Roles & Responsibilities:
- Design and implement end-to-end data architectures leveraging GCP services (e.g., Big Query, Cloud Storage, Dataflow, Pub/Sub, Cloud Composer) for large-scale data ingestion and processing.
- Build and optimize large-scale data pipelines using Apache Spark on GCP (via Dataproc or other Spark services). Ensure high performance and scalability in Spark-based data processing workloads.
- Lead the integration of SAP S/4HANA data with GCP for real-time and batch data processing. Manage data extraction, transformation, and loading (ETL) processes from SAP S/4HANA into cloud storage and data lakes.
- Develop and manage scalable data ingestion pipelines for structured and unstructured data using tools like Cloud Dataflow, Cloud Pub/Sub, and Apache Spark.
- Provide architectural guidance for designing secure, scalable, and efficient data solutions on the Google Cloud Platform, integrating with on-premise/cloud systems like SAP S/4HANA.
- Implement both real-time streaming and batch processing pipelines using Apache Spark, Dataflow, and other GCP services to meet business requirements.
- Implement data governance, access controls, and security best practices to ensure the integrity, confidentiality, and compliance of data across systems.
- Collaborate with business stakeholders, data scientists, and engineering teams to define data requirements, ensuring the architecture aligns with business goals.
- Optimize Apache Spark jobs for performance, scalability, and cost-efficiency, ensuring that the architecture can handle growing data volumes.
- Provide technical leadership to the data engineering team, mentoring junior engineers in data architecture, Apache Spark development, and GCP best practices.
Experience & Skills Fitment:
- Over 10 years of professional experience in data engineering, specializing in implementing large-scale enterprise Data Engineering projects with the latest technologies.
- Over 5 years of hands-on experience in GCP technologies and over 3 years of lead experience.
- Expert-level programming proficiency in Python, Java, and Scala.
- Extensive hands-on experience with big data technologies, including Spark, Hadoop, Hive, Yarn, MapReduce, Pig, Kafka, and PySpark.
- Proficient in Google Cloud Platform services, such as BigQuery, Dataflow, Cloud Storage, Dataproc, and Cloud Composer Google Pub/Sub, and Google Cloud Functions.
- Expertise in Apache Spark for both batch and real-time processing, as well as proficiency in Apache Beam, Hadoop, or other big data frameworks.
- Experienced in using Cloud SQL, BigQuery, and Looker Studio (Google Data Studio) for cloud-based data solutions.
- Skilled in orchestration and deployment tools like Cloud Composer, Airflow, and Jenkins for continuous integration and deployment (CI/CD).
- Expertise in designing and developing integration solutions involving Hadoop/HDFS, real-time systems, data warehouses, and analytics solutions.
- Experience with DevOps practices, including version control (Git), CI/CD pipelines, and infrastructure-as-code (e.g., Terraform, Cloud Deployment Manager).
- Strong background in working with relational databases, NoSQL databases, and in-memory databases.
- Experience managing large datasets within Data Lake and Data Fabric architectures.
- Strong knowledge of security best practices, IAM, encryption mechanisms, and compliance frameworks (GDPR, HIPAA) within GCP environments.
- Experience in implementing data governance, data lineage, and data quality frameworks.
- In-depth knowledge of web technologies, application programming languages, OLTP/OLAP technologies, data strategy disciplines, relational databases, data warehouse development, and big data solutions.
- Led end-to-end processes for the design, development, deployment, and maintenance of data engineering projects.
- Excellent debugging and problem-solving skills.
- Retail and e-commerce domain knowledge is a plus.
- Positive attitude with strong analytical skills and the ability to guide teams effectively.
Good to have:
- GCP Certifications: Such as Professional Data Engineer or Professional Cloud Architect.
- Apache Spark and Python certifications.
- Experience with Data visualization tools like Tableau, Power BI etc.
Benefits:
- Kloud9 provides a robust compensation package and a forward-looking opportunity for growth in emerging fields.
Equal Opportunity Employer:
- Kloud9 is an equal opportunity employer and will not discriminate against any employee or applicant on the basis of age, color, disability, gender, national origin, race, religion, sexual orientation, veteran status, or any classification protected by federal, state, or local law.