Submitting more applications increases your chances of landing a job.

Here’s how busy the average job seeker was last month:

Opportunities viewed

Applications submitted

Keep exploring and applying to maximize your chances!

Looking for employers with a proven track record of hiring women?

Click here to explore opportunities now!
We Value Your Feedback

You are invited to participate in a survey designed to help researchers understand how best to match workers to the types of jobs they are searching for

Would You Be Likely to Participate?

If selected, we will contact you via email with further instructions and details about your participation.

You will receive a $7 payout for answering the survey.


User unblocked successfully
https://bayt.page.link/v1TUmrkCw1dqRip19
Back to the job results
Full time
1-9 Employees · Advertising

Get the Bayt App

Download the Bayt App to manage your real time conversation with the recruiter

Download App
Create a job alert for similar positions
Job alert turned off. You won’t receive updates for this search anymore.

Job description

Overview and Mission We are seeking a seasoned Senior Data Engineer to join our growing Data Platform team. You will be instrumental in designing, building, and optimizing our next-generation data lake and data warehouse architecture on AWS. This role requires deep expertise in scalable ETL/ELT pipelines, data governance, and ensuring high-quality, reliable data is available for analytics, reporting, and machine learning initiatives. If you are passionate about data quality, automation, and cloud-native solutions, we want you on our team.

Key Responsibilities

Architecture and Development

  • Design, construct, and maintain robust, scalable, and efficient ETL/ELT pipelines using tools like Apache Spark (Databricks/EMR) and cloud services (AWS Glue, Step Functions).
  • Develop and manage the centralized data lake storage (on S3) and the structured data warehouse layer (Snowflake or Redshift), focusing on performance, cost optimization, and partitioning strategies.
  • Implement data quality checks and validation frameworks directly into the pipelines to ensure data integrity and reliability.
  • Lead the migration of legacy data processes to modern cloud-native solutions, reducing technical debt and improving latency.

Data Governance and Automation

  • Establish and enforce data governance, lineage, and metadata management policies across the entire data platform.
  • Automate infrastructure provisioning for the data environment using Infrastructure-as-Code (IaC) tools like Terraform or CloudFormation.
  • Oversee CI/CD processes for data pipelines and infrastructure code, integrating automated testing and monitoring.

Collaboration and Optimization

  • Collaborate closely with Data Scientists, Business Intelligence (BI) analysts, and application engineers to understand their data requirements and translate them into efficient data models.
  • Troubleshoot and optimize query performance for internal users, focusing on cost-effective resource utilization.
  • Mentor junior data engineers and champion best practices in Python development, documentation, and cloud architecture.

Required Technical Qualifications

  • Experience: 6+ years of professional experience in data engineering, data warehousing, or a related role, with at least 3 years focused on a major cloud platform (AWS preferred).
  • Programming: Expert proficiency in Python (specifically for data processing) and deep familiarity with data manipulation libraries (Pandas, PySpark).
  • SQL Mastery: Advanced proficiency in SQL and experience designing dimensional models (Star/Snowflake schema). Experience with a columnar database (Snowflake, Redshift, BigQuery) is required.
  • Big Data Frameworks: Proven, hands-on experience building production pipelines using Apache Spark (via Databricks, EMR, or similar).
  • Cloud Infrastructure (AWS): Strong practical knowledge of core AWS data services, including S3, IAM, AWS Glue, Lambda, and RDS. Familiarity with Kubernetes (EKS) for containerized workloads is a plus.
  • Data Orchestration: Extensive experience with a workflow orchestrator like Apache Airflow, Prefect, or Dagster for scheduling complex ETL dependencies.
  • IaC & CI/CD: Practical experience with version control (Git) and building deployment pipelines using Terraform or similar tools.

Desired Skills and Attributes

  • Experience with real-time data ingestion using streaming technologies (Kafka, Kinesis, Flink).
  • Knowledge of advanced data formats like Parquet and Delta Lake for efficient storage and querying.
  • Background in data security and compliance (GDPR, HIPAA) as it relates to data access and anonymization.
  • Strong analytical and problem-solving skills with a proactive approach to anticipating data infrastructure challenges.
  • Excellent verbal and written communication skills for documenting architectures and educating stakeholders.

Benefits, Culture, and Growth We offer a competitive salary, annual performance bonuses, and a generous equity package. Comprehensive health, dental, and vision coverage. A large annual budget for professional development, including conferences and certifications. Flexible work arrangements and a collaborative team environment where innovation is valued and rewarded.

Continuation to Meet 4,200 Characters

Advanced Data Modeling and Query Optimization: The Principal Machine Learning Engineer will take ownership of the data modeling layer, ensuring that schema evolution is handled gracefully and without interrupting downstream consumers. You must be skilled in techniques such as slow-changing dimensions (SCDs) and materialized views to improve read performance for BI applications. Deep expertise in tuning performance parameters within Spark clusters and managing compute resource allocation to meet strict SLA requirements is essential. You will regularly review and refactor complex SQL queries written by analysts, providing guidance on optimization and resource reduction.

Cloud-Native Data Security: A core part of the role involves designing and implementing security measures for data at rest and in transit. This includes setting up fine-grained access controls using AWS Lake Formation or similar tools, implementing encryption across all data storage layers (S3, Redshift), and designing secure VPC architectures to isolate sensitive data processing environments. You will work with the Security team to perform regular audits of IAM roles and policies pertaining to data access. Compliance requirements necessitate rigorous auditing capabilities within the data lake.

Data Pipeline Observability: You will implement a dedicated monitoring stack to track pipeline health, execution time, and failure rates. This includes setting up detailed logging and alerting using CloudWatch, Prometheus, and Grafana. Establishing data lineage tools will be crucial for quickly tracing the source of any data errors and accelerating incident resolution time. You will define and monitor key data quality metrics (completeness, freshness, validity) and build automated notification systems to alert data owners of critical degradation.

Technical Strategy and Vision: As a Senior Engineer, you are expected to look beyond the immediate roadmap and identify future technical challenges and opportunities. This involves proposing and prototyping new architectural patterns, such as serverless data processing using AWS Lambda for event-driven pipelines, or adopting graph databases for relationship modeling. Your recommendations will influence the entire trajectory of the data platform for the next three years. This requires excellent communication skills to articulate complex technical risks and benefits to non-technical leadership.

CI/CD and Testing Discipline: You are responsible for ensuring every piece of code—from the Python ETL scripts to the Terraform infrastructure configuration—is subjected to automated testing and deployed through a controlled CI/CD pipeline. This includes unit testing, integration testing, and data contract testing to prevent unexpected breaks when schema changes occur. You must champion the principle of immutable infrastructure across the team.

Mentorship and Skill Development: Your role includes actively coaching and mentoring mid-level data engineers on modern cloud practices, debugging complex distributed systems (like Spark), and adhering to software engineering best practices (code review, documentation). You will lead technical brown bags and internal training sessions to foster continuous improvement within the wider engineering organization.

This job post has been translated by AI and may contain minor differences or errors.
You’ve reached the maximum limit of 15 job alerts. To create a new alert, please delete an existing one first.
Job alert created for this search. You’ll receive updates when new jobs match.
Are you sure you want to unapply?

You'll no longer be considered for this role and your application will be removed from the employer's inbox.