Data Engineer

ATG (Auction Technology Group) • Full-time • London Area, United Kingdom • 10h ago

You have a passion for building scalable, reliable data systems that enable data scientists, ML engineers, and analysts to do their best work. You understand that great data products require more than just moving data; they need robust pipelines, data quality assurance, and thoughtful architecture. Not only do you put reliability and scalability at the heart of everything you do, but you are adept at enabling data-driven decisions through proper data modeling and pipeline design. You will be comfortable working cross-functionally with Product, Engineering, Data Science, Analytics, and MLOps teams to develop our products and improve the end-user experience. You should have a strong track record of successful prioritization, meeting critical deadlines, and enthusiastically tackling challenges with an eye toward problem solving.

Key Responsibilities

Data Pipeline Development & Management
Design, build, and maintain robust ETL/ELT pipelines that support analytics, ML models, and business intelligence
Develop scalable batch and streaming data pipelines to process millions of auction events, user interactions, and transactions daily
Implement workflow orchestration using Airflow, Dagster, or similar tools to manage complex data dependencies
Build data validation and quality monitoring frameworks to ensure data accuracy and reliability
ML & Analytics Infrastructure
Build feature engineering pipelines to support ML models for search, recommendations, and personalization
Integrate with feature stores to enable consistent feature computation across training and inference
Create datasets for model training, validation, and testing with proper versioning
Data Quality & Monitoring
Implement comprehensive data quality checks, anomaly detection, and alerting systems
Monitor pipeline health, data freshness, and SLA compliance
Create dashboards and reporting tools for data pipeline observability
Debug and resolve data quality issues and pipeline failures
Collaboration & Best Practices
Work closely with Data Scientists and ML Engineers to understand data requirements and deliver reliable datasets
Partner with Software Engineers to integrate data pipelines with application systems
Establish and document data engineering best practices, coding standards, and design patterns
Mentor junior engineers on data engineering principles and best practices

Key Requirements

Required Qualifications: BSc or MSc in Computer Science, Data Engineering, Software Engineering, or a related field, or equivalent practical experience
5+ years of experience building and maintaining data pipelines and infrastructure in production environments
Strong programming skills in Python, with experience in data processing libraries (Pandas, PySpark)
Expert-level SQL skills with experience in query optimization and performance tuning
Proven experience with workflow orchestration tools (Airflow, Dagster, Prefect, or similar)
Hands-on experience with cloud platforms (AWS preferred) including S3, Redshift, EMR, Glue, Lambda
Experience with data warehousing solutions (Redshift, Snowflake, BigQuery, or similar)
Experience with version control systems (Git) and CI/CD practices for data pipelines

Technical Skills:

Experience with distributed computing frameworks (Apache Spark, Dask, or similar)
Knowledge of both batch and streaming data processing (Kafka, Kinesis, or similar)
Familiarity with data formats (Parquet, ORC, Avro, JSON) and their trade-offs
Understanding of data quality frameworks and testing strategies
Previous work with vector databases (Pinecone, Milvus, etc)
Experience with monitoring and observability tools (Prometheus, Grafana, CloudWatch)
Knowledge of infrastructure-as-code tools (Terraform, CloudFormation)
Understanding of containerization (Docker) and orchestration (Kubernetes) is a plus

Nice-to-Have:

Familiarity with dbt (data build tool) for data transformation workflows
Knowledge of Elasticsearch or similar search technologies
Experience in eCommerce, marketplace, or auction platforms
Understanding of GDPR, data privacy, and compliance requirements
Experience with real-time analytics and event-driven architectures (Flink, Materialize)