You have a passion for building scalable, reliable data systems that enable data scientists, ML engineers, and analysts to do their best work. You understand that great data products require more than just moving data; they need robust pipelines, data quality assurance, and thoughtful architecture. Not only do you put reliability and scalability at the heart of everything you do, but you are adept at enabling data-driven decisions through proper data modeling and pipeline design. You will be comfortable working cross-functionally with Product, Engineering, Data Science, Analytics, and MLOps teams to develop our products and improve the end-user experience. You should have a strong track record of successful prioritization, meeting critical deadlines, and enthusiastically tackling challenges with an eye toward problem solving.
Key Responsibilities
- Data Pipeline Development & Management
- Design, build, and maintain robust ETL/ELT pipelines that support analytics, ML models, and business intelligence
- Develop scalable batch and streaming data pipelines to process millions of auction events, user interactions, and transactions daily
- Implement workflow orchestration using Airflow, Dagster, or similar tools to manage complex data dependencies
- Build data validation and quality monitoring frameworks to ensure data accuracy and reliability
- ML & Analytics Infrastructure
- Build feature engineering pipelines to support ML models for search, recommendations, and personalization
- Integrate with feature stores to enable consistent feature computation across training and inference
- Create datasets for model training, validation, and testing with proper versioning
- Data Quality & Monitoring
- Implement comprehensive data quality checks, anomaly detection, and alerting systems
- Monitor pipeline health, data freshness, and SLA compliance
- Create dashboards and reporting tools for data pipeline observability
- Debug and resolve data quality issues and pipeline failures
- Collaboration & Best Practices
- Work closely with Data Scientists and ML Engineers to understand data requirements and deliver reliable datasets
- Partner with Software Engineers to integrate data pipelines with application systems
- Establish and document data engineering best practices, coding standards, and design patterns
- Mentor junior engineers on data engineering principles and best practices
Key Requirements
- Required Qualifications: BSc or MSc in Computer Science, Data Engineering, Software Engineering, or a related field, or equivalent practical experience
- 5+ years of experience building and maintaining data pipelines and infrastructure in production environments
- Strong programming skills in Python, with experience in data processing libraries (Pandas, PySpark)
- Expert-level SQL skills with experience in query optimization and performance tuning
- Proven experience with workflow orchestration tools (Airflow, Dagster, Prefect, or similar)
- Hands-on experience with cloud platforms (AWS preferred) including S3, Redshift, EMR, Glue, Lambda
- Experience with data warehousing solutions (Redshift, Snowflake, BigQuery, or similar)
- Experience with version control systems (Git) and CI/CD practices for data pipelines
Technical Skills:
- Experience with distributed computing frameworks (Apache Spark, Dask, or similar)
- Knowledge of both batch and streaming data processing (Kafka, Kinesis, or similar)
- Familiarity with data formats (Parquet, ORC, Avro, JSON) and their trade-offs
- Understanding of data quality frameworks and testing strategies
- Previous work with vector databases (Pinecone, Milvus, etc)
- Experience with monitoring and observability tools (Prometheus, Grafana, CloudWatch)
- Knowledge of infrastructure-as-code tools (Terraform, CloudFormation)
- Understanding of containerization (Docker) and orchestration (Kubernetes) is a plus
Nice-to-Have:
- Familiarity with dbt (data build tool) for data transformation workflows
- Knowledge of Elasticsearch or similar search technologies
- Experience in eCommerce, marketplace, or auction platforms
- Understanding of GDPR, data privacy, and compliance requirements
- Experience with real-time analytics and event-driven architectures (Flink, Materialize)