PortfolioF1 Analytics Pipeline

F1 Analytics Pipeline

Personal Project

TL;DR - Quick Summary

Building a full ELT pipeline using dbt for transformations and DuckDB as a warehouse, orchestrated by Apache Airflow. Planning to containerize the entire stack with Docker, scheduling daily runs to ingest raw data, run dbt models, and execute data quality tests automatically. This will create a reliable, automated system for F1 analytics.

Apache AirflowDockerdbt CoreCosmos (Airflow Provider)DuckDBSQLPython
Team:Solo project
Role:Data Engineer
Status:In Development
View Project
Come back in some time to see the results! 🚀

The Problem

Analytical data pipelines require more than just transformation logic; they need to be scheduled, monitored, and be resilient to failure. Manually running ingestion and dbt scripts is not scalable or reliable for providing stakeholders with timely, accurate data.

The Solution

Planning to develop an Airflow DAG that will orchestrate the entire process. A `BashOperator` will first ingest raw data. The `Cosmos` provider will then be used to dynamically parse the dbt project and create a corresponding task group in Airflow, perfectly preserving the dependency graph. This DAG will run on a daily schedule, ensuring the entire pipeline from raw CSVs to analytics-ready tables is automated and reliable.

Business Impact

Will automate the entire data workflow, eliminating manual runs and ensuring data is always fresh

Planning to increase pipeline reliability with Airflow's built-in retry and alerting mechanisms

Aiming to create a fully documented and reproducible data pipeline using dbt and Airflow