PortfolioData Pipeline Observability Dashboard

Data Pipeline Observability Dashboard

Personal Project

TL;DR - Quick Summary

Building a centralized observability system for data pipelines with PostgreSQL for metrics storage and Streamlit for visualization. Features configurable threshold-based alerting, historical trend analysis, and integration with dbt/Airflow. Tracks runtime, record counts, data quality checks, and provides actionable alerts for pipeline failures.

PythonSQLAlchemyStreamlitPlotlyPostgreSQLTimescaleDB (planned)dbt artifactsAirflow callbacksDockerdocker-compose

Team:Solo project

Role:Data Engineer

Status:In Development

View Project

Come back in some time to see the results! 🚀

The Problem

Data pipelines can fail silently - completing without errors but processing zero records, taking unusually long, or producing poor quality data. Without centralized monitoring, data engineers spend hours manually checking logs across different systems to diagnose issues, leading to delayed detection of data problems and loss of stakeholder trust.

The Solution

Planning to build a lightweight instrumentation library that any pipeline (Python scripts, dbt models, Airflow DAGs) can use to log metrics to a central PostgreSQL database. A Streamlit dashboard will provide real-time and historical views of pipeline health, with configurable threshold-based alerts (e.g., "alert if runtime > 2x average" or "alert if row count = 0").

Business Impact

Will reduce mean time to detection (MTTD) for pipeline issues from hours to minutes

Planning to provide single pane of glass for monitoring all data pipelines across the organization

Aiming to eliminate manual log checking by centralizing all pipeline metrics in one dashboard

Will enable proactive issue detection through trend analysis and configurable alerting