Selected work

Pipelines, warehouses, and the systems that move data.

Each project follows the same structure — problem, architecture, and measurable impact. Open a case study for an architecture walkthrough and a code highlight.

Football Data Pipeline

End-to-end pipeline that ingests, transforms, and loads football match data for downstream analytics.

Challenge

Orchestrating multi-step ingestion and transformation reliably across 6 dependent stages.

Approach

Designed a 6-task Airflow DAG (daily schedule); used Spark for heavy transforms; modelled marts with dbt backed by 10 data-quality tests.

Impact

Fully automated daily pipeline — zero manual steps from API source to analytics-ready mart, guarded by 10 dbt tests.

PythonApache AirflowBigQueryApache Sparkdbt
GitHub

Taxi Analytics (dbt)

Analyse Chicago Taxi Trips from the BigQuery Public Dataset to surface monthly revenue and trip-volume trends.

Challenge

Cleaning millions of raw trip records with inconsistent formats and nulls.

Approach

Implemented Staging → Mart architecture across 3 dbt models; enforced schema tests + a custom positive-revenue assertion throughout.

Impact

Delivered 2 reliable marts (monthly revenue & trip volume) over millions of Chicago taxi trips, ready for dashboarding.

dbtBigQuerySQL
GitHub

Uwufufu Playlist Automator

Automate creation of a UwuFufu music quiz from any Spotify or YouTube playlist.

Challenge

Reliably resolving YouTube URLs for every track across large playlists (250+ songs).

Approach

Built a Python tool that chains Spotify → YouTube lookup → API submission with error handling for unmatched tracks.

Impact

Reduced manual quiz-setup from ~30 min to under 60 seconds for playlists of 250+ tracks.

PythonSpotify APIYouTube Data APIREST API
GitHub