Weโll go beyond basic ETL:
- Handling nested & evolving schemas
- Accelerating pipeline creation with LLMs
- Moving from scripts to reliable ingestion workflows
- Validating data and schema changes using the dlt dashboard and dlt MCP
@datateam.bsky.social
Data engineer & Cofounder @dlthub. Building out the tooling i wish i had.
Weโll go beyond basic ETL:
- Handling nested & evolving schemas
- Accelerating pipeline creation with LLMs
- Moving from scripts to reliable ingestion workflows
- Validating data and schema changes using the dlt dashboard and dlt MCP
From APIs to Warehouses ๐ฆ
On Feb 17 (16:30 CET), together with DataTalks.Club, Aashish Nair will walk through building end-to-end ingestion pipelines with dlt, from raw APIs to production-ready warehouse loads.
Register here ๐
What if dimensional modeling didnโt mean hours of boilerplate SQL?
We built an AI workflow that turns raw data into semantic models in minutes, powered by 20 questions.
Rethinking data transformation ๐
Whoโs speaking in Berlin ๐
- Francesco Mucio: integrating 20+ APIs
- Bijan Soltani: real-world analytics
- Nemanja Bibic: ingestion for AI memory
- Ken Schrรถder: analyst-friendly ingestion w/ dlt on AWS
- Violetta Mishechkina: AI agents, data quality & whatโs next
See you there ๐
Berlin, itโs meetup time!
Join us for the dltHub Community Meetup, an evening of real-world demos, lessons learned, and conversations with builders.
๐ Rosebud, Berlin
๐ Feb 17 | 18:00 โ 21:00
Curious about what weโre building at dltHub? Come by ๐
Production pipelines donโt fail loudly, they drift.
Feb 12 ยท 16:00 CET - Online
Hands-on workshop on operating pipelines in production:
โข schema changes
โข backfills
โข CI/CD
โข long-term reliability
Register โ https://community.dlthub.com/workshop-maintaining-servicing-production-data-pipelines
๐ Data Valentine Challenge started today.
5 days. 5 live data sessions with:ย
@datarecce.bsky.social, Greybeam, @databasetycoon.bsky.social, @bauplan.bsky.social
Our slot: Wednesday โ Pipelines That Donโt Ghost You
Feb 9โ13 | 9am PT | Online
https://reccehq.com/data-valentine-week-challenge/
โข Markets (Kalshi, Polymarket, DEX Screener)
โข AI platforms (fal, Jina AI, Kie AI)
โข Macro data (World Bank, Finnhub, Alpha Vantage, Frankfurter)
โข Entertainment (PokรฉAPI, OpenF1)
Explore the contexts ๐
Januaryโs Rising Stars in the dlt ecosystem ๐
Builders are vibe coding pipelines around real-time markets, AI dev platforms, macro data, and more.
Whatโs trending right now:
Arrow + ADBC + dlt just broke the EL speed limit.
5M rows DuckDBโMySQL:
SQLAlchemy 344s
Arrow + ADBC 92s (3.7ร faster)
One line of code. Columnar end-to-end.
Benchmarks:
The Modern Data Stackโข was a comfy lie that turned data engineers into passive consumers, now the bill's due, market's schisming into vendor-locked hell vs builder freedom.
Read the Builder's Manifesto:
๐ Amsterdam
๐ Jan 29 ยท 3โ6 PM (GMT+1)
Agenda highlights:
โข Vision โ Matthaus Krzykowski (@matthausk.bsky.social) & Julian Alves | dltHub
โข Demos โ Vincent D. Warmerdam (@koaning.bsky.social) | Marimo, Mehdi Ouazza (@mehdio.com) | MotherDuck
โข First impressions โ Thomas in't Veld | Tasman Analytics
Together with @motherduck.com, @duckdb.org, and marimo, weโre bringing together a toolkit built for full-stack data developers:
๐น ingest with dlt
๐น query fast
๐น serve instantly
Built for builders, not enterprise overhead.
Want to influence the tools you use every day?
Weโre hosting a Builderโs Data Stack meetup focused on developer flow, fast iteration, and shaping the roadmap with the community.
Pull up a chair:
An AI agent ignored a code freeze, wiped a prod DB, then hallucinated data to cover it up.
Data quality in the LLM era isnโt optional, itโs a safety problem.
We call it data as plutonium - powerful and dangerous without containment.
๐ค Call for Speakers
Using dlt in your projects? Weโre opening the mic to the community for short 10โ15 min talks sharing:
๐ ๏ธ real use cases
๐ lessons learned
If this sounds like you, reach out via the event page.
Letโs learn from each other in Paris!
๐ซ๐ท Paris data folks ๐
Weโre hosting a dlt Community Meetup in Paris on Feb 4th (6โ9 PM) together with Polycea.
A community meetup focused on practical takeaways, shared learnings, and conversations with people using dlt hands-on.
Join us here:
Data quality is the vegetables of data engineering: everyone agrees it's important, but nobody wants to implement it.
To increase your ๐ฬถ๐ฬถ๐ฬถ๐ฬถ๐ฬถ๐ฬถ๐ฬถ๐ฬถ๐ฬถ ฬถ๐ฬถ๐ฬถ๐ฬถ๐ฬถ๐ฬถ๐ฬถ test coverage, check out these 11 delicious recipes.
https://dlthub.com/blog/practical-data-quality-recipes-with-dlt
This gives you a self-healing system that keeps your semantic layer in sync as your data changes.
Huge thanks to Julien Hurault and Hussain Sultan for their contributions.
The blueprint:
1๏ธโฃ ingest with dlt to capture metadata
2๏ธโฃ use LLMs to infer semantic relationships
3๏ธโฃ auto-generate the BI layer
We went from a raw Sakila DB to a governed semantic layer automatically. Moving beyond manual mappings to automate the context itself.
Semantic layers are the most important "boring" part of data.ย
Building them manually is a bottleneck for Chat-BI. dlt is changing the game by "autofilling" the metadata gap, turning months of modeling into minutes of automation.ย
Most data quality failures happen because checks come too late.
dlt + dltHub treat quality as a lifecycle: in-flight checks, safe staging, and production monitoring.
Catch issues earlier, fix less and trust your data more.
docs: https://dlthub.com/docs/general-usage/data-quality-lifecycle
@ssp.shย drops another great deep-dive, a declarative data stack dlt + @clickhouse.comย + @rilldata.comย that simplifies tracking cloud spend across multiple platforms.
A complete cloud-native FinOps setup in minutes.
๐ Explore the full walkthrough
We're thrilled to take the stage at the Snowflake for Startups Pitch Night!
Join us at the Silicon Valley AI Hub to see how dlt's code & LLM-first infra-native data ingestion library is the fastest way to get compliant data into Snowflake.
AI workflows break on LLM updates?
Our Anti-Entropy pattern fixes this with declarative scaffolds, error loops, and dashboards for antifragile convergence.
Saves time, acts like an invisible senior engineer.
Ditch crutches, skip the โDeer in Headlightsโ panic!
Each integration came together quickly: Stripe with just a token, AWS CUR from S3, and GCP billing straight from BigQuery.
This powerful tool stack combined with dlt's connectors, @duckdb.orgย and @rilldata.com's interactive dashboards, deliver a real-time, consolidated analytics view.
At the core is a single incremental pipeline powered by dlt, loading everything into Parquet & DuckDB for fast analysis.
Handling auth, pagination, and schema changes, the pipeline remains simple end to end, and because itโs fully pluggable, adding Azure or Cloudflare is easy.
To overcome complex cloud cost analysis, @ssp.shย showed how dlt can ingest and normalize AWS, GCP, and Stripe data into a unified cost dashboard.
The result is a single view for ROI analysis powered by a simple ELT.
Check out the full solution! ๐
European data teams can enjoy lightning-fast analytics & production-ready pipelines with @motherduck EU region fully available.
Choose loading via MotherDuck or dlt's native DuckLake destination โ support for Postgres, DuckDB, SQLite, MySQL.
Ever launched a data pipeline & wondered whatโs happening under the hood? The dlt Workspace Dashboard gives real-time visibility into pipeline state, schemas, live dataset queries, run traces โ all in one web app. Built withย
@marimo.io.ย
Try it now: ๐ https://dlthub.com/docs/general-usage/dashboard