The open source companies built their success on top of open-source platforms, benefited from community contributions and adoption, but now must abandon open-source principles to survive commercially.
@ananthdurai.bsky.social
Editor Data Engineering Weekly; subscribe www.dataengineeringweekly.com. In Prgress, LakeByte
The open source companies built their success on top of open-source platforms, benefited from community contributions and adoption, but now must abandon open-source principles to survive commercially.
๐ The 244th edition of Data Engineering Weekly dives into:
AI agents as execution engines, LLM inference economics, databases for AI, personalization, and product evidence.
Read more ๐ www.dataengineeringw...
#DataEngineering #AI #LLMs
Cricket has been Indiaโs greatest force in overcoming centuries of colonial suppression. Todayโs Womenโs World Cup win echoes the spirit of 1983 โ a triumph that will inspire generations to come. ๐ฎ๐ณ๐
03.11.2025 00:40 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0This is the most personal essay that I have written in Data Engineering Weekly. I shared a few key moments in my life and how fortunate I was to meet mentors along my professional journey, which shaped my career.
23.10.2025 00:25 โ ๐ 9 ๐ 0 ๐ฌ 0 ๐ 1๐ Data Vault vs. Dimensional Modeling vs. Medallion Architecture โ When viewed through a modern enterprise data lens, these techniques interlock.
I break down how in Part 2 of my โRevisiting the Medallion Architectureโ series.
Fivetran and dbt form a strong foundation for modern data infrastructure, known for bringing simplicity to complex engineering workflows. That said, calling it โopenโ data infrastructure feels like a stretch.
17.10.2025 12:02 โ ๐ 5 ๐ 0 ๐ฌ 3 ๐ 0Should we update the definition of an "Analytical Engineer"?
13.10.2025 17:53 โ ๐ 4 ๐ 0 ๐ฌ 0 ๐ 0As a data engineer, you can't treat zero-party (consent) and third-party (inferred) data the same way. This distinction is critical for building systems that are scalable, private, and trustworthy.
Hereโs my guide:
Could be. Composable CDP has not gained significant market share, as identity resolution is a key component that is often proprietary.
04.10.2025 16:34 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0With Census already in with Fiveatran and with dbt, it is most likely to evolve as a composable CDP.
04.10.2025 02:11 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Airbnb: Real-Time Key-Value Store
Airbnbโs next-gen key-value store supports real-time ingestion and bulk uploads with sub-second latency, powering feature stores and fraud detection.
Read the full story here: www.dataengineeringw...
Grab: Partner Gateway Metrics at Sub-Second Speed
Real-time partner analytics at scale is tough. Grab uses Apache Pinot, KafkaโFlink ingestion, partitioning, and Star-tree indexing to cut query latency to <300 ms, enabling efficient API monitoring and fast issue resolution.
Netflix Muse: Scaling Analytics at Trillion-Row Scale
Netflix evolved its Muse architecture to handle huge datasets efficiently: HyperLogLog sketches, Hollow in-memory feeds, and Druid optimizations cut query latency by ~50% and reduced concurrency load.
โก Latency Every Data Streaming Engineer Should Know
โReal-timeโ has limitsโdisk, network, and replication delays add up. StreamNative explains latency tiers, common costs, and tuning levers like batching & async processing.
๐ก Must-read for data streaming engineers!
I enjoyed this post by @ananthdurai.bsky.social. Does a great job tying a bunch of recent papers and concepts together.
27.09.2025 17:46 โ ๐ 8 ๐ 1 ๐ฌ 0 ๐ 0MCP (Model Context Protocol) promises a new way for LLMs to use tools.
Chris Riccomini argues it mostly reinvents OpenAPI, gRPC & CLIs.
Resources = docs
Tools = RPC
Prompts = configs
Soโฆ could MCP have just been a JSON file?
๐ก More insights: www.dataengineeringw...
How Tables Got Smarter: Iceberg โ DuckLake. From static snapshots to stream-native updates and catalog-first metadata, tables are evolving fast. Choose by intent, not hype.
Subscribe โ www.dataengineeringw...
Full story โ medium.com/fresha-da...
I wrote my thoughts on Supporting Our AI Overlords.
25.09.2025 13:15 โ ๐ 4 ๐ 0 ๐ฌ 0 ๐ 0How Tables Grew a Brain: Iceberg โ DuckLake
Snapshots โ incremental โ stream-native โ catalog-first.
Metadata is the bottleneck.
More insights โ www.dataengineeringw...
Full story โ medium.com/fresha-da...
BlaBlaCar scales like a pro!
dbt Core โ Transform like a champ
Airflow โ Orchestrate effortlessly
CI/CD โ Deploy instantly
Dev Containers โ Standardized dev
๐ Full story โmedium.com/blablacar...
๐ก More insights โ Subscribe to DEW
#DataEngineering #dbt #Airflow #CICD #DevContainers
๐ AI adoption is boomingโbut most data isnโt ready!
AI-ready data is:
Unified
Real-time
Human-verified
Governed
Without it, AI can confidently fail. With it? Reliable, scalable results.
๐ Read More
๐ก More insights โ Data Engineering Weekly
#AI #AIReady #DataEngineering
๐ก More insights โ Data Engineering Weekly
๐ก Learn more: stripe.com/blog/how-...
#DataEngineering #Stripe #RealTimeAnalytics #ApacheFlink
Stripeโs Real-Time Billing Analytics โก
Content:
Stripe wanted real-time visibility into subscriptions.
Traditional batch systems werenโt fast enough. โฑ๏ธ
They built a pipeline using Flink, Spark, and Pinot v2.
Now, analytics arrive in minutes, not hours. Queries return in <300ms. ๐
The 238th edition of Data Engineering Weekly is available, featuring exciting Data & AI articles.
Read more:
www.dataengineeringw...
Apache Iceberg is now entering the classic paradox.
Reference:
www.dataengineeringw...
www.warpstream.com/b...