Vignesh Chandramohan's Avatar

Vignesh Chandramohan

@vigneshc.bsky.social

Stream processing, data infra, Table formats and Pickleball. https://datapapers.substack.com/

1,212 Followers  |  87 Following  |  36 Posts  |  Joined: 03.11.2024  |  1.8453

Latest posts by vigneshc.bsky.social on Bluesky


Preview
Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First Large Language Model (LLM) agents, acting on their users' behalf to manipulate and analyze data, are likely to become the dominant workload for data systems in the future. When working with data, agen...

Related items:
arxiv.org/abs/2509.00997 - talks about agent's query patterns, and how data systems should adapt.

www.malloydata.dev - another promising query language, the claim is complex queries are expressed in simpler form than SQL, making llms make less mistake.

17.12.2025 03:27 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@jayaprabhakar.bsky.social Interesting read on formal verification.

09.12.2025 16:50 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
​Iceberg Use Cases at DoorDash
YouTube video by Apache Icebergβ„’ Meetup ​Iceberg Use Cases at DoorDash

Short talk on Iceberg use cases in last week's Seattle Iceberg meetup.
youtu.be/F7qpOVVnxek?...

31.10.2025 02:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

It has three increasingly more verbose levels of description. They are probably trying to optimize the initial set of searches. And with markdown based skills, custom workflows are approachable for a broader audience compared to MCP, and probably safer too.
It is only available in Claude desktop.

31.10.2025 02:40 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
SF Systems Meetup: Databases and Stateful Apps Β· Luma The SF Systems Meetup is back with a pair of talks giving us a peek at the future of state management! This month, we're excited to have talks from Almog Gavra…

If you find yourself in SF next week, @almog.xyz is talking about SlateDB at the SF Systems Meetup on Wednesday!

21.10.2025 16:45 β€” πŸ‘ 17    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

Tools, prompts, sampling - all these seem to be a result of generalizing how Claude code / research feature was built over time, and extracting ask out of those patterns. Uniformity is the biggest and probably only benefit.

And maybe the goal is not about solutions that spend less tokens?

29.09.2025 22:48 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Intend to follow this along. Done with chapter 1, looking forward to the next one!

28.09.2025 01:23 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

2/ This mindset improves productivity and outcome significantly overtime.

27.09.2025 20:59 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Time to Adapt and Become a Better Engineer A colleague recently asked if I think AI agents could do all the jobs humans do. My answer was an unequivocal yes. Given the same sensory…

1/ Nice read:
medium.com/@xiafan/time...
Software engineer growth in AI era:

* Build composable tools.
* Assume non deterministic outcomes from a group of ai agents.
* Understand how LLM works to the next level, like how you would understand to read a query plan.

27.09.2025 20:58 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
SQLync

Just added sqlync.com to SlateDB's adopters list! They're building a streaming system that speaks MQTT or PostgreSQL across millions of connected users and devices. 🀯

25.08.2025 19:05 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

3/ And as an extension, how it handles maintenance operations such as vacuum on iceberg tables dones out of band.

24.08.2025 14:20 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

2/ I assume the iceberg writes uses iceberg open source libraries. This would ensure the write part continues to evolve with iceberg advancements.

I don't yet know if this handles compacted topics (which would introduce deletes on iceberg)

24.08.2025 14:19 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Introducing Tableflow: Unifying Streaming and Analytics Seamlessly integrate Apache Kafka data into your lakehouse as Apache Iceberg or Delta Lake tables, bridging the operational and analytical divide, with Tableflow. Read more in our blog post.

1/ Leveraging Remote storage manager and storing Kafka segments as parquet files + iceberg metadata is really good. Avoids having to consume, serialize and manage a separate process.
I wonder if confluent's TableFlow launched about a year back has a similar design. www.confluent.io/blog/introdu...

24.08.2025 14:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Love the idea. Could some of these eventually become sub projects, and hosted in the SlateDB organization as a separate repo? Starting projects that have that potential as GitHub issues with a specific tag would make it easy to track.

07.07.2025 01:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Insane amount of SlateDB work going on:

- snapshot reads
- split/merge DBs (zero copy)
- deterministic simulation testing

And someone just pushed Python bindings in a PR! 🀯

18.06.2025 14:48 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Internals of SlateDB: An Embedded Key Value Store Built On Object Storage
YouTube video by Data Council Internals of SlateDB: An Embedded Key Value Store Built On Object Storage

My Data council talk on SlateDB.
youtu.be/gcTRXZeKbNg?...

30.05.2025 05:33 β€” πŸ‘ 22    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Got it. So, if I wanted a view to update, say once an hour incrementally, would I create a "hourly view" that uses now() and join against it?

27.05.2025 16:53 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Clock tick as an input is indeed a way to model it! Would the clock tick table be joined in all views that need this property?

27.05.2025 14:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Finally got to read this.
One additional aspect to ivm, is reasoning about the data in the computed. For a lot of use cases, it is often easy to think of a view/table to move in predictable increments (day, hour, 15 minutes etc). This notion is not modeled as a first class concept in many.

26.05.2025 23:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

SlateDB 0.6.0 is out!

github.com/slatedb/slat...

Highlights include a hybrid cache (using Foyer), a lot of internal cleanup, and more groundwork for transactions.

Oh, and put performance jumped ~80% for write-heavy workloads :)

slatedb.io/performance/...

24.04.2025 19:04 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
SlateDB - An embedded storage engine built on object storage | SlateDB Description will go into a meta tag in <head />

Today marks SlateDB’s one year anniversary! It’s been a lot of fun. Thanks to @rohanpd.bsky.social @flaneur2024.bsky.social @almog.ai @vigneshc.bsky.social @paulbutler.org Jason Gustafson, David Moravek, and many others for joining the project. πŸ˜€

22.04.2025 21:55 β€” πŸ‘ 16    πŸ” 5    πŸ’¬ 0    πŸ“Œ 1
Preview
πŸŽ‚ Commonhaus Turns One β€” A Look Back, and the Road Ahead Commonhaus Foundation celebrates its first anniversary and lays down expectations for its future

Commonhaus is 1! πŸŽ‚
14 projects, solid foundations, and more on the way.

If you believe in light governance, shared care, and thoughtful support for open source, come see what we’re building.

www.commonhaus.org/activity/253...

10.04.2025 14:05 β€” πŸ‘ 31    πŸ” 19    πŸ’¬ 0    πŸ“Œ 1
Post image

Yo SF Bay Area #databs crew, want to talk lakehouses at a real Lake House? :)

Next week after Data Council, join the founders of @clickhouse.com, @motherduck.com, @startreedata.bsky.social, and @tobikodata.com to talk real-time databases and next-generation ETL.

www.rilldata.com/events/data-...

15.04.2025 23:44 β€” πŸ‘ 10    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Release v0.5.0 Β· slatedb/slatedb What's Changed Refactor Block Tests to Use Table-Driven Test Cases by @samsond in #410 Update await calls in README.md by @criccomini in #425 chore: Apply table driven test for sst.rs by @jeffreyl...

SlateDB 0.5.0 is out!

Features:
- Checkpoints
- Clones
- Read only client
- Split/merge database foundation
- TTL filtering on reads
- Last version with breaking byte format changes

By the numbers:
- 62 commits
- 2 new contributors
- 10 total contributors

github.com/slatedb/slat...

17.03.2025 17:23 β€” πŸ‘ 22    πŸ” 3    πŸ’¬ 2    πŸ“Œ 1
Preview
Building composable data systems: Why, How and Standards Standards improve interoperability. Reusable libraries built around standards drive adoption. In this post, we explore key papers and real-world examples.

datapapers.substack.com/p/building-c...

New post.

02.03.2025 19:51 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
CALL FOR GRAND CHALLENGE SOLUTIONS DEBS2025

DEBS conference hosts a grand challenge every year. This year's challenge is detecting outliers in a stream of images from laser powder bed fusion.
The challenge involves submitting a kubernetes app (constraint: 2 cores 8 gb). Interesting to try if you have the time!
2025.debs.org/call-for-gra...

23.02.2025 18:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Great episode!
Towards the end @vanlightly.bsky.social mentions about alloytools.org finding a data model bug.
Never thought of an intersection between data model and formal verification. Do you have more details on this?

15.02.2025 04:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Python Folks - which data/workflow engine has the best developer experience for packaging code? We have looked into - Modal, Beam, Airflow, Flyte, AWS Lambda, Prefect, Dagster and Spark. Haven’t seen any approach which is fast, reliable and intuitive.

17.12.2024 16:09 β€” πŸ‘ 10    πŸ” 2    πŸ’¬ 6    πŸ“Œ 0
Preview
Big data quality framework: a holistic approach to continuous quality management - Journal of Big Data Big Data is an essential research area for governments, institutions, and private agencies to support their analytics decisions. Big Data refers to all about data, how it is collected, processed, and ...

What are some papers or blogs about data quality challenges?

I see tools like great expectations, table formats features like 'check constraints' in Delta. I don't yet see it as a first class property of catalogs.

Found this, are there others?
journalofbigdata.springeropen.com/articles/10....

19.01.2025 19:26 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Big Data Bellevue: Apache Gluten: Accelerating SparkSQL with Spark on Velox
YouTube video by BDB Big Data Bellevue: Apache Gluten: Accelerating SparkSQL with Spark on Velox

Great talk by Binwei Yang on Apache Gluten last week.

youtu.be/GWTj3INSzPg?...

Apache Gluten moves execution of spark operators to native backend like Velox, accelerating query performance.
It has basic iceberg support too!
github.com/apache/incub...

19.01.2025 02:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@vigneshc is following 20 prominent accounts