Vignesh Chandramohan's Avatar

Vignesh Chandramohan

@vigneshc.bsky.social

Stream processing, data infra, Table formats and Pickleball. https://datapapers.substack.com/

1,195 Followers  |  86 Following  |  25 Posts  |  Joined: 03.11.2024  |  2.1247

Latest posts by vigneshc.bsky.social on Bluesky

Love the idea. Could some of these eventually become sub projects, and hosted in the SlateDB organization as a separate repo? Starting projects that have that potential as GitHub issues with a specific tag would make it easy to track.

07.07.2025 01:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Insane amount of SlateDB work going on:

- snapshot reads
- split/merge DBs (zero copy)
- deterministic simulation testing

And someone just pushed Python bindings in a PR! 🀯

18.06.2025 14:48 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Internals of SlateDB: An Embedded Key Value Store Built On Object Storage
YouTube video by Data Council Internals of SlateDB: An Embedded Key Value Store Built On Object Storage

My Data council talk on SlateDB.
youtu.be/gcTRXZeKbNg?...

30.05.2025 05:33 β€” πŸ‘ 22    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Got it. So, if I wanted a view to update, say once an hour incrementally, would I create a "hourly view" that uses now() and join against it?

27.05.2025 16:53 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Clock tick as an input is indeed a way to model it! Would the clock tick table be joined in all views that need this property?

27.05.2025 14:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Finally got to read this.
One additional aspect to ivm, is reasoning about the data in the computed. For a lot of use cases, it is often easy to think of a view/table to move in predictable increments (day, hour, 15 minutes etc). This notion is not modeled as a first class concept in many.

26.05.2025 23:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

SlateDB 0.6.0 is out!

github.com/slatedb/slat...

Highlights include a hybrid cache (using Foyer), a lot of internal cleanup, and more groundwork for transactions.

Oh, and put performance jumped ~80% for write-heavy workloads :)

slatedb.io/performance/...

24.04.2025 19:04 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
SlateDB - An embedded storage engine built on object storage | SlateDB Description will go into a meta tag in <head />

Today marks SlateDB’s one year anniversary! It’s been a lot of fun. Thanks to @rohanpd.bsky.social @flaneur2024.bsky.social @almog.ai @vigneshc.bsky.social @paulbutler.org Jason Gustafson, David Moravek, and many others for joining the project. πŸ˜€

22.04.2025 21:55 β€” πŸ‘ 15    πŸ” 5    πŸ’¬ 0    πŸ“Œ 1
Preview
πŸŽ‚ Commonhaus Turns One β€” A Look Back, and the Road Ahead Commonhaus Foundation celebrates its first anniversary and lays down expectations for its future

Commonhaus is 1! πŸŽ‚
14 projects, solid foundations, and more on the way.

If you believe in light governance, shared care, and thoughtful support for open source, come see what we’re building.

www.commonhaus.org/activity/253...

10.04.2025 14:05 β€” πŸ‘ 31    πŸ” 19    πŸ’¬ 0    πŸ“Œ 1
Post image

Yo SF Bay Area #databs crew, want to talk lakehouses at a real Lake House? :)

Next week after Data Council, join the founders of @clickhouse.com, @motherduck.com, @startreedata.bsky.social, and @tobikodata.com to talk real-time databases and next-generation ETL.

www.rilldata.com/events/data-...

15.04.2025 23:44 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Release v0.5.0 Β· slatedb/slatedb What's Changed Refactor Block Tests to Use Table-Driven Test Cases by @samsond in #410 Update await calls in README.md by @criccomini in #425 chore: Apply table driven test for sst.rs by @jeffreyl...

SlateDB 0.5.0 is out!

Features:
- Checkpoints
- Clones
- Read only client
- Split/merge database foundation
- TTL filtering on reads
- Last version with breaking byte format changes

By the numbers:
- 62 commits
- 2 new contributors
- 10 total contributors

github.com/slatedb/slat...

17.03.2025 17:23 β€” πŸ‘ 22    πŸ” 3    πŸ’¬ 2    πŸ“Œ 1
Preview
Building composable data systems: Why, How and Standards Standards improve interoperability. Reusable libraries built around standards drive adoption. In this post, we explore key papers and real-world examples.

datapapers.substack.com/p/building-c...

New post.

02.03.2025 19:51 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
CALL FOR GRAND CHALLENGE SOLUTIONS DEBS2025

DEBS conference hosts a grand challenge every year. This year's challenge is detecting outliers in a stream of images from laser powder bed fusion.
The challenge involves submitting a kubernetes app (constraint: 2 cores 8 gb). Interesting to try if you have the time!
2025.debs.org/call-for-gra...

23.02.2025 18:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Great episode!
Towards the end @vanlightly.bsky.social mentions about alloytools.org finding a data model bug.
Never thought of an intersection between data model and formal verification. Do you have more details on this?

15.02.2025 04:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Python Folks - which data/workflow engine has the best developer experience for packaging code? We have looked into - Modal, Beam, Airflow, Flyte, AWS Lambda, Prefect, Dagster and Spark. Haven’t seen any approach which is fast, reliable and intuitive.

17.12.2024 16:09 β€” πŸ‘ 9    πŸ” 2    πŸ’¬ 6    πŸ“Œ 0
Preview
Big data quality framework: a holistic approach to continuous quality management - Journal of Big Data Big Data is an essential research area for governments, institutions, and private agencies to support their analytics decisions. Big Data refers to all about data, how it is collected, processed, and ...

What are some papers or blogs about data quality challenges?

I see tools like great expectations, table formats features like 'check constraints' in Delta. I don't yet see it as a first class property of catalogs.

Found this, are there others?
journalofbigdata.springeropen.com/articles/10....

19.01.2025 19:26 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Big Data Bellevue: Apache Gluten: Accelerating SparkSQL with Spark on Velox
YouTube video by BDB Big Data Bellevue: Apache Gluten: Accelerating SparkSQL with Spark on Velox

Great talk by Binwei Yang on Apache Gluten last week.

youtu.be/GWTj3INSzPg?...

Apache Gluten moves execution of spark operators to native backend like Velox, accelerating query performance.
It has basic iceberg support too!
github.com/apache/incub...

19.01.2025 02:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This book was on my list for the year, joining!

11.01.2025 19:20 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

SlateDB 0.4.0 is out!

Features:
- Range scans
- No DynamoDB needed for S3
- Nightly perf tests
- Merge operator groundwork
- GC improvements

By the numbers:
- 57 commits
- 5 new contributors
- 11 total contributors

github.com/slatedb/slat...

31.12.2024 18:50 β€” πŸ‘ 26    πŸ” 4    πŸ’¬ 3    πŸ“Œ 0

Finally, the blog from 2023 says it isn't used in production yet. Any recent data points on production experience that can be shared now? :)

15.12.2024 21:09 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

What do you think about flink materialized views or dynamic tables? Can the hoptimator concept (i.e. provision the flink and other internal + external connectors to do what user asked) be part of flink eventually?

15.12.2024 21:08 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Declarative Data Pipelines with Hoptimator

I finally understood what it is after reading this blog :)
www.linkedin.com/blog/enginee...

Developer experience / testing is one of the hard aspect of declarative pipelines. Pipelines, while they take more steps, is more predictable. Is there a snappy preview builtin, to mitigate some of this?

15.12.2024 21:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Great writeup!
Did not realize flink CDC is finally just another flink job, and that it can use any of the debezium source connectors!

On iceberg, I do see a flink connector in the iceberg project. What needs to happen to make flink iceberg work with flink CDC out of the box (at par with Kafka)

12.12.2024 15:46 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Great write up!
Looking at the apis, it only deals with the catalog aspect. Writing the manifest file etc are still the responsibility of client. Is the writes in systems like rust client, materialize difficult because of the lack of these apis or is writing metadata etc hard as well?

04.12.2024 13:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I don't understand it as well. Documentation has missing parts. Example shows a spark package, specifically for S3 Tables. It might be an implementation of the catalog. Apis that backs this implementation is not in the documentation yet.

03.12.2024 20:09 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Apache XTableβ„’ (Incubating) Apache XTableβ„’ (Incubating) is a cross-table interop of lakehouse table formats Apache Hudi, Apache Iceberg, and Delta Lake. Apache XTableβ„’ is NOT a new or separate format, Apache XTableβ„’ provides abs...

I wonder if Fluss would extend and use XTable, since it deals with multiple table formats.
xtable.apache.org

29.11.2024 19:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Paimon is already the second version, with first one being flink table store. Paimon documentation explicitly says not to use it without a cluster framework like flink, unlike iceberg and delta which are building kernel libraries. There is no mention of Paimon in fluss,likely not a evolution.

29.11.2024 19:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Vortex: A Stream-oriented Storage Engine For Big Data Analytics

I think it plays into a similar space as the following

research.google/pubs/vortex-... storage API and background optimizations to iceberg

www.confluent.io/blog/introdu... - likely the backend of this solves some of the same problems.

29.11.2024 19:45 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Was the post commit sequential tests or did it combine multiple PRs together? Was there issues related to having to revert multiple commits due to an issue with one?

26.11.2024 17:32 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I recently started reading about iceberg. Would using something like slatedb with a custom schema/convention for storing the catalog info work?

26.11.2024 06:48 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@vigneshc is following 20 prominent accounts