Indeed it s not simple unfortunately..just the a way to get started quickly atm .
β€οΈβ€οΈ
For iceberg catalog hard to find a simpler setup..
Nice! Can you orchestrate lambda or ecs tasks that way?
Just use Pyiceberg with AWS Glue, probably the fastest way to get started.
In term of volume of data exchange over the marketplace? No idea
SF sales rep told me that markeplace was THE feature that helped a lot to convert
Saw it lot in finance!
Multi pipelines will probably get tricky no?
Dlt + duck + evidence + @blef.fr's baby?
For those lost in GCP terminology:
I wrote a summary of Iceberg integration in GCP a couple of weeks ago:
juhache.substack.com/p/gcp-and-ic...
New blog post: Building a 0$ Data Distribution System.
juhache.substack.com/p/0-data-dis...
30 million rows is just one month of data, right?
Do you support swiss banks by any chance?
" bash / make knowledge, a single instance SQL processing engine (DuckDB, CHDB or a few python scripts), a distributed file system, git and a developer workflow (CI/CD)" what s your best option to orchestate sql models in such setup?
Some learnings after helping +50 companies in high performance data engineering projects
javisantana.com/2024/11/30/l...
Super good thx!
"immutable workflow + atomic operation" 100%!
ATTACH url_to_your_dabase.duckdb;
Oh yeah
wiki.postgresql.org/wiki/Don%27t...
S3 to Snow in the same aws region is free no?
Niiiice, your view is doing a read_parquet(*) on their bucket? Or do you copy the data?
1 Docker container embedding app code + SQLite DB β live chat app with 10k simultaneous users.
youtu.be/0rlATWBNvMw
Where is the data stored, then? In DuckDB itself?
So, if you have a 1GB dataset, does that mean youβll share a single .duckdb file containing the entire dataset? Or either a view pointing to parquet files: CREATE VIEW... as read_parquet(*.parquet) ?
Do you see DuckDB as a format?
For me:
β’ Parquet = Standard storage format
β’ Iceberg = Standard metadata format
β’ DuckDB = One possible distribution vector
Yup, there are almost fifteen million SQLite databases on Blueskyβs PDS servers. Itβs wildly efficient and simple but not without trade offs of course.
Makes sense for this use case in large part because each users atproto repository is self contained, with links to other repos, like a website.
Here's what I put in my ~/.zshrc file to make sure my virtualenv autoactivates when I move to a directory with a .venv file. Works well for me so far. Do the rest of you do something like this?
#Python #DataBS
Prediction: poeple will monetize custom feeds
github.com/bluesky-soci...
Something interesting is brewing in Iceberg-on-S3 land. π
lists.apache.org/thread/v7x65...
cc @eatonphil.bsky.social