Indeed it s not simple unfortunately..just the a way to get started quickly atm .
13.12.2024 19:55 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0@hachej.bsky.social
Freelance Data | Weekly Data Eng. Newsletter ๐จ juhache.substack.com - 4k+ readers
Indeed it s not simple unfortunately..just the a way to get started quickly atm .
13.12.2024 19:55 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0โค๏ธโค๏ธ
13.12.2024 16:47 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0For iceberg catalog hard to find a simpler setup..
13.12.2024 09:07 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Nice! Can you orchestrate lambda or ecs tasks that way?
13.12.2024 08:16 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Just use Pyiceberg with AWS Glue, probably the fastest way to get started.
13.12.2024 08:02 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0In term of volume of data exchange over the marketplace? No idea
06.12.2024 21:21 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0SF sales rep told me that markeplace was THE feature that helped a lot to convert
06.12.2024 20:53 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0Saw it lot in finance!
06.12.2024 20:28 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0Multi pipelines will probably get tricky no?
06.12.2024 20:24 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Dlt + duck + evidence + @blef.fr's baby?
06.12.2024 20:03 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0For those lost in GCP terminology:
I wrote a summary of Iceberg integration in GCP a couple of weeks ago:
juhache.substack.com/p/gcp-and-ic...
New blog post: Building a 0$ Data Distribution System.
juhache.substack.com/p/0-data-dis...
30 million rows is just one month of data, right?
04.12.2024 14:06 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Do you support swiss banks by any chance?
01.12.2024 12:27 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0" bash / make knowledge, a single instance SQL processing engine (DuckDB, CHDB or a few python scripts), a distributed file system, git and a developer workflow (CI/CD)" what s your best option to orchestate sql models in such setup?
30.11.2024 18:48 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Some learnings after helping +50 companies in high performance data engineering projects
javisantana.com/2024/11/30/l...
Super good thx!
"immutable workflow + atomic operation" 100%!
ATTACH url_to_your_dabase.duckdb;
30.11.2024 15:39 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Oh yeah
wiki.postgresql.org/wiki/Don%27t...
S3 to Snow in the same aws region is free no?
30.11.2024 08:49 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Niiiice, your view is doing a read_parquet(*) on their bucket? Or do you copy the data?
30.11.2024 06:44 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 01 Docker container embedding app code + SQLite DB โ live chat app with 10k simultaneous users.
youtu.be/0rlATWBNvMw
Where is the data stored, then? In DuckDB itself?
So, if you have a 1GB dataset, does that mean youโll share a single .duckdb file containing the entire dataset? Or either a view pointing to parquet files: CREATE VIEW... as read_parquet(*.parquet) ?
Do you see DuckDB as a format?
For me:
โข Parquet = Standard storage format
โข Iceberg = Standard metadata format
โข DuckDB = One possible distribution vector
Yup, there are almost fifteen million SQLite databases on Blueskyโs PDS servers. Itโs wildly efficient and simple but not without trade offs of course.
Makes sense for this use case in large part because each users atproto repository is self contained, with links to other repos, like a website.
just do both -> catalog.boringdata.io/dashboard/in...
28.11.2024 19:22 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0A screenshot of .zshrc code with the following content: # Function to auto-activate virtual environment function auto_activate_virtualenv() { if [ -d ".venv" ]; then source .venv/bin/activate fi } # Hook the function to the 'chpwd' event (triggered when you change directories) autoload -U add-zsh-hook add-zsh-hook chpwd auto_activate_virtualenv # Also call the function for the initial directory when the terminal starts auto_activate_virtualenv
Here's what I put in my ~/.zshrc file to make sure my virtualenv autoactivates when I move to a directory with a .venv file. Works well for me so far. Do the rest of you do something like this?
#Python #DataBS
Prediction: poeple will monetize custom feeds
github.com/bluesky-soci...
Something interesting is brewing in Iceberg-on-S3 land. ๐
lists.apache.org/thread/v7x65...
cc @eatonphil.bsky.social