Julien Hurault's Avatar

Julien Hurault

@hachej.bsky.social

Freelance Data | Weekly Data Eng. Newsletter ๐Ÿ“จ juhache.substack.com - 4k+ readers

119 Followers  |  298 Following  |  45 Posts  |  Joined: 08.11.2024  |  2.4192

Latest posts by hachej.bsky.social on Bluesky

Indeed it s not simple unfortunately..just the a way to get started quickly atm .

13.12.2024 19:55 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

โค๏ธโค๏ธ

13.12.2024 16:47 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

For iceberg catalog hard to find a simpler setup..

13.12.2024 09:07 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Nice! Can you orchestrate lambda or ecs tasks that way?

13.12.2024 08:16 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Just use Pyiceberg with AWS Glue, probably the fastest way to get started.

13.12.2024 08:02 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

In term of volume of data exchange over the marketplace? No idea

06.12.2024 21:21 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

SF sales rep told me that markeplace was THE feature that helped a lot to convert

06.12.2024 20:53 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Saw it lot in finance!

06.12.2024 20:28 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Multi pipelines will probably get tricky no?

06.12.2024 20:24 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Dlt + duck + evidence + @blef.fr's baby?

06.12.2024 20:03 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
GCP & Iceberg Ju Data Engineering Weekly - Ep 77

For those lost in GCP terminology:
I wrote a summary of Iceberg integration in GCP a couple of weeks ago:
juhache.substack.com/p/gcp-and-ic...

06.12.2024 19:46 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
0$ Data Distribution Ju Data Engineering Weekly - Ep 78

New blog post: Building a 0$ Data Distribution System.

juhache.substack.com/p/0-data-dis...

06.12.2024 19:44 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

30 million rows is just one month of data, right?

04.12.2024 14:06 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Home

check catalog.boringdata.io/dashboard/in...

02.12.2024 11:40 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Do you support swiss banks by any chance?

01.12.2024 12:27 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

" bash / make knowledge, a single instance SQL processing engine (DuckDB, CHDB or a few python scripts), a distributed file system, git and a developer workflow (CI/CD)" what s your best option to orchestate sql models in such setup?

30.11.2024 18:48 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
javisantana.com

Some learnings after helping +50 companies in high performance data engineering projects

javisantana.com/2024/11/30/l...

30.11.2024 12:17 โ€” ๐Ÿ‘ 38    ๐Ÿ” 12    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 3

Super good thx!
"immutable workflow + atomic operation" 100%!

30.11.2024 18:25 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

ATTACH url_to_your_dabase.duckdb;

30.11.2024 15:39 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Oh yeah

wiki.postgresql.org/wiki/Don%27t...

30.11.2024 08:59 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

S3 to Snow in the same aws region is free no?

30.11.2024 08:49 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Niiiice, your view is doing a read_parquet(*) on their bucket? Or do you copy the data?

30.11.2024 06:44 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
DHH discusses SQLite (and Stoicism)
YouTube video by Aaron Francis DHH discusses SQLite (and Stoicism)

1 Docker container embedding app code + SQLite DB โ†’ live chat app with 10k simultaneous users.

youtu.be/0rlATWBNvMw

29.11.2024 15:25 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Where is the data stored, then? In DuckDB itself?
So, if you have a 1GB dataset, does that mean youโ€™ll share a single .duckdb file containing the entire dataset? Or either a view pointing to parquet files: CREATE VIEW... as read_parquet(*.parquet) ?

29.11.2024 15:13 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Do you see DuckDB as a format?

For me:
โ€ข Parquet = Standard storage format
โ€ข Iceberg = Standard metadata format
โ€ข DuckDB = One possible distribution vector

29.11.2024 08:25 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Yup, there are almost fifteen million SQLite databases on Blueskyโ€™s PDS servers. Itโ€™s wildly efficient and simple but not without trade offs of course.

Makes sense for this use case in large part because each users atproto repository is self contained, with links to other repos, like a website.

11.11.2024 06:51 โ€” ๐Ÿ‘ 39    ๐Ÿ” 7    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 4
Home

just do both -> catalog.boringdata.io/dashboard/in...

28.11.2024 19:22 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
A screenshot of .zshrc code with the following content:

# Function to auto-activate virtual environment
function auto_activate_virtualenv() {
    if [ -d ".venv" ]; then
        source .venv/bin/activate
    fi
}

# Hook the function to the 'chpwd' event (triggered when you change directories)
autoload -U add-zsh-hook
add-zsh-hook chpwd auto_activate_virtualenv

# Also call the function for the initial directory when the terminal starts
auto_activate_virtualenv

A screenshot of .zshrc code with the following content: # Function to auto-activate virtual environment function auto_activate_virtualenv() { if [ -d ".venv" ]; then source .venv/bin/activate fi } # Hook the function to the 'chpwd' event (triggered when you change directories) autoload -U add-zsh-hook add-zsh-hook chpwd auto_activate_virtualenv # Also call the function for the initial directory when the terminal starts auto_activate_virtualenv

Here's what I put in my ~/.zshrc file to make sure my virtualenv autoactivates when I move to a directory with a .venv file. Works well for me so far. Do the rest of you do something like this?

#Python #DataBS

27.11.2024 21:29 โ€” ๐Ÿ‘ 38    ๐Ÿ” 2    ๐Ÿ’ฌ 8    ๐Ÿ“Œ 0

Prediction: poeple will monetize custom feeds
github.com/bluesky-soci...

28.11.2024 18:53 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Something interesting is brewing in Iceberg-on-S3 land. ๐Ÿ‘€

lists.apache.org/thread/v7x65...

cc @eatonphil.bsky.social

26.11.2024 19:26 โ€” ๐Ÿ‘ 29    ๐Ÿ” 5    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 2

@hachej is following 20 prominent accounts