Andrew Fisher's Avatar

Andrew Fisher

@andrewfisher.me.bsky.social

Data & software engineer. Currently building on TRM Labs’ Data Platform that fights crypto crime and processes TBs each day. Interested in databases, data engineering, data analysis, full stack engineering, πŸ“Έ, and β˜•οΈ. https://www.andrewfisher.me

82 Followers  |  146 Following  |  16 Posts  |  Joined: 25.10.2024  |  1.7767

Latest posts by andrewfisher.me on Bluesky

That blog post was well written and clear. I love the simplicity of just having the library and a DB provide durable execution guarantees. Really simplifies things. Could be useful in some of my data eng projects.

28.12.2024 11:11 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I wonder if GCP / BigQuery will use this as a forcing function to improve their managed iceberg offering? Exciting year in data.

04.12.2024 15:37 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Thanks for sharing! It will be interesting to see how much support exists for flexible partitioning (sharding) and clustering (sorting) schemes β€” to truly make these tables performant enough on larger datasets.

04.12.2024 15:32 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This is cool. I wonder if there’s a way to put a comment box directly on the blog post and use the user session (logon info) to the BlueSky app to reduce friction πŸ€”?

27.11.2024 14:46 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Incremental Jobs and Data Quality Are On a Collision Course - Part 1 - The Problem β€” Jack Vanlightly Big data isn’t dead; it’s just going incremental If you keep an eye on the data space ecosystem like I do, then you’ll be aware of the rise of DuckDB and its message that big data is dead. The idea comes from two industry papers (and associated data sets), one from the Redshift team (paper and dataset) and one from Snowflake (paper and dataset). Each paper analyzed the queries run on their platforms, and some surprising conclusions were drawn – one being that most queries were run over quite small data. The conclusion (of DuckDB) was that big data was dead, and you could use simpler query engines rather than a data warehouse. It’s far more nuanced than that, but data shows that most queries are run over smaller datasets.Β  Why?

New blog post! Big data isn’t dead; it’s just going incremental. But bad things happen when uncontrolled changes collide with incremental jobs. Reacting to changes is a losing strategy.

jack-vanlightly.com/...

13.11.2024 14:36 β€” πŸ‘ 25    πŸ” 5    πŸ’¬ 7    πŸ“Œ 2

I wrote about this because I often need to be reminded of it. "How to use your impostor syndrome to learn anything":
davidasboth.com/impostor-syn...

20.11.2024 09:45 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 1

I was not aware of that group! I hope to join y’all at a future one.

20.11.2024 20:03 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
"Tidy First?" by Kent Beck Kent Beck’s β€œTidy First?" is a concise and engaging read, outlining several β€œtidyings”–small code improvements–that make software easier to understand and more adaptable to future changes. He emphasiz...

Kent Beck’s β€œTidy First?" is a quick read that reinforces small incremental code improvements (tidyings): www.andrewfisher.me/development/...

18.11.2024 22:17 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Yucca plants at sunrise in White Sands National Park

Yucca plants at sunrise in White Sands National Park

17.11.2024 19:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I was not very active the past year on X, but looking to use this going forward more frequently! Into big data, data engineering, and just a generalist at heart.

17.11.2024 18:11 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Feldera: Bridging Batch and Streaming with Incremental Computation Summary In this episode of the Data Engineering Podcast, the creators of Feldera talk about their incremental compute engine designed for continuous…

Feldera may be on to something with incremental compute using SQL. In 2 decades will CDC and complex streaming pipelines be a thing? www.dataengineeringpodcast.com/episodepage/...

16.11.2024 00:44 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

Great bokeh on that first one!

13.11.2024 00:49 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Introduce yourself with some past jobs

- Church pianist/organist
- Construction worker
- PHP/Perl/mySQL freelancer in the early days of web
- Tech consultant
- Customer success/support at a couple startups
- Data viz tooling (full stack)
- Data and software engineer

12.11.2024 00:55 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Nice long form blog about migrating from DBT to SQLMesh. Added SQLMesh to my β€œto explore” list.

11.11.2024 19:35 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

πŸ‘‹ Trying to be a more active participant. Really a jack of all trades personally and in the data eng space professionally.

02.11.2024 13:07 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I can’t speak to specific use cases but we’re using it. Massive (sorted) tables are generated; we update the catalog pointer and serve those datasets to end users, avoiding any β€œload” into a data store. The current metadata for Iceberg serves as an efficient β€œindex” for range and point queries.

29.10.2024 18:33 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Looking forward to hearing more about how CB will support this open table format. Indexes in Postgres or via the Puffin spec would be compelling, IMO.

28.10.2024 20:11 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Good write up indeed! I appreciated hearing about the gradual journey from a single service with Postgres to true federation. Lots to learn from that and article.

27.10.2024 15:38 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@andrewfisher.me is following 19 prominent accounts