Andrew Fisher @andrewfisher.me

That blog post was well written and clear. I love the simplicity of just having the library and a DB provide durable execution guarantees. Really simplifies things. Could be useful in some of my data eng projects.

28.12.2024 11:11 — 👍 7 🔁 0 💬 1 📌 0

I wonder if GCP / BigQuery will use this as a forcing function to improve their managed iceberg offering? Exciting year in data.

04.12.2024 15:37 — 👍 5 🔁 0 💬 1 📌 0

Thanks for sharing! It will be interesting to see how much support exists for flexible partitioning (sharding) and clustering (sorting) schemes — to truly make these tables performant enough on larger datasets.

04.12.2024 15:32 — 👍 5 🔁 0 💬 1 📌 0

This is cool. I wonder if there’s a way to put a comment box directly on the blog post and use the user session (logon info) to the BlueSky app to reduce friction 🤔?

27.11.2024 14:46 — 👍 0 🔁 0 💬 0 📌 0

Incremental Jobs and Data Quality Are On a Collision Course - Part 1 - The Problem — Jack Vanlightly Big data isn’t dead; it’s just going incremental If you keep an eye on the data space ecosystem like I do, then you’ll be aware of the rise of DuckDB and its message that big data is dead. The idea comes from two industry papers (and associated data sets), one from the Redshift team (paper and dataset) and one from Snowflake (paper and dataset). Each paper analyzed the queries run on their platforms, and some surprising conclusions were drawn – one being that most queries were run over quite small data. The conclusion (of DuckDB) was that big data was dead, and you could use simpler query engines rather than a data warehouse. It’s far more nuanced than that, but data shows that most queries are run over smaller datasets. Why?

New blog post! Big data isn’t dead; it’s just going incremental. But bad things happen when uncontrolled changes collide with incremental jobs. Reacting to changes is a losing strategy.

jack-vanlightly.com/...

13.11.2024 14:36 — 👍 25 🔁 5 💬 7 📌 2

I wrote about this because I often need to be reminded of it. "How to use your impostor syndrome to learn anything":
davidasboth.com/impostor-syn...

20.11.2024 09:45 — 👍 3 🔁 1 💬 1 📌 1

I was not aware of that group! I hope to join y’all at a future one.

20.11.2024 20:03 — 👍 1 🔁 0 💬 0 📌 0

"Tidy First?" by Kent Beck Kent Beck’s “Tidy First?" is a concise and engaging read, outlining several “tidyings”–small code improvements–that make software easier to understand and more adaptable to future changes. He emphasiz...

Kent Beck’s “Tidy First?" is a quick read that reinforces small incremental code improvements (tidyings): www.andrewfisher.me/development/...

18.11.2024 22:17 — 👍 1 🔁 0 💬 0 📌 0

Yucca plants at sunrise in White Sands National Park

17.11.2024 19:07 — 👍 0 🔁 0 💬 0 📌 0

I was not very active the past year on X, but looking to use this going forward more frequently! Into big data, data engineering, and just a generalist at heart.

17.11.2024 18:11 — 👍 3 🔁 0 💬 0 📌 0

Feldera: Bridging Batch and Streaming with Incremental Computation Summary In this episode of the Data Engineering Podcast, the creators of Feldera talk about their incremental compute engine designed for continuous…

Feldera may be on to something with incremental compute using SQL. In 2 decades will CDC and complex streaming pipelines be a thing? www.dataengineeringpodcast.com/episodepage/...

16.11.2024 00:44 — 👍 2 🔁 0 💬 0 📌 1

Great bokeh on that first one!

13.11.2024 00:49 — 👍 1 🔁 0 💬 1 📌 0

Introduce yourself with some past jobs

- Church pianist/organist
- Construction worker
- PHP/Perl/mySQL freelancer in the early days of web
- Tech consultant
- Customer success/support at a couple startups
- Data viz tooling (full stack)
- Data and software engineer

12.11.2024 00:55 — 👍 4 🔁 0 💬 0 📌 0

Nice long form blog about migrating from DBT to SQLMesh. Added SQLMesh to my “to explore” list.

11.11.2024 19:35 — 👍 1 🔁 0 💬 0 📌 0

👋 Trying to be a more active participant. Really a jack of all trades personally and in the data eng space professionally.

02.11.2024 13:07 — 👍 1 🔁 0 💬 0 📌 0

I can’t speak to specific use cases but we’re using it. Massive (sorted) tables are generated; we update the catalog pointer and serve those datasets to end users, avoiding any “load” into a data store. The current metadata for Iceberg serves as an efficient “index” for range and point queries.

29.10.2024 18:33 — 👍 1 🔁 0 💬 0 📌 0

Looking forward to hearing more about how CB will support this open table format. Indexes in Postgres or via the Puffin spec would be compelling, IMO.

28.10.2024 20:11 — 👍 1 🔁 0 💬 0 📌 0

Good write up indeed! I appreciated hearing about the gradual journey from a single service with Postgres to true federation. Lots to learn from that and article.

27.10.2024 15:38 — 👍 1 🔁 0 💬 0 📌 0

Andrew Fisher

Latest posts by andrewfisher.me on Bluesky

@andrewfisher.me is following 19 prominent accounts