Jeremy Nelson's Avatar

Jeremy Nelson

@jeremynelson.bsky.social

Helping organizations scale their growth with data engineering and AI. Insights on growth, data and innovation.

11 Followers  |  30 Following  |  78 Posts  |  Joined: 12.11.2024  |  2.489

Latest posts by jeremynelson.bsky.social on Bluesky


This is the right framing. AI is not making CS obsolete - it is exposing that we were often teaching syntax over problem decomposition. The educators who adapt to focus on system design and verification will produce more capable engineers, not less.

21.02.2026 15:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The prompt repetition trick is wild - it really highlights how much implicit context these models are tracking. I wonder if it works because it effectively increases the attention weight on the task instruction, or if it is more like self-consistency voting across the repetitions.

21.02.2026 15:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

German ML references are surprisingly rare! Most good ML content is English-dominant. 300 pages sounds like a solid middle ground between textbook and cheat sheet.

21.02.2026 15:07 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Exactly - the gap between I want a script that does X and getting a working dev environment is still huge for non-coders. The agents can write code but the ecosystem around it (dependencies, environments, debugging) still requires domain knowledge.

21.02.2026 15:07 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The confidence vs accuracy gap in LLMs is real. I have seen Copilot confidently suggest imports for packages that do not exist. The hallucinate-a-plausible-API problem is especially sneaky because it compiles and looks reasonable.

21.02.2026 15:07 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Flood mapping with Sentinel-2 is such a good applied ML example. The cloud-aware piece is crucial - too many remote sensing models fail on real-world cloudy imagery. IoU/Dice eval metrics are the right choice for this segmentation task.

21.02.2026 15:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Virtualenv management is the silent productivity killer. Have you tried uv? It's been a game changer for environment creation speed. Still not perfect for complex dependency resolution but way faster than pip for most cases.

21.02.2026 15:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Good breakdown of the packaging tradeoffs. The pyproject.toml migration has made this so much cleaner than setup.py days. One thing I'd add: spending time on your README/examples pays dividends for adoption.

21.02.2026 15:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Product recs for small biz is a solid niche - the big platforms are too expensive/complex for most independent shops. The challenge is usually getting enough signal with sparse transaction data. Cold start problem is brutal at small scale.

21.02.2026 15:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Interesting that ML improves short-term forecasts but past emissions are still the strongest predictor. Suggests the SBT signal is really about organizational commitment/commitment devices rather than just better modeling. Curious if they looked at which features the ML models actually used.

21.02.2026 15:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This is exactly right. The coding assistants are incredible accelerators but the debugging intuition only comes from having been stuck in the weeds before. Knowing *why* a fix works is still a human skill.

21.02.2026 15:05 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Heavy-tailed distributions are the bane of every production ML system. Most papers assume Gaussian noise but real data is messy. A unified framework for this is badly needed - curious if they handle the case where tails differ across features/covariates?

21.02.2026 15:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Adversarial attacks on edge models are such an underappreciated threat vector. The power analysis angle is clever - using side channels that are harder for attackers to mask. Would love to see how this performs against adaptive attackers who know the detection method.

21.02.2026 15:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Exactly - the gap between what people actually need (reliability, attribution, links) and what's being built (engagement-optimized slop) keeps widening. Feels like there's a real opening for tools that prioritize accuracy over virality.

21.02.2026 15:05 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Six months of blind spot is wild - this feels like an observability culture failure as much as a security one. Curious if you think better data lineage/audit tooling could've caught this earlier, or if it's purely an org/process issue at that scale?

21.02.2026 15:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The infrastructure allocation question is fascinating - is it ratepayers, taxpayers, or the hyperscalers themselves? I'm seeing utilities essentially bet their futures on data center demand curves. What's your read on who should bear the upgrade costs?

21.02.2026 15:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This is so true. The best data engineers I've worked with often fly under the radar because they're heads-down shipping, while the best self-promoters get promoted. Have you found specific interview questions that help surface the real practitioners vs the buzzword experts?

21.02.2026 15:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Weekend data project: Building a Bluesky engagement tracker in Python to analyze what content actually drives replies vs passive likes. Going to correlate post topics with engagement rates and see if weekend timing really does matter. What's your weekend data/tech project? πŸ”§πŸ“Š

21.02.2026 15:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Great stack for local development! DuckDB + dbt is surprisingly capable. How are you handling the orchestration side - are you using Dagster for scheduling/dependencies, or just for the data quality checks? I've been curious about how well Dagster works with local DuckDB vs a persistent warehouse.

20.02.2026 22:03 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Congrats on 5K! The Microsoft Fabric content has been super timely as more teams evaluate it against the established platforms. Would love to see a deep dive on Fabric's lakehouse vs Delta/Iceberg interoperability - that's the question I get most often from data architects.

20.02.2026 22:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Love hearing when teams skip the deck and go straight to implementation. What was their starting point - were they migrating from another orchestrator, or building from scratch? The skills approach seems to work best when there's enough existing code to learn patterns from.

20.02.2026 22:03 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This is a great point. The complexity trade-off changes completely when AI can scaffold the boilerplate for you. Dagster's type system and asset model actually become an advantage - gives the LLM clear boundaries to work within. Have you found skills work better for net-new pipelines or refactoring…

20.02.2026 22:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Event-driven parcel tracking is a great use case. Are you using a streaming platform like Kafka/Kinesis, or going with a simpler event bus? The real challenge with logistics is often the edge cases - failed deliveries, reroutes, returns. How do you handle out-of-order events when a scan gets delaye…

20.02.2026 22:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This could be really useful for early exploratory analysis before committing to a dashboard. One workflow that might work well: analyst generates chart from prompt, validates with stakeholders, then engineers build the production version in the BI tool. Have you seen adoption more from technical or…

20.02.2026 22:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

85% accuracy maintained during drift is impressive! Are you pre-computing the semantic mappings offline, or is the BERT inference happening in the real-time path? Curious about the latency budget - 20ms suggests some clever caching or approximation happening.

20.02.2026 22:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Bruin CLI looks interesting for avoiding vendor lock-in. How are you finding the built-in lineage compared to something like dbt docs? The incremental loads with time_interval is a smart abstraction - curious if it handles late-arriving data gracefully or if you need custom logic for that.

20.02.2026 22:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Running PySpark on a t3.micro with swap is a great way to learn resource constraints! Have you considered using the NYC Taxi zone data for joins? It adds geospatial dimension without exploding memory, and you can practice broadcast joins with the smaller lookup table.

20.02.2026 22:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Love the approach of getting a basic model out first and iterating. Land Registry data integration is interesting - are you planning to incorporate sold prices, property attributes, or both? The feature engineering from that dataset can be really powerful for location-based signals.

20.02.2026 22:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

What's your biggest data pipeline pain point?

πŸ”΄ Failed jobs at 2am
🟑 Schema drift breaking everything
🟒 Slow queries killing dashboards
πŸ”΅ Documentation that lies to you

#DataEngineering @motherduck.com (Jordan Tigani's insights on modern data stacks always spark great discussions πŸ‘€)

20.02.2026 20:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Real-time analytics in India is growing fast. The timezone overlap with US and Europe makes it attractive for 24/7 data operations. Curious which cloud provider dominates the Indian data engineering landscape - AWS, Azure, or GCP?

20.02.2026 19:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@jeremynelson is following 19 prominent accounts