Jeff Smith's Avatar

Jeff Smith

@jeffsmith.tech.bsky.social

Building AI things. Weaving textiles. Writing books and articles about both. https://www.jeffsmith.tech/

645 Followers  |  2,689 Following  |  214 Posts  |  Joined: 15.11.2024  |  1.7092

Latest posts by jeffsmith.tech on Bluesky

The field is absolutely going to find a better solution for model selection than we have today. The current environment is so cluttered and wasteful, and it impedes productivity. There has to be a better way. And there will be one. #GenAI is too impactful for us to not solve this one.

10.04.2025 15:01 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

It's not just benchmark grinding. There's also all sorts of open secrets devs are constantly adjusting for. e.g. Most long context model performance is actually still a WIP with real results decaying over longer input and output sizes.

10.04.2025 15:01 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

But the real story, IMO, is that you can't really trust a lot of benchmarks. Because they're open, they can be gamed, and thus they are gamed. Real world model performance often differs greatly from what benchmarks would lead you to believe.

10.04.2025 15:01 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

There's some totally worthwhile considerations in this post. Bigger isn't always better. Reasoning is still a pretty young technology. And your task of interest matters.

10.04.2025 15:01 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

This post gets into just how much complexity there is for #genAI devs when working on model selection. There are so many models these days. Which is great but also a huge shopping hassle.

10.04.2025 15:01 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
2nd Set AI | Jeff Smith | Substack Exploratory research in generative media. Click to read 2nd Set AI, a Substack publication. Launched 5 months ago.

This post and some of our older work on multimodal understanding of fashion are here on #Substack: 2ndsetai.substack.com

08.04.2025 14:17 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

There are also fun experimental results around models cheating, which are always entertaining.

08.04.2025 14:17 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

For the #AIresearch folks, this really is a unique angle on how to understand progress in generative visual media and the role MLLMs can play in bootstrapping effectively better performance, through visual reasoning.

08.04.2025 14:17 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image

I also introduce a new test of human equivalence, the Banner Test, named in honor of @bernadettebanner.bsky.social . And like any true Berna Bro, I don't stop with the work until I continue it into the world of #guineapig #fashion.

08.04.2025 14:17 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I try to break down the accuracy in temporal grounding of image generation, using historical fashion as the guiding example. It's not so much about #fashion as it is about knowing what the visually verifiable facts of the real world are. #history #historicalfashion

08.04.2025 14:17 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Dress Code AI's Generative Progress, As Seen Through Fashion History

I've just put up a post about my latest attempt to characterize progress in #genAI imagery. Everyone knows stuff is getting better fast, but it's hard to translate into human meaningful terms.
2ndsetai.substack.com/p/dress-code

08.04.2025 14:17 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Bridging AI Realities: Jeff Smith founding leader at PyTorch, Open Source and Startup Innovation The Socratic Embers ยท Episode

Spotify version: open.spotify.com/episode/7e3Q...

03.03.2025 18:51 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Bridging AI Realities: Jeff Smith founding leader at PyTorch, Open Source and Startup Innovation
YouTube video by Cyber Socratic Bridging AI Realities: Jeff Smith founding leader at PyTorch, Open Source and Startup Innovation

YouTube version: youtu.be/fVfTEk5JfXA?...

03.03.2025 18:51 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We also had time to ponder the future and what will be still coming online across synthetic data, #reasoning models, #RL, and more.

03.03.2025 18:51 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Lots of great reflections on the history of conversational agents, #PyTorch, and the huge amount of innovation that began with FAIR and exploded out into the wider world.

03.03.2025 18:51 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Last week Aditya Inamdar asked me to sit down and talk through the past ten years or so of my work across research, open source, and startups in AI for his podcast, The Socratic Embers.

03.03.2025 18:51 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

This a free community event with food and drink at the party after. If you're around NYC, I'd love to see you there.
#newmusic #contemporaryclassical #chambermusic #composition

04.02.2025 16:56 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

My piece is a wild mashup born of a 1700s fortepiano and NES-style chiptunes. ๐ŸŽน๐ŸŽฎ

04.02.2025 16:56 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Contemporaneous: Open Mic Contemporaneous provides a free opportunity for composers to submit their work to be performed in a free open mic for new music.

๐ŸŽปConcert Announcement!๐ŸŽบ
We're having a big free concert in NYC on March 4. Contemporaneous will be playing short pieces from a bunch of contemporary local composers (including me). The idea is something like an Open Mic night, but with a 23 person chamber orchestra. ๐Ÿคฏ
roulette.org/event/contem...

04.02.2025 16:47 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Hammock Grove Sheep | Governors Island (en-US) For the fourth straight year the Trust for Governors Island is proud to welcome a family of five sheep for their summer landscaping jobs. Hailing from Friends o

I'd also be remiss if I didn't give a shoutout to my favorite flock. The five most central sheep in NYC are on Governors Island. www.govisland.com/things-to-do...

23.01.2025 21:51 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I fully expect to steal this in the future:
We all like to think we're right and very often we're not.
We don't always have all the facts to hand, even if we think we do.
This applies just as much to you as to me.
But yesterday it applied very much to me, sorry.

23.01.2025 19:19 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
diamond geezer

Even the follow-up apology is somehow even better: diamondgeezer.blogspot.com/2025/01/a-sh...

23.01.2025 19:19 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
diamond geezer

The analysis of London's most central sheep is definitely my favorite thing on the internet this year, and it might still be on December 31. diamondgeezer.blogspot.com/2025/01/lond...

23.01.2025 19:19 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

So, you're constantly zooming out, trying to have yet one more model tame more models, into this incredibly high level form of #AI #reseach that feels less like programming and more like planning a corporate reorg.

16.01.2025 19:56 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Non technical commentators loved to write up false narratives like, "The startup's prompt is actually their secret sauce, and that's enough of a moat." Which is nonsense. We're already at the point where nearly any prompt you can write an LLM can rewrite better.

16.01.2025 19:56 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The job ends up being this weird sort of orchestration style research where you can get so many gains from things that are basically traffic conducting based on some good intuition.

16.01.2025 19:56 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The only solution for all of these generative models is another generative model.

16.01.2025 19:56 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

And remember: there's a ton of them! So, you totally end up needing to feed the API docs to your personal LLM just to have a chance of wiring up even a fraction of the models you might want to explore.

16.01.2025 19:56 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

And, so you think you're doing AI research, but what you're actually doing is tons of classic SWE API plumbing work. And whether you're calling to the big players or the new kids, those APIs are *rough.* Just totally fresh out of the oven and not as sane or as stable as you might hope.

16.01.2025 19:56 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

So, you have a handful of major cloud APIs you should call, and then there's just an endlessly expanding space of research quality models coming out all the time. You pretty quickly end up having to get on @replicate.com , Fal.ai, Hyperbolic, etc. just to get a common serving provider.

16.01.2025 19:56 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@jeffsmith.tech is following 20 prominent accounts