The field is absolutely going to find a better solution for model selection than we have today. The current environment is so cluttered and wasteful, and it impedes productivity. There has to be a better way. And there will be one. #GenAI is too impactful for us to not solve this one.
10.04.2025 15:01 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
It's not just benchmark grinding. There's also all sorts of open secrets devs are constantly adjusting for. e.g. Most long context model performance is actually still a WIP with real results decaying over longer input and output sizes.
10.04.2025 15:01 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
But the real story, IMO, is that you can't really trust a lot of benchmarks. Because they're open, they can be gamed, and thus they are gamed. Real world model performance often differs greatly from what benchmarks would lead you to believe.
10.04.2025 15:01 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
There's some totally worthwhile considerations in this post. Bigger isn't always better. Reasoning is still a pretty young technology. And your task of interest matters.
10.04.2025 15:01 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
This post gets into just how much complexity there is for #genAI devs when working on model selection. There are so many models these days. Which is great but also a huge shopping hassle.
10.04.2025 15:01 โ ๐ 5 ๐ 0 ๐ฌ 1 ๐ 0
There are also fun experimental results around models cheating, which are always entertaining.
08.04.2025 14:17 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
For the #AIresearch folks, this really is a unique angle on how to understand progress in generative visual media and the role MLLMs can play in bootstrapping effectively better performance, through visual reasoning.
08.04.2025 14:17 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
I also introduce a new test of human equivalence, the Banner Test, named in honor of @bernadettebanner.bsky.social . And like any true Berna Bro, I don't stop with the work until I continue it into the world of #guineapig #fashion.
08.04.2025 14:17 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
I try to break down the accuracy in temporal grounding of image generation, using historical fashion as the guiding example. It's not so much about #fashion as it is about knowing what the visually verifiable facts of the real world are. #history #historicalfashion
08.04.2025 14:17 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Dress Code
AI's Generative Progress, As Seen Through Fashion History
I've just put up a post about my latest attempt to characterize progress in #genAI imagery. Everyone knows stuff is getting better fast, but it's hard to translate into human meaningful terms.
2ndsetai.substack.com/p/dress-code
08.04.2025 14:17 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
YouTube video by Cyber Socratic
Bridging AI Realities: Jeff Smith founding leader at PyTorch, Open Source and Startup Innovation
YouTube version: youtu.be/fVfTEk5JfXA?...
03.03.2025 18:51 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
We also had time to ponder the future and what will be still coming online across synthetic data, #reasoning models, #RL, and more.
03.03.2025 18:51 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Lots of great reflections on the history of conversational agents, #PyTorch, and the huge amount of innovation that began with FAIR and exploded out into the wider world.
03.03.2025 18:51 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Last week Aditya Inamdar asked me to sit down and talk through the past ten years or so of my work across research, open source, and startups in AI for his podcast, The Socratic Embers.
03.03.2025 18:51 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
This a free community event with food and drink at the party after. If you're around NYC, I'd love to see you there.
#newmusic #contemporaryclassical #chambermusic #composition
04.02.2025 16:56 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
My piece is a wild mashup born of a 1700s fortepiano and NES-style chiptunes. ๐น๐ฎ
04.02.2025 16:56 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
Contemporaneous: Open Mic
Contemporaneous provides a free opportunity for composers to submit their work to be performed in a free open mic for new music.
๐ปConcert Announcement!๐บ
We're having a big free concert in NYC on March 4. Contemporaneous will be playing short pieces from a bunch of contemporary local composers (including me). The idea is something like an Open Mic night, but with a 23 person chamber orchestra. ๐คฏ
roulette.org/event/contem...
04.02.2025 16:47 โ ๐ 5 ๐ 0 ๐ฌ 1 ๐ 0
I fully expect to steal this in the future:
We all like to think we're right and very often we're not.
We don't always have all the facts to hand, even if we think we do.
This applies just as much to you as to me.
But yesterday it applied very much to me, sorry.
23.01.2025 19:19 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
diamond geezer
Even the follow-up apology is somehow even better: diamondgeezer.blogspot.com/2025/01/a-sh...
23.01.2025 19:19 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
diamond geezer
The analysis of London's most central sheep is definitely my favorite thing on the internet this year, and it might still be on December 31. diamondgeezer.blogspot.com/2025/01/lond...
23.01.2025 19:19 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
So, you're constantly zooming out, trying to have yet one more model tame more models, into this incredibly high level form of #AI #reseach that feels less like programming and more like planning a corporate reorg.
16.01.2025 19:56 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
Non technical commentators loved to write up false narratives like, "The startup's prompt is actually their secret sauce, and that's enough of a moat." Which is nonsense. We're already at the point where nearly any prompt you can write an LLM can rewrite better.
16.01.2025 19:56 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
The job ends up being this weird sort of orchestration style research where you can get so many gains from things that are basically traffic conducting based on some good intuition.
16.01.2025 19:56 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
The only solution for all of these generative models is another generative model.
16.01.2025 19:56 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
And remember: there's a ton of them! So, you totally end up needing to feed the API docs to your personal LLM just to have a chance of wiring up even a fraction of the models you might want to explore.
16.01.2025 19:56 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
And, so you think you're doing AI research, but what you're actually doing is tons of classic SWE API plumbing work. And whether you're calling to the big players or the new kids, those APIs are *rough.* Just totally fresh out of the oven and not as sane or as stable as you might hope.
16.01.2025 19:56 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
So, you have a handful of major cloud APIs you should call, and then there's just an endlessly expanding space of research quality models coming out all the time. You pretty quickly end up having to get on @replicate.com , Fal.ai, Hyperbolic, etc. just to get a common serving provider.
16.01.2025 19:56 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Newsletter: https://gradientflow.substack.com/
Podcast: https://thedataexchange.media/
Blog: https://gradientflow.com/blog/
At Block, we believe open source is at the heart of innovation and community empowerment. Our vision is to nurture a diverse and vibrant open source ecosystem that removes barriers to technology and fosters economic opportunities for all.
I drink and I sew things.
YouTube.com/@bernadettebanner
Robust Open Online Safety Tools (ROOST) is a new non-profit entity providing open source, accessible, high-quality, transparent safety tools for digital organizations of all kinds.
roost.tools
Entrepreneur at heart, world traveler.
#dftba #entrepreneurship #startups #events #tech #space #politics #health #photography #education #innovation & #cocreation
๐๐Helsinki, Finland
๐ซ๐ฎ ๐ช๐ช ๐บ๐ธ
@framian.fi
Guy who tackles the other side of startups. Inbound marketing, transparent sales, real value.
May occasionally talk about: ๐ถ doggos, ๐ฎ gaming, and ๐บ relationship building
Currently Scaling:
๐ฏ tactycs.io
๐ง local-leads.ai
I am the co-founder of @cactuscon | ex @bishopfox ex @spiderlabs | currently pursuing a phd at UdeG in ML & OffSec. https://sensecurity.io
slayer of applications | not a super villain
Senior AI/ML Engineer, Causal Machine Learning at GSK.ai | ML, Statistics, Causal Inference, Genetics/Genomics, Statistical Physics
Co-founder at @BotCity (YC W22)
OSS Maintainer at MarvinJ and Marvin
Computer Scientist, AI, Open Source
Studying genomics, machine learning, and fruit. My code is like our genomes -- most of it is junk.
Guest Scientist IMP Vienna, Board of Directors NumFOCUS
Incoming Prof UMass Chan Medical
Previously Stanford Genetics, UW CSE.
Author of Bea Wolf, A City on Mars, and the comic SMBC
Website: www.smbc-comics.com
Patreon: https://www.patreon.com/ZachWeinersmith?ty=h
New book: http://www.acityonmars.com/
Campaign manager at @ioppublishing.bsky.social
#MachineLearning #BiomedicalEngineering #MedicalPhysics #Complexity #Biofabrication
Weaver. Reader. Cat roommate.
ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT). June 23rd to June 26th, 2025, in Athens, Greece. #FAccT2025
https://facctconference.org/
Run AI with an API
replicate.com
Association for Computers and the Humanities, the US-based professional society for the digital humanities. #digitalhumanities
Political Communication Professor at GWU. I write a lot about the history and future of tech and politics. Best known for that one time I made fun of Bret Stephens.
Davekarpf.substack.com
Moonmaker, Pathfinder, Wonderer. Art's apprentice, Color's mistress, Nature's admirer. Silk shibori ribbon maker. #fiberarts #Japan #textile #travel
Shiborigirlstudios.com
AI, Ethics, SmartCity
๐ฆ๐ฎ๐ฝ๐ถ๐ฒ๐ป๐๐ฎ ๐จ๐ป๐ถ๐๐ฒ๐ฟ๐๐ถ๐๐: lecturer Planning and Strategic Management
๐๐ก๐๐: member of National Authority for AI association
๐ง๐๐ : Account Manager
๐ฐ https://doi.org/10.1108/TG-04-2024-0096
๐ https://www.linkedin.com/in/vriccardi
Midlife-dxโd ADHD Chaos Muppet, unexpectedly in โNow what?โ mode again. Feral tech support, learning about data, urbanism, futures and systems.
@johannab on #ravelry #LSG, coSocial.ca, wandering.shop & pxlfd.ca. Random thoughts & ๐งถyarnsplaining here.