I suspect the base probably could be finetuned for other tasks too! With a bit of hackery this is probably also a very strong DNA foundation model.
03.02.2026 17:05 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0@carrigmat.bsky.social
Engineer @huggingface. I'm the reason your LLM frontend has a jinja2cpp dependency. Sometimes yells about housing and trans rights instead of working He/him
I suspect the base probably could be finetuned for other tasks too! With a bit of hackery this is probably also a very strong DNA foundation model.
03.02.2026 17:05 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0The key twist here is that they pretrained 11 output heads for different modalities, and all match or exceed the existing task SOTA. This means no training needed, if you want splice site or transcription factor binding prediction you can just feed in your sequence and read off the output.
03.02.2026 17:05 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Really nice bio model from DeepMind just got released! There have been quite a few DNA foundation models in the past, but labs usually had to gather their own data and fine-tune them for tasks of interest. This is something else!๐งต
03.02.2026 17:05 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Though I'd add one addendum to that thread: It seems like some EPYC CPUs don't get the full socket bandwidth (possibly based on CCD count?), so going with the absolute cheapest ones might not be the best idea. If anyone knows the true memory bandwidths for those chips, I really want to know!
07.11.2025 18:27 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0The hardware for R1 should work perfectly because K2 is actually slightly smaller despite the higher parameter count due to INT4 quantization. You should be able to fit it at full quality (Q8 attention, Q4 MoE) in 768GB!
07.11.2025 18:27 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Now seems like a good time to repeat this thread, since Kimi-K2-Thinking has just arrived and might actually be the strongest LLM in the world right now, open or closed huggingface.co/moonshotai/K...
07.11.2025 18:27 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0PRs and issues on @hf.co have gotten a lot sloppier and weirder since the advent of code agents, but the weirdest ones still have an inexplicable human touch
03.11.2025 13:40 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0In particular, this bit suggests that if you inject a concept too weakly the model doesn't notice, and too strongly it just talks about the concept rather than 'introspecting'. But maybe that just means a medium strength biases towards the concept without totally overriding the original question?
29.10.2025 19:32 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Extremely fascinated by the latest Anthropic post, but parts of the results feel like they might just be the result of "the right amount of steering" rather than genuine introspection. www.anthropic.com/research/int...
29.10.2025 19:32 โ ๐ 1 ๐ 0 ๐ฌ 2 ๐ 1An underappreciated thing about the Turing test is that every teacher, writer and artist on the planet is now intimately familiar with the markers of AI output.
The post-ChatGPT era is like a global training montage to ensure the bot's job in that test is as hard as possible
Yup, you can very clearly see a halving of stock value right after GPT-4 is released
15.06.2025 21:06 โ ๐ 18 ๐ 1 ๐ฌ 0 ๐ 0I think a lot of people are dismissing it by analogy to crypto, where usage took off but it was clearly useless for anything but speculative investing or laundering the proceeds of crime. It even ate up all the GPUs for years too!
I mean, they're incredibly wrong, but I can see how they got there
One clear giveaway is that modern German still has an informal second-person "du" which bears obvious signs of shared heritage with "thou". Their similarity in sound, of course, but also their "-st" verb endings. Shakespearean "thou sayst" is almost identical to modern German "du sagst"!
13.05.2025 15:21 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Underappreciated linguistic fact: "Thou" was originally an informal, friendly pronoun, but feels extremely archaic and formal to modern ears because of its association with Shakespeare and the KJV. You'd use it for speaking to family and friends (and to God).
13.05.2025 15:21 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0the betting markets are asking the real questions today
27.04.2025 16:09 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0The discussion pages for Open-R1 on @hf.co are such a goldmine for actual practical information on how to train a reasoning model.
Like look at this! If you're not reading those community tabs you're missing so much! huggingface.co/spaces/open-...
I call this The Paper. It gets written quite often in machine learning, and it's valuable every time!
The core of it is "Everyone had a complex setup to do X task. With enough scale, none of that complexity is necessary, and a simple model does it better."
huggingface.co/papers/2503....
Here's EsportsBench v5!
72k new matches added from 2025-01-01 through 2025-03-31 and some data quality improvements to past data as well.
Over 2.4 million rows of esports match data from 20 titles spanning over 25 years
huggingface.co/datasets/Esp...
I believe ArXiv and Archive Of Our Own should swap places for April 1st. I believe this more strongly than I believe anything else
29.03.2025 17:00 โ ๐ 46 ๐ 15 ๐ฌ 0 ๐ 0And when Leela Chess Zero did an open-source reproduction of it, they just distributed inference to volunteer computers around the globe. Of course, that probably won't work for a 700GB LLM as well as it did for a 100MB convnet, but in principle you could do the same
25.03.2025 16:49 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0The analogy here is to projects like AlphaGo/AlphaZero - far more compute was spent on calculating board positions to generate the training data than it was actually updating the model with that training data! Deepmind distributed that over tons of tiny TPUv1s iirc
25.03.2025 16:49 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0People are reading MSFT dropping power contracts as a sign that AI investment will fall off, but if reasoning is the new paradigm then most training compute will be inference and that doesn't have to be centralized
Massive monolithic datacentres are much less necessary now
Preliminary take is that V3-0324 is a major upgrade on the V3 base. Increasingly confident that it's the strongest open-source LLM, and likely competitive with the top tier of closed source too
24.03.2025 22:23 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0This might also herald a possible upgraded R1 reasoning model as well, using the new V3 as an improved base, but this is pure speculation on my part - I don't have any secret info!
24.03.2025 18:43 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Deepseek V3-0324 just landed, an upgraded version of the V3 model that was used as the base for Deepseek-R1. Weights on @hf.co , and it'll start appearing on inference providers soon. It seems very strong in early testing, likely the best non-reasoning OS model (!)
huggingface.co/deepseek-ai/...
Last week, we launched a waitlist to move builders on @hf.co from LFS to Xet. This was made possible through months of hard work and staged migrations to test our infrastructure in real-time.
This post provides an inside look into the day of our first migrations and the weeks after.
Anyone want to explain to me where Anthropic are getting "powerful AI will arrive somewhere from late 2026 to early 2027"?
I totally get being AGI-pilled and extrapolating scaling laws, but we've just moved to a new reasoning scaling regime! We don't even have enough points to extrapolate!
And secondarily inspired by the experience of training a pure RL convnet to play Sonic and having to give it a shaped reward for getting further to the right, which broke down the moment the level layout required "up", or god forbid, "left"
06.03.2025 14:24 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0This thought inspired by the experience of playing games where I intellectually know what to do but have to repeat the segment 10 times until my muscle memory fires accurately enough to get me through
06.03.2025 14:21 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0