Matthew Carrigan's Avatar

Matthew Carrigan

@carrigmat.bsky.social

Engineer @huggingface. I'm the reason your LLM frontend has a jinja2cpp dependency. Sometimes yells about housing and trans rights instead of working He/him

237 Followers  |  152 Following  |  73 Posts  |  Joined: 21.08.2023  |  2.0324

Latest posts by carrigmat.bsky.social on Bluesky

I suspect the base probably could be finetuned for other tasks too! With a bit of hackery this is probably also a very strong DNA foundation model.

03.02.2026 17:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

The key twist here is that they pretrained 11 output heads for different modalities, and all match or exceed the existing task SOTA. This means no training needed, if you want splice site or transcription factor binding prediction you can just feed in your sequence and read off the output.

03.02.2026 17:05 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
google/alphagenome-all-folds ยท Hugging Face Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

Really nice bio model from DeepMind just got released! There have been quite a few DNA foundation models in the past, but labs usually had to gather their own data and fine-tune them for tasks of interest. This is something else!๐Ÿงต

03.02.2026 17:05 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Though I'd add one addendum to that thread: It seems like some EPYC CPUs don't get the full socket bandwidth (possibly based on CCD count?), so going with the absolute cheapest ones might not be the best idea. If anyone knows the true memory bandwidths for those chips, I really want to know!

07.11.2025 18:27 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

The hardware for R1 should work perfectly because K2 is actually slightly smaller despite the higher parameter count due to INT4 quantization. You should be able to fit it at full quality (Q8 attention, Q4 MoE) in 768GB!

07.11.2025 18:27 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Now seems like a good time to repeat this thread, since Kimi-K2-Thinking has just arrived and might actually be the strongest LLM in the world right now, open or closed huggingface.co/moonshotai/K...

07.11.2025 18:27 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image

PRs and issues on @hf.co have gotten a lot sloppier and weirder since the advent of code agents, but the weirdest ones still have an inexplicable human touch

03.11.2025 13:40 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

In particular, this bit suggests that if you inject a concept too weakly the model doesn't notice, and too strongly it just talks about the concept rather than 'introspecting'. But maybe that just means a medium strength biases towards the concept without totally overriding the original question?

29.10.2025 19:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Emergent introspective awareness in large language models Research from Anthropic on the ability of large language models to introspect

Extremely fascinated by the latest Anthropic post, but parts of the results feel like they might just be the result of "the right amount of steering" rather than genuine introspection. www.anthropic.com/research/int...

29.10.2025 19:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1

An underappreciated thing about the Turing test is that every teacher, writer and artist on the planet is now intimately familiar with the markers of AI output.

The post-ChatGPT era is like a global training montage to ensure the bot's job in that test is as hard as possible

10.10.2025 19:42 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Yup, you can very clearly see a halving of stock value right after GPT-4 is released

15.06.2025 21:06 โ€” ๐Ÿ‘ 18    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I think a lot of people are dismissing it by analogy to crypto, where usage took off but it was clearly useless for anything but speculative investing or laundering the proceeds of crime. It even ate up all the GPUs for years too!

I mean, they're incredibly wrong, but I can see how they got there

26.05.2025 17:55 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

One clear giveaway is that modern German still has an informal second-person "du" which bears obvious signs of shared heritage with "thou". Their similarity in sound, of course, but also their "-st" verb endings. Shakespearean "thou sayst" is almost identical to modern German "du sagst"!

13.05.2025 15:21 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Underappreciated linguistic fact: "Thou" was originally an informal, friendly pronoun, but feels extremely archaic and formal to modern ears because of its association with Shakespeare and the KJV. You'd use it for speaking to family and friends (and to God).

13.05.2025 15:21 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

the betting markets are asking the real questions today

27.04.2025 16:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
open-r1/README ยท [Experiment] Training R1-Zero-like models with Open R1 There are several recent research papers which explore various aspects of R1-Zero-like training on open base models like Qwen2.5-7B and Llama-3.1-8B:

The discussion pages for Open-R1 on @hf.co are such a goldmine for actual practical information on how to train a reasoning model.

Like look at this! If you're not reading those community tabs you're missing so much! huggingface.co/spaces/open-...

25.04.2025 14:56 โ€” ๐Ÿ‘ 11    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Paper page - Your ViT is Secretly an Image Segmentation Model Join the discussion on this paper page

I call this The Paper. It gets written quite often in machine learning, and it's valuable every time!

The core of it is "Everyone had a complex setup to do X task. With enough scale, none of that complexity is necessary, and a simple model does it better."

huggingface.co/papers/2503....

17.04.2025 14:28 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
EsportsBench/EsportsBench ยท Datasets at Hugging Face Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

Here's EsportsBench v5!

72k new matches added from 2025-01-01 through 2025-03-31 and some data quality improvements to past data as well.

Over 2.4 million rows of esports match data from 20 titles spanning over 25 years

huggingface.co/datasets/Esp...

16.04.2025 03:50 โ€” ๐Ÿ‘ 4    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I believe ArXiv and Archive Of Our Own should swap places for April 1st. I believe this more strongly than I believe anything else

29.03.2025 17:00 โ€” ๐Ÿ‘ 46    ๐Ÿ” 15    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

And when Leela Chess Zero did an open-source reproduction of it, they just distributed inference to volunteer computers around the globe. Of course, that probably won't work for a 700GB LLM as well as it did for a 100MB convnet, but in principle you could do the same

25.03.2025 16:49 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

The analogy here is to projects like AlphaGo/AlphaZero - far more compute was spent on calculating board positions to generate the training data than it was actually updating the model with that training data! Deepmind distributed that over tons of tiny TPUv1s iirc

25.03.2025 16:49 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

People are reading MSFT dropping power contracts as a sign that AI investment will fall off, but if reasoning is the new paradigm then most training compute will be inference and that doesn't have to be centralized

Massive monolithic datacentres are much less necessary now

25.03.2025 16:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Preliminary take is that V3-0324 is a major upgrade on the V3 base. Increasingly confident that it's the strongest open-source LLM, and likely competitive with the top tier of closed source too

24.03.2025 22:23 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

This might also herald a possible upgraded R1 reasoning model as well, using the new V3 as an improved base, but this is pure speculation on my part - I don't have any secret info!

24.03.2025 18:43 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
deepseek-ai/DeepSeek-V3-0324 ยท Hugging Face Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

Deepseek V3-0324 just landed, an upgraded version of the V3 model that was used as the base for Deepseek-R1. Weights on @hf.co , and it'll start appearing on inference providers soon. It seems very strong in early testing, likely the best non-reasoning OS model (!)

huggingface.co/deepseek-ai/...

24.03.2025 18:43 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Xet is on the Hub Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

Last week, we launched a waitlist to move builders on @hf.co from LFS to Xet. This was made possible through months of hard work and staged migrations to test our infrastructure in real-time.

This post provides an inside look into the day of our first migrations and the weeks after.

18.03.2025 14:18 โ€” ๐Ÿ‘ 11    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

Anyone want to explain to me where Anthropic are getting "powerful AI will arrive somewhere from late 2026 to early 2027"?

I totally get being AGI-pilled and extrapolating scaling laws, but we've just moved to a new reasoning scaling regime! We don't even have enough points to extrapolate!

17.03.2025 20:46 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

And secondarily inspired by the experience of training a pure RL convnet to play Sonic and having to give it a shaped reward for getting further to the right, which broke down the moment the level layout required "up", or god forbid, "left"

06.03.2025 14:24 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

This thought inspired by the experience of playing games where I intellectually know what to do but have to repeat the segment 10 times until my muscle memory fires accurately enough to get me through

06.03.2025 14:21 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@carrigmat is following 18 prominent accounts