Amadeus's Avatar

Amadeus

@amadeusz.bsky.social

hacker @hsp.sh | mlops | ai builder | socdem | ๐Ÿ‡ต๐Ÿ‡ฑ

191 Followers  |  424 Following  |  138 Posts  |  Joined: 13.11.2023  |  1.8553

Latest posts by amadeusz.bsky.social on Bluesky

Video thumbnail

โ€œContinuous Thought Machinesโ€

Blog โ†’ sakana.ai/ctm

Modern AI is powerful, but it's still distinct from human-like flexible intelligence. We believe neural timing is key. Our Continuous Thought Machine is built from the ground up to use neural dynamics as a powerful representation for intelligence.

12.05.2025 02:33 โ€” ๐Ÿ‘ 73    ๐Ÿ” 15    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 5
Preview
moonshotai/Kimi-Audio-7B-Instruct ยท Hugging Face Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

Kimi-Audio ๐Ÿš€๐ŸŽง an OPEN audio foundation model released by Moonshot AI

huggingface.co/moonshotai/K...

โœจ 7B
โœจ 13M+ hours of pretraining data
โœจ Novel hybrid input architecture
โœจ Universal audio capabilities (ASR, AQA, AAC, SER, SEC/ASC, end-to-end conversation)

28.04.2025 07:34 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
A line chart titled โ€œFigure 1: Inference-time scaling performance with different RMs on all tested RM benchmarksโ€ shows performance on the y-axis (ranging from 66.5 to 72.5) and k: #sampled rewards (logscale) on the x-axis, with values from 1 to 32.

Key observations:
	โ€ข	DeepSeek-GRM-27B (MetaRM@k) (Ours) is the top performer, shown with a red line and star markers, rising steeply and leveling near 72.5.
	โ€ข	DeepSeek-GRM-27B (Voting@k) (Ours) follows, in blue with stars, peaking slightly above 70.5.
	โ€ข	GPT-4o (Greedy) is shown as a gray dashed line, sitting just under 71.
	โ€ข	Other models, shown in orange, green, brown, and gray lines (scalar or voting methods), plateau between ~66.5 and ~68.5.
	โ€ข	LLM-as-a-Judge w/ TokenProb, Skywork-Reward-Gemma-2-27B, and DeepSeek-BTRM-27B are among these lower-performing models.

Caption summary: The plot shows how performance scales with the number of reward samples at inference time. Results are up to 8 samples, with some (DeepSeek models) extrapolated to 32. Models in non-italic font use Gemma-2-27B as their base.

A line chart titled โ€œFigure 1: Inference-time scaling performance with different RMs on all tested RM benchmarksโ€ shows performance on the y-axis (ranging from 66.5 to 72.5) and k: #sampled rewards (logscale) on the x-axis, with values from 1 to 32. Key observations: โ€ข DeepSeek-GRM-27B (MetaRM@k) (Ours) is the top performer, shown with a red line and star markers, rising steeply and leveling near 72.5. โ€ข DeepSeek-GRM-27B (Voting@k) (Ours) follows, in blue with stars, peaking slightly above 70.5. โ€ข GPT-4o (Greedy) is shown as a gray dashed line, sitting just under 71. โ€ข Other models, shown in orange, green, brown, and gray lines (scalar or voting methods), plateau between ~66.5 and ~68.5. โ€ข LLM-as-a-Judge w/ TokenProb, Skywork-Reward-Gemma-2-27B, and DeepSeek-BTRM-27B are among these lower-performing models. Caption summary: The plot shows how performance scales with the number of reward samples at inference time. Results are up to 8 samples, with some (DeepSeek models) extrapolated to 32. Models in non-italic font use Gemma-2-27B as their base.

๐ŸšจNew DeepSeek Model Incoming๐Ÿšจ

but first they release the paper describing generative reward modeling (GRM) via Self-Principled Critique Tuning (SPCT)

looking forward to DeepSeek-GRM!

arxiv.org/abs/2504.02495

04.04.2025 10:45 โ€” ๐Ÿ‘ 30    ๐Ÿ” 6    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 5
Post image

Youโ€™ve probably heard about how AI/LLMs can solve Math Olympiad problems ( deepmind.google/discover/blo... ).

So naturally, some people put it to the test โ€” hours after the 2025 US Math Olympiad problems were released.

The result: They all sucked!

31.03.2025 20:33 โ€” ๐Ÿ‘ 174    ๐Ÿ” 50    ๐Ÿ’ฌ 9    ๐Ÿ“Œ 12
Post image

Kimi 1.6 is coming ...

Source: livecodebench.github.io/leaderboard....
Chat: kimi.ai

27.02.2025 22:53 โ€” ๐Ÿ‘ 17    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Post image

This is new - Moonshot AI (i.e., kimi.ai) released the two open-weigh models.

Moonlight: 3B/16B MoE model trained with Muon on 5.7T tokens, advancing the Pareto frontier with better performance at fewer FLOPs.

huggingface.co/moonshotai

22.02.2025 20:20 โ€” ๐Ÿ‘ 15    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Post image

The paper below describes Huawei's cloud AI platform designed to efficiently serve LLMs.

It uses four major design components: serverless abstraction and infrastructure, serving engine, scheduling algorithms, and scaling optimizations.

18.02.2025 13:43 โ€” ๐Ÿ‘ 13    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

RAG is dead. Long live RAG.

LLMs suck at long context.

This paper shows what I have seen in most deployments.

With longer contexts, performance degrades.

13.02.2025 06:11 โ€” ๐Ÿ‘ 6    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿ”ฅ allenai/Llama-3.1-Tulu-3-8B (trained with PPO) -> allenai/Llama-3.1-Tulu-3.1-8B (trained with GRPO)

We are happy to "quietly" release our latest GRPO-trained Tulu 3.1 model, which is considerably better in MATH and GSM8K!

12.02.2025 17:33 โ€” ๐Ÿ‘ 22    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2
Probabilistic Inference Scaling Probabilistic Inference Scaling

can we scale small, open LMs to o1 level? Using classical probabilistic inference methods, YES!

Particle filtering approach to Improved inference w/o any training!
Check out probabilistic-inference-scaling.github.io

By Aisha Puri et al๐Ÿ“ˆ๐Ÿค–
Joint MIT-CSAIL & RedHat

07.02.2025 20:05 โ€” ๐Ÿ‘ 48    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Post training an LLM for reasoning with GRPO in TRL by @sergiopaniego.bsky.social

A guide to post-training a LLM using GRPO. It's particularly effective for scaling test-time compute for extended reasoning, making it an ideal approach for solving complex tasks, such as mathematical problem-solving

07.02.2025 05:15 โ€” ๐Ÿ‘ 11    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image

Dzisiaj spotkanie z zaล‚ogฤ… misji Ax-4 w Centrum Nauki Kopernik. Trzymamy kciuki za lot Sล‚awosza Uznaล„skiego!

z @polsa.studenci @astro_peggy @astro_slawosz @tibor_to_orbit

05.02.2025 21:57 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Running pytest against a specific Python version with uv run While working on this issue I figured out a neat pattern for running the tests for my project locally against a specific Python version using uv run :

Tiny TIL: I just figured out how to run pytest with a different Python version against my pyproject.toml/setup.py projects using uv run

uv run --python 3.12 --with '.[test]' pytest

https://til.simonwillison.net/pytest/pytest-uv

04.02.2025 22:59 โ€” ๐Ÿ‘ 12    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

A ateizm to jakiล› lewicowy postulat?

04.02.2025 19:23 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
GitHub - Deep-Agent/R1-V: Witness the aha moment of VLM with less than $3. Witness the aha moment of VLM with less than $3. Contribute to Deep-Agent/R1-V development by creating an account on GitHub.

R1-V: teaching a VLM how to count with RL with verifiable rewards

Starting with Qwen2VL-Instruct-2B, they spent $3 on compute and got it to outperform the 72B

github.com/Deep-Agent/R...

03.02.2025 12:27 โ€” ๐Ÿ‘ 9    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Alibaba Qwen2.5-1M ! ๐Ÿ’ฅ Now supporting a 1 MILLION TOKEN CONTEXT LENGTH ๐Ÿ”ฅ

๐Ÿ“„ Blog: qwenlm.github.io/blog/qwen2.5...

26.01.2025 17:56 โ€” ๐Ÿ‘ 39    ๐Ÿ” 5    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 0
The image depicts a monumental statue of Buddha, emphasizing serenity and grandeur. The statue's intricate design captures traditional Buddhist features, including a meditative posture with hands placed in a symbolic gesture, flowing robes, and a calm facial expression exuding peace. The perspective highlights the statue's immense size against a minimalistic white sky background, underscoring its significance as a spiritual and cultural landmark.

The image depicts a monumental statue of Buddha, emphasizing serenity and grandeur. The statue's intricate design captures traditional Buddhist features, including a meditative posture with hands placed in a symbolic gesture, flowing robes, and a calm facial expression exuding peace. The perspective highlights the statue's immense size against a minimalistic white sky background, underscoring its significance as a spiritual and cultural landmark.

Explainer: What's R1 and Everything Else

This is an attempt to consolidate the dizzying rate of AI developments since Christmas. If you're into AI but not deep enough, this should get you oriented again.

timkellogg.me/blog/2025/01...

26.01.2025 03:17 โ€” ๐Ÿ‘ 116    ๐Ÿ” 27    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 13

The โ€œActive Enumโ€ Pattern

Enums are objects, why not give them attributes?

https://blog.glyph.im/2025/01/active-enum.html

26.01.2025 05:15 โ€” ๐Ÿ‘ 10    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Preview
R1+Sonnet set SOTA on aiderโ€™s polyglot benchmark R1+Sonnet has set a new SOTA on the aider polyglot benchmark. At 14X less cost compared to o1.

Aider reports that R1+Sonnet (R1 Thinking + Sonnet) set a new SOTA on the aider polyglot benchmark at 14X less cost compared to o1.

64% R1+Sonnet
62% o1
57% R1
52% Sonnet
48% DeepSeek V3

aider.chat/2025/01/24/r...

24.01.2025 17:46 โ€” ๐Ÿ‘ 41    ๐Ÿ” 9    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2
Preview
MiniMax - Intelligence with everyone MiniMax is a leading global technology company and one of the pioneers of large language models (LLMs) in Asia. Our mission is to build a world where intelligence thrives with everyone.

MiniMax-01 supports a 4M (!!!) context width

Thatโ€™s 100% on needle in the haystack all the way through 4M, as i understand (seems like a benchmark mistake tbqh, itโ€™s too good)

www.minimaxi.com/en/news/mini...

15.01.2025 03:10 โ€” ๐Ÿ‘ 29    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Post image

InternLM v3

- Performance surpasses models like Llama3.1-8B and Qwen2.5-7B
- Capable of deep reasoning with system prompts
- Trained only on 4T high-quality tokens

huggingface.co/collections/...

15.01.2025 08:24 โ€” ๐Ÿ‘ 18    ๐Ÿ” 7    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

The newest extremely strong embedding model based on ModernBERT-base is out: `cde-small-v2`. Both faster and stronger than its predecessor, this one tops the MTEB leaderboard for its tiny size!

Details in ๐Ÿงต

14.01.2025 13:21 โ€” ๐Ÿ‘ 31    ๐Ÿ” 7    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Post image

"Sky-T1-32B-Preview, our reasoning model that performs on par with o1-preview on popular reasoning and coding benchmarks."
That was quick! Is this already the Alpaca moment for reasoning models?
Source: novasky-ai.github.io/posts/sky-t1/

14.01.2025 00:34 โ€” ๐Ÿ‘ 39    ๐Ÿ” 8    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0
Post image

Google's Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time as presented by one of the author - @alibehrouz.bsky.social

13.01.2025 19:53 โ€” ๐Ÿ‘ 70    ๐Ÿ” 18    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 5
Post image

What a week to open the year in open ML, all the things released at @hf.co ๐Ÿค 

Here's everything released, find text-readable version here huggingface.co/posts/merve/...

All models are here huggingface.co/collections/...

10.01.2025 14:51 โ€” ๐Ÿ‘ 21    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Contemplative LLMs: Anxiety is all you need? Let the LLM 'contemplate' before answering

Contemplative LLMs: Anxiety is all you need? by Maharshi

Let the LLM 'contemplate' for a bit before answering using this simple system prompt, which might (in most cases) lead to the correct final answer!

maharshi.bearblog.dev/contemplativ...

10.01.2025 03:44 โ€” ๐Ÿ‘ 25    ๐Ÿ” 8    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Release v1.2.0 ยท modelcontextprotocol/python-sdk A big thank you to @jlowin for the creation of the fantastic FastMCP, which is now included into the MCP SDK. Backwards Compatibility This release is semver compatible. Existing code will continue ...

This is a big one: Python MCP SDK 1.2.0 is out!

The best part? It contains the widely used FastMCP library as part of the SDK. Don't worry, the old code still works and there is no need to update, but writing new servers is much easier.

Check out the release notes: github.com/modelcontext...

03.01.2025 22:30 โ€” ๐Ÿ‘ 18    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

That didn't take long! Nomic AI has finetuned the new ModernBERT-base encoder model into a strong embedding model for search, classification, clustering and more!

Details in ๐Ÿงต

31.12.2024 15:43 โ€” ๐Ÿ‘ 37    ๐Ÿ” 10    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
A screenshot of my Atuin wrapped, showing some fun stats about my command history in the style of "spotify wrapped"

A screenshot of my Atuin wrapped, showing some fun stats about my command history in the style of "spotify wrapped"

Atuin v18.4 is out now!

Including `atuin wrapped`, your year in shell history ๐Ÿข

thanks @daveeddy.com for the suggestion!

27.12.2024 17:26 โ€” ๐Ÿ‘ 124    ๐Ÿ” 21    ๐Ÿ’ฌ 9    ๐Ÿ“Œ 11
Code written with box characters used on old old software to make fake UIs

Code written with box characters used on old old software to make fake UIs

Youโ€™re still arguing about tabs vs. spaces? May I presentโ€ฆ

25.12.2024 18:37 โ€” ๐Ÿ‘ 5327    ๐Ÿ” 1293    ๐Ÿ’ฌ 157    ๐Ÿ“Œ 149

@amadeusz is following 19 prominent accounts