Sung Kim's Avatar

Sung Kim

@sungkim.bsky.social

A business analyst at heart who enjoys delving into AI, ML, data engineering, data science, data analytics, and modeling. My views are my own. You can also find me at threads: @sung.kim.mw

6,960 Followers  |  1,147 Following  |  5,054 Posts  |  Joined: 22.01.2024  |  1.8848

Latest posts by sungkim.bsky.social on Bluesky

ML Systems Textbook

Free eBook: Machine Learning Systems

Principles and Practices of Engineering Artificially Intelligent Systems

This textbook bridges the gap between theoretical foundations and practical engineering, emphasizing the systems perspective required to build effective AI solutions.

www.mlsysbook.ai

01.11.2025 02:04 β€” πŸ‘ 12    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Paper: arxiv.org/abs/2510.25781
Repo: github.com/AmirNoori68/...

31.10.2025 23:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

A Practitioner's Guide to Kolmogorov-Arnold Networks

A systematic and comprehensive overview of the rapidly expanding KAN landscape, moving beyond simple performance comparisons to offer a structured synthesis of theoretical foundations, architectural variants, and practical implementation strategi

31.10.2025 23:04 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Google: TPU + JAX and killing it.
Meta: ??? + PyTorch ...and you had a plan for years, maybe even a decade; where's the execution? ...perhaps partner with Microsoft?

31.10.2025 20:57 β€” πŸ‘ 11    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

On the left, the refurbished Lincoln bathroom. On the right, picture I took in Saddam Hussein's palace in Basra in 2005.

31.10.2025 18:00 β€” πŸ‘ 18227    πŸ” 5990    πŸ’¬ 1296    πŸ“Œ 621
Preview
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Large Language Models (LLMs) often struggle with problems that require multi-step reasoning. For small-scale open-source models, Reinforcement Learning with Verifiable Rewards (RLVR) fails when correc...

arxiv.org/abs/2510.25992

31.10.2025 19:47 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Google's Supervised Reinforcement Learning (SRL), a method designed to teach LLMs complex reasoning skills from expert demonstrations, for problems that are too difficult for conventional RL or SFT approaches.

"Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning"

31.10.2025 19:47 β€” πŸ‘ 22    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1
Preview
Making Decisions under Model Misspecification We use decision theory to confront uncertainty that is sufficiently broad to incorporate "models as approximations." We presume the existence of a featured collection of what we call "structured model...

To address this "misspecification fear", they propose a decision criterion that evaluates outcomes by minimizing across alternative "unstructured models" while imposing a penalty based on the Hausdorff statistical set distance (C(p,Q)) from the original set Q.

arxiv.org/abs/2008.01071

31.10.2025 19:39 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

In economics and science, "model misspecification" means that the specific models we use to predict the future or guide decisions are always simplifications and approximations, so they are inherently "wrong" or they don't perfectly represent the complex real world.

31.10.2025 19:39 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Project: emu.world
Model: huggingface.co/collections/...
Github: github.com/baaivision/E...
Paper: arxiv.org/abs/2510.26583

31.10.2025 19:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

BAAI's Emu3.5

A large-scale multimodal world model that natively predicts the next vision-language state. Emu3.5 outperforms Nano Banana across image generation, editing, interleaved tasks and more.

31.10.2025 19:16 β€” πŸ‘ 17    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1
Post image

Paper: arxiv.org/abs/2510.26788
Repo: github.com/sail-sg/Prec...

31.10.2025 19:06 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

I like their simple conclusion.

Problem: The BF16 precision causes a large training-inference mismatch, leading to unstable RL training.

Solution: Just switch to FP16.

"Defeating the Training-Inference Mismatch via FP16"

31.10.2025 19:06 β€” πŸ‘ 16    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0
Post image

I can’t help but wonder about this photo - I thought these two companies (Hyundai and Samsung) and families (Chung and Lee) HATE each other.

31.10.2025 04:09 β€” πŸ‘ 9    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Moreover, Intel’s track record with past AI processor acquisitions has been disastrous.

31.10.2025 04:04 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Intel may purchase SambaNova.

Good: It’s a solid ASIC company - the only provider capable of running inference on DeepSeek, unlike Cerebras or Groq.

Bad: Lip-Bu Tan serves as the company’s Chairperson, and SambaNova is facing funding difficulties.

31.10.2025 04:04 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
Rethinking Thinking Tokens: LLMs as Improvement Operators Reasoning training incentivizes LLMs to produce long chains of thought (long CoT), which among other things, allows them to explore solution strategies with self-checking. This results in higher accur...

"Rethinking Thinking Tokens: LLMs as Improvement Operators"

arxiv.org/abs/2510.01123

31.10.2025 02:08 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Reasoning training encourages LLMs to produce long chains of thought (CoT), improving accuracy via self-checking but increasing context length, compute cost, and latency. This work studies whether frontier models can achieve better trade-offs, higher accuracy with lower cost.

31.10.2025 02:08 β€” πŸ‘ 16    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

The Smol Training Playbook: The Secrets to Building World-Class LLMs

A practical journey through the challenges, decisions, and messy reality behind training state-of-the-art language models.

huggingface.co/spaces/Huggi...

31.10.2025 02:06 β€” πŸ‘ 33    πŸ” 3    πŸ’¬ 0    πŸ“Œ 1
Marin

Project: marin.community
Model: huggingface.co/marin-commun...
Repo: github.com/marin-commun...
Doc: marin.readthedocs.io/en/latest/
Discord: discord.com/invite/J9CTk...

30.10.2025 22:27 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Marin 32B Base - mantis (Open-source: Model, Code, and Data)

A key feature of Marin is reproducibility: every step, from raw data to the final model are recorded, not just the end result. This includes failed experiments, so the entire research process is transparent.

30.10.2025 22:27 β€” πŸ‘ 23    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Preview
moonshotai/Kimi-Linear-48B-A3B-Instruct Β· Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

usage and up to 6x decoding throughput at a 1M context length.

Model: huggingface.co/moonshotai/K...
Tech Report: github.com/MoonshotAI/K...

30.10.2025 22:20 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Moonshot AI's Kimi Linear (Open-weight)

A novel architecture that outperforms full attention with faster speeds and better performanceβ€”ready to serve as a drop-in replacement for full attention, featuring our open-sourced KDA kernels! Kimi Linear offers up to a 75% reduction in KV cache

30.10.2025 22:20 β€” πŸ‘ 16    πŸ” 1    πŸ’¬ 1    πŸ“Œ 1
Preview
LFM2-ColBERT-350M: One Model to Embed Them All | Liquid AI Today, we release LFM2-ColBERT-350M, a late interaction retriever with excellent multilingual performance. It allows you to store documents in one language (for example, a product description in Engli...

Full Blog: liquid.ai/blog/lfm2-co...
Model: huggingface.co/LiquidAI/LFM...
Demo: huggingface.co/spaces/Liqui...

29.10.2025 14:48 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

LiquidAI's (MIT CSAIL offshoot) LFM2-ColBERT-350M

350M parameters embedding model that allows you to store documents in one language and retrieve them in many languages with high accuracy and inference speeds of models a fraction of its size.

29.10.2025 14:48 β€” πŸ‘ 13    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Tongyi DeepResearch DeepResearch

πŸ”— Homepage: tongyi-agent.github.io
πŸ”— Technical Report: arxiv.org/pdf/2510.24701
πŸ”— Blog: tongyi-agent.github.io/blog/introdu...
πŸ”— Model HuggingFace: huggingface.co/Alibaba-NLP/...
πŸ”— Model ModelScope: modelscope.cn/models/iic/T...
πŸ”— GitHub Repo: github.com/Alibaba-NLP/...

29.10.2025 14:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Alibaba's Tongyi DeepResearch Technical Report

Dive deep into the technology and insights behind our 30B (A3B) open-source web agent that achieves SOTA performance: 32.9 on Humanity's Last Exam, 43.4 on BrowseComp, and 46.7 on BrowseComp-ZH.

29.10.2025 14:38 β€” πŸ‘ 12    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

A Geometric Analysis of PCA

What property of the data distribution determines the excess risk of principal component analysis? In this paper, they provide a precise answer to this question.

arxiv.org/abs/2510.20978

29.10.2025 12:29 β€” πŸ‘ 15    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

The Principles of Diffusion Models

It traces the core ideas that shaped diffusion modeling and explains how today’s models work, why they work, and where they’re heading.

www.arxiv.org/abs/2510.21890

29.10.2025 03:19 β€” πŸ‘ 33    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Everything About Transformers: A visual story of how transformers came to life

www.krupadave.com/articles/eve...

29.10.2025 02:58 β€” πŸ‘ 25    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

@sungkim is following 20 prominent accounts