Ksenia Se / Turing Post @turingpost

"Densing Law of LLMs" paper: arxiv.org/abs/2412.04315

11.12.2024 11:40 — 👍 1 🔁 1 💬 0 📌 0

• The amount of work an LLM can handle on the same hardware is growing even faster than the improvements in model density or chip power alone.

That's why researchers suggest focusing on improving "density" instead of just aiming for bigger and more powerful models.

11.12.2024 11:40 — 👍 0 🔁 0 💬 1 📌 0

Here are the key findings from the study:

• Costs to run models are dropping as they are becoming more efficient.
• The release of ChatGPT sped up the growth of efficiency of new models up to 50%!
• Techniques like pruning and distillation don’t necessarily make models more efficient.

11.12.2024 11:40 — 👍 0 🔁 0 💬 1 📌 0

Estimating of effective parameter size:

It combines a two-step process:

- Loss Estimation: Links a model's size and training data to its accuracy
- Performance Estimation: Uses a sigmoid function to predict how well a model performs based on its loss.

11.12.2024 11:40 — 👍 0 🔁 0 💬 1 📌 0

Scaling law:

The density of a model is the ratio of its effective parameter size to its actual parameter size.

If the effective size is close to or smaller than the actual size, the model is very efficient.

11.12.2024 11:40 — 👍 0 🔁 0 💬 1 📌 0

Why is density important?

A higher-density model can deliver better results without needing more resources, reducing computational costs, making models suitable for devices with limited resources, like smartphones and avoiding unnecessary energy use.

11.12.2024 11:40 — 👍 0 🔁 0 💬 1 📌 0

Interestingly, they found a trend, called Densing Law:

The capacity density of LLMs is doubling every 3 months, meaning that newer models are getting much better at balancing performance and size.

Let's look at this more precisely:

11.12.2024 11:40 — 👍 0 🔁 0 💬 1 📌 0

Reading about scaling laws recently I came by the interesting point:
Focus on a balance between models' size and performance is more important that aiming for larger models

Tsinghua University and ModelBest Inc propose the idea of “capacity density” to measure how efficiently a model uses its size

11.12.2024 11:40 — 👍 2 🔁 4 💬 1 📌 0

🌁#79: Sora and World Models – Bringing magic to muggles Spatial Intelligence just got a boost! Plus, a concise coverage of the remarkably rich week in ML research and innovations

Explore more interesting ML/AI news in our free weekly newsletter -> www.turingpost.com/p/fod79

10.12.2024 22:46 — 👍 0 🔁 0 💬 0 📌 0

YouTube video by Maginative World Labs Unveils AI System That Transforms Single Images into Interactive 3D Worlds

2. AI system from World Labs, co-founded by Fei-Fei Li:

Transforms a single image into interactive 3D scenes with varied art styles and realistic physics. You can explore, interact with elements and move within AI-generated environments directly in your web browser

www.youtube.com/watch?v=lPYJ...

10.12.2024 22:46 — 👍 1 🔁 0 💬 1 📌 0

1. GoogleDeepMind's Genie 2

Generates 3D environments with object interactions, animations, and physical effects from one image or text prompt. You can interact with them in real-time using a keyboard and mouse.

Paper: deepmind.google/discover/blo...

Our example: www.youtube.com/watch?v=YjO6...

10.12.2024 22:46 — 👍 0 🔁 0 💬 1 📌 0

An incredible shift is happening in spatial intelligence!

Here are 2 latest revolutional World Models, which create interactive 3D environments:

1. GoogleDeepMind's Genie 2

2. AI system from World Labs, co-founded by Fei-Fei Li

Explore more below 👇

10.12.2024 22:46 — 👍 2 🔁 0 💬 1 📌 0

Topic 20: What is Flow Matching? Explore the key concepts of Flow Matching, its relation to diffusion models, and how it can enhance the training of generative models

In our new AI 101 episode we discuss:

- FM concepts for optimizing the path from noise to realistic data
- Continuous Normalizing Flows (CNFs)
- Conditional Flow Matching
- Difference of FM and diffusion models

Find out more: turingpost.com/p/flowmatching

05.12.2024 01:05 — 👍 0 🔁 0 💬 0 📌 0

What is Flow Matching?

Flow Matching (FM) is used in top generative models, like Flux, F5-TTS, E2-TTS, and MovieGen with state-pf-the-art results. Some experts even say that FM might surpass diffusion models👇

05.12.2024 01:05 — 👍 0 🔁 0 💬 1 📌 0

Turing Post Saves you a lot of research time, plus gives a flashback to ML history and insights into the future. Stay ahead alongside over 73,000 professionals from top AI labs, ML startups, and enterprises

Also, elevate your AI game with our free newsletter ↓
www.turingpost.com/subscribe

05.12.2024 00:29 — 👍 0 🔁 0 💬 0 📌 0

🌁#78: Enabling the Future of AI (2025) join the prediction game plus our usual collection of interesting articles, relevant news, and research papers. Dive in!

See other important AI/ML news in our free weekly newsletter: www.turingpost.com/p/fod78

05.12.2024 00:29 — 👍 0 🔁 0 💬 1 📌 0

INTELLECT-1 by Prime Intellect

INTELLECT-1 is a 10B open-source LLM trained over 42 days on 1T tokens across 14 global nodes, leverages the PRIME framework for exceptional efficiency (400× bandwidth reduction).

github.com/PrimeIntelle...

05.12.2024 00:29 — 👍 0 🔁 0 💬 1 📌 0

MultiFoley by Adobe Research

MultiFoley is an AI model generating high-quality sound effects from text, audio, and video inputs. Cool demos highlight its creative potential.

arxiv.org/abs/2411.17698

05.12.2024 00:29 — 👍 0 🔁 0 💬 1 📌 0

ShowUI by Show Lab, NUS, Microsoft

ShowUI is a 2B vision-language-action model tailored for GUI tasks:

- features UI-guided token selection (33% fewer tokens)
- interleaved streaming for multi-turn tasks
- 256K dataset
- achieves 75.1% zero-shot grounding accuracy

arxiv.org/abs/2411.17465

05.12.2024 00:29 — 👍 0 🔁 0 💬 1 📌 0

OLMo 2 by Allen AI

OLMo 2, a family of fully open LMs with 7B and 13B parameter, is trained on 5 trillion tokens.

allenai.org/blog/olmo2

05.12.2024 00:29 — 👍 1 🔁 0 💬 1 📌 0

Qwen/QwQ-32B-Preview · Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Alibaba’s QwQ-32B

It excites with strong math, coding, and reasoning scores, ranking between Claude 3.5 Sonnet and OpenAI’s o1-mini.

- Optimized for consumer GPUs through quantization
- Open-sourced under Apache, revealing tokens and weights

huggingface.co/Qwen/QwQ-32B...

05.12.2024 00:29 — 👍 1 🔁 0 💬 1 📌 0

Amazing models of the week:

• Alibaba’s QwQ-32B
• OLMo 2 by Allen AI
• ShowUI by Show Lab, NUS, Microsoft
• Adobe's MultiFoley
• INTELLECT-1 by Prime Intellect

🧵

05.12.2024 00:29 — 👍 0 🔁 0 💬 1 📌 0

Turing Post Saves you a lot of research time, plus gives a flashback to ML history and insights into the future. Stay ahead alongside over 73,000 professionals from top AI labs, ML startups, and enterprises

Like/repost the 1st post to support our work 🤍

Also, elevate your AI game with our free newsletter ↓
www.turingpost.com/subscribe

02.12.2024 23:14 — 👍 0 🔁 0 💬 0 📌 0

🌁#78: Enabling the Future of AI (2025) join the prediction game plus our usual collection of interesting articles, relevant news, and research papers. Dive in!

Find a complete list of the latest research papers in our free weekly digest: www.turingpost.com/p/fod78

02.12.2024 23:14 — 👍 0 🔁 0 💬 1 📌 0

Boundless Socratic Learning with Language Games, Google DeepMind

This framework leverages recursive language-based "games" for self-improvement, focusing of feedback, coverage, and scalability. It suggests a roadmap for scalable AI via autonomous data gen and feedback loops
arxiv.org/abs/2411.16905

02.12.2024 23:14 — 👍 1 🔁 0 💬 1 📌 0

MH-MoE: Multi-Head Mixture-of-Experts

@msftresearch.bsky.social’s MH-MoE improves sparse MoE by adding multi-head attention, reducing perplexity without increasing FLOPs, and demonstrating robust performance under quantization.

arxiv.org/abs/2411.16205

02.12.2024 23:14 — 👍 0 🔁 0 💬 1 📌 0

LLM-as-a-Judge:

Presents a taxonomy of methodologies and applications of LLMs for judgment tasks, highlighting bias, vulnerabilities, and self-judgment, with future directions in human-LLM collaboration and bias mitigation

arxiv.org/abs/2411.16594

02.12.2024 23:14 — 👍 0 🔁 0 💬 1 📌 0

Star Attention:

NVIDIA introduced a block-sparse attention mechanism for Transformer-based LLMs. It uses local/global attention phases to achieve up to 11x inference speedup on sequences up to 1M tokens, retaining 95-100% accuracy.

arxiv.org/abs/2411.17116
Code: github.com/NVIDIA/Star-...

02.12.2024 23:14 — 👍 0 🔁 0 💬 1 📌 0

Natural Language Reinforcement Learning:

Redefines reinforcement learning components using natural language for interpretable and knowledge-rich decision-making.

arxiv.org/pdf/2411.14251 t.co/Kru1Hz1JcX

bsky.app/profile/turi...

02.12.2024 23:14 — 👍 0 🔁 0 💬 1 📌 0

Top 5 researches of the week:

• Natural Language Reinforcement Learning
• Star Attention, NVIDIA
• Opportunities and Challenges of LLM-as-a-judge
• MH-MoE: Multi-Head Mixture-of-Experts, @msftresearch.bsky.social
• Boundless Socratic Learning with Language Games, Google DeepMind

🧵

02.12.2024 23:14 — 👍 2 🔁 0 💬 1 📌 0

Ksenia Se / Turing Post

Latest posts by turingpost.bsky.social on Bluesky

@turingpost is following 3 prominent accounts