Maha Elbayad @elbayadm - Bluesky Profile

👋 Hello world! We’re thrilled to announce the v0.4 release of fairseq2 — an open-source library from FAIR powering many projects at Meta. pip install fairseq2 and explore our trainer API, instruction & preference finetuning (up to 70B), and native vLLM integration.

12.02.2025 12:31 — 👍 4 🔁 2 💬 1 📌 2

The LCM component here (green) is the only place where we have diffusion, i.e., denoising is only performed at the concept (sentence) level. The concept decoder is a regular subword-level decoder conditioning on a single vector (the sentence vector from the LCM).

16.12.2024 22:05 — 👍 1 🔁 0 💬 0 📌 0

3/3 Figure 13 from the paper shows the flops under different settings of "context size in sentences" & "average length of a sentence". It would definitely be much costlier if we had 1 sentence = 1-5 subwords.

16.12.2024 20:42 — 👍 1 🔁 0 💬 1 📌 0

Large Concept Models: Language Modeling in a Sentence Representation Space LLMs have revolutionized the field of artificial intelligence and have emerged as the de-facto tool for many tasks. The current established technology of LLMs is to process input and generate output a...

2/3 There are no embedding or logits flops in the LCM & the context length is much shorter (a sentence is on average 30 subwords), so a context length of 3000 subwords is only 100 in the LCM. See section 2.5.1 of the paper arxiv.org/abs/2412.08821 for a comparison of inference flops.

16.12.2024 20:42 — 👍 1 🔁 0 💬 1 📌 0

1/3 Yes. The LCM denoises one concept at a time. A concept once denoised is dispatched to a sentence decoder to generate the corresponding text. No, it does not take 40x the flops of a traditional subword-level decoder.

16.12.2024 20:42 — 👍 1 🔁 0 💬 1 📌 0

8/ A massive shout-out to the amazing team who made this happen! Loic, Artyom, Paul-Ambroise, David, Tuan and many more awesome collaborators

14.12.2024 18:59 — 👍 0 🔁 0 💬 0 📌 0

GitHub - facebookresearch/large_concept_model: Large Concept Models: Language modeling in a sentence representation space Large Concept Models: Language modeling in a sentence representation space - facebookresearch/large_concept_model

7/
At FAIR (AI at Meta), we're committed to open research! The training code for our LCMs is freely available. I’m excited about the potential of concept-based language models and what new capabilities they can unlock. github.com/facebookrese...

14.12.2024 18:59 — 👍 2 🔁 0 💬 1 📌 0

6/ We scale our two-tower diffusion LCM to 7B parameters, achieving competitive summarization performance with similarly sized LLMs. Most importantly, the LCM demonstrates remarkable zero-shot generalization capabilities, effectively handling unseen languages.

14.12.2024 18:59 — 👍 0 🔁 0 💬 2 📌 0

5/ One main challenge of the LCMs was coming up with search algorithms. We use an “end of document” concept and introduce a stopping criterion based on the distance to this special concept. Common inference parameters in diffusion models play a major role too (guidance scale, initial noise, ...)

14.12.2024 18:59 — 👍 0 🔁 0 💬 1 📌 0

4/ Two diffusion architectures were proposed: “One-Tower” with a single Transformer decoder encoding the context and denoising the next concept at once, and “Two-tower” where we separate context encoding from denoising.

14.12.2024 18:59 — 👍 0 🔁 0 💬 1 📌 0

3/ We explored different designs for the LCM, a model that can generate the next continuous SONAR embedding conditioned on a sequence of preceding embeddings (MSE regression, diffusion, quantized SONAR). Our study revealed diffusion models to be the most effective approach.

14.12.2024 18:59 — 👍 0 🔁 0 💬 1 📌 0

2/ Within the SONAR space, the LCM is trained to predict the next concept in a sequence. The LCM architecture is hierarchical, incorporating SONAR encoders and decoders to seamlessly map into and from the internal space where the LCM performs its computations.

14.12.2024 18:59 — 👍 1 🔁 1 💬 1 📌 0

1/ LCMs operate at the level of meaning or what we label “concepts”. This corresponds to a sentence in text or an utterance in speech. These units are then embedded into SONAR, a language- and modality-agnostic representation space. github.com/facebookrese...

14.12.2024 18:59 — 👍 1 🔁 0 💬 1 📌 0

Proud to share our work on Large Concept Models (LCMs)! This is a new direction in language modeling that moves beyond traditional token-level LLMs.
Paper: ai.meta.com/research/pub...
Code: github.com/facebookrese...

14.12.2024 18:59 — 👍 21 🔁 6 💬 1 📌 1

Maha Elbayad

Latest posts by elbayadm.bsky.social on Bluesky

@elbayadm is following 20 prominent accounts