Maha Elbayad's Avatar

Maha Elbayad

@elbayadm.bsky.social

Research Scientist at FAIR, Meta. πŸ’¬ My opinions are my own.

59 Followers  |  77 Following  |  13 Posts  |  Joined: 14.12.2024  |  1.6893

Latest posts by elbayadm.bsky.social on Bluesky

πŸ‘‹ Hello world! We’re thrilled to announce the v0.4 release of fairseq2 β€” an open-source library from FAIR powering many projects at Meta. pip install fairseq2 and explore our trainer API, instruction & preference finetuning (up to 70B), and native vLLM integration.

12.02.2025 12:31 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 1    πŸ“Œ 2
Post image

The LCM component here (green) is the only place where we have diffusion, i.e., denoising is only performed at the concept (sentence) level. The concept decoder is a regular subword-level decoder conditioning on a single vector (the sentence vector from the LCM).

16.12.2024 22:05 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

3/3 Figure 13 from the paper shows the flops under different settings of "context size in sentences" & "average length of a sentence". It would definitely be much costlier if we had 1 sentence = 1-5 subwords.

16.12.2024 20:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Large Concept Models: Language Modeling in a Sentence Representation Space LLMs have revolutionized the field of artificial intelligence and have emerged as the de-facto tool for many tasks. The current established technology of LLMs is to process input and generate output a...

2/3 There are no embedding or logits flops in the LCM & the context length is much shorter (a sentence is on average 30 subwords), so a context length of 3000 subwords is only 100 in the LCM. See section 2.5.1 of the paper arxiv.org/abs/2412.08821 for a comparison of inference flops.

16.12.2024 20:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

1/3 Yes. The LCM denoises one concept at a time. A concept once denoised is dispatched to a sentence decoder to generate the corresponding text. No, it does not take 40x the flops of a traditional subword-level decoder.

16.12.2024 20:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

8/ A massive shout-out to the amazing team who made this happen! Loic, Artyom, Paul-Ambroise, David, Tuan and many more awesome collaborators

14.12.2024 18:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - facebookresearch/large_concept_model: Large Concept Models: Language modeling in a sentence representation space Large Concept Models: Language modeling in a sentence representation space - facebookresearch/large_concept_model

7/
At FAIR (AI at Meta), we're committed to open research! The training code for our LCMs is freely available. I’m excited about the potential of concept-based language models and what new capabilities they can unlock. github.com/facebookrese...

14.12.2024 18:59 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

6/ We scale our two-tower diffusion LCM to 7B parameters, achieving competitive summarization performance with similarly sized LLMs. Most importantly, the LCM demonstrates remarkable zero-shot generalization capabilities, effectively handling unseen languages.

14.12.2024 18:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Post image

5/ One main challenge of the LCMs was coming up with search algorithms. We use an β€œend of document” concept and introduce a stopping criterion based on the distance to this special concept. Common inference parameters in diffusion models play a major role too (guidance scale, initial noise, ...)

14.12.2024 18:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

4/ Two diffusion architectures were proposed: β€œOne-Tower” with a single Transformer decoder encoding the context and denoising the next concept at once, and β€œTwo-tower” where we separate context encoding from denoising.

14.12.2024 18:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

3/ We explored different designs for the LCM, a model that can generate the next continuous SONAR embedding conditioned on a sequence of preceding embeddings (MSE regression, diffusion, quantized SONAR). Our study revealed diffusion models to be the most effective approach.

14.12.2024 18:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

2/ Within the SONAR space, the LCM is trained to predict the next concept in a sequence. The LCM architecture is hierarchical, incorporating SONAR encoders and decoders to seamlessly map into and from the internal space where the LCM performs its computations.

14.12.2024 18:59 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

1/ LCMs operate at the level of meaning or what we label β€œconcepts”. This corresponds to a sentence in text or an utterance in speech. These units are then embedded into SONAR, a language- and modality-agnostic representation space. github.com/facebookrese...

14.12.2024 18:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - facebookresearch/large_concept_model: Large Concept Models: Language modeling in a sentence representation space Large Concept Models: Language modeling in a sentence representation space - facebookresearch/large_concept_model

Proud to share our work on Large Concept Models (LCMs)! This is a new direction in language modeling that moves beyond traditional token-level LLMs.
Paper: ai.meta.com/research/pub...
Code: github.com/facebookrese...

14.12.2024 18:59 β€” πŸ‘ 21    πŸ” 6    πŸ’¬ 1    πŸ“Œ 1

@elbayadm is following 20 prominent accounts