in collaboration with the tremendous research team at FAIR: @karen-ullrich.bsky.social Jingtong Su, @arjunsubgraph.bsky.social , @claudiashi.bsky.social, Amir Bar, Ivan Evtimov, Nikolaos Tsilivis, Randall Balestriero, and @kempelab.bsky.social
10.12.2025 15:44 β π 1 π 0 π¬ 0 π 0
Start with OpenApps
Explore how the latest models, like GPT-5, can be used as digital agents to complete tasks on your behalf:
πDocs: facebookresearch.github.io/OpenApps/
πPaper: arxiv.org/abs/2511.20766
π¬Video Tutorial: www.youtube.com/watch?v=gzNW...
10.12.2025 15:44 β π 1 π 0 π¬ 1 π 0
β
Unlimited data (for evaluating and training agents): generate thousands of versions of each app
β
Lightweight: runs on a single CPU; no Docker or OS emulators needed
β
Ground truth rewards: task rewards are based on the underlying state and all app logic is transparent in Python.
10.12.2025 15:44 β π 1 π 0 π¬ 1 π 0
Want to teach AI agents to use apps like humans? Get started with digital agents research using OpenApps, our new Python-based environment.
10.12.2025 15:44 β π 4 π 3 π¬ 1 π 0
facebook/Common-O Β· Datasets at Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
β
22k multi-scene questions
β
New scenes not in existing web data
β
Runs in ~15 min on one GPU
Work led by Candace Ross in collaboration with @afeinstein20.bsky.social , Florian Bordes, and @polkirichenko.bsky.social
Check it out on HuggingFace, ArXiv & NeurIPS! huggingface.co/datasets/fac...
07.11.2025 20:52 β π 0 π 0 π¬ 0 π 0
Despite saturating single image perception, Common-O establishes a new challenging multimodal benchmark. The best performing model only achieves 35% on Common-O and on Common-O Complex, consisting of more complex scenes, the best model achieves only 1%.
π§΅2/3
07.11.2025 20:52 β π 0 π 0 π¬ 1 π 0
We introduce, Common-O, a new multimodal benchmark for hallucination when reasoning across scenes.
We find leading multimodal LLMs can reliably identify objects, yet hallucinate when reasoning across scenes.
π§΅1/3
07.11.2025 20:52 β π 0 π 0 π¬ 1 π 0
If youβre an NYU student, come learn about this wonderful opportunity to collaborate with us at FAIR events.atmeta.com/metanyuaimen... Panel is tomorrow 10am at NYU Center for Data Science.
16.10.2025 14:45 β π 0 π 0 π¬ 1 π 0
We explain how good delimiters steer attention heads to key input tokens and offer practical recommendations for prompts and delimiter choices to get the best performance from your LLMβtldr; use β!β or β\nβ.
09.10.2025 14:31 β π 2 π 0 π¬ 0 π 0
- MMLU performance can vary by +/- 23% depending on the choice of delimiter across leading open model families (Llama, Qwen, and Gemma).
- Closed models, GPT-4o, are also brittle to the choice of delimiter.
π§΅
09.10.2025 14:31 β π 2 π 0 π¬ 1 π 0
One can manipulate LLM rankings to put any model in the leadβonly by modifying the single character separating demonstration examples. Learn more in our new paper arxiv.org/abs/2510.05152
w/ Jingtong Su, Jianyu Zhang, @karen-ullrich.bsky.social , and LΓ©on Bottou.
π§΅
09.10.2025 14:31 β π 31 π 3 π¬ 1 π 2
Open-weights for our Llip multimodal vision-language model led by @lavoiems.bsky.social are public!
LLIP proposes new pre-training objective to capture the many ways to describe an image leading to strong performance across a suite of 22-zero shot benchmarks.
bsky.app/profile/lavo...
21.07.2025 18:57 β π 1 π 0 π¬ 0 π 0
We also find better models are not necessarily better at abstention, suggesting the skill of abstention is an open-research question.
w/ @polkirichenko.bsky.social Sam Bell Kamalika Chaudhuri
Paper: arxiv.org/abs/2506.09038
Code: github.com/facebookrese...
bsky.app/profile/polk...
π§΅2/2
17.06.2025 18:32 β π 1 π 0 π¬ 0 π 0
A good language model should say βI donβt knowβ by reasoning about the limits of its knowledge. Our new work AbstentionBench carefully measures this overlooked skill in an open-codebase others can build on!
We find frontier reasoning degrades modelsβ ability to know when NOT to answer.
π§΅1/2
17.06.2025 18:32 β π 1 π 0 π¬ 1 π 0
Join us as a PhD research intern at FAIR w/ @polkirichenko.bsky.social and Kamalika Chaudhuri
to start this summer or fall with a focus on open science into multimodal models, agents and beyond! Email polkirichenko@meta.com with the title [Prospective Intern 2025] and attach your CV if interested!
02.05.2025 19:29 β π 0 π 0 π¬ 0 π 0
MLM-U
We found MLM-U training can even outperform transformers trained with additional supervision from A* search traces, showing the promise of alternative learning objectives.
Learn more on our site and code at facebookresearch.github.io/maze_navigat...
11.12.2024 18:42 β π 1 π 0 π¬ 0 π 0
Recently, we also applied the same MLM-U objective to maze navigation. We find when training parameter-matched transformers on identical data, MLM-U without any tweaks outperforms standard next token training across all maze grid sizes (up to 30x30).
11.12.2024 18:42 β π 1 π 0 π¬ 1 π 0
We find MLM-U training improves knowledge retrieval on Wikipedia-based questions and even outperforms a pretrained 7B Mistral model with a much smaller 100M parameter transformer trained from scratch!
Come by our NeurIPS poster Exhibit Halls A-C #3204 11am PST Thursday to learn more.
11.12.2024 18:36 β π 1 π 0 π¬ 0 π 0
We show training with a factorization agnostic objective, MLM-U (a variable ratio BERT-style loss with links to discrete diffusion), that predicts multiple tokens ahead and back can significantly mitigate the reversal curse!
11.12.2024 18:36 β π 0 π 0 π¬ 1 π 0
Problem: Language models struggle with the βreversal curse:β an inability to answer reformulations of a question. We show this stems from the standard next token learning objective in what we call βthe factorization curse.β
11.12.2024 18:36 β π 0 π 0 π¬ 1 π 0
Can we boost transformersβ ability to retrieve knowledge and plan in maze navigation by only tweaking the learning objective?
We emphatically say YES in our #NeurIPS 2024 study! π§΅
w/ Ouail Kitouni, Niklas Nolte, Diane Bouchacourt, Adina Williams, and Mike Rabbat
Paper arxiv.org/abs/2406.05183
11.12.2024 18:32 β π 4 π 0 π¬ 2 π 0
π PhD student at the Max Planck Institute for Intelligent Systems
π¬ Safe and robust AI, algorithms and society
π https://andrefcruz.github.io
π researcher in π©πͺ, from π΅πΉ
Brown Computer Science / Brown University || BootstrapWorld || Pyret || Racket
I'm unreasonably fascinated by, delighted by, and excited about #compsci #education #cycling #cricket and the general human experience.
Student Researcher @ RAI Institute, MSc CS Student @ ETH Zurich
visual computing, 3D vision, spatial AI, machine learning, robot perception.
πZurich, Switzerland
PhD candidate @Mila_quebec, @UMontreal. Ex: FAIR @AIatMeta.
Learning representations, minimizing free energy, running.
Research scientist @nvidia | postdoc @caltech | PhD @univienna | former research intern @MetaAI and @nvidia | views are my own
Associate Professor in EECS at MIT. Neural nets, generative models, representation learning, computer vision, robotics, cog sci, AI.
https://web.mit.edu/phillipi/
Professor a NYU; Chief AI Scientist at Meta.
Researcher in AI, Machine Learning, Robotics, etc.
ACM Turing Award Laureate.
http://yann.lecun.com
Assistant Professor @tticconnect.bsky.social
Understanding intelligence, one pixel at a time.
shiry.ttic.edu
CS PhD Student at NYU, previously @MetaAI. Trying to make ML more reliable, predictable, and representative.
Machine Learning Professor
https://cims.nyu.edu/~andrewgw
Researching #ComputerVision at #GoogleDeepMind using JAX/Flax (http://github.com/google/flax). Views are my own.
Researcher at Google DeepMind. I make LLMs go fast. I also play piano and climb sometimes. Opinions my own
Biostatistician β’ Associate Prof @ Wake Forest University β’ former postdoc @ Hopkins Biostat β’ PhD @ Vandy Biostat β’ π Casual Inference β’ lucymcgowan.com
A LLN - large language Nathan - (RL, RLHF, society, robotics), athlete, yogi, chef
Writes http://interconnects.ai
At Ai2 via HuggingFace, Berkeley, and normal places
Principal AI research scientist @Vanguard_Group | Founder&Host @WiAIR_podcast | Research in NLP, multimodal AI, LLMs, evaluation | own opinions only π¨π¦πͺπΊπ³οΈβπ
Mathematician at UCLA. My primary social media account is https://mathstodon.xyz/@tao . I also have a blog at https://terrytao.wordpress.com/ and a home page at https://www.math.ucla.edu/~tao/
@Cohere.com's non-profit research lab and open science initiative that seeks to solve complex machine learning problems. Join us in exploring the unknown, together. https://cohere.com/research
Mathematician. John J. & Ann Curley Chair in Liberal Arts at Dickinson College. Author of Tales of Impossibility and Euler's Gem. Coffee drinker. [Everything in the timeline before October 2024 was imported from my Twitter/X feed 2008-24.]