Mark Ibrahim's Avatar

Mark Ibrahim

@markibrahim.bsky.social

Researching the dark arts of deep learning at Meta's FAIR (Fundamental AI Research) Lab https://markibrahim.me/

62 Followers  |  120 Following  |  21 Posts  |  Joined: 17.11.2024  |  1.9057

Latest posts by markibrahim.bsky.social on Bluesky


in collaboration with the tremendous research team at FAIR: @karen-ullrich.bsky.social Jingtong Su, @arjunsubgraph.bsky.social , @claudiashi.bsky.social, Amir Bar, Ivan Evtimov, Nikolaos Tsilivis, Randall Balestriero, and @kempelab.bsky.social

10.12.2025 15:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Start with OpenApps

Explore how the latest models, like GPT-5, can be used as digital agents to complete tasks on your behalf:

πŸ“’Docs: facebookresearch.github.io/OpenApps/
πŸ“ƒPaper: arxiv.org/abs/2511.20766
🎬Video Tutorial: www.youtube.com/watch?v=gzNW...

10.12.2025 15:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

βœ… Unlimited data (for evaluating and training agents): generate thousands of versions of each app
βœ… Lightweight: runs on a single CPU; no Docker or OS emulators needed
βœ… Ground truth rewards: task rewards are based on the underlying state and all app logic is transparent in Python.

10.12.2025 15:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Want to teach AI agents to use apps like humans? Get started with digital agents research using OpenApps, our new Python-based environment.

10.12.2025 15:44 β€” πŸ‘ 4    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Preview
facebook/Common-O Β· Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

βœ… 22k multi-scene questions
βœ… New scenes not in existing web data
βœ… Runs in ~15 min on one GPU

Work led by Candace Ross in collaboration with @afeinstein20.bsky.social , Florian Bordes, and @polkirichenko.bsky.social

Check it out on HuggingFace, ArXiv & NeurIPS! huggingface.co/datasets/fac...

07.11.2025 20:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Despite saturating single image perception, Common-O establishes a new challenging multimodal benchmark. The best performing model only achieves 35% on Common-O and on Common-O Complex, consisting of more complex scenes, the best model achieves only 1%.

🧡2/3

07.11.2025 20:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We introduce, Common-O, a new multimodal benchmark for hallucination when reasoning across scenes.

We find leading multimodal LLMs can reliably identify objects, yet hallucinate when reasoning across scenes.

🧡1/3

07.11.2025 20:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

If you’re an NYU student, come learn about this wonderful opportunity to collaborate with us at FAIR events.atmeta.com/metanyuaimen... Panel is tomorrow 10am at NYU Center for Data Science.

16.10.2025 14:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We explain how good delimiters steer attention heads to key input tokens and offer practical recommendations for prompts and delimiter choices to get the best performance from your LLMβ€”tldr; use β€œ!” or β€œ\n”.

09.10.2025 14:31 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

- MMLU performance can vary by +/- 23% depending on the choice of delimiter across leading open model families (Llama, Qwen, and Gemma).
- Closed models, GPT-4o, are also brittle to the choice of delimiter.

🧡

09.10.2025 14:31 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

One can manipulate LLM rankings to put any model in the leadβ€”only by modifying the single character separating demonstration examples. Learn more in our new paper arxiv.org/abs/2510.05152
w/ Jingtong Su, Jianyu Zhang, @karen-ullrich.bsky.social , and LΓ©on Bottou.
🧡

09.10.2025 14:31 β€” πŸ‘ 31    πŸ” 3    πŸ’¬ 1    πŸ“Œ 2

Open-weights for our Llip multimodal vision-language model led by @lavoiems.bsky.social are public!

LLIP proposes new pre-training objective to capture the many ways to describe an image leading to strong performance across a suite of 22-zero shot benchmarks.

bsky.app/profile/lavo...

21.07.2025 18:57 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We also find better models are not necessarily better at abstention, suggesting the skill of abstention is an open-research question.

w/ @polkirichenko.bsky.social Sam Bell Kamalika Chaudhuri

Paper: arxiv.org/abs/2506.09038
Code: github.com/facebookrese...

bsky.app/profile/polk...

🧡2/2

17.06.2025 18:32 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

A good language model should say β€œI don’t know” by reasoning about the limits of its knowledge. Our new work AbstentionBench carefully measures this overlooked skill in an open-codebase others can build on!

We find frontier reasoning degrades models’ ability to know when NOT to answer.

🧡1/2

17.06.2025 18:32 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Join us as a PhD research intern at FAIR w/ @polkirichenko.bsky.social and Kamalika Chaudhuri

to start this summer or fall with a focus on open science into multimodal models, agents and beyond! Email polkirichenko@meta.com with the title [Prospective Intern 2025] and attach your CV if interested!

02.05.2025 19:29 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
MLM-U

We found MLM-U training can even outperform transformers trained with additional supervision from A* search traces, showing the promise of alternative learning objectives.

Learn more on our site and code at facebookresearch.github.io/maze_navigat...

11.12.2024 18:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Recently, we also applied the same MLM-U objective to maze navigation. We find when training parameter-matched transformers on identical data, MLM-U without any tweaks outperforms standard next token training across all maze grid sizes (up to 30x30).

11.12.2024 18:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We find MLM-U training improves knowledge retrieval on Wikipedia-based questions and even outperforms a pretrained 7B Mistral model with a much smaller 100M parameter transformer trained from scratch!

Come by our NeurIPS poster Exhibit Halls A-C #3204 11am PST Thursday to learn more.

11.12.2024 18:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We show training with a factorization agnostic objective, MLM-U (a variable ratio BERT-style loss with links to discrete diffusion), that predicts multiple tokens ahead and back can significantly mitigate the reversal curse!

11.12.2024 18:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Problem: Language models struggle with the β€œreversal curse:” an inability to answer reformulations of a question. We show this stems from the standard next token learning objective in what we call β€œthe factorization curse.”

11.12.2024 18:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Can we boost transformers’ ability to retrieve knowledge and plan in maze navigation by only tweaking the learning objective?

We emphatically say YES in our #NeurIPS 2024 study! 🧡

w/ Ouail Kitouni, Niklas Nolte, Diane Bouchacourt, Adina Williams, and Mike Rabbat

Paper arxiv.org/abs/2406.05183

11.12.2024 18:32 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

@markibrahim is following 20 prominent accounts