๐ Excited to share: "๐๐ข๐ฑ๐ญ๐ฎ๐ซ๐-๐จ๐-๐๐ซ๐๐ง๐ฌ๐๐จ๐ซ๐ฆ๐๐ซ๐ฌ (๐๐จ๐)" has been officially accepted to TMLR (March 2025) and the code is now open-sourced!
๐ GitHub repo: github.com/facebookrese...
๐ Paper: arxiv.org/abs/2411.04996
09.05.2025 05:35 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
๐จ Our new study reveals widespread LLM adoption across society:
๐By late 2024, LLMs assist in writing:
- 18% of financial consumer complaints
- 24% of corporate press releases
- Up to 15% of job postings (esp. in small/young firms)
- 14% of UN press releases
arxiv.org/abs/2502.09747
01.03.2025 22:28 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
Joint work with Junhong Shen, Genghan Zhang @zhang677.bsky.social, Ning Dong, Luke Zettlemoyer, Lili Yu
#LLM #MultiModal #pretraining
06.02.2025 05:31 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
โ
Mamba w/ Chameleon setting (discrete tokens): Dense-level image/text at 42.5% and 65.4% FLOPs
โ
Mamba w/ Three-modality (image, text, speech): Dense-level speech at 24.8% FLOPs
Takeaway:
Modality-aware sparsity isnโt just for Transformersโit thrives in SSMs like Mamba too! ๐
06.02.2025 05:30 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
w/ Genghan Zhang(zhang677.github.io), Olivia Hsu (weiya711.github.io), and Prof. Kunle Olukotun (arsenalfc.stanford.edu/kunle/)
06.02.2025 04:42 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Genghan Zhang
A simple, whitespace theme for academics. Based on [*folio](https://github.com/bogoli/-folio) design.
2/ Our system learns through curriculum-based experience, programming complex ML operators like attention and MoE from scratch.
3/ Bonus: We're releasing a clean benchmark with STeP - a new programming language never seen in training data. A true test of reasoning ability!๐ฏ
06.02.2025 04:42 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Honored that #Nature has highlighted our work again in their latest piece examining #ChatGPT's transformative impact on scientific research and academia over the past two years. h/t @natureportfolio.bsky.social
www.nature.com/articles/d41...
04.12.2024 17:15 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
*Our survey was opt-in and could have selection bias.
**Reviewers should still engage w/ papers independently w/o relying on LLM.
(4/n)
22.11.2024 20:56 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Findings (cont.):
๐ GPT4 can struggle with in-depth critique of study methods; sometimes more generic.
Takeaway: high-quality human feedback still necessary; #LLM could help authors improve early drafts before official peer review**.
Collaborators: Yuhui, Hancheng, and many others ๐
(3/n)
22.11.2024 20:55 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Findings:
๐ Most authors found GPT4 generated feedback helpful*
๐ >50% of points raised by GPT4 also raised by >=1 human reviewer.
๐ Overlap between GPT4 and human feedback similar to overlap between 2 human reviewers.
(2/n)
22.11.2024 20:52 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Our new study in #COLM2024 estimates that ~17% of recent CS arXiv papers used #LLMs substantially in its writing.
Around 8% for bioRxiv papers.
Paper: arxiv.org/abs/2404.01268 ๐งต
22.11.2024 20:49 โ ๐ 0 ๐ 1 ๐ฌ 0 ๐ 0
Excited that our paper quantifying #LLMs usage in paper reviews is selected as an #ICML2024 oral (top 1.5% of submissions)! ๐
Main results๐
proceedings.mlr.press/v235/liang24...
Media Coverage: The New York Times
nyti.ms/3vwQhdi
22.11.2024 20:48 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
ChatGPT is transforming peer review โ how can we use it responsibly?
At major computer-science publication venues, up to 17% of the peer reviews are now written by artificial intelligence. We need guidelines before things get out of hand.
Excited to share that our recent work on LLM in peer review and responsible LLM use is featured in #Nature!
Many thanks to my collaborators for their insights and dedication to advancing fair and ethical AI practices in scientific publishing. #AI #PeerReview
www.nature.com/articles/d41...
22.11.2024 17:07 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Takeaway:
Modality-aware sparsity in MoT offers a scalable path to efficient, multi-modal AI with reduced pretraining costs.
Work of a great team with Lili Yu, Liang Luo, Srini Iyer, Ning Dong, Chunting Zhou, Gargi Ghosh, Mike Lewis, Scott Wen-tau Yih, Luke Zettlemoyer, Victoria Lin.
(6/n)
22.11.2024 01:40 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
โ
System profiling shows MoT achieves dense-level image quality in 47% and text quality in 75.6% of the wall-clock time**
**Measured on AWS p4de.24xlarge instances with NVIDIA A100 GPUs.
(5/n)
22.11.2024 01:39 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
โ
Transfusion setting (text autoregressive + image diffusion): MoT matches dense model quality using one-third of the FLOPs.
(4/n)
22.11.2024 01:38 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
โ
Chameleon setting (text + image generation): Our 7B MoT matches dense baseline quality using just 55.8% of the FLOPs.
Extended to speech as a third modality, MoT achieves dense-level speech quality with only 37.2% of the FLOPs.
(3/n)
22.11.2024 01:38 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
At Meta, we introduce Mixture-of-Transformers (MoT), a sparse architecture with modality-aware sparsity for every non-embedding transformer parameter (e.g., feed-forward networks, attention matrices, and layer normalization).
MoT achieves dense-level performance with up to 66% fewer FLOPs!
(2/n)
22.11.2024 01:37 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
How can we reduce pretraining costs for multi-modal models without sacrificing quality? We study this Q in our new work: arxiv.org/abs/2411.04996
โ
MoT achieves dense-level 7B performance with up to 66% fewer FLOPs!
22.11.2024 01:36 โ ๐ 1 ๐ 0 ๐ฌ 5 ๐ 1