Ameya P.'s Avatar

Ameya P.

@bayesiankitten.bsky.social

Postdoctoral Researcher @ Bethgelab, University of TΓΌbingen Benchmarking | LLM Agents | Data-Centric ML | Continual Learning | Unlearning drimpossible.github.io

565 Followers  |  129 Following  |  15 Posts  |  Joined: 19.11.2024  |  2.002

Latest posts by bayesiankitten.bsky.social on Bluesky

Post image

πŸš€ A new era in European #AIresearch begins!

ELLIOT is a €25M #HorizonEurope project launching July 2025 to build open, trustworthy Multimodal Generalist Foundation Models.
30 partners, 12 countries, EU values.

πŸ”— Press release: apigateway.agilitypr.com/distribution...

24.06.2025 06:46 β€” πŸ‘ 12    πŸ” 3    πŸ’¬ 0    πŸ“Œ 2
Post image

πŸš€ Never miss a beat in science again!

πŸ“¬ Scholar Inbox is your personal assistant for staying up to date with your literature. It includes: visual summaries, collections, search and a conference planner.

Check out our white paper: arxiv.org/abs/2504.08385
#OpenScience #AI #RecommenderSystems

14.04.2025 11:04 β€” πŸ‘ 94    πŸ” 19    πŸ’¬ 1    πŸ“Œ 4
Post image

🧡1/ 🚨 New paper: A Sober Look at Progress in Language Model Reasoning
We re-evaluate recent SFT and RL models for mathematical reasoning and find most gains vanish under rigorous, multi-seed, standardized evaluation.

πŸ“Š bethgelab.github.io/sober-reason...
πŸ“„ arxiv.org/abs/2504.07086

10.04.2025 15:36 β€” πŸ‘ 14    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0

Hochlehnert, Bhatnagar, Udandarao, Albanie, Prabhu, Bethge: A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility https://arxiv.org/abs/2504.07086 https://arxiv.org/pdf/2504.07086 https://arxiv.org/html/2504.07086

10.04.2025 06:08 β€” πŸ‘ 1    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

Great work! A much-needed upgrade for continual learning datasetsβ€”excited to see progress on long-timespan tasks beyond classification. Deets belowπŸ‘‡

10.04.2025 05:32 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Deadline extended to March 19 for the EVAL-FoMo workshop @cvprconference.bsky.social! We welcome submissions (incl. published papers) analyzing emerging capabilities & limits in visual foundation models.

Details: sites.google.com/view/eval-fo...
#CVPR2025

12.03.2025 12:20 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

LMs excel at solving problems (~48% success) but falter at debunking them (<9% counterexample rate)!

Could form an AI Brandolini's Law: "Capability needed to refute bullshit is far larger than that needed to generate it"

28.02.2025 19:17 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

AI can generate correct-seeming hypotheses (and papers!). Brandolini's law states BS is harder to refute than generate. Can LMs falsify incorrect solutions? o3-mini (high) scores just 9% on our new benchmark REFUTE. Verification is not necessarily easier than generation 🧡

28.02.2025 18:12 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1
Conference Management Toolkit - Login Microsoft's Conference Management Toolkit is a hosted academic conference management system. Modern interface, high scalability, extensive features and outstanding support are the signatures of Micros...

πŸš€ Call for Papers – CVPR 3rd Workshop on Multi-Modal Foundation Models (MMFM)
@cvprconference.bsky.social ! πŸš€

πŸ” Topics: Multi-modal learning, vision-language, audio-visual, and more!
πŸ“… Deadline: March 14, 2025
πŸ“ Submission: cmt3.research.microsoft.com/MMFM2025
🌐 sites.google.com/view/mmfm3rd...

19.02.2025 14:16 β€” πŸ‘ 6    πŸ” 2    πŸ’¬ 1    πŸ“Œ 2
Post image

New preprint out! πŸŽ‰

How does LLM training loss translate to downstream performance?

We show that pretraining data and tokenizer shape loss-to-loss scaling, while architecture and other factors play a surprisingly minor role!
brendel-group.github.io/llm-line/ 🧡1/8

18.02.2025 14:09 β€” πŸ‘ 18    πŸ” 8    πŸ’¬ 1    πŸ“Œ 2

CuratedThoughts: Data curation focus for RL post-training! (Update 1) πŸš€

25% of Openthoughts-114k-math filtered β€” issues included proofs, missing figures, and multiple questions with one answer.

Check out work by
@ahochlehnert.bsky.social & @hrdkbhatnagar.bsky.social
below πŸ‘‡

17.02.2025 18:30 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Our 2nd Workshop on Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo) is accepting submissions. We are looking forward to talks by our amazing speakers that include @saining.bsky.social, @aidanematzadeh.bsky.social, @lisadunlap.bsky.social, and @yukimasano.bsky.social. #CVPR2025

13.02.2025 16:02 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 0    πŸ“Œ 1
Preview
EVAL-FoMo 2 A Vision workshop on Evaluations and Analysis

πŸ”₯ #CVPR2025 Submit your cool papers to Workshop on
Emergent Visual Abilities and Limits of Foundation Models πŸ“·πŸ“·πŸ§ πŸš€βœ¨

sites.google.com/view/eval-fo...

Submission Deadline: March 12th!

12.02.2025 14:45 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 1

LMs are used for annotation, evaluation and distillation! We identify critical issues!

LMs of a similar capability class (not model family tho!) behave similarly and this skews oversight far more than I expected.

Check the 4-in-1 mega paper below to πŸ‘€ how πŸ‘‡

07.02.2025 22:09 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Can better representation learning help? No!

RanDumb recovers 70-90% of the joint performance.

Forgetting isn't the main issueβ€”the benchmarks are too toy!

Key Point: Current OCL benchmarks are too constrained for any effective learning of online continual representations!

13.12.2024 18:53 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Across a wide range of online continual learning benchmarks-- RanDumb consistently surpasses prior methods (even latest contrastive & meta strategies), often by surprisingly large margins!

13.12.2024 18:53 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Continual Learning assumes deep representations learned outperform old school kernel classifiers (as in supervised DL). But this isn't validated!!

Why might it not work? Updates are limited and networks may not converge.

We find: OCL representations are severely undertrained!

13.12.2024 18:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

How RanDumb works: Fix a random embedder to transform raw pixels. Train a linear classifier on topβ€”single pass, one sample at a time, no stored exemplars. Order-invariant, worst-case readyπŸš€

Looks familiar? This is streaming (approx.) Kernel LDA!!

13.12.2024 18:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Random Representations Outperform Online Continually Learned Representations Continual learning has primarily focused on the issue of catastrophic forgetting and the associated stability-plasticity tradeoffs. However, little attention has been paid to the efficacy of continual...

Paper Link: arxiv.org/abs/2402.08823

13.12.2024 18:51 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

New Work: RanDumb!πŸš€

Poster @NeurIPS, East Hall #1910- come say hiπŸ‘‹

Core claim: Random representations Outperform Online Continual Learning Methods!

How: We replace the deep network by a *random projection* and linear clf, yet outperform all OCL methods by huge margins [1/n]

13.12.2024 18:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The Practitioner's Guide to Continual Multimodal Pretraining @dziadzio.bsky.social @confusezius.bsky.social @vishaalurao.bsky.social @bayesiankitten.bsky.social

12.12.2024 02:19 β€” πŸ‘ 25    πŸ” 5    πŸ’¬ 0    πŸ“Œ 0

Breaking the 8-model merge limit was tough, but we scaled to merging 200+ models! The secret? Iterative finetuning + merging *over time*.

The time axis unlocks scalable mergeability. Merging has surprising scaling gains across size & compute budgets.

All the gory details ⬇️

11.12.2024 18:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

How do we benchmark the vast capabilities of foundation models? Introducing ONEBench – a unifying benchmark to test them all, led by
@adhirajghosh.bsky.social and
@dziadzio.bsky.social!⬇️

Sample-level benchmarks could be the new generation- reusable, recombinable & evaluate lots of capabilities!

10.12.2024 18:39 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Come chat with us @ NeurIPS for hot takes on the future of continual learning with foundation models!

10.12.2024 18:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
ICLR 2025 Workshop Proposals Welcome to the OpenReview homepage for ICLR 2025 Workshop Proposals

The list of accepted workshops for ICLR 2025 is available at openreview.net/group?id=ICL...
@iclr-conf.bsky.social

We received 120 wonderful proposals, with 40 selected as workshops.

03.12.2024 16:29 β€” πŸ‘ 57    πŸ” 15    πŸ’¬ 1    πŸ“Œ 5

@bayesiankitten is following 20 prominent accounts