Aryo Pradipta Gema's Avatar

Aryo Pradipta Gema

@aryopg.bsky.social

AI Safety Fellow @Anthropic | PhD at University of Edinburgh | LLM Hallucinations | Clinical NLP | Opinions are my own. Personal page: https://aryopg.github.io

911 Followers  |  678 Following  |  9 Posts  |  Joined: 17.11.2024  |  2.311

Latest posts by aryopg.bsky.social on Bluesky

Video thumbnail

We propose Neurosymbolic Diffusion Models! We find diffusion is especially compelling for neurosymbolic approaches, combining powerful multimodal understanding with symbolic reasoning πŸš€

Read more πŸ‘‡

21.05.2025 10:57 β€” πŸ‘ 92    πŸ” 27    πŸ’¬ 4    πŸ“Œ 6
MMLU-Redux Poster at NAACL 2025

MMLU-Redux Poster at NAACL 2025

MMLU-Redux just touched down at #NAACL2025! πŸŽ‰
Wish I could be there for our "Are We Done with MMLU?" poster today (9:00-10:30am in Hall 3, Poster Session 7), but visa drama said nope πŸ˜…
If anyone's swinging by, give our research some love! Hit me up if you check it out! πŸ‘‹

02.05.2025 13:00 β€” πŸ‘ 16    πŸ” 11    πŸ’¬ 0    πŸ“Œ 0
Image illustrating that ALM can enable Ensembling, Transfer to Bytes, and general Cross-Tokenizer Distillation.

Image illustrating that ALM can enable Ensembling, Transfer to Bytes, and general Cross-Tokenizer Distillation.

We created Approximate Likelihood Matching, a principled (and very effective) method for *cross-tokenizer distillation*!

With ALM, you can create ensembles of models from different families, convert existing subword-level models to byte-level and a bunch more🧡

02.04.2025 06:36 β€” πŸ‘ 26    πŸ” 14    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸš€ Thrilled to share our new preprint, "An Analysis of Decoding Methods for LLM-based Agents for Faithful Multi-Hop Question Answering"! πŸ“„

Dive into the paper: arxiv.org/abs/2503.23415

#AI #MachineLearning #LLM #NLP #Research #QuestionAnswering #Retrieval

01.04.2025 13:56 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Today, I'm starting as an AI Safety Fellow @anthropic.com ! πŸš€
Super excited to collaborate and learn from some of the brightest minds in AI! 🌟

24.03.2025 08:51 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

madly in love with this article from @thesun.co.uk covering a paper from @rohit-saxena.bsky.social and @aryopg.bsky.social

Paper: arxiv.org/abs/2502.05092
The Sun: www.thesun.co.uk/tech/3384555...

17.03.2025 12:41 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 0    πŸ“Œ 1
Post image

Can multimodal LLMs truly understand research poster images?πŸ“Š

πŸš€ We introduce PosterSumβ€”a new multimodal benchmark for scientific poster summarization!

πŸ“‚ Dataset: huggingface.co/datasets/rohitsaxena/PosterSum
πŸ“œ Paper: arxiv.org/abs/2502.17540

10.03.2025 14:19 β€” πŸ‘ 8    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0

Garbage in, garbage out -- nice gem for the Italian-speaking folks on this platform πŸ˜… TLDR, in arxiv.org/abs/2406.04127 we found that MMLU contains TONS of errors, and looks like all these seamlessly propagated to this new "Global MMLU" dataset

06.12.2024 13:17 β€” πŸ‘ 11    πŸ” 2    πŸ’¬ 0    πŸ“Œ 1
Preview
Are We Done with MMLU? Maybe not. We identify and analyse errors in the popular Massive Multitask Language Understanding (MMLU) benchmark. Even though MMLU is widely adopted, our analysis demonstrates numerous ground truth ...

MMLU-Redux:
πŸ“œPaper: arxiv.org/abs/2406.04127
πŸ“šDataset: huggingface.co/datasets/edi...

06.12.2024 09:44 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This goes without saying: As someone from a non-English speaking country, I salute the effort to democratise LLM evaluations across languages. But we must also ensure we don't democratise mistakes.

06.12.2024 09:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Super Cool work from Cohere for AI! πŸŽ‰ However, this highlights a concern raised by our MMLU-Redux team (arxiv.org/abs/2406.04127): **error propagation to many languages**. Issues in MMLU (e.g., "rapid intervention to solve ebola") seem to persist in many languages. Let's solve the root cause first?

06.12.2024 09:38 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

For clarity -- great project, but most of the MMLU errors we found (and fixed) in our MMLU Redux paper (arxiv.org/abs/2406.04127) are also present in this dataset. We also provide a curated version of MMLU, so it's easy to fix 😊

06.12.2024 09:26 β€” πŸ‘ 15    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

Oops! Some errors we noticed in MMLU-Redux still exist in some languages (e.g., rapid intervention to "solve" ebola). (I just checked the 2 languages that I understand: Indonesian and Malay)

05.12.2024 23:11 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 1
A picture showing Halo's features which include heart rate, sleep cycle and SPO2 monitoring, using on-device ML.

A picture showing Halo's features which include heart rate, sleep cycle and SPO2 monitoring, using on-device ML.

Super excited to introduce Halo: A beginner's guide to DIY health tracking with wearables! πŸ€—βœ¨
Using an $11 smart ring, I'll show you how to build your own private health monitoring app. From basic metrics to advanced features like:
- Activity tracking
- HR monitoring
- Sleep analysis
and more!

19.11.2024 18:38 β€” πŸ‘ 77    πŸ” 15    πŸ’¬ 5    πŸ“Œ 2
Preview
University of Edinburgh Starter Pack Join the conversation

Starter pack for University of Edinburgh researchers done by the amazing ramandutt4.bsky.social - go.bsky.app/KRNDkN7

20.11.2024 16:34 β€” πŸ‘ 35    πŸ” 9    πŸ’¬ 9    πŸ“Œ 1

Would you be so kind including me to the party? @ramandutt4.bsky.social

20.11.2024 16:57 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If you’re interested in mechanistic interpretability, I just found this starter pack and wanted to boost it (thanks for creating it @butanium.bsky.social !). Excited to have a mech interp community on bluesky πŸŽ‰

go.bsky.app/LisK3CP

19.11.2024 00:28 β€” πŸ‘ 36    πŸ” 8    πŸ’¬ 3    πŸ“Œ 2
Preview
Generative AI Laboratory

Joining the Generative AI Lab (GAIL, gail.ed.ac.uk) at the University of Edinburgh as a GAIL Fellow! Excited for what's ahead πŸ€—

19.11.2024 22:43 β€” πŸ‘ 19    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

πŸ€”How to achieve efficient ICL without storing a huge dataset in one prompt?
πŸ’‘Mixtures of In-Context Learners (π— π—Όπ—œπ—–π—Ÿ): we treat LLMs prompted with subsets of demonstrations as experts and learn a weighting function to optimise the distribution over the continuation (🧡1/n)

18.11.2024 18:36 β€” πŸ‘ 33    πŸ” 4    πŸ’¬ 1    πŸ“Œ 2

Started making a list of researchers working at the intersection of healthcare, language, and computation. Please help me add more people!

18.11.2024 11:09 β€” πŸ‘ 61    πŸ” 10    πŸ’¬ 14    πŸ“Œ 1
Post image

I’ll be travelling to London from Wednesday to Friday for an upcoming event and would be very happy to meet up! πŸš€
I'd love to chat about my recent works (DeCoRe, MMLU-Redux, etc.). DM me if you’re around! πŸ‘‹

DeCoRe: arxiv.org/abs/2410.18860
MMLU-Redux: arxiv.org/abs/2406.04127

18.11.2024 13:48 β€” πŸ‘ 11    πŸ” 7    πŸ’¬ 0    πŸ“Œ 0

I'd love to be added!

17.11.2024 18:53 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@aryopg is following 20 prominent accounts