We propose Neurosymbolic Diffusion Models! We find diffusion is especially compelling for neurosymbolic approaches, combining powerful multimodal understanding with symbolic reasoning π
Read more π
@aryopg.bsky.social
AI Safety Fellow @Anthropic | PhD at University of Edinburgh | LLM Hallucinations | Clinical NLP | Opinions are my own. Personal page: https://aryopg.github.io
We propose Neurosymbolic Diffusion Models! We find diffusion is especially compelling for neurosymbolic approaches, combining powerful multimodal understanding with symbolic reasoning π
Read more π
MMLU-Redux Poster at NAACL 2025
MMLU-Redux just touched down at #NAACL2025! π
Wish I could be there for our "Are We Done with MMLU?" poster today (9:00-10:30am in Hall 3, Poster Session 7), but visa drama said nope π
If anyone's swinging by, give our research some love! Hit me up if you check it out! π
Image illustrating that ALM can enable Ensembling, Transfer to Bytes, and general Cross-Tokenizer Distillation.
We created Approximate Likelihood Matching, a principled (and very effective) method for *cross-tokenizer distillation*!
With ALM, you can create ensembles of models from different families, convert existing subword-level models to byte-level and a bunch moreπ§΅
π Thrilled to share our new preprint, "An Analysis of Decoding Methods for LLM-based Agents for Faithful Multi-Hop Question Answering"! π
Dive into the paper: arxiv.org/abs/2503.23415
#AI #MachineLearning #LLM #NLP #Research #QuestionAnswering #Retrieval
Today, I'm starting as an AI Safety Fellow @anthropic.com ! π
Super excited to collaborate and learn from some of the brightest minds in AI! π
madly in love with this article from @thesun.co.uk covering a paper from @rohit-saxena.bsky.social and @aryopg.bsky.social
Paper: arxiv.org/abs/2502.05092
The Sun: www.thesun.co.uk/tech/3384555...
Can multimodal LLMs truly understand research poster images?π
π We introduce PosterSumβa new multimodal benchmark for scientific poster summarization!
π Dataset: huggingface.co/datasets/rohitsaxena/PosterSum
π Paper: arxiv.org/abs/2502.17540
Garbage in, garbage out -- nice gem for the Italian-speaking folks on this platform π TLDR, in arxiv.org/abs/2406.04127 we found that MMLU contains TONS of errors, and looks like all these seamlessly propagated to this new "Global MMLU" dataset
06.12.2024 13:17 β π 11 π 2 π¬ 0 π 1MMLU-Redux:
πPaper: arxiv.org/abs/2406.04127
πDataset: huggingface.co/datasets/edi...
This goes without saying: As someone from a non-English speaking country, I salute the effort to democratise LLM evaluations across languages. But we must also ensure we don't democratise mistakes.
06.12.2024 09:44 β π 1 π 0 π¬ 1 π 0Super Cool work from Cohere for AI! π However, this highlights a concern raised by our MMLU-Redux team (arxiv.org/abs/2406.04127): **error propagation to many languages**. Issues in MMLU (e.g., "rapid intervention to solve ebola") seem to persist in many languages. Let's solve the root cause first?
06.12.2024 09:38 β π 9 π 3 π¬ 1 π 0For clarity -- great project, but most of the MMLU errors we found (and fixed) in our MMLU Redux paper (arxiv.org/abs/2406.04127) are also present in this dataset. We also provide a curated version of MMLU, so it's easy to fix π
06.12.2024 09:26 β π 15 π 4 π¬ 1 π 0Oops! Some errors we noticed in MMLU-Redux still exist in some languages (e.g., rapid intervention to "solve" ebola). (I just checked the 2 languages that I understand: Indonesian and Malay)
05.12.2024 23:11 β π 3 π 1 π¬ 0 π 1A picture showing Halo's features which include heart rate, sleep cycle and SPO2 monitoring, using on-device ML.
Super excited to introduce Halo: A beginner's guide to DIY health tracking with wearables! π€β¨
Using an $11 smart ring, I'll show you how to build your own private health monitoring app. From basic metrics to advanced features like:
- Activity tracking
- HR monitoring
- Sleep analysis
and more!
Starter pack for University of Edinburgh researchers done by the amazing ramandutt4.bsky.social - go.bsky.app/KRNDkN7
20.11.2024 16:34 β π 35 π 9 π¬ 9 π 1Would you be so kind including me to the party? @ramandutt4.bsky.social
20.11.2024 16:57 β π 2 π 0 π¬ 0 π 0If youβre interested in mechanistic interpretability, I just found this starter pack and wanted to boost it (thanks for creating it @butanium.bsky.social !). Excited to have a mech interp community on bluesky π
go.bsky.app/LisK3CP
Joining the Generative AI Lab (GAIL, gail.ed.ac.uk) at the University of Edinburgh as a GAIL Fellow! Excited for what's ahead π€
19.11.2024 22:43 β π 19 π 2 π¬ 0 π 0π€How to achieve efficient ICL without storing a huge dataset in one prompt?
π‘Mixtures of In-Context Learners (π πΌπππ): we treat LLMs prompted with subsets of demonstrations as experts and learn a weighting function to optimise the distribution over the continuation (π§΅1/n)
Started making a list of researchers working at the intersection of healthcare, language, and computation. Please help me add more people!
18.11.2024 11:09 β π 61 π 10 π¬ 14 π 1Iβll be travelling to London from Wednesday to Friday for an upcoming event and would be very happy to meet up! π
I'd love to chat about my recent works (DeCoRe, MMLU-Redux, etc.). DM me if youβre around! π
DeCoRe: arxiv.org/abs/2410.18860
MMLU-Redux: arxiv.org/abs/2406.04127
I'd love to be added!
17.11.2024 18:53 β π 1 π 0 π¬ 0 π 0