ICLR 2025 was so much fun!
28.04.2025 03:55 β π 1 π 0 π¬ 0 π 0@cyroid.bsky.social
Researcher at Google. Improving LLM factuality, RAG and multimodal alignment and evaluation. San Diego. he/him βοΈπ±π§π»π Prev UCSD, MSR, UW, UIUC.
ICLR 2025 was so much fun!
28.04.2025 03:55 β π 1 π 0 π¬ 0 π 0Curious about fine-grained text-to-image model evaluation? Come see our spotlight paper on Gecko π¦ in the afternoon poster session at #ICLR25
πHall 3 + Hall 2B #359
ποΈFriday 3pm
ICLR: iclr.cc/virtual/2025...
Paper: arxiv.org/abs/2404.16820
Prompts: github.com/google-deepm...
Why do LLMs hallucinate with RAG?! π€
Find out at my #ICLR25 poster on Sufficient Context! ππΌ
πHall 3 + Hall 2B #230
β° Fri 25 Apr 10 a.m. to 12:30 p.m.
Happy to chat with anyone at ICLR about RAG, LLMs, Factuality!
25.04.2025 01:31 β π 0 π 0 π¬ 0 π 0When RAG systems hallucinate, is the LLM misusing available information or is the retrieved context insufficient? In our #ICLR2025 paper, we introduce "sufficient context" to disentangle these failure modes. Work w Jianyi Zhang, Chun-Sung Ferng, Da-Cheng Juan, Ankur Taly, @cyroid.bsky.social
24.04.2025 18:18 β π 11 π 5 π¬ 1 π 01/2 Just a reminder about Google Research Scholar Program, providing up to $60K unrestricted gifts to recognize early-career professors and support world-class research at institutions around the world. This year, we are particularly interested in the following research areas...
20.12.2024 19:04 β π 12 π 5 π¬ 1 π 0[6/6] The other idea is to do the weighted combination at an instance level. We look at intermediate layers for *each token* and slightly modify the overall distribution. This leads to consistent accuracy improvements for many models and datasets!
Would love to see some theory on why this works!
[5/6] Here's a nice example. We want to do some math. Greedy decoding leads to 5 x $10 = $50 for the overtime pay. This is cus A x B = C is a common pattern. But we really need A x B x C = D to get the answer. SLED can help with this because the internal layers happen to predict 'x' instead of '='.
13.12.2024 18:43 β π 0 π 0 π¬ 1 π 0[4/6] Our main decoding trick is to use a weighted combination of *all of the layers*. Precisely, we project the layers into the same output distribution (over vocab tokens). Then we combine the intermediate "logits" with the output logits based on our estimate of the LLM's internal knowledge
13.12.2024 18:43 β π 0 π 0 π¬ 1 π 0[3/6] The key observation is that LLMs "know" a lot more than they "tell" -- basically the training process can favor more popular tokens (in the dataset) rather than more accuracy predictions for the query at hand.
So we can utilize this during decoding time...
[2/6] Joint work with Jianyi Zhang Β· Da-Cheng Juan Β· Chun-Sung Ferng Β· Heinrich Jiang Β· Yiran Chen
ArXiv paper: arxiv.org/abs/2411.02433
Project page: jayzhang42.github.io/sled_page/
GitHub: github.com/JayZhang42/S...
But how does it work you ask?
Longer thread about our new factuality decoding method SLED at NeurIPS 2024. Main idea: freeze the model, but be thoughtful about the decoding. With a small amount of extra inference-time compute, we increase accuracy by 3% on several benchmarks! SLED helps for all major open source models!
13.12.2024 18:43 β π 2 π 0 π¬ 1 π 0ArXiv paper: arxiv.org/abs/2411.02433
Project page: jayzhang42.github.io/sled_page/
GitHub: github.com/JayZhang42/S...
First shameless plug -- our new factuality decoding method, SLED gets SOTA improvements on 14+ models (Llama 2/3, Gemma, Mistral) & 9 benchmarks!
See our #NeurIPS2024 poster today (Friday) in the East Exhibit Hall A-C #3311
Hi friends!π©·
I have never done this but iβm making a list so and i can keep in touch with all of you more easilyπ«Άπ»
please like this or say hi if i can add youπ₯° Thankπ«Άπ»
Everyone I spoke to at @rl-conference.bsky.social last summer agreed on it being one of the best conferences ever for an RL researcher... So many great RL-focused papers!
CFP is out, send your work here!
Excited to try out bluesky and chat about GenAI and ML theory!
02.12.2024 18:32 β π 7 π 0 π¬ 1 π 0