Cyrus Rashtchian's Avatar

Cyrus Rashtchian

@cyroid.bsky.social

Researcher at Google. Improving LLM factuality, RAG and multimodal alignment and evaluation. San Diego. he/him β˜€οΈπŸŒ±πŸ§—πŸ»πŸ Prev UCSD, MSR, UW, UIUC.

299 Followers  |  109 Following  |  13 Posts  |  Joined: 02.12.2024  |  1.575

Latest posts by cyroid.bsky.social on Bluesky

ICLR 2025 was so much fun!

28.04.2025 03:55 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
ICLR Poster Revisiting text-to-image evaluation with Gecko: on metrics, prompts, and human ratingICLR 2025

Curious about fine-grained text-to-image model evaluation? Come see our spotlight paper on Gecko 🦎 in the afternoon poster session at #ICLR25

πŸ†Hall 3 + Hall 2B #359
πŸŽ–οΈFriday 3pm

ICLR: iclr.cc/virtual/2025...
Paper: arxiv.org/abs/2404.16820
Prompts: github.com/google-deepm...

25.04.2025 01:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Why do LLMs hallucinate with RAG?! πŸ€”
Find out at my #ICLR25 poster on Sufficient Context! πŸ‘‹πŸΌ
πŸ“Hall 3 + Hall 2B #230
⏰ Fri 25 Apr 10 a.m. to 12:30 p.m.

25.04.2025 01:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Happy to chat with anyone at ICLR about RAG, LLMs, Factuality!

25.04.2025 01:31 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

When RAG systems hallucinate, is the LLM misusing available information or is the retrieved context insufficient? In our #ICLR2025 paper, we introduce "sufficient context" to disentangle these failure modes. Work w Jianyi Zhang, Chun-Sung Ferng, Da-Cheng Juan, Ankur Taly, @cyroid.bsky.social

24.04.2025 18:18 β€” πŸ‘ 11    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0

1/2 Just a reminder about Google Research Scholar Program, providing up to $60K unrestricted gifts to recognize early-career professors and support world-class research at institutions around the world. This year, we are particularly interested in the following research areas...

20.12.2024 19:04 β€” πŸ‘ 12    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0
Post image

[6/6] The other idea is to do the weighted combination at an instance level. We look at intermediate layers for *each token* and slightly modify the overall distribution. This leads to consistent accuracy improvements for many models and datasets!

Would love to see some theory on why this works!

13.12.2024 18:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

[5/6] Here's a nice example. We want to do some math. Greedy decoding leads to 5 x $10 = $50 for the overtime pay. This is cus A x B = C is a common pattern. But we really need A x B x C = D to get the answer. SLED can help with this because the internal layers happen to predict 'x' instead of '='.

13.12.2024 18:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

[4/6] Our main decoding trick is to use a weighted combination of *all of the layers*. Precisely, we project the layers into the same output distribution (over vocab tokens). Then we combine the intermediate "logits" with the output logits based on our estimate of the LLM's internal knowledge

13.12.2024 18:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

[3/6] The key observation is that LLMs "know" a lot more than they "tell" -- basically the training process can favor more popular tokens (in the dataset) rather than more accuracy predictions for the query at hand.

So we can utilize this during decoding time...

13.12.2024 18:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

[2/6] Joint work with Jianyi Zhang Β· Da-Cheng Juan Β· Chun-Sung Ferng Β· Heinrich Jiang Β· Yiran Chen

ArXiv paper: arxiv.org/abs/2411.02433
Project page: jayzhang42.github.io/sled_page/
GitHub: github.com/JayZhang42/S...

But how does it work you ask?

13.12.2024 18:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Longer thread about our new factuality decoding method SLED at NeurIPS 2024. Main idea: freeze the model, but be thoughtful about the decoding. With a small amount of extra inference-time compute, we increase accuracy by 3% on several benchmarks! SLED helps for all major open source models!

13.12.2024 18:43 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

ArXiv paper: arxiv.org/abs/2411.02433
Project page: jayzhang42.github.io/sled_page/
GitHub: github.com/JayZhang42/S...

13.12.2024 17:55 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

First shameless plug -- our new factuality decoding method, SLED gets SOTA improvements on 14+ models (Llama 2/3, Gemma, Mistral) & 9 benchmarks!

See our #NeurIPS2024 poster today (Friday) in the East Exhibit Hall A-C #3311

13.12.2024 17:55 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Hi friends!🩷
I have never done this but i’m making a list so and i can keep in touch with all of you more easily🫢🏻

please like this or say hi if i can add youπŸ₯° Thank🫢🏻

02.12.2024 21:51 β€” πŸ‘ 54    πŸ” 1    πŸ’¬ 22    πŸ“Œ 0

Everyone I spoke to at @rl-conference.bsky.social last summer agreed on it being one of the best conferences ever for an RL researcher... So many great RL-focused papers!
CFP is out, send your work here!

02.12.2024 16:02 β€” πŸ‘ 45    πŸ” 13    πŸ’¬ 1    πŸ“Œ 0

Excited to try out bluesky and chat about GenAI and ML theory!

02.12.2024 18:32 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@cyroid is following 19 prominent accounts