Elias Stengel-Eskin's Avatar

Elias Stengel-Eskin

@esteng.bsky.social

Postdoc @UNC working on NLP, AI, and computational linguistics. Formerly PhD student @JHU and undergrad @McGill esteng.github.io

1,945 Followers  |  682 Following  |  62 Posts  |  Joined: 17.09.2023  |  1.7027

Latest posts by esteng.bsky.social on Bluesky

Post image Post image

Some personal updates:
- I've completed my PhD at @unccs.bsky.social! πŸŽ“
- Starting Fall 2026, I'll be joining the CS dept. at Johns Hopkins University @jhucompsci.bsky.social as an Assistant Professor πŸ’™
- Currently exploring options for my gap year (Aug 2025 - Jul 2026), so feel free to reach out! πŸ”Ž

20.05.2025 17:58 β€” πŸ‘ 27    πŸ” 5    πŸ’¬ 3    πŸ“Œ 2
Post image Post image

πŸ“’ The SoLaR workshop will be collocated with COLM!
@colmweb.org

SoLaR is a collaborative forum for researchers working on responsible development, deployment and use of language models.

We welcome both technical and sociotechnical submissions, deadline July 5th!

12.05.2025 15:25 β€” πŸ‘ 17    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0
Post image

🚨 Introducing our @tmlrorg.bsky.social paper β€œUnlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation”
We present UnLOK-VQA, a benchmark to evaluate unlearning in vision-and-language models, where both images and text may encode sensitive or private information.

07.05.2025 18:54 β€” πŸ‘ 10    πŸ” 8    πŸ’¬ 1    πŸ“Œ 0

Thank you!

06.05.2025 03:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thanks @niranjanb.bsky.social!

06.05.2025 03:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

πŸ”₯ BIG CONGRATS to Elias (and UT Austin)! Really proud of you -- it has been a complete pleasure to work with Elias and see him grow into a strong PI on *all* axes πŸ€—

Make sure to apply for your PhD with him -- he is an amazing advisor and person! πŸ’™

05.05.2025 22:00 β€” πŸ‘ 12    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0

Thank you @mohitbansal.bsky.social -- I have learned so much from your mentorship (and benefitted greatly from your job market guidance), and consider myself extremely fortunate to have found such a fantastic lab and postdoc advisor!

05.05.2025 23:27 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Thanks @kmahowald.bsky.social looking forward to collaborating!

05.05.2025 21:49 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thanks @rdesh26.bsky.social ❀️

05.05.2025 21:48 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

And of course thank you to the amazing students/collaborators from @unccs.bsky.social and @jhuclsp.bsky.social πŸ™

05.05.2025 20:28 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

A huge shoutout to my mentors who have supported and shaped my research! Esp. grateful to my postdoc advisor @mohitbansal.bsky.social for helping me grow along the whole spectrum of PI skills, and my PhD advisor @vandurme.bsky.social for shaping my trajectory as a researcher

05.05.2025 20:28 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Elias Stengel-Eskin Postdoctoral Research Associate, UNC Chapel Hill

Looking forward to continuing to develop AI agents that interact/communicate with people, each other, and the multimodal world. I’ll be recruiting PhD students for Fall 2026 across a range of connected topics (details: esteng.github.io) and plan on recruiting interns for Fall 2025 as well.

05.05.2025 20:28 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
UT Austin campus

UT Austin campus

Extremely excited to announce that I will be joining
@utaustin.bsky.social Computer Science in August 2025 as an Assistant Professor! πŸŽ‰

05.05.2025 20:28 β€” πŸ‘ 43    πŸ” 9    πŸ’¬ 5    πŸ“Œ 2

🌡 I'm going to be presenting PBT at #NAACL2025 today at 2PM! Come by poster session 2 if you want to hear about:
-- balancing positive and negative persuasion
-- improving LLM teamwork/debate
-- training models on simulated dialogues

With @mohitbansal.bsky.social and @peterbhase.bsky.social

30.04.2025 15:04 β€” πŸ‘ 8    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

I will be presenting ✨Reverse Thinking Makes LLMs Stronger Reasoners✨at #NAACL2025!

In this work, we show
- Improvements across 12 datasets
- Outperforms SFT with 10x more data
- Strong generalization to OOD datasets

πŸ“…4/30 2:00-3:30 Hall 3

Let's chat about LLM reasoning and its future directions!

29.04.2025 23:21 β€” πŸ‘ 5    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Preview
Teaching Models to Balance Resisting and Accepting Persuasion Large language models (LLMs) are susceptible to persuasion, which can pose risks when models are faced with an adversarial interlocutor. We take a first step towards defending models against persuasion while also arguing that defense against adversarial (i.e. negative) persuasion is only half of the equation: models should also be able to accept beneficial (i.e. positive) persuasion to improve their answers. We show that optimizing models for only one side results in poor performance on the other. In order to balance positive and negative persuasion, we introduce Persuasion-Training (or PBT), which leverages multi-agent recursive dialogue trees to create data and trains models via preference optimization to accept persuasion when appropriate. PBT allows us to use data generated from dialogues between smaller 7-8B models for training much larger 70B models. Moreover, PBT consistently improves resistance to misinformation and resilience to being challenged while also resulting in the best overall performance on holistic data containing both positive and negative persuasion. Crucially, we show that PBT models are better teammates in multi-agent debates across two domains (trivia and commonsense QA). We find that without PBT, pairs of stronger and weaker models have unstable performance, with the order in which the models present their answers determining whether the team obtains the stronger or weaker model's performance. PBT leads to better and more stable results and less order dependence, with the stronger model consistently pulling the weaker one up.

Links:
1⃣ arxiv.org/abs/2410.14596
2⃣ arxiv.org/abs/2503.15272
3⃣ arxiv.org/abs/2409.07394

With awesome collaborators @mohitbansal.bsky.social, @peterbhase.bsky.social, David Wan, @cyjustinchen.bsky.social, Han Wang, @archiki.bsky.social

29.04.2025 17:52 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

πŸ“† 04/30 2PM: Teaching Models to Balance Resisting and Accepting Persuasion

πŸ“† 05/01 2PM: MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration

πŸ“† 05/02 11AM: AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge

29.04.2025 17:52 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

✈️ Heading to #NAACL2025 to present 3 main conf. papers, covering training LLMs to balance accepting and rejecting persuasion, multi-agent refinement for more faithful generation, and adaptively addressing varying knowledge conflict.

Reach out if you want to chat!

29.04.2025 17:52 β€” πŸ‘ 15    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - atinpothiraj/CAPTURe Contribute to atinpothiraj/CAPTURe development by creating an account on GitHub.

Kudos to Atin Pothiraj on leading this project, with @jmincho.bsky.social and @mohitbansal.bsky.social

Code: github.com/atinpothiraj...

@hf.co Dataset: huggingface.co/datasets/ati...

Paper: arxiv.org/abs/2504.15485

24.04.2025 15:14 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

By testing VLMs’ spatial reasoning under occlusion, CAPTURe highlights an unexpected weakness. We analyze this weakness by providing the model with additional information:

➑️ Providing object coordinates as text improves performance substantially.
➑️ Providing diffusion-based inpainting also helps.

24.04.2025 15:14 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

Interestingly, model error increases with respect to the number of occluded dots, suggesting that task performance is correlated with the level of occlusion.

Additionally, model performance depends on pattern type (the shape in which the objects are arranged).

24.04.2025 15:14 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

We evaluate 4 strong VLMs (GPT-4o, InternVL2, Molmo, and Qwen2VL) on CAPTURe.

Models generally struggle with multiple aspects of the task (occluded and unoccluded)

Crucially, every model performs worse in the occluded setting but we find that humans can perform the task easily even with occlusion.

24.04.2025 15:14 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We release 2 splits:

➑️ CAPTURe-real contains real-world images and tests the ability of models to perform amodal counting in naturalistic contexts.

➑️ CAPTURe-synthetic allows us to analyze specific factors by controlling different variables like color, shape, and number of objects.

24.04.2025 15:14 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

CAPTURe = Counting Amodally Through Unseen Regions, which requires a model to count objects arranged in a pattern by inferring how the pattern continues behind an occluder (an object that blocks parts of the scene).

This needs pattern recognition + counting, making it a good testbed for VLMs!

24.04.2025 15:14 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Check out 🚨CAPTURe🚨 -- a new benchmark testing spatial reasoning by making VLMs count objects under occlusion.

SOTA VLMs (GPT-4o, Qwen2-VL, Intern-VL2) have high error rates on CAPTURe (but humans have low error βœ…) and models struggle to reason about occluded objects.

arxiv.org/abs/2504.15485

πŸ§΅πŸ‘‡

24.04.2025 15:14 β€” πŸ‘ 5    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
Post image

🚨Real-world retrieval is messy: queries are ambiguous or docs conflict & have incorrect/irrelevant info. How can we jointly address these problems?

➑️RAMDocs: challenging dataset w/ ambiguity, misinformation & noise
➑️MADAM-RAG: multi-agent framework, debates & aggregates evidence across sources

πŸ§΅β¬‡οΈ

18.04.2025 17:05 β€” πŸ‘ 14    πŸ” 7    πŸ’¬ 3    πŸ“Œ 0

Excited to share my first paper as first author: "Task-Circuit Quantization" πŸŽ‰
I led this work to explore how interpretability insights can drive smarter model compression. Big thank you to @esteng.bsky.social, Yi-Lin Sung, and @mohitbansal.bsky.social for mentorship and collaboration. More to come

16.04.2025 16:19 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

What if we could transform advanced math problems into abstract programs that can generate endless, verifiable problem variants?

Presenting EFAGen, which automatically transforms static advanced math problems into their corresponding executable functional abstractions (EFAs).
πŸ§΅πŸ‘‡

15.04.2025 19:37 β€” πŸ‘ 15    πŸ” 5    πŸ’¬ 1    πŸ“Œ 1
Preview
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression Post-training quantization (PTQ) reduces a model's memory footprint by mapping full precision weights into low bit weights without costly retraining, but can degrade its downstream performance especia...

Had an awesome time mentoring @hanqix.bsky.social who led this work, with Yi-Lin Sung and @mohitbansal.bsky.social

@unccs.bsky.social

πŸ’»Code: github.com/The-Inscruta...
πŸ“„Paper: arxiv.org/abs/2504.07389

12.04.2025 14:19 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

To simulate a realistic use case involving generation, we evaluate on Spider for text-to-sql.

Quantization methods struggle with preserving performance on generative tasks. We show that TaCQ is the only method to achieve non-zero performance in 2-bits for Llama-3-8B-Instruct.

12.04.2025 14:19 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@esteng is following 20 prominent accounts