Diyi Yang diyiyang - Bluesky Statics

We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions 🎮 aligned with human norms 👩‍⚖️ ?

With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!

04.03.2025 04:32 — 👍 22 🔁 9 💬 1 📌 1

We (w/ @diyiyang.bsky.social, @zhuhao.me, & Bodhisattwa Prasad Majumder) are excited to present our #NAACL25 tutorial on Social Intelligence in the Age of LLMs!
It will highlight long-standing and emerging challenges of AI interacting w humans, society & the world.
⏰ May 3, 2:00pm-5:30pm Room Pecos

03.05.2025 13:58 — 👍 14 🔁 6 💬 0 📌 0

LM agents today primarily aim to automate tasks. Can we turn them into collaborative teammates? 🤖➕👤

Introducing Collaborative Gym (Co-Gym), a framework for enabling & evaluating human-agent collaboration! I now get used to agents proactively seeking confirmations or my deep thinking.(🧵 with video)

17.01.2025 17:44 — 👍 22 🔁 10 💬 1 📌 1

Talk Arena Interactive evaluation for audio models

My first bluesky post will be for my first project as a postdoc at Stanford.

Talk Arena is our first step towards building audio LMs into interactive agents. Try it out and let me know what you think. talkarena.org

10.12.2024 01:39 — 👍 18 🔁 4 💬 2 📌 1

Talk Arena Interactive evaluation for audio models

Want to add your model to the arena? Have an idea for a new feature for Talk Arena? We are open to collaboration in many forms!

Co-led with @ellaminzhili.bsky.social in collaboration with @michaelryan207.bsky.social Kunat Pipatanakul Potsawee Manakul @zhuhao.me and @diyiyang.bsky.social (5/5)

10.12.2024 00:01 — 👍 3 🔁 1 💬 0 📌 0

Talk Arena: Interactive Evaluation of Large Audio Models

With an increasing number of Large *Audio* Models 🔊, which one do users like the most?

Introducing talkarena.org — an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks.
🧵 (1/5)

10.12.2024 00:01 — 👍 30 🔁 8 💬 3 📌 3

PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action As language models (LMs) are widely utilized in personalized communication scenarios (e.g., sending emails, writing social media posts) and endowed with a certain level of agency, ensuring they act in...

Check out our paper, code, data to learn more!

Paper: arxiv.org/abs/2409.00138
Website: salt-nlp.github.io/PrivacyLens/

06.12.2024 18:20 — 👍 2 🔁 2 💬 1 📌 0

Excited to present our PrivacyLens paper at #NuerIPS next week! We explore LM agent privacy risks when deployed as personal assistants. (Details in thread)

I am working on developing LM agents as collaborative research partners, learning aids, personal assistants, and more. Let's connect and chat!!

06.12.2024 18:20 — 👍 7 🔁 2 💬 2 📌 0

Meet Tülu 3, a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms.
We invented new methods for fine-tuning language models with RL and built upon best practices to scale synthetic instruction and preference data.
Demo, GitHub, paper, and models 👇

21.11.2024 17:15 — 👍 111 🔁 31 💬 2 📌 7

Missed some – or all – of our papers at #EMNLP2024?

It's not too late to catch up using this handy list from the Stanford AI Lab blog:

ai.stanford.edu/blog/emnlp-2...

18.11.2024 16:29 — 👍 24 🔁 4 💬 0 📌 0

Histogram peaked at 3 minutes and 2 weeks since sent

When I will respond to your email

20.11.2024 18:29 — 👍 2062 🔁 350 💬 39 📌 86

so far, every Thanksgiving week is writing letters week for me 🤣

20.11.2024 05:21 — 👍 2 🔁 0 💬 1 📌 0

🌶️(?) take: Agents are somehow hot right because people realized that LLM output can be interpreted as a DSL which directs side effects in the world (e.g. tool calls) rather than just returning text in a chat/autocomplete sense. What are the open challenges? A 🧵... [1/11]

19.11.2024 09:32 — 👍 165 🔁 30 💬 9 📌 7

I did an unscientific, uncontrolled experiment for #EMNLP2024—details in 🧵👇. I posted my conference & workshop papers to 5 socials. Clear results: Mastodon is near dead, Threads may have users but not my people, not giving up on X/Twitter yet, but Bluesky is worth investing in.

18.11.2024 18:40 — 👍 87 🔁 10 💬 2 📌 0

EMNLP 2024 Tutorial: Language Agents: Foundations, Prospects, and Risks Deformable Neural Radiance Fields creates free-viewpoint portraits (nerfies) from casually captured videos.

Had a great time doing the language agent tutorial (language-agent-tutorial.github.io) with Yu Su, Shunyu Yao and Tao Yu 😀 #EMNLP2024

Check out our slides here: tinyurl.com/language-age...

18.11.2024 18:28 — 👍 33 🔁 5 💬 0 📌 0

Header of poster "Turn your LLM into a Speech LLM in 6 hours without any new data" Find a machine readable version of this poster at https://diva-audio.github.io/

I'll be at the Google Theory and Practice of Foundation Models Workshop today and tomorrow! FOMO for EMNLP, but excited to chat more casually at a smaller non-archival workshop 😅

I am presenting at the Lightning Talks tomorrow at 1:30 PM on our Distilled Voice Assistant model if you're around!

14.11.2024 21:25 — 👍 8 🔁 1 💬 0 📌 0

CS 4644 / 7643: Deep Learning - LLM Guest Lecture - Fall 2024 Training Large Language Models CS 4644 / 7643: Deep Learning William Held School of Interactive Computing Georgia Institute of Technology

Every semester, I drop into Georgia Tech's Deep Learning course to do a speed-through LLM lecture! I keep updating things to balance "history" and recent progress.

Slides for this semester are here for folks who are teaching courses on NLP/DL/LLMs in the near future: docs.google.com/presentation...

07.11.2024 21:45 — 👍 10 🔁 2 💬 0 📌 0

I wanted to contribute to "Starter Pack Season" with one for Stanford NLP+HCI: go.bsky.app/VZBhuJ5

Here are some other great starter packs:

- CSS: go.bsky.app/GoEyD7d + go.bsky.app/CYmRvcK
- NLP: go.bsky.app/SngwGeS + go.bsky.app/JgneRQk
- HCI: go.bsky.app/p3TLwt
- Women in AI: go.bsky.app/LaGDpqg

15.11.2024 19:20 — 👍 25 🔁 10 💬 2 📌 2

Posts by Diyi Yang (@diyiyang.bsky.social)