Diyi Yang's Avatar

Diyi Yang

@diyiyang.bsky.social

Assistant Professor @Stanford CS @StanfordNLP @StanfordAILab Computational Social Science & NLP

2,073 Followers  |  716 Following  |  2 Posts  |  Joined: 18.11.2024  |  1.9232

Latest posts by diyiyang.bsky.social on Bluesky

Video thumbnail

We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions ๐ŸŽฎ aligned with human norms ๐Ÿ‘ฉโ€โš–๏ธ ?

With EgoNormia, a 1.8k ego-centric video ๐Ÿฅฝ QA benchmark, we show that this is surprisingly challenging!

04.03.2025 04:32 โ€” ๐Ÿ‘ 22    ๐Ÿ” 9    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

We (w/ @diyiyang.bsky.social, @zhuhao.me, & Bodhisattwa Prasad Majumder) are excited to present our #NAACL25 tutorial on Social Intelligence in the Age of LLMs!
It will highlight long-standing and emerging challenges of AI interacting w humans, society & the world.
โฐ May 3, 2:00pm-5:30pm Room Pecos

03.05.2025 13:58 โ€” ๐Ÿ‘ 14    ๐Ÿ” 6    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

LM agents today primarily aim to automate tasks. Can we turn them into collaborative teammates? ๐Ÿค–โž•๐Ÿ‘ค

Introducing Collaborative Gym (Co-Gym), a framework for enabling & evaluating human-agent collaboration! I now get used to agents proactively seeking confirmations or my deep thinking.(๐Ÿงต with video)

17.01.2025 17:44 โ€” ๐Ÿ‘ 22    ๐Ÿ” 10    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
Talk Arena Interactive evaluation for audio models

My first bluesky post will be for my first project as a postdoc at Stanford.

Talk Arena is our first step towards building audio LMs into interactive agents. Try it out and let me know what you think. talkarena.org

10.12.2024 01:39 โ€” ๐Ÿ‘ 18    ๐Ÿ” 4    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Preview
Talk Arena Interactive evaluation for audio models

Want to add your model to the arena? Have an idea for a new feature for Talk Arena? We are open to collaboration in many forms!

Co-led with @ellaminzhili.bsky.social in collaboration with @michaelryan207.bsky.social Kunat Pipatanakul Potsawee Manakul @zhuhao.me and @diyiyang.bsky.social (5/5)

10.12.2024 00:01 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Talk Arena: Interactive Evaluation of Large Audio Models

Talk Arena: Interactive Evaluation of Large Audio Models

With an increasing number of Large *Audio* Models ๐Ÿ”Š, which one do users like the most?

Introducing talkarena.org โ€” an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks.
๐Ÿงต (1/5)

10.12.2024 00:01 โ€” ๐Ÿ‘ 30    ๐Ÿ” 8    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 3
Preview
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action As language models (LMs) are widely utilized in personalized communication scenarios (e.g., sending emails, writing social media posts) and endowed with a certain level of agency, ensuring they act in...

Check out our paper, code, data to learn more!

Paper: arxiv.org/abs/2409.00138
Website: salt-nlp.github.io/PrivacyLens/

06.12.2024 18:20 โ€” ๐Ÿ‘ 2    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

Excited to present our PrivacyLens paper at #NuerIPS next week! We explore LM agent privacy risks when deployed as personal assistants. (Details in thread)

I am working on developing LM agents as collaborative research partners, learning aids, personal assistants, and more. Let's connect and chat!!

06.12.2024 18:20 โ€” ๐Ÿ‘ 7    ๐Ÿ” 2    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

Meet Tรผlu 3, a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms.
We invented new methods for fine-tuning language models with RL and built upon best practices to scale synthetic instruction and preference data.
Demo, GitHub, paper, and models ๐Ÿ‘‡

21.11.2024 17:15 โ€” ๐Ÿ‘ 111    ๐Ÿ” 31    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 7
Post image Post image Post image Post image

Missed some โ€“ or all โ€“ of our papers at #EMNLP2024?

It's not too late to catch up using this handy list from the Stanford AI Lab blog:

ai.stanford.edu/blog/emnlp-2...

18.11.2024 16:29 โ€” ๐Ÿ‘ 24    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Histogram peaked at 3 minutes and 2 weeks since sent

Histogram peaked at 3 minutes and 2 weeks since sent

When I will respond to your email

20.11.2024 18:29 โ€” ๐Ÿ‘ 2069    ๐Ÿ” 351    ๐Ÿ’ฌ 40    ๐Ÿ“Œ 86

so far, every Thanksgiving week is writing letters week for me ๐Ÿคฃ

20.11.2024 05:21 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐ŸŒถ๏ธ(?) take: Agents are somehow hot right because people realized that LLM output can be interpreted as a DSL which directs side effects in the world (e.g. tool calls) rather than just returning text in a chat/autocomplete sense. What are the open challenges? A ๐Ÿงต... [1/11]

19.11.2024 09:32 โ€” ๐Ÿ‘ 167    ๐Ÿ” 30    ๐Ÿ’ฌ 9    ๐Ÿ“Œ 7

I did an unscientific, uncontrolled experiment for #EMNLP2024โ€”details in ๐Ÿงต๐Ÿ‘‡. I posted my conference & workshop papers to 5 socials. Clear results: Mastodon is near dead, Threads may have users but not my people, not giving up on X/Twitter yet, but Bluesky is worth investing in.

18.11.2024 18:40 โ€” ๐Ÿ‘ 87    ๐Ÿ” 10    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
EMNLP 2024 Tutorial: Language Agents: Foundations, Prospects, and Risks Deformable Neural Radiance Fields creates free-viewpoint portraits (nerfies) from casually captured videos.

Had a great time doing the language agent tutorial (language-agent-tutorial.github.io) with Yu Su, Shunyu Yao and Tao Yu ๐Ÿ˜€ #EMNLP2024

Check out our slides here: tinyurl.com/language-age...

18.11.2024 18:28 โ€” ๐Ÿ‘ 33    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Header of poster "Turn your LLM into a Speech LLM in 6 hours without any new data"

Find a machine readable version of this poster at https://diva-audio.github.io/

Header of poster "Turn your LLM into a Speech LLM in 6 hours without any new data" Find a machine readable version of this poster at https://diva-audio.github.io/

I'll be at the Google Theory and Practice of Foundation Models Workshop today and tomorrow! FOMO for EMNLP, but excited to chat more casually at a smaller non-archival workshop ๐Ÿ˜…

I am presenting at the Lightning Talks tomorrow at 1:30 PM on our Distilled Voice Assistant model if you're around!

14.11.2024 21:25 โ€” ๐Ÿ‘ 8    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
CS 4644 / 7643: Deep Learning - LLM Guest Lecture - Fall 2024 Training Large Language Models CS 4644 / 7643: Deep Learning William Held School of Interactive Computing Georgia Institute of Technology

Every semester, I drop into Georgia Tech's Deep Learning course to do a speed-through LLM lecture! I keep updating things to balance "history" and recent progress.

Slides for this semester are here for folks who are teaching courses on NLP/DL/LLMs in the near future: docs.google.com/presentation...

07.11.2024 21:45 โ€” ๐Ÿ‘ 10    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I wanted to contribute to "Starter Pack Season" with one for Stanford NLP+HCI: go.bsky.app/VZBhuJ5

Here are some other great starter packs:

- CSS: go.bsky.app/GoEyD7d + go.bsky.app/CYmRvcK
- NLP: go.bsky.app/SngwGeS + go.bsky.app/JgneRQk
- HCI: go.bsky.app/p3TLwt
- Women in AI: go.bsky.app/LaGDpqg

15.11.2024 19:20 โ€” ๐Ÿ‘ 25    ๐Ÿ” 10    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 2

@diyiyang is following 20 prominent accounts