We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions ๐ฎ aligned with human norms ๐ฉโโ๏ธ ?
With EgoNormia, a 1.8k ego-centric video ๐ฅฝ QA benchmark, we show that this is surprisingly challenging!
04.03.2025 04:32 โ
๐ 22
๐ 9
๐ฌ 1
๐ 1
We (w/ @diyiyang.bsky.social, @zhuhao.me, & Bodhisattwa Prasad Majumder) are excited to present our #NAACL25 tutorial on Social Intelligence in the Age of LLMs!
It will highlight long-standing and emerging challenges of AI interacting w humans, society & the world.
โฐ May 3, 2:00pm-5:30pm Room Pecos
03.05.2025 13:58 โ
๐ 14
๐ 6
๐ฌ 0
๐ 0
LM agents today primarily aim to automate tasks. Can we turn them into collaborative teammates? ๐คโ๐ค
Introducing Collaborative Gym (Co-Gym), a framework for enabling & evaluating human-agent collaboration! I now get used to agents proactively seeking confirmations or my deep thinking.(๐งต with video)
17.01.2025 17:44 โ
๐ 22
๐ 10
๐ฌ 1
๐ 1
Talk Arena
Interactive evaluation for audio models
My first bluesky post will be for my first project as a postdoc at Stanford.
Talk Arena is our first step towards building audio LMs into interactive agents. Try it out and let me know what you think. talkarena.org
10.12.2024 01:39 โ
๐ 18
๐ 4
๐ฌ 2
๐ 1
Talk Arena
Interactive evaluation for audio models
Want to add your model to the arena? Have an idea for a new feature for Talk Arena? We are open to collaboration in many forms!
Co-led with @ellaminzhili.bsky.social in collaboration with @michaelryan207.bsky.social Kunat Pipatanakul Potsawee Manakul @zhuhao.me and @diyiyang.bsky.social (5/5)
10.12.2024 00:01 โ
๐ 3
๐ 1
๐ฌ 0
๐ 0
Talk Arena: Interactive Evaluation of Large Audio Models
With an increasing number of Large *Audio* Models ๐, which one do users like the most?
Introducing talkarena.org โ an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks.
๐งต (1/5)
10.12.2024 00:01 โ
๐ 30
๐ 8
๐ฌ 3
๐ 3
Excited to present our PrivacyLens paper at #NuerIPS next week! We explore LM agent privacy risks when deployed as personal assistants. (Details in thread)
I am working on developing LM agents as collaborative research partners, learning aids, personal assistants, and more. Let's connect and chat!!
06.12.2024 18:20 โ
๐ 7
๐ 2
๐ฌ 2
๐ 0
Meet Tรผlu 3, a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms.
We invented new methods for fine-tuning language models with RL and built upon best practices to scale synthetic instruction and preference data.
Demo, GitHub, paper, and models ๐
21.11.2024 17:15 โ
๐ 111
๐ 31
๐ฌ 2
๐ 7
Histogram peaked at 3 minutes and 2 weeks since sent
When I will respond to your email
20.11.2024 18:29 โ
๐ 2062
๐ 350
๐ฌ 39
๐ 86
so far, every Thanksgiving week is writing letters week for me ๐คฃ
20.11.2024 05:21 โ
๐ 2
๐ 0
๐ฌ 1
๐ 0
๐ถ๏ธ(?) take: Agents are somehow hot right because people realized that LLM output can be interpreted as a DSL which directs side effects in the world (e.g. tool calls) rather than just returning text in a chat/autocomplete sense. What are the open challenges? A ๐งต... [1/11]
19.11.2024 09:32 โ
๐ 165
๐ 30
๐ฌ 9
๐ 7
I did an unscientific, uncontrolled experiment for #EMNLP2024โdetails in ๐งต๐. I posted my conference & workshop papers to 5 socials. Clear results: Mastodon is near dead, Threads may have users but not my people, not giving up on X/Twitter yet, but Bluesky is worth investing in.
18.11.2024 18:40 โ
๐ 87
๐ 10
๐ฌ 2
๐ 0
Header of poster "Turn your LLM into a Speech LLM in 6 hours without any new data"
Find a machine readable version of this poster at https://diva-audio.github.io/
I'll be at the Google Theory and Practice of Foundation Models Workshop today and tomorrow! FOMO for EMNLP, but excited to chat more casually at a smaller non-archival workshop ๐
I am presenting at the Lightning Talks tomorrow at 1:30 PM on our Distilled Voice Assistant model if you're around!
14.11.2024 21:25 โ
๐ 8
๐ 1
๐ฌ 0
๐ 0
CS 4644 / 7643: Deep Learning - LLM Guest Lecture - Fall 2024
Training Large Language Models CS 4644 / 7643: Deep Learning William Held School of Interactive Computing Georgia Institute of Technology
Every semester, I drop into Georgia Tech's Deep Learning course to do a speed-through LLM lecture! I keep updating things to balance "history" and recent progress.
Slides for this semester are here for folks who are teaching courses on NLP/DL/LLMs in the near future: docs.google.com/presentation...
07.11.2024 21:45 โ
๐ 10
๐ 2
๐ฌ 0
๐ 0
I wanted to contribute to "Starter Pack Season" with one for Stanford NLP+HCI: go.bsky.app/VZBhuJ5
Here are some other great starter packs:
- CSS: go.bsky.app/GoEyD7d + go.bsky.app/CYmRvcK
- NLP: go.bsky.app/SngwGeS + go.bsky.app/JgneRQk
- HCI: go.bsky.app/p3TLwt
- Women in AI: go.bsky.app/LaGDpqg
15.11.2024 19:20 โ
๐ 25
๐ 10
๐ฌ 2
๐ 2