We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions ๐ฎ aligned with human norms ๐ฉโโ๏ธ ?
With EgoNormia, a 1.8k ego-centric video ๐ฅฝ QA benchmark, we show that this is surprisingly challenging!
04.03.2025 04:32 โ ๐ 22 ๐ 9 ๐ฌ 1 ๐ 1
We (w/ @diyiyang.bsky.social, @zhuhao.me, & Bodhisattwa Prasad Majumder) are excited to present our #NAACL25 tutorial on Social Intelligence in the Age of LLMs!
It will highlight long-standing and emerging challenges of AI interacting w humans, society & the world.
โฐ May 3, 2:00pm-5:30pm Room Pecos
03.05.2025 13:58 โ ๐ 14 ๐ 6 ๐ฌ 0 ๐ 0
LM agents today primarily aim to automate tasks. Can we turn them into collaborative teammates? ๐คโ๐ค
Introducing Collaborative Gym (Co-Gym), a framework for enabling & evaluating human-agent collaboration! I now get used to agents proactively seeking confirmations or my deep thinking.(๐งต with video)
17.01.2025 17:44 โ ๐ 22 ๐ 10 ๐ฌ 1 ๐ 1
Talk Arena
Interactive evaluation for audio models
My first bluesky post will be for my first project as a postdoc at Stanford.
Talk Arena is our first step towards building audio LMs into interactive agents. Try it out and let me know what you think. talkarena.org
10.12.2024 01:39 โ ๐ 18 ๐ 4 ๐ฌ 2 ๐ 1
Talk Arena
Interactive evaluation for audio models
Want to add your model to the arena? Have an idea for a new feature for Talk Arena? We are open to collaboration in many forms!
Co-led with @ellaminzhili.bsky.social in collaboration with @michaelryan207.bsky.social Kunat Pipatanakul Potsawee Manakul @zhuhao.me and @diyiyang.bsky.social (5/5)
10.12.2024 00:01 โ ๐ 3 ๐ 1 ๐ฌ 0 ๐ 0
Talk Arena: Interactive Evaluation of Large Audio Models
With an increasing number of Large *Audio* Models ๐, which one do users like the most?
Introducing talkarena.org โ an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks.
๐งต (1/5)
10.12.2024 00:01 โ ๐ 30 ๐ 8 ๐ฌ 3 ๐ 3
Excited to present our PrivacyLens paper at #NuerIPS next week! We explore LM agent privacy risks when deployed as personal assistants. (Details in thread)
I am working on developing LM agents as collaborative research partners, learning aids, personal assistants, and more. Let's connect and chat!!
06.12.2024 18:20 โ ๐ 7 ๐ 2 ๐ฌ 2 ๐ 0
Meet Tรผlu 3, a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms.
We invented new methods for fine-tuning language models with RL and built upon best practices to scale synthetic instruction and preference data.
Demo, GitHub, paper, and models ๐
21.11.2024 17:15 โ ๐ 111 ๐ 31 ๐ฌ 2 ๐ 7
Histogram peaked at 3 minutes and 2 weeks since sent
When I will respond to your email
20.11.2024 18:29 โ ๐ 2069 ๐ 351 ๐ฌ 40 ๐ 86
so far, every Thanksgiving week is writing letters week for me ๐คฃ
20.11.2024 05:21 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
๐ถ๏ธ(?) take: Agents are somehow hot right because people realized that LLM output can be interpreted as a DSL which directs side effects in the world (e.g. tool calls) rather than just returning text in a chat/autocomplete sense. What are the open challenges? A ๐งต... [1/11]
19.11.2024 09:32 โ ๐ 167 ๐ 30 ๐ฌ 9 ๐ 7
I did an unscientific, uncontrolled experiment for #EMNLP2024โdetails in ๐งต๐. I posted my conference & workshop papers to 5 socials. Clear results: Mastodon is near dead, Threads may have users but not my people, not giving up on X/Twitter yet, but Bluesky is worth investing in.
18.11.2024 18:40 โ ๐ 87 ๐ 10 ๐ฌ 2 ๐ 0
Header of poster "Turn your LLM into a Speech LLM in 6 hours without any new data"
Find a machine readable version of this poster at https://diva-audio.github.io/
I'll be at the Google Theory and Practice of Foundation Models Workshop today and tomorrow! FOMO for EMNLP, but excited to chat more casually at a smaller non-archival workshop ๐
I am presenting at the Lightning Talks tomorrow at 1:30 PM on our Distilled Voice Assistant model if you're around!
14.11.2024 21:25 โ ๐ 8 ๐ 1 ๐ฌ 0 ๐ 0
CS 4644 / 7643: Deep Learning - LLM Guest Lecture - Fall 2024
Training Large Language Models CS 4644 / 7643: Deep Learning William Held School of Interactive Computing Georgia Institute of Technology
Every semester, I drop into Georgia Tech's Deep Learning course to do a speed-through LLM lecture! I keep updating things to balance "history" and recent progress.
Slides for this semester are here for folks who are teaching courses on NLP/DL/LLMs in the near future: docs.google.com/presentation...
07.11.2024 21:45 โ ๐ 10 ๐ 2 ๐ฌ 0 ๐ 0
I wanted to contribute to "Starter Pack Season" with one for Stanford NLP+HCI: go.bsky.app/VZBhuJ5
Here are some other great starter packs:
- CSS: go.bsky.app/GoEyD7d + go.bsky.app/CYmRvcK
- NLP: go.bsky.app/SngwGeS + go.bsky.app/JgneRQk
- HCI: go.bsky.app/p3TLwt
- Women in AI: go.bsky.app/LaGDpqg
15.11.2024 19:20 โ ๐ 25 ๐ 10 ๐ฌ 2 ๐ 2
Professor @milanlp.bsky.social for #NLProc, compsocsci, #ML
Also at http://dirkhovy.com/
AI researcher. Postdocing at Stanford NLP. Prev: PhD CMU LTI.
Visit https://zhuhao.me
Raising agents in the Opensocial.world
The AI community building the future!
Breakthrough AI to solve the world's biggest problems.
โบ Join us: http://allenai.org/careers
โบ Get our newsletter: https://share.hsforms.com/1uJkWs5aDRHWhiky3aHooIg3ioxm
Computational LinguistsโNatural LanguageโMachine Learning
AI scientist & consultant :: prev Amazon Alexa, Toshiba, Cam Uni :: voice & language tech :: powered by coffee :: photographer :: Cambridge UK
https://www.catherinebreslin.co.uk
Founding list[float] engineer. Recsys. Personalization. Infra. Systems. Normcore code. Nutella. Vectors. Words. Vibes. Bad puns (soon).
https://vickiboykis.com/what_are_embeddings/
Professor, Santa Fe Institute. Research on AI, cognitive science, and complex systems.
Website: https://melaniemitchell.me
Substack: https://aiguide.substack.com/
Full professor of inclusive speech communication at TU Delft, The Netherlands. Former president of the International Speech Communication Association (ISCA). General Chair of @interspeech.bsky.social Rotterdam, 2025. Mother of 3๐
Co-founder & CEO of Hidden Door. I <3 cheeseburgers and making beautiful things. I know a thing or two about AI, but it's people who matter. I am interested in many things.
NYC
Always growing, she/her, RAG builder, LLM whisperer, tech generalist
proud mediterrenean ๐งฟ open-sourceress at hugging face ๐ค multimodality, zero-shot vision, vision language models, transformers
Professor of Computer Science and Tech Evangelist at Durham University; International Keynote Speaker; #womenintech, #100moments tech #podcast host and so much moreโฆ https://linktr.ee/drsueblack
Professor of Computational Cognitive Science | @AI_Radboud | @Iris@scholar.social on ๐ฆฃ | http://cognitionandintractability.com | she/they ๐ณ๏ธโ๐
Musings about Design & Product. Currently at Zillow, Design Tech & AI.
Academic; writer. Professor of AI & Society, Chair-Director @kings-dfi.bsky.social, Kingโs College London. Come for the sex robots; stay for the eye-rolling at AI nonsense on a daily basis. #academicsky . Norn Irish in Norwich.
Leading delivery of Scotlandโs AI Strategy. Advocate for diversity in STEM, data and AI. Scottish AI Alliance. Diverse AI. Working towards a trustworthy, ethical and inclusive AI future. Failed astrophysicist. Film geek. Parent. Love a GIF
- Director @adalovelaceinst.bsky.social:ensuring data & AI work for ppl & society
- Stint in government - led #NationalDataStrategy; roles in Cabinet Office, ONS & MHCLG
- Charity roles inc. Samaritans Trustee; staff @ The RSA, Centrepoint, ParkinsonsUK
๐ฒ๐ฒ Applied ML/AI, data science, MLOps | Wife of 1, mom of 2 | Co-Founder and CTO of http://storytellers.ai
python ๐ AI ๐ค cloud โ๏ธ data ๐
I also talk about Jesus here: @itskirstenlum.bsky.social
Associate Professor of AI & Society
Oxford Internet Institute and Institute for Ethics in AI, University of Oxford
Principal Investigator at DomesticAI (http://domesticai.oii.ox.ac.uk) project funded by ESRC