We hope this survey is useful and fun for the community! We couldnโt include everything, but tried to at least give a good overview of the field. Happy to hear feedback and if you think we messed something up, feel free to DM or email me.
29.08.2025 16:41 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Thereโs a lot of great stuff in here we think! We cite over 100 papers and websites. One thing I am very happy about is how easy it is to follow links in our survey to the bibliography which then links to the papers directly.
29.08.2025 16:41 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Then we talk about the LLM-Agent approaches and try to explain and make some sense of the many components that make up an LLM-based Computer Use Agent.
29.08.2025 16:41 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
We then spend a lot of time looking at the different earlier (Pre-LLM) approaches to the problem, including the RL-from scratch period and even the very earliest planning-based approaches.
29.08.2025 16:41 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
We try to categorize all the environments and datasets in common use and let users click/filter and browse through each of the datasets.
29.08.2025 16:41 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
First, we try to ground our survey, say what we even mean by โComputer Useโ and define some key terms, grounded in the classical agent-environment framework.
29.08.2025 16:41 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
You can view the survey here: kennethmarino.com/computeruse/...
We tried to make it as interactive and fun as possible, including a retro DOS theme to go along with the subject.
Credit to Claude for helping me create the website :)
29.08.2025 16:41 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Super excited that the Computer Use survey I've been working on w/ @anamarasovic.bsky.social for a while now is ready! Originally we were planning on a more traditional survey paper but as more surveys came out we decided on an interactive website survey.
29.08.2025 16:41 โ ๐ 1 ๐ 1 ๐ฌ 2 ๐ 1
Arriving to #ACL2025 #ACL2025NLP in a few hours!
See you at the welcome reception & catch me at the poster session on ๐๐ฎ๐๐ฌ๐๐๐ฒ (๐๐ฎ๐ฅ๐ฒ ๐๐) ๐๐ญ ๐๐:๐๐๐๐ฆ, where Jesse will present our work introducing new tasks for supporting legal brief writing: arxiv.org/abs/2506.06619
27.07.2025 13:35 โ ๐ 25 ๐ 3 ๐ฌ 1 ๐ 4
I canโt find it but my favorite was when someone asked ChatGPT to set an alarm for them and it pretended to set one and the person missed their important meeting
16.07.2025 22:04 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
Also, this is my first paper (hopefully of many) with my
@utah.edu colleagues! Feel very welcomed so far and really excited about the things we'll be able to do together. And we just had another great hiring year with several new colleagues, so expect lots of exciting stuff soon!
01.07.2025 17:30 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Read Fateme's full thread, but what I find interesting about the paper is that LLMs are already pretty good at summarization, but is still quite bad at finding relevant cases. With many retrieval benchmarks becoming saturated, I think this is an exciting place for new work!
01.07.2025 17:30 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Really excited about this!
As backstory, Jesse Woo started this project when I taught a ML Datasets class at Columbia.
Then we joined up with @anamarasovic.bsky.social and @fatemehc.bsky.social and really kicked it into high gear. Would not have happened without the full team!
01.07.2025 17:29 โ ๐ 1 ๐ 1 ๐ฌ 1 ๐ 0
Join us on June 11, 9am to discuss all things fine-grained!
We are looking forward to a series of talks on semantic granularity, covering topics such as machine teaching, interpretability and much more!
Room 104 E
Schedule & details: sites.google.com/view/fgvc12
@cvprconference.bsky.social #CVPR25
08.06.2025 23:19 โ ๐ 10 ๐ 6 ๐ฌ 1 ๐ 1
We are so excited to have this amazing line-up of speakers!!
Randall Balestriero, Kai Han, Mia Chiquier, Kenneth Marino (@kennethmarino.bsky.socialโฌ), Elisa Ricci, Thomas Fel (@thomasfel.bsky.socialโฌ)
08.06.2025 23:30 โ ๐ 2 ๐ 1 ๐ฌ 0 ๐ 0
We just dropped a new paper on studying LLMs on the โBlicket Testโ to ask the question: do language models explore like adults or like children? We also show how to get them to act more like children (i.e. more like scientists). All credit to Anthony and team, this came together super well!
16.05.2025 17:18 โ ๐ 3 ๐ 2 ๐ฌ 0 ๐ 0
Really glad you like the paper! Anthony and team did a great job on this.
15.05.2025 19:21 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Are you tired of your static fixed benchmarks? Feel like your data is in a rut. You want to change something but you just feel stuck? Try ReCogLab!
Really proud of this work and of my fantastic colleagues at Google DeepMind who put in so much hard work.
See you all in Singapore!
18.03.2025 17:06 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
You donโt know me man. Get off your high horse. Blocking you now
19.02.2025 13:26 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
I literally do none of those things. I donโt work in any of these areas. I think you need to step back and ask why youโre fighting random researchers who donโt decide these things instead of the people you actually seem mad at
19.02.2025 13:15 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
?????
I post about AI papers, what on Earth are you talking about?
19.02.2025 12:29 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
People who actually believe in the promise of AI should be the most upset about the over-claiming, over-hyping and overt secrecy and unwillingness to expose your work to scrutiny that has come to characterize much of the โfeel the AGIโ crowd.
20.01.2025 20:39 โ ๐ 7 ๐ 0 ๐ฌ 1 ๐ 0
This field is literally so old that there was famously a paper calling the field overhyped called the Lighthill Report in 1973 that caused funding to plummet. Weโve literally already went through at least a few hype cycles.
19.01.2025 18:08 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
This is why open source and publishing is important. Maybe OpenAI didnโt do anything sus with held out splits. But if code and models are never released and the experiments and methods are not published or described in sufficient detail, we canโt reproduce it or scrutinize any of these decisions.
19.01.2025 17:34 โ ๐ 7 ๐ 0 ๐ฌ 0 ๐ 0
Just read a fantastic web agent paper. Game changer!
* Treats it as an RL problem
* Trains rather than just prompting
* Beats closed models
* Releases code and model so other people can build off of their work
Many great ideas in this paper too, definitely read
arxiv.org/pdf/2411.02337
17.01.2025 16:23 โ ๐ 7 ๐ 0 ๐ฌ 0 ๐ 0
If this isnโt clear btw, I donโt think you must have a full conference paper in a top venue to do a PhD. But trying to trick your reviewers is a really bad strategy for getting them to like you.
07.01.2025 15:50 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
Would love to see blog posts in CVs actually. Just donโt try to trick me into thinking itโs a full conference paper!
07.01.2025 15:44 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
Fun fact: Faculty do check the papers you put in your CV and notice when you try to make a workshop paper look like a full conference paper with deceptive wording.
07.01.2025 01:36 โ ๐ 5 ๐ 0 ๐ฌ 2 ๐ 0
I feel like our field spent like 5 years playing around with different algorithms for model-free RL just to figure out, oh PPO works pretty good actually.
05.01.2025 23:13 โ ๐ 6 ๐ 0 ๐ฌ 1 ๐ 0
Felix Hill and some other DMers and I after cold water swimming at Parliament Hill Lido a few years ago
Felix Hill was such an incredible mentor โ and occasional cold water swimming partner โ to me. He's a huge part of why I joined DeepMind and how I've come to approach research. Even a month later, it's still hard to believe he's gone.
02.01.2025 19:01 โ ๐ 124 ๐ 17 ๐ฌ 7 ๐ 5
Workshop on Fine-Grained Visual Categorization (FGVC) - CVPR
Nashville, June 11, 9am-17pm
Room: 104 E
https://sites.google.com/view/fgvc12
PhD student at NYU CILVR. Prev: Master's at McGill / Mila. || RL, ML, Neuroscience.
https://im-ant.github.io/
Cognitive scientist, philosopher, and psychologist at Berkeley, author of The Scientist in the Crib, The Philosophical Baby and The Gardener and the Carpenter and grandmother of six.
Professor, UW Biology / Santa Fe Institute
I study how information flows in biology, science, and society.
Book: *Calling Bullshit*, http://tinyurl.com/fdcuvd7b
LLM course: https://thebullshitmachines.com
Corvids: https://tinyurl.com/mr2n5ymk
he/him
Research @OpenAI. I study Reinforcement Learning. PhD from UT Austin. Previously FAIR Paris, Meta US, NVIDIA, CMU, and IIT Kharagpur.
Website: https://hari-sikchi.github.io/
PhD Student in Machine Learning at CMU.
๐ฆ twitter.com/steph_milani
๐ stephmilani.github.io
PhD at Machine Learning Department, Carnegie Mellon University | Interactive Decision Making | https://yudasong.github.io
PhD student | Interested in all things decision-making and learning
In search of mathematics and ML content.
PhD@NYU
Reinforcement Learning PhD student, UPF Barcelona.
Uncertain in the face of optimism.
ahanadeb.github.io
ML Research @ Tzafon | Prev: Robot Learning & RL PhD @Technion
More data isn't all we need ๐ญ๐ฆพ ๐
PhD Student working on Generlization and State abstractions in #RL, #MetaLearning, and #AutoRL
amsks.github.io
RL researcher looking for DACs // What is this AutoRL anyway?
she/her
Currently: Leibniz Uni Hannover
Previously: Uni Freiburg (Master's) | Meta AI London (Intern)
Always & Forever: AutoRL.org
Stanford CS PhD working on RL and LLMs with Emma Brunskill and Chris Piech. Co-creator of Trace. Prev @GoogleDeepMind @MicrosoftResearch
Specifically
- Offline RL
- In-context RL
- Causality
https://anie.me/about
Unverified hot takes go to this account
For professional, see https://cvoelcker.de
If I seem very angry, check if I have been watered in the last 24 hours.
Now ๐บ๐ธ flavoured, previously available in ๐จ๐ฆ and ๐ฉ๐ช
PhD student at the University of Pennsylvania. Prev, intern at MSR, currently at Meta FAIR. Interested in reliable and replicable reinforcement learning, robotics and knowledge discovery: https://marcelhussing.github.io/
All posts are my own.
Visiting Researcher at Meta; PhD student @mila.quebec. Ex: Intern @GoogleDeepMind, Intern @ EPFL, MSc@MIPT;
artemzholus.github.io
PhD Student in Tรผbingen (MPI-IS & Uni Tรผ), interested in reinforcement learning. Freedom is a pure idea. https://onnoeberhard.com/
PhD student at UC Berkeley studying RL and AI safety.
https://cassidylaidlaw.com
CS PhD candidate @UCLA | Prev. Research Intern @MSFTResearch, Applied Scientist Intern @AWS | LLM post-training, multi-modal learning
https://yihedeng9.github.io