βcasual interceptionβ as defined in \citep{}β¦
14.02.2025 23:41 β π 2 π 0 π¬ 0 π 0@shikharmurty.bsky.social
Final year PhD Student in Computer Science @Stanford Work on: - Compositionality, syntax (language structure) - Web Agents: Synthetic data, tree search, exploration (language interpretation)
βcasual interceptionβ as defined in \citep{}β¦
14.02.2025 23:41 β π 2 π 0 π¬ 0 π 0Ever dreamed of AI agents learning through interacting with the open world unsupervisedly? Our latest preprint introduces NNetNav-Live which collects training data through exploration on real websites and hindsight labeling, which produces a SOTA OSS agent.
06.02.2025 19:22 β π 4 π 2 π¬ 1 π 0controlling a browser / computer!
but requires a bit more tooling to set it up.
Please check out our paper for more details: arxiv.org/pdf/2410.02907
And our code if you want a NNetNav-ed model for your own domain:
github.com/MurtyShikhar...
Done with collaborators: @zhuhao.me, Dzmitry Bahdanau and @chrmanning.bsky.social
We find that cross-website robustness is limited, and almost always, performance goes up from incorporating in-domain nnetnav data. This makes it even more important to work on unsupervised learning for agents - how are you going to collect human data for *any* website? [6/n]
06.02.2025 17:42 β π 1 π 0 π¬ 1 π 0We use this data for SFT-ing LLama3.1-8b. Our best models outperform zero-shot GPT-4 on both WebArena and WebVoyager, and reach SoTA performance among unsupervised methods for both datasets [5/n]
06.02.2025 17:42 β π 0 π 0 π¬ 1 π 0We use NNetNav to collect around 10k workflows for over 20 websites including 15 live websites, and 5 self-hosted websites.
Data is available on π€: huggingface.co/datasets/sta...
huggingface.co/datasets/sta...
[4/n]
Main ideas behind NNetNav exploration
1 complex goals have intermediate subgoals thus complex trajectories must have meaningful sub-trajectories
2 Use an LM instruction relabeler + judge to test if trajectory-so-far is meaningful. If yes, continue exploring, otherwise prune [3/n]
NNetNav uses a structured exploration method to efficiently search and collect traces on live-websites, which are retroactively labeled into instructions, finding a strikingly diverse set of workflows for any website (e.g. like this plot) [2/n]
06.02.2025 17:42 β π 0 π 0 π¬ 1 π 0Want to make a browser agent for *any* domain like banking or healthcare?
We propose methods for training LLMs with open-ended, unsupervised interaction on live websites:
β
OSS SoTA on WebVoyager
β
world's smallest high-performing web-agent
Try it here: nnetnav.dev
going to stay off twitter for my own mental health. something has gone horribly wrong with that platform.
28.12.2024 22:07 β π 5 π 0 π¬ 0 π 0Couldn't make it to NeurIPS due to work, but do check out our workshop happening in West Ballroom B. Lots of cool things to come, including a very fun panel!
15.12.2024 20:29 β π 2 π 0 π¬ 0 π 0Come visit our poster "MoEUT: Mixture-of-Experts Universal Transformers" on Friday at 4:30 in East Exhibit Hall A-C #1907 on #NeurIPS2024. With Kazuki Irie, JΓΌrgen Schmidhuber, Christopher Potts and @chrmanning.bsky.social.
12.12.2024 22:46 β π 14 π 5 π¬ 1 π 0The extraordinary recent takeover of ML/AI by #NLP is well-known but insufficiently reflected on.
Look at the @neuripsconf.bsky.social tutorials in 2024!
neurips.cc/virtual/2024...
14 tutorials; 6 have "LLM" in the title; 4 more cover foundation models, with large NLP coverage. That's > 70% π²
π¨ Thrilled to share that Compositional Generalization Across Distributional Shifts with Sparse Tree Operations received a spotlight award at #NeurIPS2024! π I'll present a poster on Tuesday and give an invited lightning talk at the System 2 Reasoning Workshop on Sunday. π§΅π
09.12.2024 15:06 β π 12 π 4 π¬ 1 π 1AgentLab diagram. The image describes AgentLab, a framework for efficient parallel experiments with agents. It highlights: Core Agent Features: Dynamic Prompting and a Unified LLM API for interacting with large language models. BrowserGym Platform: A tool for testing agents on benchmarks like WebArena, WorkArena, MiniWoB, and others. Key Features: Reproducibility, a Unified Leaderboard, an analysis tool called Xray, and a Dataset for sharing agent traces. Blue elements represent AgentLab components.
π§΅-1
We are thrilled to release #AgentLab, a new open-source package for developing and evaluating web agents. This builds on the new #BrowserGym package which supports 10 different benchmarks, including #WebArena.
Folks, I'm not going to be at Neurips this year, but we have an *awesome* workshop that i'm super proud of.
Go attend, and use the link below to ask all of your burning questions about LLM reasoning, agents and compositionality!
πExcited for #neurips2024 and our "System 2 Reasoning at Scale" workshop. We have an excited lineup of speakers who will answer your most burning questions about AI and reasoning π
π₯Got spicy questions? Submit & vote here:
app.sli.do/event/dJNU63...
I also wear the AI agents researcher hat. Can't say i'm similarly impressed by reviewers in that community...
27.11.2024 23:32 β π 1 π 0 π¬ 0 π 0ACL syntax track reviewers >> almost any other conference.
These folks care about their sub-field and i learn something new every time!
Now, reviewers are upset if we only finetune sub 10B parameter models!
26.11.2024 22:28 β π 0 π 0 π¬ 1 π 0for more context: we are training the probe on sentences from PTB / BLIMP
25.11.2024 05:52 β π 1 π 0 π¬ 0 π 0thx for sharing, though semantic parsing almost certainly benefits from modeling syntax :)
25.11.2024 03:49 β π 1 π 0 π¬ 1 π 0SRL probe still rewards hidden states that model dependency relations, no? would like a probe thats agnostic to how well the underlying network models syntax
24.11.2024 22:38 β π 1 π 0 π¬ 1 π 0could i get added? thx for making this!!
24.11.2024 05:25 β π 2 π 0 π¬ 0 π 0What is a probing task that is purely about semantics?
Context: I have a probe trained to predict dependency relations, and would like to train another one on a semantics only task (for research purposes)
To be fair, after some prompt engineering:
German:
(S
(NP (DT Der) (NN Mann))
(VP (VB mag)
(NP (JJ schwarze) (NNS Katzen))))
Japanese:
(S
(NP (NN Otoko) (PP wa))
(VP
(NP (JJ kuro) (NN neko) (PP ga))
Asked GPT-4o to draw parse trees in two languages:
21.11.2024 05:49 β π 5 π 0 π¬ 1 π 0Hot take (since it's still just friends on this platform):
It's crazy how the classic "sample and rerank" baseline from machine translation and IR, got re-branded as "scaling up inference-time compute".
nothing but blue skies, for posting puns
20.11.2024 22:54 β π 4 π 0 π¬ 1 π 0