Check out Tianweiβs latest work on using unlikelihood objective to distill search traces back to base model to boost reasoning capabilities of LLMs!
23.04.2025 23:12 β π 2 π 0 π¬ 0 π 0@allenanie.bsky.social
Stanford CS PhD working on RL and LLMs with Emma Brunskill and Chris Piech. Co-creator of Trace. Prev @GoogleDeepMind @MicrosoftResearch Specifically - Offline RL - In-context RL - Causality https://anie.me/about Unverified hot takes go to this account
Check out Tianweiβs latest work on using unlikelihood objective to distill search traces back to base model to boost reasoning capabilities of LLMs!
23.04.2025 23:12 β π 2 π 0 π¬ 0 π 0For all the RL PhDs and people interested in Planning and MDPs, there's a summer internship opportunity at AWS Science that specializes in LLM post-training, RLHF, LLM agents, and benchmarks like WebArena. Interested students can send their CV to fakoor@amazon.com
07.02.2025 19:52 β π 3 π 0 π¬ 0 π 0For education and psychometrics people, this dataset is very useful!
11.12.2024 07:52 β π 1 π 0 π¬ 0 π 0I credit Omar @lateinteraction.bsky.social for this beautiful summary of the difference π€£
11.12.2024 02:14 β π 1 π 0 π¬ 0 π 0Hi Tim β Trace can optimize the control flow, whereas DSPy optimizes the modules in a fixed control flow (for now) π I would use DSPy for a supervised learning setup and Trace for an RL-like task (when thereβs a clear definition of reward and feedback).
11.12.2024 02:13 β π 1 π 0 π¬ 1 π 0Trace performs inference-time optimization β not directly updating weights of the underlying neural network. It updates the agentic workflow (python functions, prompts to LLMs and etc)
11.12.2024 00:53 β π 0 π 0 π¬ 1 π 0People say Ching-an and I are indistinguishableβ¦is that true π€£
10.12.2024 23:15 β π 1 π 0 π¬ 0 π 0Come check us out near the Tesla Booth in West Exhibition Hall A 3-5pm! Come and claim your mug π€£ we have an identity crisis β people keep thinking we are from IBM for some reasonβ¦
10.12.2024 23:05 β π 3 π 0 π¬ 0 π 0We are happy to give a talk or have a 1:1 chat if you are interested in learning what Trace is and/or how to use it! Trace has already been presented at the UW Robotics Colloquium and ServiceNow. #foundermode for Open-Source Software! Time to build π§ and ship π!
10.12.2024 19:52 β π 2 π 0 π¬ 0 π 0This open-source project is a joint effort with
@chinganc_rl
and Adith, the MSR RL group. We are presenting Trace at the NeurIPS Expo Demo this afternoon 3pm-5pm PT. We have MUGs, T-SHIRTs, and STICKERs!
π microsoft.github.io/Trace/
π¨βπ» github.com/microsoft/Tr...
Once you build an agent with Trace, you can use ANY LLM optimizer you want. With the release of Trace 0.1.3, we introduce TextGrad (github.com/microsoft/Tr...) as an optimizer for the RL agent, along with OPRO and OptoPrime.
10.12.2024 19:52 β π 0 π 0 π¬ 1 π 0What enables Trace to be an RL-style agentic library? We use **Generative Optimization** techniques (LLM as an optimizer) to derive an analog to RL's policy gradient algorithm. The agent makes a move, receives feedback/reward, and updates its parameters.
10.12.2024 19:52 β π 0 π 0 π¬ 1 π 0In Trace, you define an Agent with declarative Python functions using Trace primitives. Trace provides flexible ways to mark what you want to change -- for example, we mark two prompts and two functions below as trainable.
10.12.2024 19:52 β π 0 π 0 π¬ 1 π 0True RL agents learn online -- continuously changing themselves to improve upon the feedback (reward) from a user or an environment. Why haven't people done this in the LLM "Agentic" libraries? We wondered the same and developed Trace -- a true *RL-style* agentic framework.
10.12.2024 19:52 β π 0 π 0 π¬ 1 π 0Unveiling Trace v0.1.3 at NeurIPS 2024, a library for building an RL-style AI Agent that learns from the environment and human feedback. Today's LLM Agent libraries are not RL agents. They specify a workflow, and it remains unchanged regardless of user feedback. #NotRL vimeo.com/1036224270
10.12.2024 19:52 β π 4 π 0 π¬ 2 π 0An honor to have you here!! Welcome ππ
30.11.2024 04:35 β π 1 π 0 π¬ 0 π 0arxiv.org/abs/2411.17668 Our postdoc zihan slays another COLT open problem! proceedings.mlr.press/v247/kornows...
27.11.2024 13:03 β π 68 π 11 π¬ 1 π 3For people who like RL theory, this is a must follow!
26.11.2024 17:08 β π 2 π 0 π¬ 0 π 0π
25.11.2024 14:38 β π 0 π 0 π¬ 0 π 0Can I get added? Not NLP but still working with LLMs on the RL side.
25.11.2024 02:19 β π 1 π 0 π¬ 0 π 0Hello...world?
Trying to reconstruct my academic networks over here :) Follow me if we know each other or if you're interested in machine learning for healthcare/social equity! Please retweet, or resky, or whatever they call it over here.
π
24.11.2024 01:15 β π 0 π 0 π¬ 0 π 0Totally β itβs a great list π
23.11.2024 18:24 β π 1 π 0 π¬ 0 π 0Here is a list of ML OSS & Open Source / Science enthusiasts I found on Bluesky π¦
go.bsky.app/8MFcfXd
Let me know if you find such people here!
I'm still new here and probably the list misses many must-add people, so let's built it togetherπͺ
Hi, Iβm one of the main maintainers of Trace: github.com/microsoft/Tr... and will use this platform to promote it and engage with the OSS community π«‘
23.11.2024 14:19 β π 1 π 0 π¬ 1 π 0This is kinda cool honestly
23.11.2024 01:55 β π 0 π 0 π¬ 1 π 0I seeβ¦wellβ¦hope theyβll include it soon π
23.11.2024 01:48 β π 1 π 0 π¬ 0 π 0How to save/bookmark posts on π¦?
23.11.2024 01:38 β π 1 π 0 π¬ 4 π 0Filled out so fast π« but I saw some friends who made to the list β happy for them instead π₯³
21.11.2024 22:38 β π 0 π 0 π¬ 0 π 0I wanted to contribute to "Starter Pack Season" with one for Stanford NLP+HCI: go.bsky.app/VZBhuJ5
Here are some other great starter packs:
- CSS: go.bsky.app/GoEyD7d + go.bsky.app/CYmRvcK
- NLP: go.bsky.app/SngwGeS + go.bsky.app/JgneRQk
- HCI: go.bsky.app/p3TLwt
- Women in AI: go.bsky.app/LaGDpqg