Harsh Trivedi

Harsh Trivedi

@harsh3vedi.bsky.social

πŸ€– Building AI agents & interactive environments: 🌍 AppWorld (https://appworld.dev) #NLProc PhD @stonybrooku. Past intern Allen AI & visitor CILVR at NYU. 🐦 https://x.com/harsh3vedi 🌐 https://harshtrivedi.me/

790 Followers 231 Following 7 Posts Joined Nov 2024
1 year ago
Post image

Our AI & Scientific Discovery Workshop (@ NAACL 2025) broadly welcomes papers on all aspects of the scientific discovery process through the lens of AI / NLP.

Paper submission deadline: Jan 30/2025 (about 2 weeks).
We're excited to see you there!

3 1 0 3
1 year ago

Hey Marc! Thanks for this starter pack. Can you please add me to it as well?

7 0 0 0
1 year ago
AppWorld: Reliable Evaluation of Interactive Agents in a World of Apps and People. Happening at 11 AM EST online on Dec 2, 2024

🚨 Happening next Monday, 2 Dec, @cohere.com ! ✨
πŸ‘‹ Anyone can join remotely at this link:
πŸ‘‰ cohere.com/events/coher...
πŸ™ Thank you @sebruder.bsky.social for helping arrange it!!
πŸ“… Upcoming talks: appworld.dev/talks

8 1 0 0
2 years ago
A plot: the x axis is baseline score of rankers, in ndcg@10. y axis is delta of model score after an expansion is applied.

There are three sets of results, one dataset for each shift type: TrecDL (no shift), FiQA (domain shift), ArguAna (query shift).  For each set of result, the chart shows a scatter plot with a trend line. We observe the same trend for all: as the baseline score increases, the delta when using expansion decreases. 

On TREC DL, worst models have a base score of ~40, and improve by 10 points w/expansion. the best models have a score of >70, and their performance decreases by -5 points w/expansion.

On FiQA, worse models have a base score of ~15, and improve by 5 points w/expansion. the best models have a score of ~45, and their performance decreases by -3 point w/expansion.

On ArguAna, worst models have a base score of ~25, and improve by >20 points w/expansion. the best models have a score of >55, and their performance decreases by -1 point w/expansion.

Using LLMs for query or document expansion in retrieval (e.g. HyDE and Doc2Query) have scores going πŸ“ˆ

But do these approaches work for all IR models and for different types of distribution shifts? Turns out its actually more πŸ“‰ 🚨

πŸ“ (arxiv soon): orionweller.github.io/assets/pdf/L...

42 6 3 3
1 year ago

Great opportunity to see how (your) new coding agent methods stack up real world user tasks

3 1 0 0
1 year ago
Post image

Meet TΓΌlu 3, a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms.
We invented new methods for fine-tuning language models with RL and built upon best practices to scale synthetic instruction and preference data.
Demo, GitHub, paper, and models πŸ‘‡

111 31 2 7
1 year ago

another starter pack, this time for folks (past & current) from Ai2 (@ai2.bsky.social) 😍

go.bsky.app/Qjyc97J

22 5 2 0
1 year ago

I thought to create a Starter Pack for people working on LLM Agents. Please feel free to self-refer as well.

go.bsky.app/LUrLWXe

#LLMAgents #LLMReasoning

15 5 11 0
1 year ago

🚨 We are refreshing the 🌎 AppWorld (appworld.dev) leaderboard with all the new coding and/or tool-use LMs.

❓ What would you like to be included?

πŸ”Œ Self-plugs are welcome!!

x.com/harsh3vedi/s...

7 2 0 1
1 year ago

Hi Nikolai! Mind adding me to this starter pack? Thanks!

1 0 1 0
1 year ago
EMNLP 2024 Tutorial: Language Agents: Foundations, Prospects, and Risks Deformable Neural Radiance Fields creates free-viewpoint portraits (nerfies) from casually captured videos.

Had a great time doing the language agent tutorial (language-agent-tutorial.github.io) with Yu Su, Shunyu Yao and Tao Yu πŸ˜€ #EMNLP2024

Check out our slides here: tinyurl.com/language-age...

33 5 0 0
1 year ago

Hi! Can you please add me to this list? Thank you!

1 0 0 0
1 year ago

Hi Michael! Can you please add me to this list? Thank you!

2 0 0 0
1 year ago

Hi Maria! Can you please add me to the list? Thank you!

0 0 0 0