What are your favorite recent papers on using LMs for annotation (especially in a loop with human annotators), synthetic data for task-specific prediction, active learning, and similar?
Looking for practical methods for settings where human annotations are costly.
A few examples in thread โด
23.07.2025 08:10 โ ๐ 74 ๐ 23 ๐ฌ 14 ๐ 3
21.07.2025 14:20 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
I am once again pitching my romantic comedy:
- two academics start dating
- discover they are each other's terrible reviewer
- hijinks ensue
Working title: Love is Double-Blind
18.06.2025 10:55 โ ๐ 2636 ๐ 350 ๐ฌ 99 ๐ 66
I'm extremely curious -- would you want digital tools that would help with this (e.g. planning, time organization) or embodied AI (e.g. physical assistance in-home, transportation)?
16.04.2025 17:27 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
i wish i could shout this from the rooftops. relatedly, there's no need for robots to be limited by the human form.
similar/tangential thing came up in the 2010s with respect to self-driving: just because people only sense using their eyes doesn't mean cars have to only use cameras!
09.04.2025 15:47 โ ๐ 5 ๐ 0 ๐ฌ 0 ๐ 0
The Wikimedia Foundation, which owns Wikipedia, says its bandwidth costs have gone up 50% since Jan 2024 โย a rise they attribute to AI crawlers.
AI companies are killing the open web by stealing visitors from the sources of information and making them pay for the privilege
02.04.2025 09:12 โ ๐ 5687 ๐ 2660 ๐ฌ 68 ๐ 178
we are living in an empirical world and we are empirical girls
25.03.2025 20:39 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
No labels, no problem! I am so excited for this release. We have been working on it for many months, and it's motivated by a common customer roadblock: insufficient labeled examples.
25.03.2025 20:39 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
has anyone successfully gotten very involved with their local library system and, if so, how does one do so?
i know there are volunteer opportunities and it is my dream to one day organize a crafting circle, but i'm talking about how the library actually organizes / functions / prioritizes things!
22.01.2025 20:42 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
@jfrankle.com @ericajiyuen.bsky.social
19.12.2024 16:26 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
and a big shout out to my collaborators: Erica Ji Yuen, Kartik Sreenivasan, Yue (Andy) Zhang, Sam Havens, Michael Carbin, Matei Zaharia, Jonathan Frankle
19.12.2024 16:25 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Benchmarking Domain Intelligence
3/3 ๐ Want to see how different models perform on enterprise tasks? Full analysis in the blog here: databricks.com/blog/benchma...!
19.12.2024 16:25 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
๐ DIBS measures real enterprise needs. We tested 14 models & found:
- Academic benchmarks mask enterprise gaps
- No single model wins across all tasks
- Open models are competitive on key capabilities
- Some enterprise tasks show clear paths forward, others are more complex
2/3
19.12.2024 16:25 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
๐งต Super proud to finally share this work I led last quarter - the
@databricks.bsky.social Domain Intelligence Benchmark Suite (DIBS)! TL;DR: Academic benchmarks โ real performance and domain intelligence > general capabilities for enterprise tasks. 1/3
19.12.2024 16:25 โ ๐ 5 ๐ 4 ๐ฌ 4 ๐ 1
@jfrankle.com @ericajiyuen.bsky.social
19.12.2024 16:24 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
And of course a big shout out to my collaborators: Erica Ji Yuen, Kartik Sreenivasan, Yue (Andy) Zhang, Sam Havens, Michael Carbin, Matei Zaharia, and Jonathan Frankle for their help!
19.12.2024 16:23 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Benchmarking Domain Intelligence
3/3 ๐ Want to see how different models perform on enterprise tasks? Full analysis in the blog here: databricks.com/blog/benchma...!
19.12.2024 16:21 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
๐ DIBS measures real enterprise needs. We tested 14 models & found:
- Academic benchmarks mask enterprise gaps
- No single model wins across all tasks
- Open models are competitive on key capabilities
- Some enterprise tasks show clear paths forward, others are more complex
2/3
19.12.2024 16:20 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
very demure, very mindful, very 2019-era mujoco humanoid learning to walk
12.12.2024 14:00 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
"technology built to address people's needs" is the north star.
side note: it would be amazing to see this attitude in the physical, embodied world as well. it's amazing to see how older adults in dense, walkable areas have such different lifestyles than those in car-centric suburbs.
12.12.2024 13:33 โ ๐ 0 ๐ 1 ๐ฌ 0 ๐ 0
would love to be added :-)
11.12.2024 19:33 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
brat tulu is amazing
10.12.2024 23:52 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
this is incredible research, and beautiful. would love to know more about what it's like to meaningfully interact with genie 2, or similar models, e.g. to modify the outputs of such a model in the service of a design vision.
05.12.2024 19:31 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
24.11.2024 15:35 โ ๐ 1090 ๐ 289 ๐ฌ 18 ๐ 10
i know some labs are already starting to do this; i hope more continue to. it is challenging, complex technical work and we should think of it as a first-class contribution in the field. 5/5
26.11.2024 14:09 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
๐ค we can start to more broadly value thoughtful, direction-setting benchmark work. it requires technical contributions, a keen sense of how people might meaningfully interact with a system, and the discernment to recognize where progress might yet be made. 4/5
26.11.2024 14:09 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
i think as a field, we have a problematic tendency to focus on magnitude-related problems, like new architectures or training paradigms or other ways to maximize performance on whatever benchmarks we can. maybe this is because it is more akin to the training/experience many of us have. 3/5
26.11.2024 14:09 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
in the LLM space, at this time, benchmarks/evaluations set the direction of that vector. it's extremely hard to make good benchmarks, and historically under-rewarded in the field. 2/5
26.11.2024 14:09 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
i often talk about the importance of aligning both the magnitude AND direction of a workstream vector. 1/5
26.11.2024 14:09 โ ๐ 1 ๐ 1 ๐ฌ 1 ๐ 0
i do not study this, but i did just finish reading the anxious generation and so i'm very grateful that there are so many people who do indeed study such important things!
22.11.2024 00:51 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
RL & Agents Reading Group @ University of Edinburgh
We regularly discuss recent papers in RL, MARL & related
https://edinburgh-rl.github.io/reading-group
SFF Author (she/her) of NY Times best-selling book Dreadful and The Grimoire Grammar School Parent Teacher Association. AKA Catherine Beck, author of Leah's Perfect Christmas.
https://www.caitlinrozakis.com/
https://linktr.ee/caitlinrozakis
Art historian, dealer/art consultant 19thC and 20thC British/European art. Writing book on lesser known great artists. Seen in/on: CNN, NBC, Spectator, The Times etc
website: richardmorris.org
richard@richardmorris.org
https://www.kickstarter.com/projects/ianmcque/mileships-a-narrative-art-book-by-ian-mcque
I design. I teach. I research. Associate Professor, Information Science, Jacobs Technion-Cornell Institute at Cornell Tech
LibraryReads: The Top 10 Adult Fiction and Nonfiction Chosen Monthly by America's Library Staff. Find out more at libraryreads.org!
(Social media managed by volunteers)
Incoming Assistant Professor of HCI at Carnegie Mellon studying the psychology of technology. NSF postdoc at NYU, PhD from Cambridge, BA from Stanford. stevenrathje.com
assistant prof at USC Data Sciences and Operations; phd Cornell ORIE. data-driven decision-making, operations research/management, causal inference, algorithmic fairness/equity
angelamzhou.github.io
Menswear writer. Editor at Put This On. Words at The New York Times, The Washington Post, The Financial Times, Esquire, and Mr. Porter.
If you have a style question, search:
https://dieworkwear.com/ | https://putthison.com/start-here/
Entrepreneur
Costplusdrugs.com
Empowering and advocating for libraries and library workers to ensure equitable access to information for all. ๐โs largest library association.
Find us at https://ala.org.
We follow the ALA Code of Conduct: https://www.ala.org/user-guidelines
Comics by Jorge Cham: Oliver's Great Big Universe, Elinor Wonders Why, ScienceStuff and PHD Comics
Writer of fantasy and SF. Gardener. Chicken tender. Dog Person but cat friendly. Washington state USA. Old and getting older. Stay off my lawn!
Author of the Arcadia Project trilogy etc. Terminal cancer, incurable optimist. Currently 1% Novelist 56% Vidyagamer 12% Polyglot 31% Dungeon Master
Enjoy life. Be someone you would love. Make things better.
Knowing things is a solved problem. Getting along is not. Working on AI, media, and inter-group conflict @CHAI_Berkeley. Got here from computational journalism.
โข Cache rules everything around me
โข You become what you are most measured by
โข Performance @ Databricks
AI quality at Databricks. Past: Co-founder of Lilac AI, acquired by Databricks. Co-created TensorFlow.js and Know Your Data. Google Brain // PAIR // Responsible AI. nsthorat@ over there
Co-CEO, Yutori. Join the waitlist at yutori.com
Sr. Principal Research Manager at Microsoft Research, NYC // Machine Learning, Responsible AI, Transparency, Intelligibility, Human-AI Interaction // WiML Co-founder // Former NeurIPS & current FAccT Program Co-chair // Brooklyn, NY // http://jennwv.com