🐘 @pkydrm - Bluesky Profile

What are your favorite recent papers on using LMs for annotation (especially in a loop with human annotators), synthetic data for task-specific prediction, active learning, and similar?

Looking for practical methods for settings where human annotations are costly.

A few examples in thread ↴

23.07.2025 08:10 — 👍 76 🔁 23 💬 13 📌 3

21.07.2025 14:20 — 👍 0 🔁 0 💬 0 📌 0

I am once again pitching my romantic comedy:

- two academics start dating
- discover they are each other's terrible reviewer
- hijinks ensue

Working title: Love is Double-Blind

18.06.2025 10:55 — 👍 2621 🔁 350 💬 97 📌 66

I'm extremely curious -- would you want digital tools that would help with this (e.g. planning, time organization) or embodied AI (e.g. physical assistance in-home, transportation)?

16.04.2025 17:27 — 👍 0 🔁 0 💬 1 📌 0

i wish i could shout this from the rooftops. relatedly, there's no need for robots to be limited by the human form.

similar/tangential thing came up in the 2010s with respect to self-driving: just because people only sense using their eyes doesn't mean cars have to only use cameras!

09.04.2025 15:47 — 👍 5 🔁 0 💬 0 📌 0

The Wikimedia Foundation, which owns Wikipedia, says its bandwidth costs have gone up 50% since Jan 2024 — a rise they attribute to AI crawlers.

AI companies are killing the open web by stealing visitors from the sources of information and making them pay for the privilege

02.04.2025 09:12 — 👍 5648 🔁 2640 💬 67 📌 178

we are living in an empirical world and we are empirical girls

25.03.2025 20:39 — 👍 1 🔁 0 💬 0 📌 0

No labels, no problem! I am so excited for this release. We have been working on it for many months, and it's motivated by a common customer roadblock: insufficient labeled examples.

25.03.2025 20:39 — 👍 1 🔁 0 💬 0 📌 0

has anyone successfully gotten very involved with their local library system and, if so, how does one do so?

i know there are volunteer opportunities and it is my dream to one day organize a crafting circle, but i'm talking about how the library actually organizes / functions / prioritizes things!

22.01.2025 20:42 — 👍 1 🔁 0 💬 0 📌 0

@jfrankle.com @ericajiyuen.bsky.social

19.12.2024 16:26 — 👍 0 🔁 0 💬 0 📌 0

and a big shout out to my collaborators: Erica Ji Yuen, Kartik Sreenivasan, Yue (Andy) Zhang, Sam Havens, Michael Carbin, Matei Zaharia, Jonathan Frankle

19.12.2024 16:25 — 👍 0 🔁 0 💬 0 📌 0

Benchmarking Domain Intelligence

3/3 🔑 Want to see how different models perform on enterprise tasks? Full analysis in the blog here: databricks.com/blog/benchma...!

19.12.2024 16:25 — 👍 0 🔁 0 💬 0 📌 0

📊 DIBS measures real enterprise needs. We tested 14 models & found:

- Academic benchmarks mask enterprise gaps
- No single model wins across all tasks
- Open models are competitive on key capabilities
- Some enterprise tasks show clear paths forward, others are more complex

2/3

19.12.2024 16:25 — 👍 0 🔁 0 💬 0 📌 0

🧵 Super proud to finally share this work I led last quarter - the
@databricks.bsky.social Domain Intelligence Benchmark Suite (DIBS)! TL;DR: Academic benchmarks ≠ real performance and domain intelligence > general capabilities for enterprise tasks. 1/3

19.12.2024 16:25 — 👍 5 🔁 4 💬 4 📌 1

@jfrankle.com @ericajiyuen.bsky.social

19.12.2024 16:24 — 👍 0 🔁 0 💬 0 📌 0

And of course a big shout out to my collaborators: Erica Ji Yuen, Kartik Sreenivasan, Yue (Andy) Zhang, Sam Havens, Michael Carbin, Matei Zaharia, and Jonathan Frankle for their help!

19.12.2024 16:23 — 👍 0 🔁 0 💬 0 📌 0

Benchmarking Domain Intelligence

3/3 🔑 Want to see how different models perform on enterprise tasks? Full analysis in the blog here: databricks.com/blog/benchma...!

19.12.2024 16:21 — 👍 1 🔁 0 💬 0 📌 0

📊 DIBS measures real enterprise needs. We tested 14 models & found:
- Academic benchmarks mask enterprise gaps
- No single model wins across all tasks
- Open models are competitive on key capabilities
- Some enterprise tasks show clear paths forward, others are more complex

2/3

19.12.2024 16:20 — 👍 0 🔁 0 💬 0 📌 0

very demure, very mindful, very 2019-era mujoco humanoid learning to walk

12.12.2024 14:00 — 👍 1 🔁 0 💬 0 📌 0

"technology built to address people's needs" is the north star.

side note: it would be amazing to see this attitude in the physical, embodied world as well. it's amazing to see how older adults in dense, walkable areas have such different lifestyles than those in car-centric suburbs.

12.12.2024 13:33 — 👍 0 🔁 1 💬 0 📌 0

would love to be added :-)

11.12.2024 19:33 — 👍 0 🔁 0 💬 1 📌 0

brat tulu is amazing

10.12.2024 23:52 — 👍 0 🔁 0 💬 1 📌 0

this is incredible research, and beautiful. would love to know more about what it's like to meaningfully interact with genie 2, or similar models, e.g. to modify the outputs of such a model in the service of a design vision.

05.12.2024 19:31 — 👍 0 🔁 0 💬 0 📌 0

24.11.2024 15:35 — 👍 1084 🔁 288 💬 18 📌 10

i know some labs are already starting to do this; i hope more continue to. it is challenging, complex technical work and we should think of it as a first-class contribution in the field. 5/5

26.11.2024 14:09 — 👍 0 🔁 0 💬 0 📌 0

🤞 we can start to more broadly value thoughtful, direction-setting benchmark work. it requires technical contributions, a keen sense of how people might meaningfully interact with a system, and the discernment to recognize where progress might yet be made. 4/5

26.11.2024 14:09 — 👍 0 🔁 0 💬 1 📌 0

i think as a field, we have a problematic tendency to focus on magnitude-related problems, like new architectures or training paradigms or other ways to maximize performance on whatever benchmarks we can. maybe this is because it is more akin to the training/experience many of us have. 3/5

26.11.2024 14:09 — 👍 0 🔁 0 💬 1 📌 0

in the LLM space, at this time, benchmarks/evaluations set the direction of that vector. it's extremely hard to make good benchmarks, and historically under-rewarded in the field. 2/5

26.11.2024 14:09 — 👍 0 🔁 0 💬 1 📌 0

i often talk about the importance of aligning both the magnitude AND direction of a workstream vector. 1/5

26.11.2024 14:09 — 👍 1 🔁 1 💬 1 📌 0

i do not study this, but i did just finish reading the anxious generation and so i'm very grateful that there are so many people who do indeed study such important things!

22.11.2024 00:51 — 👍 0 🔁 0 💬 0 📌 0

🐘

Latest posts by pkydrm.bsky.social on Bluesky

@pkydrm is following 20 prominent accounts