Adam Bataineh MD

@dradamb.bsky.social

founder of numenor.health (previously cofounder of Span - acquired by Eight Sleep)

11 Followers | 2 Following | 3 Posts | Joined: 21.11.2024 | 1.3216

Latest posts by dradamb.bsky.social on Bluesky

This seems especially relevant for real world cases which are full of extra information that is not necessary to reach an accurate conclusion

but often lead to inaccuracies/distractions in LLM outputs

23.12.2024 15:07 — 👍 0 🔁 0 💬 0 📌 0

A team at Apple recently published a really interesting paper where they tested LLM performance with GSM (a standard benchmark test for mathematical reasoning ability)

they modified the questions with unnecessary information to distract the LLMs

It led to much lower accuracy even for o1

23.12.2024 15:07 — 👍 0 🔁 0 💬 1 📌 0

I wonder how much of the improvement in performance is because of goodhart's law

“When a measure becomes a target, it ceases to be a good measure”

I.e. is better performance on benchmark tests translatable to real world performance?

23.12.2024 15:07 — 👍 3 🔁 0 💬 1 📌 0

@dradamb is following 2 prominent accounts

Trish Greenhalgh
@trishgreenhalgh

Prof of Primary Care Health Sciences, Oxford. Researching digital health/ inequities, covid prevention (masks/ air quality). Wild swimmer. Mum to Rob (marine bio) & Al (doc). She/her. https://scholar.google.com.au/citations?user=QDCqsJwAAAAJ&hl=en&oi=ao

Bluesky
@bsky.app

official Bluesky account (check username👆) Bugs, feature requests, feedback: support@bsky.app