This study represents years of collaborative work by an incredible cross-functional team at Verily and our outstanding research partners at SRI International. Congrats to the team! n/3
14.04.2025 20:21 β π 1 π 0 π¬ 0 π 0@bwfnelson.bsky.social
Sr Clinical Research Scientist @ Verily | Harvard Medical School and BIDMC | Behavioral Medicine | Digital Health | Wearables
This study represents years of collaborative work by an incredible cross-functional team at Verily and our outstanding research partners at SRI International. Congrats to the team! n/3
14.04.2025 20:21 β π 1 π 0 π¬ 0 π 0We rigorously evaluated the Verily Numetric Watchβs ability to estimate over 12 sleep metrics against gold-standard in-lab polysomnography in a demographically diverse cohort. n/2
14.04.2025 20:21 β π 1 π 0 π¬ 1 π 0As the Clinical Science Lead on this study, Iβm excited to share our new publication in theΒ Journal of Sleep Research: "Performance Evaluation of the Verily Numetric Watch Sleep Suite for Digital Sleep Assessment Against In-Lab Polysomnography."
onlinelibrary.wiley.com/doi/10.1111/...
n/1
Huge thanks to my amazing co-authors: @prof-nick-allen.bsky.social, John Torous, MD MBI, Ari Winbush, Steven Siddals, Matthew Flathers! Grateful for the opportunity to lead this project as part of my Adjunct Faculty position at Harvard Medical School and BIDMC.
01.04.2025 20:59 β π 5 π 1 π¬ 0 π 0GPT-4o surpassing human performance for calm/neutral and surprise recognition, while Gemini surpassed human performance for surprise recognition.
We also examined model performance across actor race and sex, finding no significant biasesβan encouraging result for future clinical applications. 4/n
All LLM models demonstrated substantial to almost perfect agreement with ground truth labels. Notably, GPT-4o and Gemini reached human performance levels for overall facial emotion recognition 3/n
01.04.2025 20:59 β π 0 π 0 π¬ 1 π 0We evaluated the agreement + accuracy GPT-4o, Gemini 2.0 Experimental, and Claude 3.5 Sonnet, using the NimStim dataset, a benchmark of 672 facial expressions from 43 diverse human actors resulting in 2,016 model-based emotion estimates. 2/n
01.04.2025 20:59 β π 1 π 0 π¬ 1 π 0New Preprint: "Evaluating the Performance of Large Language Models in Identifying Human Facial Emotions: GPT-4o, Gemini 2.0 Experimental, and Claude 3.5 Sonnet" π§ πΌοΈπ
In this study, we benchmarked the ability of leading LLMs to recognize facial emotions. 1/n
osf.io/preprints/ps...