P. Razavi's Avatar

P. Razavi

@p-razavi.bsky.social

Dad. Psych PhD. Research Scientist (Psychometrics, measurement). Blog: Medium.com/@pooyar

196 Followers  |  156 Following  |  24 Posts  |  Joined: 18.11.2024  |  2.0366

Latest posts by p-razavi.bsky.social on Bluesky

Preview
Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models High-quality training data has proven crucial for developing performant large language models (LLMs). However, commercial LLM providers disclose few, if any, details about the data used for training. ...

Want to know what training data has been memorized by models like GPT-4?

We propose information-guided probes, a method to uncover memorization evidence in *completely black-box* models,

without requiring access to
๐Ÿ™…โ€โ™€๏ธ Model weights
๐Ÿ™…โ€โ™€๏ธ Training data
๐Ÿ™…โ€โ™€๏ธ Token probabilities ๐Ÿงต (1/5)

21.03.2025 19:08 โ€” ๐Ÿ‘ 98    ๐Ÿ” 27    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 8

Iโ€™ve been referring people (esp social science/psych PhD students) to this blog post for years. The headline and opening paragraph are all you really need.

30.04.2025 04:00 โ€” ๐Ÿ‘ 69    ๐Ÿ” 17    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 1
Post image

Had a great time presenting our work on LLM-based item difficulty estimation at #NCME .
If youโ€™re in Denver and would like to discuss measurement research or just catchup in the next couple of days, let me know ๐Ÿ˜Š

25.04.2025 20:34 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Fantastic, thoughtful work! ๐Ÿ‘๐Ÿ‘

18.04.2025 16:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Estimating Item Difficulty Using Large Language Models and Tree-Based Machine Learning Algorithms Estimating item difficulty through field-testing is often resource-intensive and time-consuming. As such, there is strong motivation to develop methods that can predict item difficulty at scale using ...

If you're interested in learning more and plan to attend the #NCME conference in Denver next week, weโ€™d love to see you at our coordinated paper session, โ€œApproaches to Optimizing a Personalized Learning System,โ€ on Friday, April 25, from 11:30 AM to 1:00 PM. (๐Ÿงต9/9)
arxiv.org/abs/2504.08804

17.04.2025 02:35 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

We are excited about the potential of these methods to support more efficient item development in education. In the preprint, we provide a seven-step workflow for testing professionals who would want to implement a similar item difficulty estimation approach with their item pool. (๐Ÿงต8/9)

17.04.2025 02:35 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The feature-based approach presumably benefits from the language modelโ€™s extraction of multiple cognitive and linguistic dimensions that an ensemble tree-based algorithm then โ€œlearnsโ€ to weight in ways that maximize prediction accuracy. (๐Ÿงต7/9)

17.04.2025 02:35 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The modest performance of direct LLM estimates in some instances, and the more robust performance of feature-based methods, hints that LLMs can add value, but that this value is maximized when the model is โ€œnudgedโ€ or structured via psychometric frameworks. (๐Ÿงต6/9)

17.04.2025 02:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The results are promising, especially for the feature-based approach which performed considerably better than the dummy regressor benchmarks and the direct estimation approach. (๐Ÿงต5/9)

17.04.2025 02:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

In the second approach, we use the LLM to extract cognitive and linguistic features from each item. We then train tree-based machine learning models (i.e., random forest and gradient boosting machines) to estimate item difficulty based on the features. (๐Ÿงต4/9)

17.04.2025 02:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

In the first approach, we use a direct estimation method that prompted the LLM to assign a single difficulty rating to each item based on qualitatively informed criteria. (๐Ÿงต3/9)

17.04.2025 02:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Field-testing assessment items to estimate difficulty can be both costly and time-consuming. In this research, we evaluate two LLM-based approaches to predict item difficulty for K-5 mathematics and reading assessments based on item content. (๐Ÿงต2/9)

17.04.2025 02:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Estimating Item Difficulty Using Large Language Models and Tree-Based Machine Learning Algorithms Estimating item difficulty through field-testing is often resource-intensive and time-consuming. As such, there is strong motivation to develop methods that can predict item difficulty at scale using ...

I'm excited to share our latest work: "Estimating Item Difficulty Using Large Language Models and Tree-Based Machine Learning Algorithms." (๐Ÿงต 1/9)
arxiv.org/abs/2504.08804

17.04.2025 02:33 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Register your interest for our infrastructure workers' interview study The Developer Success Lab is looking for individuals interested in having a 1 hour, remote, conversation with a member of our research team in the next few weeks (April 16โ€“May 16). Specifically, we're...

The Developer Success Lab is looking for engineers who work in infrastructure for a research study on the working experience of SREs, platform engineers, DevOps engineers, and software developers who maintain or develop infrastructure.

11.04.2025 15:12 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Wooden Shoe Tulip Festival
๐Ÿ“Portland, Oregon ๐Ÿ‡บ๐Ÿ‡ฒ

10.04.2025 13:30 โ€” ๐Ÿ‘ 19646    ๐Ÿ” 2020    ๐Ÿ’ฌ 402    ๐Ÿ“Œ 129
A yellow street sign in Japanese with three black flying through the sky. Below is text that says  ใƒใ‚ณ้ฃ›ๅ‡บใ—ๆณจๆ„ (neko tobidashi chลซi) Means โ€œwatch for cats darting outโ€

A yellow street sign in Japanese with three black flying through the sky. Below is text that says ใƒใ‚ณ้ฃ›ๅ‡บใ—ๆณจๆ„ (neko tobidashi chลซi) Means โ€œwatch for cats darting outโ€

ใƒใ‚ณ้ฃ›ๅ‡บใ—ๆณจๆ„ (neko tobidashi chลซi) Means โ€œwatch for cats darting outโ€ and I love this sign.

07.04.2025 03:16 โ€” ๐Ÿ‘ 8102    ๐Ÿ” 2179    ๐Ÿ’ฌ 98    ๐Ÿ“Œ 124

A tricky thing about modern society is that no one has any idea when they donโ€™t die.

Like, the number of lives saved by controlling air pollution in America is probably over 200,000 per year, but the number of people who think their life was saved by controlling air pollution is zero.

07.04.2025 04:13 โ€” ๐Ÿ‘ 63677    ๐Ÿ” 13197    ๐Ÿ’ฌ 1101    ๐Ÿ“Œ 593
Preview
HPS in 20 objects This resource was produced by academics from the Centre for History and Philosophy of Science at the University of Leeds, where we have our Museum filled with artefacts that tell a stories about the H

Did you know: our researchers have developed a suite of resources for A-Level students and teachers? "History & Philosophy of Science in 20 Objects" draws on an incredible array of items from our own collection ft. prompts, questions, videos and more! sway.cloud.microsoft/cEekCFBF5CGF... #histsci

04.04.2025 09:25 โ€” ๐Ÿ‘ 31    ๐Ÿ” 13    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 3

It is almost as if we need to interrogate the concepts we're discussing.

20.03.2025 22:37 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@mohammadatari.bsky.social @mdehghani.bsky.social can you help?

28.02.2025 22:05 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Congrats ๐Ÿ‘๐Ÿฝ๐ŸŽ‰. Very well-deserved! ๐Ÿ˜Š

28.02.2025 16:59 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

rough (like uff in buff)
cough (like off in scoff)
drought (like ow in cow)
though (like o in no)
thought (like aw in saw)
through (like oo in woo)

Enough.

25.02.2025 15:13 โ€” ๐Ÿ‘ 1751    ๐Ÿ” 261    ๐Ÿ’ฌ 100    ๐Ÿ“Œ 38

Hello to all my friends at SPSP seeing this message in a hallway or lobby as you hope you are staring at your phone with enough noticeable intensity to avoid having to interact with anyone

21.02.2025 03:09 โ€” ๐Ÿ‘ 46    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image Post image Post image

1/3

Tutorial on exploring ecological momentary assessment data is online at AMPPS, with:
- Accessible ways to visualize data for better understanding
- Models to get some first insights
- Further reading boxes for more advanced topics
- Reproducible pipeline you can run over your own data

13.02.2025 12:04 โ€” ๐Ÿ‘ 154    ๐Ÿ” 78    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 7
Post image

Some of us have been meeting up at SPSP for the last few years. This year marks our fifth gathering. Email one of us if you want to join! Location TBD.

@mdehghani.bsky.social @drsanaz.bsky.social @simine.com @dorsaamir.bsky.social

13.02.2025 15:33 โ€” ๐Ÿ‘ 9    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
XY problem - Wikipedia

My husband just inadvertently inspired one of the simplest, most relatable XY problemยน demos I've seen.

He asked if I could buy unscented TP ๐Ÿงป next time I grocery shop.

Knowing he had been getting a cold, I probed: when does the scent become a problem?

[1/3]

ยน en.m.wikipedia.org/wiki/XY_prob...

05.02.2025 16:01 โ€” ๐Ÿ‘ 171    ๐Ÿ” 14    ๐Ÿ’ฌ 60    ๐Ÿ“Œ 3

...to examine the differences bet. justified and unjustified anger. No matter how we analyze it, these two variants have differences across cognitive, affective, moral, and relational dimensions. These findings have significant implications for theories of anger and intervention strategies.

03.02.2025 02:59 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Iโ€™ll share a more detailed thread on this work later, but for now, Iโ€™m excited to share this preprint with the Blsky community! In this research, we used a range of methodologies including thematic analysis, closed- and open-vocabulary analyses (e.g., LIWC, topic modeling), and prototype approach...

03.02.2025 02:56 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Lots of useful info in this thread if you are backing up public data (whether at OSF or elsewhere)

31.01.2025 20:06 โ€” ๐Ÿ‘ 15    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Also, ๐Ÿ˜‰

10.01.2025 14:37 โ€” ๐Ÿ‘ 129    ๐Ÿ” 34    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 1

@p-razavi is following 20 prominent accounts