We also explored other benchmark datasets and different models.
If you're interested in learning more, check out our paper, Data Laundering: arxiv.org/pdf/2412.15255
@afaji.bsky.social
Faculty @MBZUAI, visiting scientist @Google
We also explored other benchmark datasets and different models.
If you're interested in learning more, check out our paper, Data Laundering: arxiv.org/pdf/2412.15255
We discovered that the (illegal) knowledge of GPQA was leaked through the distillation loss, even though it was never explicitly trained on during the distillation stage.
We also repeated the distillation process multiple times and found that the performance was maintained
Data Laundering
We first train a model on the GPQA test data, which obviously made this model achieve 100% performance. But hey, donβt many LLMs train on test data anyway?π
Then, we train a new model on another (fair) data, but with a distillation loss from the cheating model
Final work promotion in 2024, by my student Jonibek Mansurov
We managed to achieve ~75% on a challenging GPQA with only 2 layers of transformers(~ 40M params) that were trained on different data; in our case, MedMCQA.
Introducing...
βοΈ We're going to launch Grassroots Science, a year-long ambitious, massive-scale, fully open-source initiative aimed at developing multilingual LLMs aligned to diverse and inclusive human preferences in Feb 2025.
π Check our website: grassroots.science.
#NLProc #GrassrootsScience
Hello, world! π
Iβll be using this platform, mainly cross-posting from X and other places
Kicking things off by promoting (to my nonexistent audience π) CVQA at NeurIPS!
Oral:
π East Meeting Room 1-3
ποΈ Thu, 12 Dec 3:30 pm PST
Poster:
π West Ballroom A-D #5110
ποΈ Thu, 12 Dec 4:30 pm PST