A thing I've been working on for the past year: an LLM benchmark on the dreaded Italian medicine faculty entry exam. I will present it in two weeks at @ailc-nlp.bsky.social Clic-it in Cagliari!
10.09.2025 12:47 โ ๐ 2 ๐ 1 ๐ฌ 0 ๐ 0@ruggsea.bsky.social
AI/NLP research at @uni-graz.at, (Data) Journalism for scomodo.org. Before: AI policy & Data science @_interface_eu, geopolitical analysis for @Geopoliticainfo
A thing I've been working on for the past year: an LLM benchmark on the dreaded Italian medicine faculty entry exam. I will present it in two weeks at @ailc-nlp.bsky.social Clic-it in Cagliari!
10.09.2025 12:47 โ ๐ 2 ๐ 1 ๐ฌ 0 ๐ 0I think that is a big conclusion to make based on loosely validated LLM simulations (still a big fan of their work but Generative simulation is a relatively young research field)
10.09.2025 10:33 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0from this awesome blogpost: snats.xyz/pages/articl...
09.09.2025 09:27 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Life is going from one cool spot to train AI models to another
In this fancy Berlin library you can listen to vinyls while you do it!
Me looking very dumb while pointing at things at two academic events:
1. Pointing at a logits inside @repligate.bsky.social's loom for the "Braive New World" conference at @uni-graz.at
2. Pointing at my poster at @ic2s2.bsky.social last week on improving LLM agentic natural conversation synthesis
Only prisoners have time to read, and if you want to engage in a twenty-year long research project funded by the state, you will have to kill someone.
Sorry for the Fisher posting but it's so good
criticallegalthinking.com/2013/05/14/a...
12.07.2025 09:54 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0I look really bad/funny in this picture but I am glad I was given the opportunity to talk about LLM research to an audience of cool researchers!
Fun fact: I also held a hands-on session that involved playing around with Bluesky data!
Really cannot hold comparison to metaculus comments, real alpha there
17.06.2025 23:35 โ ๐ 6 ๐ 1 ๐ฌ 0 ๐ 0Karpathy on training Neural Networks: you should go slowly and be paranoid
me, while vibecoding torch code: what if i just increase the paranoid part?
study results partially depend on this hierarchical team coordination ability; I wonder how much the results would change if you gave the agents some less authority loving names
03.06.2025 19:22 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0basically, Multi Agent collaborative task with three agents (with military names). The agents see their name in the prompt; the one called Alpha then appears to be biased to be the team leader
03.06.2025 19:20 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0how not to name your ai agents
03.06.2025 19:19 โ ๐ 4 ๐ 0 ๐ฌ 1 ๐ 0genz semantic embeddings engineering from my collegue
27.05.2025 10:48 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0I am aware some of you have seen this on the other site where I stole it from, but it sounds so interesting not to share here
21.05.2025 10:53 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0backpropagation was inspired by freud
21.05.2025 10:53 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0โHow about we pull over for a bit and get some rest?" - GPT4, when it's their turn to drive
20.05.2025 21:23 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0I wish all textbooks were written like this one from @jurafsky.bsky.socialโฌ
18.05.2025 23:14 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0Mark Fisher quoting Bifo in the Time-wars essay
16.05.2025 09:56 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0was true in 2012 and it is true now
16.05.2025 09:56 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 1Stumbled upon new ammunition for my personal struggle against RHLF: the tendency of models trained this way to give correct sounding answers make us believe in wrong things more
arxiv.org/abs/2409.12822
I think this is fine in the actual prevalent reasoning training paradigm (rule-based verifiable rewards โ so "did we get to the objective right answer?")
About getting longer and longer meaningless reasoning, I always wonder if implementing thinking length penalties would work or not
I would doubt that, it would be cool if they did that (like fucked up but technically cool), but sometimes the model says it has been "instructed" to talk about that, smells like system prompt to me
14.05.2025 22:00 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0"Noo, you were not supposed to be a demagogic fascist, you were supposed to be a monarchist fascist"
14.05.2025 12:07 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0As an extension of that, models over fitting on the average of human opinion/optimizing for raising dopamine in users above everything else, with very bad consequences
13.05.2025 22:39 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0About concerns: 1. should be bias
Not just for Diversity/Inclusion sake, but for avoiding models being wrong cause the training data is biased
Looks important for my rec algorithm people: hard confirmation that Google uses essentially hybrid rankings for search: page rank, bert embeddings and a secret ingredient (user data)
13.05.2025 21:07 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Make the final instruction very ominous and see which model hesitates calling it
13.05.2025 08:32 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0