Steffen Herbold's Avatar

Steffen Herbold

@sherbold.bsky.social

https://www.fim.uni-passau.de/ai-engineering/

248 Followers  |  130 Following  |  66 Posts  |  Joined: 21.11.2024  |  2.0615

Latest posts by sherbold.bsky.social on Bluesky

Very interesting paper that shows that LLMs are good. But not good enough. I wish the following from the conclusion would have made it into the abstract:

08.10.2025 06:49 β€” πŸ‘ 1    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

πŸ˜‚

02.10.2025 14:23 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Studying memorization of large language models using answers to... Large Language Models (LLMs) are capable of answering many software related questions and supporting developers by generating code snippets. These capabilities originate from training on massive...

Just accepted at TMLR:

We found evidence of copyright violations by LLMs even when we ask questions that were not part of the training. Indeed, we found that the amount of memorized content was independent from the questions being part of the training or not.

openreview.net/forum?id=ddo...

10.09.2025 07:12 β€” πŸ‘ 6    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
Why language models hallucinate OpenAI’s new research explains why language models hallucinate. The findings show how improved evaluations can enhance AI reliability, honesty, and safety.

This just in: Leading AI firm discovers confidence thresholds. More on this exciting development in news at 11.

openai.com/index/why-la...

(Honestly, OpenAI!?)

09.09.2025 06:31 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Original post on mastodon.social

Scientific impact and achievement, redefined:
Huge congrats to #Fraunhofer IIS on winning an #Emmy for their JPEG XS compression standard πŸ†πŸŽ‰ […]

04.09.2025 07:58 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

re

(I miss IRC)

(Now I feel old)

01.09.2025 06:38 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Dear all,

please enjoy your complementary "European Professor goes on Holiday" message.

See you in September.

Yours sincerely,
A European Professor

08.08.2025 18:04 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image Post image

Good news (for me!) my gender bias paper from 2023 still replicates with GPT-5.
Bad news (for everyone!) my gender bias paper from 2023 still replicates with GPT-5.
arxiv.org/pdf/2308.14921
hkotek.com/blog/gender-...

08.08.2025 01:19 β€” πŸ‘ 154    πŸ” 47    πŸ’¬ 1    πŸ“Œ 3

I wonder what my PhD students will think, once they discover that "someone" glued the three laws to the wall in the hallway. πŸ™ƒ

06.08.2025 14:01 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

Newton's Laws of Graduation, Part 2 - The Second Law

04.08.2025 18:47 β€” πŸ‘ 46    πŸ” 10    πŸ’¬ 1    πŸ“Œ 1
Post image

Newton's Laws of Graduation, Part 3 - The Third Law πŸ˜†

06.08.2025 12:50 β€” πŸ‘ 46    πŸ” 9    πŸ’¬ 3    πŸ“Œ 0

Success, a luxury problem, and its solution:
πŸŽ‰ Our quiz is a huge success and incredibly popular on YouTube with now over 100,000 views.
😐 We cannot answer all the feedback and comments individually anymore.
πŸ˜€ We write a follow up article to answer the most important questions.

29.07.2025 09:32 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Partial Colexifications Improve Concept Embeddings Arne Rubehn, Johann-Mattis List. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025.

It is official, our two long papers at #ACL2025 have now been published. Common work with Arne Rubehn (Concept Embeddings), and Frederic Blum and @sherbold.bsky.social (Automated Language Affiliation).

aclanthology.org/2025.acl-lon...
aclanthology.org/2025.acl-lon...

23.07.2025 10:40 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

My debut as TV-Show moderator - now live on Youtube.

We had a lot of fun with how the five professors answered questions on topics ranging from 90's music, counting peas, size of Asian countries, etc.

The only drawback: it is only available in German.

P.S. The humans won.

21.07.2025 09:21 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Quizshow 5 gegen KI - Professorenteam tritt gegen KI an
YouTube video by UniversitΓ€t Passau Quizshow 5 gegen KI - Professorenteam tritt gegen KI an

Wie schlΓ€gt sich KI gegen professorale Expertise? Die Quiz-Show unter Moderation von @sherbold.bsky.social ist nun in voller LΓ€nge online. Wer sich vorab selbst mit der KI messen mΓΆchte, kann dies per Online-Quiz tun: www.digital.uni-passau.de/beitraege/20...

#KeepCALLM #5gegenKI

21.07.2025 08:46 β€” πŸ‘ 2    πŸ” 2    πŸ’¬ 0    πŸ“Œ 1

That was so much fun. I look forward to the video πŸ˜ƒ

18.07.2025 08:22 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

I'll just leave that quote here ...

16.07.2025 10:34 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0

Yesterday: Let's try to ground AI models in reality.
Now: Let's try to ground reality on AI models.

Fixes a lot of issues. I am impressed. πŸ˜…

They should call it LLM as a Physicists, then it gets accepted by the community ... right? (Looking at you, everybody trusting LLM as a judge!)

16.07.2025 11:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Math Mutator (MAMUT) Accepted at TMLR

Happy to share that we published MAMUT @tmlrorg.bsky.social. We defined multiple data augmentation approaches to get more diverse mathematical data and show this improves pre-training.

Congrats to my student Jonathan Drechsel for his first publication! πŸŽ‰

www.fim.uni-passau.de/en/ai-engine...

14.07.2025 14:34 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

No need, I can already feel the @icseconf.bsky.social paper bidding approaching πŸ™ƒ

11.07.2025 16:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Starting the weekend on a Friday at 4pm with an empty inbox feels kind of strange. Good, but strange.

11.07.2025 14:09 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

How much energy is needed to generate an image? 🎨🧠⚑️
Up to 4.08 Wh β€” like charging your phone to 40%!
In our new study we tested 17 models & 9,000+ runs.

Other key finds:
⚑️ Model energy use varies up to 46x
πŸ“ Resolution matters, prompts don't
πŸ› οΈ Quantization β‰  savings

πŸ“„ Preprint: lnkd.in/dKWWAETW

08.07.2025 15:55 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 1
Ringvorlesung - Sprachmodelle: ein neuer Zugang zum Recht?
YouTube video by UniversitΓ€t Passau Ringvorlesung - Sprachmodelle: ein neuer Zugang zum Recht?

KΓΆnnen #LLMs einen neuen Zugang zum Recht erΓΆffnen? DarΓΌber spricht Brian Valerius, Professor fΓΌr #KI im #Strafrecht, mit Rechtsanwalt Sven Galla, der KI bereits in der Praxis einsetzt.

πŸ“… Donnerstag, 10. Juli, 18 Uhr
πŸ“ HΓΆrsaal 13

Mehr Infos: www.digital.uni-passau.de/generative-s...

#KeepCALLM

07.07.2025 10:35 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

The truly impressive thing about Zoom is that whenever they update the UI, it gets worse.

04.07.2025 07:02 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Evaluating Large Language Models on Non-Code Software Engineering Tasks Large Language Models (LLMs) have demonstrated remarkable capabilities in code understanding and generation; however, their effectiveness on non-code Software Engineering (SE) tasks remains underexplo...

New pre-print: If you are wondering which models are good for non-code software engineering tasks, take a look at this work from my student Fabian Pena.

Also: Look at it if you want to know how to use Bayesian stats for ranking models.

arxiv.org/abs/2506.10833

26.06.2025 14:03 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Reviews so far this year: 11 journal papers, 8 conference papers, and 6 registered report protocols.

And I already feel like I decline almost all incoming requests...

26.06.2025 13:59 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Congrats especially also to our main author Frederic Blum!

25.06.2025 20:21 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Wow, our ACL paper with @lingulist.de is selected as oral presentation at the ACL main conference - less than 10 percent of the accepted papers get this honor 🀯

25.06.2025 20:20 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1
Preview
Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task This study explores the neural and behavioral consequences of LLM-assisted essay writing. Participants were divided into three groups: LLM, Search Engine, and Brain-only (no tools). Each completed thr...

❝Over four months, LLM users consistently underperformed at neural, linguistic, and behavioral levels. These results raise concerns about the long-term educational implications of LLM reliance and underscore the need for deeper inquiry into AI's role in learning.❞ arxiv.org/abs/2506.08872

16.06.2025 07:49 β€” πŸ‘ 161    πŸ” 74    πŸ’¬ 2    πŸ“Œ 20
Preview
Ars Technica News and reviews, covering IT, AI, science, space, health, gaming, cybersecurity, tech policy, computers, mobile devices, and operating systems.

The big lawsuit from Disney and Co finally has arrived. This will be interesting: arstechnica.com/ai/2025/06/i...

11.06.2025 19:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@sherbold is following 19 prominent accounts