Dan Saattrup Smart's Avatar

Dan Saattrup Smart

@saattrupdan.com.bsky.social

Researcher and consultant in low-resource NLP, with a focus on evaluation. saattrupdan.com

277 Followers  |  929 Following  |  34 Posts  |  Joined: 16.11.2024  |  2.0859

Latest posts by saattrupdan.com on Bluesky

#dktech

14.05.2025 19:51 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

NoDaLiDa 2027 will be held at the Center of Language Technology at the University of Copenhagen!!

#nodalida #nlp

04.03.2025 15:23 β€” πŸ‘ 13    πŸ” 3    πŸ’¬ 0    πŸ“Œ 1

Wanna keep up with our @milanlp.bsky.social lab? Here is a starter pack of current and former members:
bsky.app/starter-pack...

05.03.2025 10:47 β€” πŸ‘ 13    πŸ” 7    πŸ’¬ 0    πŸ“Œ 0

NoDaLiDa x Baltic-HLT 2025 is a wrap!

Thank you all for joining for a fruitful conference! Safe trip home and see you in Copenhagen or Vilnius in 2027!!

#nlp #nodalida #baltichlt

05.03.2025 15:11 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Amazing, well done! Have you conducted any experiments with finetuning LLMs on the data?

06.03.2025 13:44 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
PaDaS-Lab/webfaq Β· Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

WebFAQ: Massive Multilingual Q&A Dataset

- 96M QA pairs extracted from schema.org/FAQPage annotations
- 75 languages with standardized structured markup
- Leverages existing web publisher content intent
- No synthetic data generation needed

huggingface.co/datasets/PaD...

06.03.2025 09:18 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
NoDaLiDa/Baltic-HLT 2025 - Program All times are local (GMT+2/UTC+2). See detailed program below.

πŸš€ Thank you all for waiting! The full program of NoDaLiDa x Baltic-HLT is online:

www.nodalida-bhlt2025.eu/program

#nodalida #baltichlt #nlp #nlproc

18.02.2025 15:26 β€” πŸ‘ 2    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Screenshot of 'SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models.'
SHADES is in multiple grey colors (shades).

Screenshot of 'SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models.' SHADES is in multiple grey colors (shades).

⚫βšͺ It's coming...SHADES. βšͺ⚫
The first ever resource of multilingual, multicultural, and multigeographical stereotypes, built to support nuanced LLM evaluation and bias mitigation. We have been working on this around the world for almost **4 years** and I am thrilled to share it with you all soon.

10.02.2025 08:28 β€” πŸ‘ 129    πŸ” 23    πŸ’¬ 6    πŸ“Œ 3
πŸ‡¬πŸ‡§ English - ScandEval

See the full English leaderboard here: scandeval.com/leaderboards...

You can make your own radial plots, like the one above, using this tool: scandeval.com/extras/radia...

(4/4)

10.02.2025 16:33 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

If we dig down into more granular evaluations, we see that the main discrepancies between the two models lie in that o3-mini gets a higher text classification performance, where gpt-4o performs better at common-sense reasoning.

(3/4)

10.02.2025 16:33 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Overall, the gpt-4o model achieves a slightly better rank score of 1.46, compared to o3-mini's 1.51. Here lower is better, with 1 being the best score possible (indicating that the model beats all other models at all tasks).

We use the default 'medium' reasoning effort of o3-mini here.

(2/4)

10.02.2025 16:33 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Some new evaluation results from the European evaluation benchmark ScandEval! This time of the new o3-mini model by OpenAI - how well does it compare to the existing gpt-4o model on English tasks?

(1/4)

#nlp #evaluation #reasoning #llm #o3

10.02.2025 16:33 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
ScandEval

Check out the full leaderboards on scandeval.com, which also includes results on the Llama-3.3-70B, Qwen2.5-72B, QwQ-32B-preview, Gemma-27B and Nemotron-4-340B.

20.01.2025 14:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

On average, the 405B Llama-3.1 model achieves a solid second place with ScandEval rank of 1.53, where GPT-4-turbo is in the lead with a ScandEval rank of 1.39 πŸŽ‰

20.01.2025 14:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image

However, for Icelandic, Faroese and Norwegian, it's not quite there yet.

20.01.2025 14:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image Post image

For Danish, Swedish, Dutch, German and English, it turns out that it is roughly on par with GPT-4-turbo!

20.01.2025 14:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Recently, we got a lot of new ScandEval evaluations of large LLMs, including the 405B Llama-3.1 model. So how well does it perform?

A 🧡 (1/n)

#llm #evaluation

20.01.2025 14:01 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
The image shows an illustration titled "Hygge Web Data" featuring three cartoon animals - a fox, an owl, and what appears to be a bear or similar animal - sitting at a table or surface reviewing various documents and papers. The style is cute and whimsical, with the animals drawn in a simple, friendly manner. Each animal is looking at different papers with sketched symbols, text, and designs on them. The illustration has a gentle, cozy feel to it, fitting with the "hygge" (Danish concept of coziness and comfort) mentioned in the title.

The image shows an illustration titled "Hygge Web Data" featuring three cartoon animals - a fox, an owl, and what appears to be a bear or similar animal - sitting at a table or surface reviewing various documents and papers. The style is cute and whimsical, with the animals drawn in a simple, friendly manner. Each animal is looking at different papers with sketched symbols, text, and designs on them. The illustration has a gentle, cozy feel to it, fitting with the "hygge" (Danish concept of coziness and comfort) mentioned in the title.

Introducing Scandi-fine-web-cleaner, a decoder model trained to remove low-quality web from FineWeb 2 for Danish and Swedish

- Uses FineWeb-c community annotations
- 90%+ precision + minimal compute required
- Enables efficient filtering of 43M+ documents

huggingface.co/davanstrien/...

13.01.2025 15:48 β€” πŸ‘ 17    πŸ” 4    πŸ’¬ 1    πŸ“Œ 1
Facebook i kovending: Forvent flere vilde opslag – og forvent at blive dummere, advarer ekspert LΓ¦s mere her.

Brugerdrevet faktatjek kan betyde, at minoriteters interesser bliver overset, advarer ITU-lektor @lrossi.bsky.social.

PΓ₯stande om fx grΓΈnlandske forhold risikerer at undslippe faktatjek, simpelthen fordi der er fΓ₯ grΓΈnlandske brugere i forhold til andre grupper.
www.berlingske.dk/kultur/faceb...

09.01.2025 13:12 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

#dkai

28.12.2024 13:14 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
A minimalist illustration showing a packaged charger box labeled "one Union one Charger." The box features an image of a blue charger with the European Union flag symbol and a USB-C cable. The scene is set within a holiday theme, with decorative Christmas trees, ornaments, and gift boxes surrounding the charger box. In the top right corner, there is a small EU flag symbol.

A minimalist illustration showing a packaged charger box labeled "one Union one Charger." The box features an image of a blue charger with the European Union flag symbol and a USB-C cable. The scene is set within a holiday theme, with decorative Christmas trees, ornaments, and gift boxes surrounding the charger box. In the top right corner, there is a small EU flag symbol.

It’s time for THE charger.

Today, the USB-C becomes officially the common standard for charging new mobile electronic devices in the EU.

It means better-charging technology, reduced e-waste, and less fuss to find the chargers you need!

#DigitalEU

28.12.2024 07:09 β€” πŸ‘ 7926    πŸ” 1684    πŸ’¬ 224    πŸ“Œ 380
OpenAl03 (high compute tuned) 1 task = 684 kg COβ‚‚e R Emissions = 5 full tanks of gas

OpenAl03 (high compute tuned) 1 task = 684 kg COβ‚‚e R Emissions = 5 full tanks of gas

"Each task consumed approximately 1,785 kWh of energyβ€”about the same amount of electricity an average U.S. household uses in two months"

This is one per-task estimate from Salesforce's head of sustainability -->>

www.linkedin.com/posts/bgamaz...

28.12.2024 08:44 β€” πŸ‘ 400    πŸ” 136    πŸ’¬ 22    πŸ“Œ 31
A markdown preview within Neovim, showing syntax-highlighted code blocks, including gutter icons for each filetype, and custom rendering of headers, with unique colors for each level and a replacement of the hash syntax (###) with custom icons.

A markdown preview within Neovim, showing syntax-highlighted code blocks, including gutter icons for each filetype, and custom rendering of headers, with unique colors for each level and a replacement of the hash syntax (###) with custom icons.

I'm so impressed with the markview #Neovim plugin. Look at the preview you get out of the box:

github.com/OXY2DEV/mark...

18.12.2024 22:49 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

TII UAE's Falcon 3

1B, 3B, 7B, 10B (Base + Instruct) & 7B Mamba, trained on 14 trillion tokens!

- 1B-Base surpasses SmolLM2-1.7B and matches gemma-2-2b
- 3B-Base outperforms larger models like Llama-3.1-8B and Minitron-4B-Base
- 7B-Base is on par with Qwen2.5-7B in the under-9B category

17.12.2024 15:07 β€” πŸ‘ 14    πŸ” 3    πŸ’¬ 2    πŸ“Œ 1
Post image

40,7% med hjΓ¦lp fra 15 annotators! πŸ‡©πŸ‡°πŸ˜ŽπŸ”₯

Vi er kommet langt men ikke helt i mΓ₯l endnu :) Det drejer sig virkelig ikke om mange annoteringer efterhΓ₯nden.

DrΓΈmmer lidt om at vi kan fΓ₯ en lille slutspurt i lΓΈbet af ugen! HjΓ¦lp til her: data-is-better-together-fineweb-c.hf.space/dataset/5a58...

16.12.2024 08:43 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 2
Video thumbnail

Loving this Neovim plugin ❄️

Source: github.com/marcussimons...

13.12.2024 17:32 β€” πŸ‘ 7    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Dansk er gΓ₯et fra 0.1% -> 12.3% i dag! Det svarer til at 123 tekster er annoteret af 3 personer.

Enhver annotering hjΓ¦lper os med det fΓΈrste mΓ₯l pΓ₯ 1000 tekster :)

Hjælp med til at annotere datasættet her: data-is-better-together-fineweb-c.hf.space/dataset/5a58... #dkai

12.12.2024 11:10 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

Vil du hjælpe med at forbedre kvaliteten af danske sprogmodeller?

VΓ¦r med til at hjΓ¦lpe i annoteringssprintet! Det krΓ¦ver ingen erfaring - bare gΓ₯ ind pΓ₯ linket og begynd med annotering:)

huggingface.co/spaces/data-... #dkai #dktech

LΓ¦ngere opslag pΓ₯ LinkedIn: www.linkedin.com/posts/rasgaa...

10.12.2024 12:11 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Danmark Starter Pack fΓΆr dig i MalmΓΆ Γ–resundsregionen eller bara intresserad av Danmark och danskar.

Nyheter, tidningar, media, politik, organisationer...

#danmark #danskar #kΓΆpenhamn #ΓΆresund #malmΓΆ #skΓ₯ne #nyheter #tidningar #media #politik #starterpack

go.bsky.app/U2VkkfU

03.12.2024 07:11 β€” πŸ‘ 2    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

Announcing πŸ₯‚ FineWeb2: A sparkling update with 1000s of πŸ—£οΈlanguages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

πŸ₯‚ FineWeb2 has 8TB of compressed text data and outperforms other datasets.

08.12.2024 09:19 β€” πŸ‘ 75    πŸ” 19    πŸ’¬ 1    πŸ“Œ 0

@saattrupdan.com is following 20 prominent accounts