Are you into Chip Design, EDA or just like to do RTL code for fun? Check out the largest benchmarking of LLMs for Verilog generation: TuRTLe π’
It includes 40 open LLMs evaluated on 4 benchmarks, following 5 tasks. And its only growing!
huggingface.co/spaces/HPAI-...
arxiv.org/abs/2504.01986
03.06.2025 16:08 β π 2 π 2 π¬ 0 π 0
So many healthcare LLMs, and yet so little information! Check out this table summarizing contributions, and find more details in our latest pre-print: arxiv.org/abs/2505.04388
22.05.2025 12:18 β π 0 π 0 π¬ 0 π 0
Biology needs to be reduced to fundamental components so that mysticism and religion do not corrupt its might.
Same happens to LLMs. These are not thinking machines, in any definition of thinking we may agree on. And the NTP reduction clearly shows that.
21.05.2025 08:22 β π 1 π 0 π¬ 0 π 0
Mostly agree here, though I would rather use the word mimic than persuade. Persuade entails a purpose, which I'm not sure LLMs have. That, is, does a mathematical function have a purpose?
21.05.2025 08:11 β π 9 π 0 π¬ 0 π 0
Exactly! The most effective control measure, RAG, is still a patch that can provide no technical guarantee. Just a strong bias that models may not follow.
The sooner we understand the limits of LLMs, the sooner we'll learn to deploy them properly.
21.05.2025 08:08 β π 1 π 0 π¬ 0 π 0
The Aloe Beta preprint includes full details on data & training setup.
Plus four different evaluation methods (including medical expert).
Plus a risk assessment of healthcare LLMs.
Two years of work condensed in a few pages, figures and tables.
Love open research!
huggingface.co/papers/2505....
21.05.2025 08:06 β π 0 π 0 π¬ 0 π 0
350_25_CS_AIR_RE2
Reference: 350_25_CS_AIR_RE2
Job title: Research Engineer - AI Factory (RE2)
About BSC
The Barcelona Supercomputing Center - Centro Nacional de SupercomputaciΓ³n (BSC-
We just opened two MLOps Engineer positions at @bsc-cns.bsky.social
Our active and young research team needs someone to help sustain and improve our services, including HPC clusters, automated pipelines, artifact managements and much more!
Are you up for the challenge?
www.bsc.es/join-us/job-...
06.05.2025 17:03 β π 0 π 0 π¬ 0 π 0
Last week our team presented this at NAACL. Check out the beautiful poster they put together π
06.05.2025 16:39 β π 1 π 0 π¬ 0 π 0
Though both data sources have the same origin (visual inspection of embryo change) I'd expect features found by humans and features found by a neural net to be complementary.
I guess the intrinsic variance is what dominates here. We can only know so much about an embryo by just looking at it.
11.04.2025 14:35 β π 0 π 0 π¬ 0 π 0
Working on a project for evaluating embryo quality using in-vitro fertilization data.
A random forest using morphokinetic features of embryo evolution visually annotated by experts, and a CNN directly using static images get similar performance. Separately AND together.
I find it surprising...
11.04.2025 14:35 β π 0 π 0 π¬ 1 π 0
There are quite a lot of researchers who a so preoccupied with whether or not they could get the funding, they don't stop to think if they should.
Being chased by dinosaurs and writing grants. Same thing.
09.04.2025 08:32 β π 1 π 0 π¬ 0 π 0
The recipe is simple π§βπ³ :
1. A good open model π
2. A properly tuned RAG pipeline π
And you will be cooking a five star AI system β β β β β
See you on the AIAI 2025 conference, where we will be presenting this work, done at @bsc-cns.bsky.social and @hpai.bsky.social
04.04.2025 14:35 β π 2 π 0 π¬ 0 π 0
How expensive π«° is it to get the best LLM performance? How much cash needs to burn πΈ to get reliable responses? Pareto optimal plots answer these questions.
Our research shows it is economically feasible and scalable to achieve O1 level performance at a fraction of the cost.
buff.ly/ji1VHiV
04.04.2025 14:35 β π 2 π 0 π¬ 1 π 0
Our LLM safety project, Egida, reached 2K downloads π
It includes +60K safety questions expanded with jailbreaking prompts.
The four models trained (and released) show strong signs of safety alignment and generalization capacity. Check out the π€ HF page and the paper for details!
buff.ly/kxFVyl2
01.04.2025 21:11 β π 0 π 0 π¬ 0 π 0
TuRTLe Leaderboard - a Hugging Face Space by HPAI-BSC
A Unified Evaluation of LLMs for RTL Generation.
Today we release the TuRTLe leaderboard! π’
Are you in the Chip Design or EDA business? Wanna know which LLMs are best for the task? By integrating 4 benchmarks, TuRTLe evaluates:
* Syntax
* Functionality
* Synthesizability
* Power, Performance and Area metrics
huggingface.co/spaces/HPAI-...
01.04.2025 14:15 β π 1 π 0 π¬ 0 π 0
Disclaimer: Only text questions were used to evaluate LLMs, unlike students. Student's score computed under the assumption that all questions were answered, which may not be the case.
buff.ly/3Xa9gFc
03.03.2025 15:35 β π 0 π 0 π¬ 0 π 0
HPAI-BSC/CareQA Β· Datasets at Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
MIR is Spain's medical entrance exam. Best students reach an estimate accuracy of +90. Two or three every year.
We took MIR, '20-'24 to test open LLMs. Llama 3.1 based models, like Aloe, reach +80 in accuracy.
Deepseep R1 reaches +88. Boosted by a RAG system, 92.
buff.ly/4bbbXMw
buff.ly/4hLrhBV
03.03.2025 15:35 β π 0 π 0 π¬ 1 π 0
After listening to the latest @fallofcivspod.bsky.social episode about the Mongolian Empire, by @paulcooper34.bsky.social , I realized Mongols and the Fremen from Dune share remarkable similarities.
Skilled warriors adapted to harsh environments, taking over a society they don't want to adopt.
28.02.2025 08:08 β π 1 π 0 π¬ 1 π 0
I like the cell one π
I'm an empiricist, so we attack metrics by developing adversarial benchmarks that expose model shortcuts. Plus, its a lot of fun to show how fragile LLMs can be.
23.02.2025 18:26 β π 1 π 0 π¬ 0 π 0
While writing a paper I consistently learn general insights that are too general or not tested enough to be sold as paper contributions, but are great for conversation :)
23.02.2025 09:13 β π 1 π 0 π¬ 1 π 0
Human evaluation of LLMs is close to saturation. Models have been optimized so much for plausibility, that we are unable to tell good from bad. Only experts in expert domains can see a meaningful difference.
21.02.2025 17:41 β π 0 π 0 π¬ 0 π 0
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
Current Large Language Models (LLMs) benchmarks are often based on open-ended or close-ended QA evaluations, avoiding the requirement of human labor. Close-ended measurements evaluate the factualityβ¦
After a year working on LLM evaluation, our benchmarking paper is finally out (to be presented at NAACL 2025). Main lessons:
* All LLM evals are wrong, some are slightly useful.
* Goodhart's law. All the time. Everywhere.
* Do lots of different evals and hope for the best.
21.02.2025 15:35 β π 2 π 0 π¬ 1 π 1
Evaluating LLMs is a bit like paleontology. Trying to understand the behavior of very complex entities by observing only noisy and partial evidence. How do paleontologists deal with the uncertainty and frustration? Do they also feel like doing alchemy instead of science?
21.02.2025 10:49 β π 0 π 0 π¬ 0 π 0
Remarkable effort. Questionable motivation.
19.02.2025 20:45 β π 0 π 0 π¬ 0 π 0
Wisdom from my 6y old daughter: "A king is a just person disguised as king."
19.02.2025 19:08 β π 0 π 0 π¬ 0 π 0
Over and over again I keep finding @sarahooker.bsky.social papers to reference. This time about ELO rankings. She's always 2-3 years ahead...
19.02.2025 15:12 β π 1 π 0 π¬ 0 π 0
Summary of LLM learning methods
So many keywords around LLM training, its easy to get lost.
For an incoming paper, did this little visual summary. Would you change anything?
18.02.2025 17:48 β π 0 π 0 π¬ 0 π 0
CADL 2025
Advancing AI Through Efficient Computing
Over the past decade, Deep Learning (DL) has revolutionized numerous research fields, transforming AI into a computational science where massive models are tra...
5th International Workshop on Computational Aspects of Deep Learning (CADL) to be held in conjunction with ISC-HPC 2025.
10 days to go, and an award to be decided!
Submit your paper and join us sites.google.com/view/cadl2025/
17.02.2025 09:06 β π 0 π 0 π¬ 0 π 0
Only two weeks until the deadline!
Submit your paper and see you in Germany :)
12.02.2025 17:12 β π 0 π 0 π¬ 0 π 0
Bring it on. Totally prepared for another lockdown.
10.02.2025 20:08 β π 0 π 0 π¬ 0 π 0
La ΓΊnica pregunta tonta es la que no se hace β’ Ciencia en espaΓ±ol β’ Naukas β’ Coffee Break: seΓ±al y ruido β’ Ciencia con Tres enCantos β’ Iberozoa β’ AstrΓ³briga β’ Editor: Juan Carlos Gil Montoro β’ https://apuntesciencia.start.page/
Miguel Γngel Cajigal Vera | Historiador del Arte | MΓΊsico | Patrimonio Cultural y Museos
I work at Sakana AI ππ π‘ β @sakanaai.bsky.social
https://sakana.ai/careers
Passionate about AI & Journalism / Previously @hf.co @radiocanadainfo @ledevoir & others
A history podcast by @PaulMMCooper.com looking at a different collapsed society each episode.
patreon.com/fallofcivilizations_podcast
All about Computational Biology and HPC - from the dark side (yes, we have cookies!) BSCing!
El artista antes conocido como El Portadas.
ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT). June 2026 in Montreal, Canada π¨π¦ #FAccT2026
https://facctconference.org/
Revista Contexto y AcciΓ³n. Periodismo en libertad. SUSCRΓBETE en http://agora.ctxt.es/suscripciones/ π§ SΓguenos en http://linktr.ee/ctxt.es
En Bluesky desde 2008. De gira en http://www.facudiaz.net
π¬ hola@nerealuis.es
π§ Doctora en Inteligencia Artificial
π‘Cofundadora @t3chfest
πͺTecnologΓa en @orbitalaika_tve
Me dedico a la divulgaciΓ³n, asesorΓa y consultorΓa sobre inteligencia artificial para ayudar a empresas y organizaciones β‘οΈ nerealuis.es
Association for Uncertainty in AI.
Upcoming conference: #uai2025 July 21-25th in Rio de Janeiro, Brazil π§π· !
https://auai.org/uai2025
The 2025 Conference on Language Modeling will take place at the Palais des Congrès in Montreal, Canada from October 7-10, 2025
19th International conference on Neurosymbolic Learning and Reasoning
UC Santa Cruz, Santa Cruz, California
8 to 10 September 2025
https://nesy-ai.org/
https://2025.nesyconf.org
International Conference on Learning Representations https://iclr.cc/
San Diego Dec 2-7, 25 and Mexico City Nov 30-Dec 5, 25. Comments to this account are not monitored. Please send feedback to townhall@neurips.cc.
Official Account for the European Conference on Computer Vision (ECCV) #ECCV2026, Malmo πΈπͺ Hosted by @jbhaurum and @CSProfKGD
Official account for the IEEE/CVF International Conference on Computer Vision. #ICCV2025 Honolulu πΊπΈ Co-hosted by @natanielruiz @antoninofurnari @yaelvinker @CSProfKGD
The Association for Computational Linguistics (ACL) is a scientific and professional organization for people working on Natural Language Processing/Computational Linguistics.
Hash tags: #NLProc #ACL2025NLP