Excited to be at ACL! Join us at the Table Representation Learning workshop tomorrow in room 2.15 to talk about tables and AI.
We also present a paper showing the sensitivity of LLMs in tabular reasoning to e.g. missing vals and duplicates, by @cowolff.bsky.social at 16:50: arxiv.org/abs/2505.07453
30.07.2025 22:24 β π 6 π 3 π¬ 1 π 0
π§ͺ Paper link: arxiv.org/pdf/2505.07453
π
Iβm presenting Thursday, July 31st at the TRL workshop
Iβll be around all week, so if youβre also interested in tabular learning/understanding and insight retrieval, feel free to reach out to me. I would be happy to connect! (4/4)
25.07.2025 15:06 β π 0 π 0 π¬ 0 π 0
Turns out:
πΉ BLEU/BERTScore? Not reliable for evaluating tabular QA capabilities
πΉ LLMs often struggle with missing values, duplicates, or structural alterations
πΉ We propose an LLM-as-a-judge method for a more realistic evaluation of the LLMs tabular reasoning capabilities (3/4)
25.07.2025 15:06 β π 0 π 0 π¬ 1 π 0
The paper's called:
"How well do LLMs reason over tabular data, really?" π
We dig into two important questions:
1οΈβ£ Are general-purpose LLMs robust with real-world tables?
2οΈβ£ How should we actually evaluate them? (2/4)
25.07.2025 15:06 β π 1 π 1 π¬ 1 π 0
Headed to Vienna for ACL and the 4th Tabular Representation Learning Workshop! π¦πΉ
Super excited to be presenting my first PhD paper there π (1/4)
25.07.2025 15:06 β π 1 π 0 π¬ 1 π 0
Huge thanks to @madelonhulsebos.bsky.social for all the support on getting this work off the ground on such short notice after I started my PhD π
And I am excited to keep building on this research!
π Paper link: arxiv.org/pdf/2505.07453
28.05.2025 10:03 β π 1 π 0 π¬ 0 π 0
What did we find?
Even on simple tasks like look-up, LLM performance drops significantly as table size increases.
And even on smaller tables, results leave plenty of room for improvement, highlighting major gaps in LLMs' understanding of tabular data and the need for more research on this topic.
28.05.2025 10:03 β π 1 π 0 π¬ 1 π 0
Furthermore, we extended the existing TQA-Benchmark with some common data perturbations like Missing Values, Duplicates and Column Shuffling.
Using this dataset and the LLM-as-a-judge, we tested the response accuracy to basic reasoning tasks like look-ups, subtractions, averages etc.
28.05.2025 10:03 β π 0 π 0 π¬ 1 π 0
But only measuring if an answer from a LLM is actually correct turned out to be surprisingly tricky.
π The standard metrics? BLEU, BERTScore?
They fail to capture the correctness of the outputs given in this space.
So we introduced an alternative:
An LLM-as-a-judge to assess responses more reliably.
28.05.2025 10:03 β π 0 π 0 π¬ 1 π 0
Tables are everywhere, but so are LLMs these days!
But what happens when these two meet? Do LLMs actually understand tables, when they encounter them for example in a RAG pipeline?
Most benchmarks donβt test this well. So we decided to dig deeper.π
28.05.2025 10:03 β π 0 π 0 π¬ 1 π 0
"Can LLMs really reason over tabular data, really?"
Thatβs the title and central question of my first paper in my new role as a PhD student, which has been accepted to the 4th Table Representation Learning Workshop @ ACL 2025! arxiv.org/pdf/2505.07453
π§΅Hereβs what we found:
28.05.2025 10:03 β π 2 π 1 π¬ 1 π 0
Open positions | TRL Lab
Eager to contribute to democratizing insights from tabular data? We have 2 new PhD openings! β¨
1) Fundamental Techniques in Table Representation Learning
2) Reliable AI-powered Tabular Data Analysis Systems
β° Apply by: 30 June 2025
π
Start: Fall/Winter 2025
π Info: trl-lab.github.io/open-positions
22.05.2025 18:56 β π 4 π 3 π¬ 0 π 0
Details about the seminar talk titled TabICL: A Tabular Foundation Model for In-Context Learning on Large Data by Marine Le Morvan
Excited to share the new monthly Table Representation Learning (TRL) Seminar under the ELLIS Amsterdam TRL research theme! To recur every 2nd Friday.
Who: Marine Le Morvan, Inria (in-person)
When: Friday 11 April 4-5pm (+drinks)
Where: L3.36 Lab42 Science Park / Zoom
trl-lab.github.io/trl-seminar/
02.04.2025 09:42 β π 12 π 3 π¬ 0 π 1
RL + LLM @ai2.bsky.social; main dev of https://cleanrl.dev/
M.Sc. Cognitive Computing
B.Sc. Business Informatics Data Science
Research & code: Research director @inria
βΊData, Health, & Computer science
βΊPython coder, (co)founder of scikit-learn, joblib, & @probabl.bsky.social
βΊSometimes does art photography
βΊPhysics PhD
RecSys, AI, Engineering; Principal Applied Scientist @ Amazon. Led ML @ Alibaba, Lazada, Healthtech Series A. Writing @ eugeneyan.com, aiteratelabs.com.
evals evals evals. https://evals.info
Associate Prof. of Databases @ Carnegie Mellon.
Assistant Prof of Computer Science and Data Science at UChicago. Research on visualization, HCI, statistics, data cognition. Moonlighting as a musician πΊ https://people.cs.uchicago.edu/~kalea/
UW Computer Science Professor. Data, visualization & interaction. UW Interactive Data Lab, Vega, Ex-Trifacta. Sometimes Seattle, manchmal Berlin.
ποΈπποΈ design & eng @ MotherDuck. UI, statistics, databases. Ex Rill Data, Mozilla
Breakthrough AI to solve the world's biggest problems.
βΊ Join us: http://allenai.org/careers
βΊ Get our newsletter: https://share.hsforms.com/1uJkWs5aDRHWhiky3aHooIg3ioxm
Hummus, people, and data. Co-Founder & CTO of B12. Previously Locu, MIT CSAIL. He/him.
https://marcua.net/
Queens is the future.
Assistant professor at UPenn. Database systems. https://RyanMarc.us
I'm mostly on Mastodon, https://discuss.systems/@ryanmarcus
Prefer common sense over hype. Employed at @marimo.io, building calmcode.io and dearme.email. Also blogs over at https://koaning.io.
chief dashboard officer at dagster, professional birdwatcher, certified vibes analyst
PhD student at @BerkeleySky + https://hydro.run, designing languages for modular and performant distributed systems. Co-organizer https://sfsystemsclub.com
More at https://shadaj.me!