Thumbnail: Accelerating Tabular Inference: Training Data Generation with TENET
Vol:18 No:12 → Accelerating Tabular Inference: Training Data Generation with TENET
👥 Authors: Enzo Veltri, Donatello Santoro, Jean-Flavien Bussotti, Paolo Papotti
📄 PDF: https://www.vldb.org/pvldb/vol18/p5303-veltri.pdf
04.09.2025 04:00 — 👍 2 🔁 2 💬 0 📌 0
Can We Trust the Judges? This is the question we asked in validating factuality evaluation methods via answer perturbation. Check out the results at the #EvalLLM2025 workshop at #TALN2025
Blog: giovannigatti.github.io/trutheval/
Watch: www.youtube.com/watch?v=f0XJ...
Play: github.com/GiovanniGatt...
30.06.2025 12:55 — 👍 3 🔁 1 💬 0 📌 0
Kudos to my amazing co-authors Dario Satriani, Enzo Veltri, Donatello Santoro! Another great collaboration between Università degli Studi della Basilicata and EURECOM 🙌
#LLM #Factuality #Benchmark #RelationalFactQA #NLP #AI
02.06.2025 14:51 — 👍 2 🔁 0 💬 0 📌 0
Structured outputs power analytics, reporting, and tool-augmented agents. This work exposes where current LLMs fall short and offers a clear tool for measuring progress on factuality beyond single-value QA. 📊
02.06.2025 14:51 — 👍 1 🔁 0 💬 1 📌 0
We release a new factuality benchmark with 696 annotated natural-language questions paired with gold factual answers expressed as tables (avg. 27 rows × 5 attributes), spanning 9 knowledge domains, with controlled question complexity and rich metadata.
02.06.2025 14:51 — 👍 0 🔁 0 💬 1 📌 0
Our new paper, "RelationalFactQA: A Benchmark for Evaluating Tabular Fact Retrieval from Large Language Models", measures exactly this gap.
Wider or longer output tables = tougher for all LLMs! 🧨
From Llama 3 and Qwen to GPT-4, no LLM goes above 25% accuracy on our stricter measure.
02.06.2025 14:51 — 👍 0 🔁 0 💬 1 📌 0
RelationalFactQA: A Benchmark for Evaluating Tabular Fact Retrieval from Large Language Models
Factuality in Large Language Models (LLMs) is a persistent challenge. Current benchmarks often assess short factual answers, overlooking the critical ability to generate structured, multi-record tabul...
Ask any LLM for a single fact and it’s usually fine.
Ask it for a rich list and the same fact is suddenly missing or hallucinated because the output context got longer 😳
LLMs exceed 80% accuracy on single-value questions but accuracy drops linearly with the # of output facts
New paper, details 👇
02.06.2025 14:51 — 👍 8 🔁 0 💬 1 📌 0
and a special thanks to
@tanmoy-chak.bsky.social for leading this effort!
01.06.2025 08:43 — 👍 5 🔁 1 💬 0 📌 0
More co-authors here on bsky
@iaugenstein.bsky.social
@preslavnakov.bsky.social
@igurevych.bsky.social
@emilioferrara.bsky.social
@fil.bsky.social
@giovannizagni.bsky.social
@dcorney.com
@mbakker.bsky.social
@computermacgyver.bsky.social
@irenelarraz.bsky.social
@gretawarren.bsky.social
01.06.2025 08:43 — 👍 4 🔁 1 💬 1 📌 0
It’s time we rethink how "facts" are negotiated in the age of platforms.
Excited to hear your thoughts!
#Misinformation #FactChecking #SocialMedia #Epistemology #HCI #DigitalTruth #CommunityNotes
arxiv.org/pdf/2505.20067
01.06.2025 07:48 — 👍 5 🔁 0 💬 1 📌 0
Community-based moderation offers speed & scale, but also raises tough questions:
– Can crowds overcome bias?
– What counts as evidence?
– Who holds epistemic authority?
Our interdisciplinary analysis combines perspectives from HCI, media studies, & digital governance.
01.06.2025 07:48 — 👍 2 🔁 1 💬 1 📌 0
Platforms like X are outsourcing fact-checking to users via tools like Community Notes. But what does this mean for truth online?
We argue this isn’t just a technical shift — it’s an epistemological transformation. Who gets to define what's true when everyone is the fact-checker?
01.06.2025 07:48 — 👍 9 🔁 4 💬 1 📌 0
🚨 𝐖𝐡𝐚𝐭 𝐡𝐚𝐩𝐩𝐞𝐧𝐬 𝐰𝐡𝐞𝐧 𝐭𝐡𝐞 𝐜𝐫𝐨𝐰𝐝 𝐛𝐞𝐜𝐨𝐦𝐞𝐬 𝐭𝐡𝐞 𝐟𝐚𝐜𝐭-𝐜𝐡𝐞𝐜𝐤𝐞𝐫?
new "Community Moderation and the New Epistemology of Fact Checking on Social Media"
with I Augenstein, M Bakker, T. Chakraborty, D. Corney, E
Ferrara, I Gurevych, S Hale, E Hovy, H Ji, I Larraz, F
Menczer, P Nakov, D Sahnan, G Warren, G Zagni
01.06.2025 07:48 — 👍 16 🔁 8 💬 1 📌 0
🌟 New paper alert! 🌟
Our paper, "Retrieve, Merge, Predict: Augmenting Tables with Data Lakes", has been published in TMLR!
In this work, we created YADL (a semi-synthetic data lake), and we benchmarked methods for augmenting user-provided tables given information found in data lakes.
1/
19.05.2025 15:43 — 👍 6 🔁 3 💬 2 📌 2
Thanks for the amazing work to the whole team!
Joint work between Università degli Studi della Basilicata (Enzo Veltri, Donatello Santoro, Dario Satriani) and EURECOM (Sara Rosato, Simone Varriale).
#SQL #DataManagement #QueryOptimization #AI #LLM #Databases #SIGMOD2025
05.05.2025 18:03 — 👍 1 🔁 0 💬 0 📌 0
GitHub - dbunibas/galois: Galois
Galois. Contribute to dbunibas/galois development by creating an account on GitHub.
The principles in Galois – optimizing for quality alongside cost & dynamically acquiring optimization metadata – are a promising starting point for building robust and effective declarative data systems over LLMs. 💡
Paper and code: github.com/dbunibas/gal...
05.05.2025 18:03 — 👍 1 🔁 0 💬 1 📌 0
This cost/quality trade-off is guided by dynamically estimated metadata instead of relying on traditional stats.
Result: Significant quality gains (+29%) without prohibitive costs. Works across LLMs & for internal knowledge + in-context data (RAG-like setup, reported results in the figure). ✅
05.05.2025 18:03 — 👍 0 🔁 0 💬 1 📌 0
With our Galois system, we show one path to adapt database optimization for LLMs:
🔹 Designing physical operators tailored to LLM interaction nuances (e.g., Table-Scan vs Key-Scan in the figure).
🔹 Rethinking logical optimization (like pushdowns) for a cost/quality trade-off.
05.05.2025 18:03 — 👍 0 🔁 0 💬 1 📌 0
Why do traditional methods fail? They prioritize execution cost & ignore crucial LLM response quality (factuality, completeness).
Our results show standard techniques like predicate pushdown can even reduce result quality by making LLM prompts more complex to process accurately. 🤔
05.05.2025 18:03 — 👍 0 🔁 0 💬 1 📌 0
Our new @sigmod2025.bsky.social paper tackles a fundamental challenge for the next gen of data systems: "Logical and Physical Optimizations for SQL Query Execution over Large Language Models" 📄
As systems increasingly use declarative interfaces on LLMs, traditional optimization falls short
Details 👇
05.05.2025 18:03 — 👍 5 🔁 0 💬 1 📌 0
Alberto Sánchez Pérez (AILY LABS) will explain how we generate high-level hypotheses, use an agent to query databases via SQL, and summarize the findings into concise, correct, and insightful text.
Joint work with Alaa Boukhary, Luis Castejón Lozano, Adam Elwood
30.04.2025 09:34 — 👍 2 🔁 0 💬 0 📌 0
Presenting at #NAACL2025 today (April 30th) 🎤
⏰ 11:00 Session B
Our work, "An LLM-Based Approach for Insight Generation in Data Analysis," uses LLMs to automatically find insights in databases, outperforming baselines both in insightfulness and correctness
Paper: arxiv.org/abs/2503.11664
Details 👇
30.04.2025 09:34 — 👍 7 🔁 2 💬 1 📌 0
Work led by @spapicchio.bsky.social , in collaboration with Simone Rossi (EURECOM) and Luca Cagliero (Politecnico Torino)
#Text2SQL #LLM #AI #NLP #ReinforcementLearning
29.04.2025 12:24 — 👍 2 🔁 0 💬 0 📌 0
Key Insights:
🔹 General ZSL reasoning alone is insufficient
🔹 Smaller LLMs gain more from SFT with reasoning traces compared to larger models
🔹 RL consistently improves performance, especially with our fine-grained rewards
🔹 SFT+RL is highly effective for smaller models
29.04.2025 12:24 — 👍 2 🔁 0 💬 1 📌 0
We evaluate 4 training strategies:
1️⃣ Zero-Shot Learning (ZSL) +/- general-purpose reasoning
2️⃣ Supervised Fine Tuning (SFT) +/- task-specific reasoning traces
3️⃣ Reinforcement Learning (RL) with EXecution accuracy (EX) vs. our fine-grained rewards
4️⃣ Combined SFT+RL approach
29.04.2025 12:24 — 👍 2 🔁 0 💬 1 📌 0
Think2SQL: Bridging the Reasoning Gap in Text-to-SQL for Small LLMs
Leveraging RL with our reward mechanism, we push Qwen-Coder-2.5 7B to performance on par with much larger LLMs (>400B) on the BIRD dataset! 🤯
Model: huggingface.co/simone-papic...
Paper: huggingface.co/papers/2504....
Details 👇
29.04.2025 12:24 — 👍 4 🔁 1 💬 1 📌 0
Our method reduces the inference latency dramatically, making it 2x faster than RAG and practical for real-world scenarios.🚀
11.03.2025 17:16 — 👍 2 🔁 0 💬 1 📌 0
Experiments on our synthetic dataset, LongBench-v2, and LongBench show KVCompress excels at broad queries needing information synthesis, while RAG suits narrow queries. For query-agnostic tasks like summarization and code completion, KVCompress strongly outperforms RAG.
11.03.2025 17:16 — 👍 2 🔁 0 💬 1 📌 0
Inspired by students condensing notes for exams, our method compresses offline an entire corpus using just a task description and few-shot examples. This compressed cache then efficiently answers multiple queries online without recompression.
11.03.2025 17:16 — 👍 2 🔁 0 💬 1 📌 0
Decoherence Media investigates authoritarian and anti-democratic movements using open-source and data-driven methods.
Website: https://decoherence.media/
Subscribe: https://decoherence.media/#/portal/signup
Pearl Jam's Official Bluesky Account. Dark Matter, the 12th studio album from Pearl Jam. https://pearljam.lnk.to/darkmatter
Cofounder & CTO @ Abridge, Raj Reddy Associate Prof of ML @ CMU, occasional writer, relapsing 🎷, creator of d2l.ai & approximatelycorrect.com
Assistant professor at UMass Amherst CS department
Assist. Prof. @USC CS, Prev. Postdoc @MIT CSAIL, PhD @UMNComputerSci
Professor, Santa Fe Institute. Research on AI, cognitive science, and complex systems.
Website: https://melaniemitchell.me
Substack: https://aiguide.substack.com/
Postdoc @ University of Copenhagen (CopeNLU) | Making the world's knowledge reliable and accessible w/ ML + NLP | Former UMSI, AI2, IBM Research, UCSD | https://dustinbwright.com
Researcher, UCalgary
PhD, Political Science
Dr. Cronk, Historian of Complaints. I've written a book, The Press Gallery (release TBD).
Headline of the Year! Lots going on.
Influence, tendances, crises : nous cartographions les dynamiques des réseaux sociaux et leurs communautés d’influence
https://agoratlas.com/
Programmation compétitive, cybersécurité & IA.
Co-fondateur & CTO @agoratlas.com, agence d'analyse des réseaux sociaux à grande échelle.
Climatologist and Meteorologist at the Italian National Research Council (CNR) and Lamma Consortium.
stealth // Gemini RL+inference @ Google DeepMind // Conversational AI @ Meta // RL Agents @ EA // ML+Information Theory @ MIT+Harvard+Duke // Georgia Tech PhD // زن زندگی آزادی
📍{NYC, SFO, YYZ}
🔗 https://beirami.github.io/
Bande dessinée • Éducation artistique & culturelle • Formation • Festival & Book fair Formula Bula septembre 2025, Paris-France
https://linktr.ee/formulabula
Storm chaser. Volcano hunter. University Physics, Astronomy, & Earth Science Instructor. MS Physics/BS Astrophysics. Rockhounder & Mineral Collector. Raccoon whisperer. Allow me to share our planet's beauty with you. linktr.ee/spahn711
Professor of Practical Philosophy, Stockholm University & Institute for Futures Studies. He/Him. Author of How Economics Can Save the World 🛟 a.co/d/0AZASJz Web: https://linktr.ee/erikangner Opinions &c. my own. Agent: JP Marshall #Econsky #Philsky
AI and Games Researcher at NYU.
Teaching and writing media studies at CU Boulder. Helping to build a cooperative fediverse with Social.coop. Fan of democratic experiences and divine mysteries. Co-leading metagov.org, start.coop, wagingnonviolence.org.
Professor at ICTA-UAB and Visiting Senior Fellow at LSE • Author of THE DIVIDE and LESS IS MORE • Global inequality, political economy and ecological economics
Your home for AS Roma news, analysis, opinion, and occasional Francesco Totti idolatry.
chiesaditotti.com