We introduce the TableEval benchmark and investigate the effectiveness and robustness of text-based and multimodal LLMs on table understanding through a cross-domain & cross-modality evaluation.
Joint work by DFKI SLT incl. Fabio Barth, Raia Abu Ahmad, @malteos.bsky.social @pjox.bsky.social
26.07.2025 09:35 โ ๐ 2 ๐ 1 ๐ฌ 0 ๐ 0
If you want to help us improve language and cultural coverage, and build an open source LangID system, please register to our shared task on Language Identification! ๐ฌ
Registering is easy! All the details are on the shared task webpage: wmdqs.org/shared-task/
Deadline: July 23, 2025 (AoE) โฐ
21.07.2025 22:40 โ ๐ 2 ๐ 2 ๐ฌ 0 ๐ 0
Just a few days left to contribute annotations before the first release of training data. We have over 17,000 document annotations so far!
09.07.2025 14:21 โ ๐ 3 ๐ 1 ๐ฌ 1 ๐ 0
The deadline for paper submissions has been extended!
The new deadline is July 3, 2025. AoE.
For more information, please visit: wmdqs.org
23.06.2025 14:23 โ ๐ 2 ๐ 5 ๐ฌ 0 ๐ 0
AI Alliance @ IBM One Madison (UN Open Source Week 2025) ยท Luma
This yearโs UN Open Source Week 2025, June 16-20) will once again bring together a global โwho is whoโ of Open Source leaders.
As part of the officialโฆ
The Common Crawl Foundation, together with IBM, the AI Alliance, and BrightQuery will be hosting an "UN Conference" at IBM's new flagship NYC HQ at One Madison Avenue on Friday, June 20, from 12:30-5pm.
If you are in NYC, it would be great to see you there!
lu.ma/p0a1scde
10.06.2025 21:54 โ ๐ 2 ๐ 1 ๐ฌ 0 ๐ 0
1st Workshop on Multilingual Data Quality Signals
Call for papers!
We are organising the 1st Workshop on Multilingual Data Quality Signals with @mlcommons.org and @eleutherai.bsky.social, held in tandem with @colmweb.org. Submit your research on multilingual data quality!
Submission deadline is 23 June, more info: wmdqs.org
29.05.2025 17:18 โ ๐ 9 ๐ 8 ๐ฌ 0 ๐ 1
Iโll be running the Paris Marathon this Sunday for cancer research and treatment ๐๐ปโโ๏ธ
Please donate if you can! Every donation no matter how small, helps immensely.
marathon-paris.dossards-solidaires.org/fundraisers/...
11.04.2025 22:17 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
We would like to welcome all of our attending members to Oslo, with a special welcome to two of our newest members, the Publications Office of the European Union and @commoncrawl.bsky.social!
@nettarkivet.bsky.social | #iipcGA25 | #webarchiving
08.04.2025 08:58 โ ๐ 9 ๐ 4 ๐ฌ 0 ๐ 0
Same thing is true for coffee, prices havenโt increased much in the last 60 years, but the cost of living for the producers has skyrocketed in recent years ๐ข
22.02.2025 14:54 โ ๐ 1 ๐ 1 ๐ฌ 1 ๐ 0
Today is "I love Free Software Day".
Thank you to the @commoncrawl.bsky.social Foundation for all their hard work. Onwards! @pjox.bsky.social - So great to meet in person.
14.02.2025 13:08 โ ๐ 11 ๐ 4 ๐ฌ 0 ๐ 0
Iโll be today at the AI Action Summit in Paris, if youโre attending and want to discuss about @commoncrawl.bsky.social or about open data, please DM me!
10.02.2025 09:22 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
We're very happy to release cc-downloader, a new CLI tool to download Common Crawl data ๐๐๐งโ๐ป
โcc-downloader is still under active development, so if you find any issues or would like to submit a feature request, please visit its GitHub repository at github.com/commoncrawl/....
21.01.2025 23:57 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
If you care about open data or anything related to crawling, The Common Crawl Foundation @commoncrawl.bsky.social is now on Bluesky ๐๐๐๐ฅณ
19.11.2024 19:31 โ ๐ 20 ๐ 5 ๐ฌ 1 ๐ 0
๐ No worries, I do mostly Rust and Python these days ๐ฆ
15.11.2024 23:19 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Report of Pedro's Berlin Marathon splits, first half time was 2:06:57 and second half was 2:44:01. Final time was 4:50:58.
Photo of Pedro holding his medal in front of the Brandenburg Gate.
Photo of Pedro's Bib number (26445) and his medal.
Ran the Berlin marathon yesterday and while it was not my best marathon and I was recovering from injury, I had an amazing time. I really hope I can do better next year in Paris where I'll run for cancer research. If you can donate please do so: marathon-paris.dossards-solidaires.org/fundraisers/...
30.09.2024 17:44 โ ๐ 4 ๐ 0 ๐ฌ 1 ๐ 0
If you can and want to give a donation to the Gustave Roussy Institute, however small, I'd be extremely grateful. If you cannot donate, resharing/boosting is always appreciated! Thank you! โค๏ธ
30.08.2024 20:21 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Pedroโs medal and bib (number 77156) for the Paris Marathon.
Pedroโs splits for the Paris Marathon and final time of 4:47:33. Full readable results should be available in https://resultscui.active.com/participants/45607218
Ran the Paris Marathon yesterday. It was an amazing experience. Getting into running was probably the best decision Iโve made in recently. It has helped massively with both physical and mental health. I highly recommend any type of physical activity, especially for researchers ๐๐ปโโ๏ธ
08.04.2024 18:40 โ ๐ 5 ๐ 0 ๐ฌ 0 ๐ 0
Thank you! ๐
25.09.2023 07:13 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
Pedro at the Branderburg gate after finishing the Berlin Marathon.
Pedroโs medal for finishing the Berlin Marathon, time reads 5:03:04
I still donโt know how, but I finished my first marathon in 5:03:04 ๐ฅน
24.09.2023 18:00 โ ๐ 5 ๐ 0 ๐ฌ 2 ๐ 0
Very happy to announce this new release of @oscarproject.bsky.social ๐ฅณ. We're still working on documentation so please be patient, more details and features are coming soon! ๐
We're always open for feedback and collaboration, so please join our community: https://t.co/toLKAPje4E
10.08.2023 15:52 โ ๐ 1 ๐ 1 ๐ฌ 0 ๐ 0
๐ฅบ
26.07.2023 20:08 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
It should be criminal to have a public space this big without any place to hydrate, specially with the temperatures weโre experiencing these days ๐
16.07.2023 15:53 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
21.6 km run at 7:09 /km peace in Rosensteinpark.
Ran half a marathon today and had to stop to buy water as I was feeling dehydrated and had already finished all my water. Had to pay 5.50 โฌ for that bottle of water just because someone had the amazing idea of building a massive park without any water fountains ๐ ๐ฅต๐๐ปโโ๏ธ
16.07.2023 15:52 โ ๐ 10 ๐ 0 ๐ฌ 1 ๐ 0
A series of state-of-the-art, open source and transparent
foundation models for European languages
The first iteration of our workshop will be co-located with @colmweb.org 2025 in Montreal.
https://wmdqs.org/
Community architect/builder at IBM Research (@ossci.bsky.social and @aialliance.bsky.social). San Josรฉ, CA. Wannabe trail runner. Skeets my own, etc. #opensource #openscience #community #criminaljustice #learningspanish
Compose papers faster: Focus on your text and let Typst take care of layout and formatting.
Assoc. Prof. Computational Philology at รcole des chartes, Univ. PSL
ERC StG "The Lost Manuscripts of Medieval Europe: Modelling the Transmission of Texts (LostMA)" (2024-2029)
Cultural Transmission - Evolution of Texts - Computational Methods - Stylometry
PhD at ALMAnaCH/Inria Paris,
@aubmindlab Alumni
Interested in AI, NLP, Video Games
wissamantoun.com
European Research Council, set up by the EU, funds top researchers of any nationality, helping them pursue great ideas at the frontiers of knowledge. #HorizonEU
L'IR* Huma-Num est une infrastructure de recherche dรฉdiรฉe aux pratiques numรฉriques de la recherche en SHS. Elle fรฉdรจre des communautรฉs scientifiques nationales et internationales et dรฉveloppe avec elles des services et outils numรฉriques.
Doctorante au LARHRA & chargรฉe de cours ร l'รฉcole du Louvre
Histoire culturelle & urbaine. Estampe, XVIIIe siรจcle, Humanitรฉs numรฉriques.
HAL : https://cv.hal.science/johanna-daniel/
Carnet : https://ig.hypotheses.org/
Mixing up my personas here: open science, Volt, digital humanities, information extraction
CHR2025 will take place in Luxembourg, from 9-12 December 2025. Stay tuned!
https://2025.computational-humanities-research.org
CNRS researcher at ENS-PSL. Natural language processing, Computational humanities, AI and society.
๐ฅธ Docteure en Humanitรฉs Numรฉriques
๐ฉ๐ปโ๐ป Ingรฉnieure HN ร lโObTIC
Chair of Data Science at University of Pretoria, South Africa.
Co-Founder LelapaAI. #NLProc
PhDCS Rutgers, BScEE-MsEE Wits University.
Changing the World @DeepIndaba @MasakhaneNLP.
Made with โค๏ธ Tshwane
Lab Data Science for Social Impact
PhD student at INRIA Paris (ALMAnaCH project-team), working on bias and cultural awareness in language models.
Digital Historian, Web Archaeologist and Data Analyst.
Research Librarian at @nettarkivet.bsky.social
Latest research: โProviding Web Archive News Articles as Corpus Dataโ, Journal of Open Humanities Data, 11(1): https://doi.org/10.5334/johd.281