Pedro Ortiz Suarez's Avatar

Pedro Ortiz Suarez

@pjox.bsky.social

Senior Research Scientist at the Common Crawl Foundation. Weird coffee person โ˜•๏ธ, runner ๐Ÿƒ๐Ÿปโ€โ™‚๏ธ. (he/him) ๐Ÿ‡ซ๐Ÿ‡ท๐Ÿ‡ช๐Ÿ‡บ๐Ÿ‡จ๐Ÿ‡ด

303 Followers  |  441 Following  |  18 Posts  |  Joined: 08.06.2023  |  1.7373

Latest posts by pjox.bsky.social on Bluesky

Post image Post image

We introduce the TableEval benchmark and investigate the effectiveness and robustness of text-based and multimodal LLMs on table understanding through a cross-domain & cross-modality evaluation.

Joint work by DFKI SLT incl. Fabio Barth, Raia Abu Ahmad, @malteos.bsky.social @pjox.bsky.social

26.07.2025 09:35 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

If you want to help us improve language and cultural coverage, and build an open source LangID system, please register to our shared task on Language Identification! ๐Ÿ’ฌ

Registering is easy! All the details are on the shared task webpage: wmdqs.org/shared-task/

Deadline: July 23, 2025 (AoE) โฐ

21.07.2025 22:40 โ€” ๐Ÿ‘ 2    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Just a few days left to contribute annotations before the first release of training data. We have over 17,000 document annotations so far!

09.07.2025 14:21 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Common Crawl - Blog - The First WMDQS-Masakhane LangID Hackathon In June 2025 the Common Crawl Foundation, MLCommons, and EleutherAI had the pleasure of hosting a virtual hackathon in partnership with Masakhane in order to collect language identification annotation...

In June 2025 the Common Crawl Foundation, MLCommons, and EleutherAI had the pleasure of hosting a virtual hackathon in partnership with Masakhane in order to collect language identification annotations for African languages.

commoncrawl.org/blog/the-fir...

08.07.2025 16:21 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Common Crawl - Blog - Common Crawl at the United Nations Open Source Week, June 2025 The Common Crawl Foundation team took part in the United Nations Open Source Week in New York City this June, meeting with global developers, researchers, and policymakers to discuss all things open s...

The Common Crawl Foundation team took part in the United Nations Open Source Week in New York City this June, meeting with global developers, researchers, and policymakers to discuss all things open source and AI.

commoncrawl.org/blog/common-...

01.07.2025 00:12 โ€” ๐Ÿ‘ 3    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

The deadline for paper submissions has been extended!

The new deadline is July 3, 2025. AoE.

For more information, please visit: wmdqs.org

23.06.2025 14:23 โ€” ๐Ÿ‘ 2    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
AI Alliance @ IBM One Madison (UN Open Source Week 2025) ยท Luma This yearโ€™s UN Open Source Week 2025, June 16-20) will once again bring together a global โ€œwho is whoโ€ of Open Source leaders. As part of the officialโ€ฆ

The Common Crawl Foundation, together with IBM, the AI Alliance, and BrightQuery will be hosting an "UN Conference" at IBM's new flagship NYC HQ at One Madison Avenue on Friday, June 20, from 12:30-5pm.

If you are in NYC, it would be great to see you there!

lu.ma/p0a1scde

10.06.2025 21:54 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
1st Workshop on Multilingual Data Quality Signals

Call for papers!
We are organising the 1st Workshop on Multilingual Data Quality Signals with @mlcommons.org and @eleutherai.bsky.social, held in tandem with @colmweb.org. Submit your research on multilingual data quality!

Submission deadline is 23 June, more info: wmdqs.org

29.05.2025 17:18 โ€” ๐Ÿ‘ 9    ๐Ÿ” 8    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

Iโ€™ll be running the Paris Marathon this Sunday for cancer research and treatment ๐Ÿƒ๐Ÿปโ€โ™‚๏ธ

Please donate if you can! Every donation no matter how small, helps immensely.

marathon-paris.dossards-solidaires.org/fundraisers/...

11.04.2025 22:17 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

We would like to welcome all of our attending members to Oslo, with a special welcome to two of our newest members, the Publications Office of the European Union and @commoncrawl.bsky.social!

@nettarkivet.bsky.social | #iipcGA25 | #webarchiving

08.04.2025 08:58 โ€” ๐Ÿ‘ 9    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
CHERICOยฎ, la chicorรฉe comme vous ne l'avez jamais bue ! CHERICO, la chicorรฉe nouvelle gรฉnรฉration : saine, gourmande, durable ! En 2023, on se lance dans la folle aventure de devenir micro torrรฉfacteur de Chicorรฉe convaincues que cette boisson ร  touts les a...

I donโ€™t know if it is good (and I havenโ€™t tried it), but I found this the other day: www.cherico.fr

10.03.2025 22:44 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Same thing is true for coffee, prices havenโ€™t increased much in the last 60 years, but the cost of living for the producers has skyrocketed in recent years ๐Ÿ˜ข

22.02.2025 14:54 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Today is "I love Free Software Day".

Thank you to the @commoncrawl.bsky.social Foundation for all their hard work. Onwards! @pjox.bsky.social - So great to meet in person.

14.02.2025 13:08 โ€” ๐Ÿ‘ 11    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Iโ€™ll be today at the AI Action Summit in Paris, if youโ€™re attending and want to discuss about @commoncrawl.bsky.social or about open data, please DM me!

10.02.2025 09:22 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

We're very happy to release cc-downloader, a new CLI tool to download Common Crawl data ๐Ÿ“š๐Ÿš€๐Ÿง‘โ€๐Ÿ’ป

โ€cc-downloader is still under active development, so if you find any issues or would like to submit a feature request, please visit its GitHub repository at github.com/commoncrawl/....

21.01.2025 23:57 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Common Crawl - Blog - Expanding the Language and Cultural Coverage of Common Crawl We aim to enhance linguistic diversity in our dataset by inviting community contributions of non-English URLs and collaborating with MLCommons on a Language Identification campaign.
11.12.2024 15:16 โ€” ๐Ÿ‘ 6    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

If you care about open data or anything related to crawling, The Common Crawl Foundation @commoncrawl.bsky.social is now on Bluesky ๐Ÿ“Š๐Ÿ“ˆ๐Ÿ“š๐Ÿฅณ

19.11.2024 19:31 โ€” ๐Ÿ‘ 20    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿ˜‚ No worries, I do mostly Rust and Python these days ๐Ÿฆ€

15.11.2024 23:19 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Report of Pedro's Berlin Marathon splits, first half time was 2:06:57 and second half was 2:44:01. Final time was 4:50:58.

Report of Pedro's Berlin Marathon splits, first half time was 2:06:57 and second half was 2:44:01. Final time was 4:50:58.

Photo of Pedro holding his medal in front of the Brandenburg Gate.

Photo of Pedro holding his medal in front of the Brandenburg Gate.

Photo of Pedro's Bib number (26445) and his medal.

Photo of Pedro's Bib number (26445) and his medal.

Ran the Berlin marathon yesterday and while it was not my best marathon and I was recovering from injury, I had an amazing time. I really hope I can do better next year in Paris where I'll run for cancer research. If you can donate please do so: marathon-paris.dossards-solidaires.org/fundraisers/...

30.09.2024 17:44 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

If you can and want to give a donation to the Gustave Roussy Institute, however small, I'd be extremely grateful. If you cannot donate, resharing/boosting is always appreciated! Thank you! โค๏ธ

30.08.2024 20:21 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Marathon for Gustave Roussy Institute Chers amis et famille, J'ai dรฉcidรฉ de courir le Marathon de Paris 2025, mais cette fois-ci j'ai choisi de courir pour l'Institut Gustave Roussy, premier centre de lutte contre le cancer en Europe tant...

I decided to run the 2025 Paris Marathon for the Gustave Roussy Institute, the Leading Cancer Centre in Europe. This is a cause close to my heart, as cancer has touched my family, my friends and colleagues:

marathon-paris.dossards-solidaires.org/fundraisers/...

30.08.2024 20:20 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Pedroโ€™s medal and bib (number 77156) for the Paris Marathon.

Pedroโ€™s medal and bib (number 77156) for the Paris Marathon.

Pedroโ€™s splits for the Paris Marathon and final time of 4:47:33. Full readable results should be available in https://resultscui.active.com/participants/45607218

Pedroโ€™s splits for the Paris Marathon and final time of 4:47:33. Full readable results should be available in https://resultscui.active.com/participants/45607218

Ran the Paris Marathon yesterday. It was an amazing experience. Getting into running was probably the best decision Iโ€™ve made in recently. It has helped massively with both physical and mental health. I highly recommend any type of physical activity, especially for researchers ๐Ÿƒ๐Ÿปโ€โ™‚๏ธ

08.04.2024 18:40 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Thank you! ๐Ÿ˜„

25.09.2023 07:13 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Pedro at the Branderburg gate after finishing the Berlin Marathon.

Pedro at the Branderburg gate after finishing the Berlin Marathon.

Pedroโ€™s medal for finishing the Berlin Marathon, time reads 5:03:04

Pedroโ€™s medal for finishing the Berlin Marathon, time reads 5:03:04

I still donโ€™t know how, but I finished my first marathon in 5:03:04 ๐Ÿฅน

24.09.2023 18:00 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Very happy to announce this new release of @oscarproject.bsky.social ๐Ÿฅณ. We're still working on documentation so please be patient, more details and features are coming soon! ๐Ÿ‘€

We're always open for feedback and collaboration, so please join our community: https://t.co/toLKAPje4E

10.08.2023 15:52 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

๐Ÿฅบ

26.07.2023 20:08 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

It should be criminal to have a public space this big without any place to hydrate, specially with the temperatures weโ€™re experiencing these days ๐Ÿ˜’

16.07.2023 15:53 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
21.6 km run at 7:09 /km peace in Rosensteinpark.

21.6 km run at 7:09 /km peace in Rosensteinpark.

Ran half a marathon today and had to stop to buy water as I was feeling dehydrated and had already finished all my water. Had to pay 5.50 โ‚ฌ for that bottle of water just because someone had the amazing idea of building a massive park without any water fountains ๐Ÿ˜ ๐Ÿฅต๐Ÿƒ๐Ÿปโ€โ™‚๏ธ

16.07.2023 15:52 โ€” ๐Ÿ‘ 10    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@pjox is following 20 prominent accounts