Luca's Avatar

Luca

@sciencialab.com.bsky.social

29 Followers  |  119 Following  |  21 Posts  |  Joined: 17.10.2024  |  2.8979

Latest posts by sciencialab.com on Bluesky

The Safari browser is like a car with one gear that claim it does not pollute...

24.08.2025 14:11 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Exactly! There is a common misconception that by throwing any kind of crap into a vector it will magically work. Still at the age of AI, metadata information cannot still be ignored.

28.07.2025 07:25 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Yes. The time is now. Vaccines to treat and prevent cancer.
www.jci.org/articles/vie...

01.07.2025 17:57 β€” πŸ‘ 328    πŸ” 81    πŸ’¬ 4    πŸ“Œ 2

Your feedback will help us improve Grobid! 🌟 Feel free to share your thoughts, star us on GitHub, and let’s keep building! πŸ’¬πŸš€

18.05.2025 08:27 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Next up, we're focusing on supporting more platforms (Linux ARM), improving figures and tables extraction, enhancing CJK language support, and providing better handling for more document types like theses, reports, and more.
πŸ”½

18.05.2025 08:27 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Release 0.8.2 Β· kermitt2/grobid What's Changed Added New model specialization/variants (flavors) mechanism #1151 Specialization/variant process for a lightweight processing that covers other types of scientific articles that...

- πŸ”€ Improved recognition of non-standard fonts
- πŸ› οΈ Various bug fixes and security vulnerabilities addressed

github.com/kermitt2/...
πŸ”½

18.05.2025 08:27 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Grobid 0.8.2 is out! πŸš€
- 🧠 New processing "flavors" for different doc types (e.g. SDO, corrections, editorials)
- πŸ”— Improved URL extraction
- βœ… Better text extraction for paragraphs around figures and tables
πŸ§΅πŸ”½

18.05.2025 08:27 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I estimate that a few examples for each model would quickly improve the results to an acceptable level.
Feel free to reach out if you are interested, and we can work out a collaboration around it.

18.05.2025 04:58 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I'm not sure Grobid is used in any project targetting any of the CJK languages, as other details might need to be addressed.
We started a branch at low-priority (github.com/kermitt2/gro...) to improve CJK languages at once, but other more urgent issues were prioritized at the time.
πŸ‘‡

18.05.2025 04:58 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

It's interesting to see this analysis, however, to be fair, Grobid does not have any training data for Japanese. This is valid also for Chinese, Korean, etc.
πŸ‘‡

18.05.2025 04:58 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Dear @github, I wonder whether it would be possible to have a way to save certain "search parameters" inside the issues/pulls so that our work may be framed to important tasks. E.g. working on a specific milestone and wanting to know everything that is not yet done:

11.05.2025 06:49 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

new demo: kermitt2-grobid.hf.space

07.05.2025 19:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
GitHub - kermitt2/grobid: A machine learning software for... A machine learning software for extracting information fr...

GROBID by Patrice Lopez turns messy PDFs into well-structured text in TEI format including references- super useful! https://github.com/kermitt2/grobid

18.12.2013 11:53 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

To what extent do researchers funded by Dutch Research Council NWO and ZonMw share the research data and code underlying their publications?

Today we published an analysis based on 10.000+ papers using the open source tool Grobid: www.nwo.nl/en/news/shar...

All underlying data openly available!

10.02.2025 21:10 β€” πŸ‘ 32    πŸ” 12    πŸ’¬ 0    πŸ“Œ 1
Post image

Grobid popularity is still growing, despite LM, LLM, LLLM....

06.05.2025 14:57 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Hi, I'm happy to sell my Twitter handler and close up my twitter account, as soon as it's legally allowed πŸ™‚

21.01.2025 08:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Hallucinating AI? 🫣

09.12.2024 17:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
uBlock Origin - Free, open-source ad content blocker. uBlock Origin is not just an β€œad blockerβ€œ, it's a wide-spectrum content blocker with CPU and memory efficiency as a primary feature. Developed by Raymond Hill.

install Ublock (ublockorigin.com/) and Ghostery (www.ghostery.com/), or both.. they will increase your security and privacy overall.
2/2

22.11.2024 08:35 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Suggestion not asked. If you don't want advertisements anymore on Twitter, you can pay 200 EUR per year (100 EUR only reduced them by half, lol), or you can pay 0 EUR and
1/2

22.11.2024 08:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Ci sono strumenti che per postare in entrambi i social per esempio con fedica.com (uno a caso che sembra fatto bene) πŸ™ƒ Se i contenuti, senza necessariamente seguire tutte le risposte, si espandono di qui Γ© piΓΊ facile spingerne l'espansione

21.11.2024 18:48 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Superconductivity researcher who committed misconduct exits university Nature - The University of Rochester has confirmed that it no longer employs Ranga Dias, who was found by investigators to have committed data fabrication.

Alleluja!! www.nature.com/artic...

21.11.2024 15:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

πŸ“Œ

21.11.2024 08:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - JefTek/BlueskyGuide: Collection of Tips & Tricks for collaborating on Bluesky Collection of Tips & Tricks for collaborating on Bluesky - JefTek/BlueskyGuide

Here a few tips for using Bluesky github.com/JefTek/Blues...

21.11.2024 08:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Vedendo come sta andando la sua ricerca, io non mi preoccuperei troppo ;-)

20.11.2024 14:24 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@sciencialab.com is following 20 prominent accounts