Tommaso Bonomo's Avatar

Tommaso Bonomo

@tommasobonomo.bsky.social

PhD student at Sapienza NLP, based in Rome ๐Ÿ‡ฎ๐Ÿ‡น

136 Followers  |  226 Following  |  7 Posts  |  Joined: 11.11.2024  |  1.9991

Latest posts by tommasobonomo.bsky.social on Bluesky

Preview
sapienzanlp/bookcoref ยท Datasets at Hugging Face Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

We hope our contribution can become a useful benchmark for the challenging setting of Coreference Resolution on books! We release our code and dataset:

GitHub: github.com/sapienzaNLP/bookcoref
HuggingFace: huggingface.co/datasets/sapienzanlp/bookcoref

Thank you for your attention! (6/6)

21.07.2025 14:24 โ€” ๐Ÿ‘ 0    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Longdoc is the best performing model in both off-the-shelf and fine-tuned settings. Although BOOKCOREF Silver enables fine-tuned models to achieve a better score, 67 CoNLL F1 points are still far from current SOTA scores on small- or medium-sized datasets. (5/6)

21.07.2025 14:24 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Our pipeline achieves 80.5 CoNLL F1 on our manual annotated split, BOOKCOREF Gold, warranting the use of BOOKCOREF Silver as a fine-tuning dataset.

We test current SOTA Coreference Resolution systems on our benchmark, both off-the-shelf and after fine-tuning. (4/6)

21.07.2025 14:24 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We manually annotate 3 books for our test set and develop a 3-step pipeline to auto-annotate long books from text and character lists. We initialize coreferential clusters via explicit character mentions, refine them with an LLM, and expand using a local coreference model. (3/6)

21.07.2025 14:24 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Coreference Resolution datasets currently focus on small- to medium-sized text, preventing the development of robust long-document Coreference Resolution systems. We introduce BOOKCOREF, a large-scale dataset obtained through our BOOKCOREF Pipeline to fill this gap. (2/6)

21.07.2025 14:24 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿšจ Our paper "BOOKCOREF: Coreference Resolution at Book Scale" was accepted at #ACL2025 main conference!

Kudos to all my co-authors: @giulianomartinelli.bsky.social, @perelluis13.bsky.social and Roberto Navigli, within the Sapienza NLP group.

Paper: arxiv.org/abs/2507.12075

๐ŸงตA brief thread: (1/6)

21.07.2025 14:24 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Hi, would also love to be added if there's still space! Thank you!

25.11.2024 07:47 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@tommasobonomo is following 19 prominent accounts