13/12: Co-author update! @nanonancy.bsky.social was also instrumental in helping make sure the summaries were up to scratch! Thanks Nancy!
07.02.2025 14:59 — 👍 2 🔁 0 💬 0 📌 0@rnacentral.bsky.social
The non-coding RNA sequence database providing unified access to a comprehensive and up-to-date set of non-coding RNAs
13/12: Co-author update! @nanonancy.bsky.social was also instrumental in helping make sure the summaries were up to scratch! Thanks Nancy!
07.02.2025 14:59 — 👍 2 🔁 0 💬 0 📌 012/12 Big thanks to our co-authors @afg781.bsky.social, @antonipetrov.bsky.social, @alexbateman1.bsky.social and others! Read the full paper here: doi.org/10.1093/data... #bioinformatics #LLM #AI
07.02.2025 14:38 — 👍 2 🔁 0 💬 1 📌 011/12 But overall, this shows that with careful prompting and checking, LLMs can help address the curation bottleneck in bioinformatics! 🎯
07.02.2025 14:38 — 👍 1 🔁 0 💬 1 📌 010/12 Some limitations: We can only use open-access papers (highlighting the importance of #OpenAccess!), and LLMs sometimes struggle with complex information synthesis.
07.02.2025 14:38 — 👍 0 🔁 0 💬 1 📌 09/12 We've also made our entire dataset of contexts and summaries available:
huggingface.co/datasets/RNA...
8/12 Want to try it yourself? Search for RNAs with summaries at:
rnacentral.org/search?q=has...
7/12 All summaries are now available through @rnacentral.bsky.social - making it easier than ever to quickly understand what we know about specific RNAs
07.02.2025 14:38 — 👍 0 🔁 0 💬 1 📌 06/12 The results? We generated >4,600 summaries covering ~28,700 RNA transcripts! Expert evaluation showed 94% were rated good or excellent quality. 📈
07.02.2025 14:38 — 👍 0 🔁 0 💬 1 📌 05/12 The key innovation is our multi-stage checking system:
Reference validation
Automated fact-checking
Self-consistency verification
This helps ensure accuracy and proper attribution.
4/12 Our solution: Use GPT-4 with carefully designed prompts to read scientific papers and generate accurate summaries, complete with proper citations! 🤖
07.02.2025 14:38 — 👍 0 🔁 0 💬 1 📌 03/12 We focused on non-coding RNAs, where the curation gap is particularly acute. Most databases lack good summaries of what each RNA does, making it harder for researchers to quickly understand their function.
07.02.2025 14:38 — 👍 0 🔁 0 💬 1 📌 02/12 Why did we build this? Curation of scientific literature is becoming increasingly challenging. There's a growing gap between publication rates and the number of available curators.
07.02.2025 14:38 — 👍 0 🔁 0 💬 1 📌 01/12 Excited to share our new paper in DATABASE on LitSumm - our system that uses large language models to automatically generate high-quality literature summaries for non-coding RNAs! 🧬📚
07.02.2025 14:38 — 👍 6 🔁 3 💬 1 📌 2