As far as I can tell, the main point of this new NIH policy is that there will no longer be an explicit payline in any of the institutes. It's hard to see how this new policy makes the decision-making process "clearer for applicants."
grants.nih.gov/news-events/...
Cascadia from @wnoble.bsky.social is a mass spec-based de novo sequencing model that uses a transformer architecture to handle data-independent acquisition data and achieves substantially improved performance across a range of instruments and experimental protocols. www.nature.com/articles/s41...
We’re excited to announce the publication of Cascadia, our new de novo sequencing model designed for DIA data. By extending the transformer architecture to fully capture the complexities of DIA data, we achieve state-of-the-art performance.
www.nature.com/articles/s41...
Excited to see this published! It is a good step in the process for people to assess their FDR control in proteomics experiments. Great work from @bo-wen.bsky.social and @urikeich.bsky.social in particular who drove this.
Error control in proteomics mass spectrometry analysis is hard. We came up with a way to evaluate error control. Upshot: for old-school DDA data, not so bad. For DIA data, no existing tool successfully controls the false discovery rate!
www.nature.com/articles/s41...
Interested in prediction tasks involving peptide mass spectra? Our foundation model uses pre-trained spectrum representations learned by a de novo sequencing model to solve many tasks better and with less data, from recognizing chimeras to separating N- and O-glycopeptides. arxiv.org/abs/2505.10848
Ledidi turns any genomics ML model into a controllable sequence designer by inverting the normal ML paradigm. Now, it is significantly faster, flexible, and more powerful than before.
Available on GitHub and installable with `pip install ledidi`
HiCFoundation is a Swiss army knife for Hi-C data. Any task that takes Hi-C as input will benefit from our pre-trained model. You can do resolution enhancement, reproducibility analysis, loop calling, prediction of epigenomic profiles, or single-cell Hi-C analysis. tinyurl.com/v3nmp6np
BLAST is a fantastic tool that has enabled sequence-driven discovery for over 30 years. But, alas, the E-value that it reports turns out to have some serious problems. Here we propose a fix. It's more computationally expensive, but computers are a bit faster than they were in 1990.
bit.ly/3ZDgYt8
Re-posting our new preprint on match between runs. This multi-lab effort (Keich, Noble, Payne & Smith) led by Alex Solivais should be of interest to anyone doing LFQ. We describe here how to control FDR in LFQ and provide the open source software to do it.
www.biorxiv.org/content/10.1...
Along the way, we show that existing methods --- IonQuant, and MaxQuant, and the old version of FlashLFQ --- fail to control the FDR.
The PIP-ECHO method is implemented in the new version of FlashLFQ:
github.com/smith-chem-w...
We make two contributions in this paper: first, a method for ascertaining whether a given technique for peptide identity propagation successfully controls the FDR, and second, a new method, PIP-ECHO, that successfully does this.
When you analyze multiple protein samples in a single MS/MS experiment, it's common to do peptide identity propagation, rescuing peptides that fail to be identified in one run by mapping their coordinates (in time and m/z) from a different run in which those peptides were successfully identified.
How can you transfer peptide IDs between runs and still control your false discovery rate? Till now, the short answer is, you couldn't. Now you can, with PIP-ECHO.
www.biorxiv.org/content/10.1...
Here is the back story behind our recent de novo sequencing benchmark. Science involves a lot of trial and error!
communities.springernature.com/posts/wrangl...
Lots of people use machine learning to post process mass spectrometry database search results. But why not just use ML as the score function in database search? Turns out it works great! www.biorxiv.org/content/10.1...
The best place to do computational biology.
jobs.chronicle.com/job/37553144...
Er, let me get back to you on that. 😉
People don’t spend enough time looking at the trans contacts in their Hi-C data. There’s gold in them thar hills! www.biorxiv.org/content/10.1...