Alex Solivais's Avatar

Alex Solivais

@alexandersol.bsky.social

PhD Candidate at UW-Madison. Interested in proteomics, software development, and disc golf

83 Followers  |  202 Following  |  12 Posts  |  Joined: 04.12.2024
Posts Following

Posts by Alex Solivais (@alexandersol.bsky.social)

Preview
GitHub - smith-chem-wisc/MetaMorpheus: Proteomics search software with integrated calibration, PTM discovery, bottom-up, top-down and LFQ capabilities Proteomics search software with integrated calibration, PTM discovery, bottom-up, top-down and LFQ capabilities - smith-chem-wisc/MetaMorpheus

Free copies of MetaMorpheus available for an unlimited time. Get yours now before they are all gone!

github.com/smith-chem-w...

20.03.2025 19:45 β€” πŸ‘ 25    πŸ” 9    πŸ’¬ 2    πŸ“Œ 0

Did you every consider what effect in-source oxidation might have on the mass shifted decoys? For z = 2, an 8*1.0005 Th mass shift is about the same as an oxidation (+- 10-20ppm)

06.12.2024 20:15 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Very cool figure! I understand that N*1.0005 mass shifts are used due to the distribution of possible peptide masses.

06.12.2024 20:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We only need to assume that an incorrect transfer for a given donor peak is equally likely to involve the predicted RT as it is to involve the random RT.

06.12.2024 20:02 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

That's a good point! There's a chance (I would argue a small chance) that you could randomly choose an RT and end up with an accurate match.

However, we don't rely on the assumption that every single random-RT peptide is an incorrect match.

06.12.2024 20:01 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

The and Kall use a 5*1.0005 Th shift because they expect there to be peptides at the shifted m/z.

"The idea behind this offset is that the density, w.r.t. precursor m/z and retention time, of MS1 features is approximately the same in this offset region."

06.12.2024 19:23 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

A match is false if the two peaks were generated by different analytes. We can't really "guarantee" that a match is false. We only assume that an incorrect match for a given donor peak is equally likely to involve the predicted RT as it is to involve the random RT

06.12.2024 19:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I don't think those assumptions are necessary (or true). When we investigated potential features, we found that XIC shape doesn't have a lot of predictive power. Most peptides have approximately Gaussian peaks. Likewise. the isotopic distribution for peptides with similar masses don't really differ

06.12.2024 19:12 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

"In this case 11 Da was chosen as a randomly selected integer value which differs from any known common post-translational modification. Indeed the number of matches does not vary significantly as long as the mass shift value stays an integer" - Petyuk et al., 2007

06.12.2024 19:04 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

In the original work from PNNL, an 11 Da shift was used as this mass difference doesn't correspond to any common PTMS or tags. However, if you look at the human proteome, there are a lot of peptides that are 11 Da away from one another.

06.12.2024 19:03 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 3    πŸ“Œ 0

We would love to look at more single-cell datasets, but our method for FDP estimation only works with specialized "two-proteome" experiments. It would be great if our method catches on and more of these datasets are generated!

04.12.2024 23:17 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The randomized RT we use when trying to locate a decoy peak has to be different from the RT of the target donor peptide. This ensures we don't select the target peak twice. Then, at the end of the peak-matching procedure, we go through all peaks and make sure none were assigned to multiple peptides

04.12.2024 23:15 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Improved detection of differentially abundant proteins through FDR-control of peptide-identity-propagation Quantitative analysis of proteomics data frequently employs peptide-identity-propagation (PIP) β€” also known as match-between-runs (MBR) β€” to increase the number of peptides quantified in a given LC-MS/MS experiment. PIP can routinely account for up to 40% of all quantitative results, with that proportion rising as high as 75% in single-cell proteomics. Therefore, a significant concern for any PIP method is the possibility of false discoveries: errors that result in peptides being quantified incorrectly. Although several tools for label-free quantification (LFQ) claim to control the false discovery rate (FDR) of PIP, these claims cannot be validated as there is currently no accepted method to assess the accuracy of the stated FDR. We present a method for FDR control of PIP, called β€œPIP-ECHO” (PIP Error Control via Hybrid cOmpetition) and devise a rigorous protocol for evaluating FDR control of any PIP method. Using three different datasets, we evaluate PIP-ECHO alongside the PIP procedures implemented by FlashLFQ, IonQuant, and MaxQuant. These analyses show that PIP-ECHO can accurately control the FDR of PIP at 1% across multiple datasets. Only PIP-ECHO was able to control the FDR in data with injected sample size equivalent to a single-cell dataset. The three other methods fail to control the FDR at 1%, yielding false discovery proportions ranging from 2–6%. We demonstrate the practical implications of this work by performing differential expression analyses on spike-in datasets, where different known amounts of yeast or E. coli peptides are added to a constant background of HeLa cell lysate peptides. In this setting, PIP-ECHO increases both the accuracy and sensitivity of differential expression analysis: our implementation of PIP-ECHO within FlashLFQ enables the detection of 53% more differentially abundant proteins than MaxQuant and 146% more than IonQuant in the spike-in dataset. ### Competing Interest Statement The authors have declared no competing interest.

Re-posting our new preprint on match between runs. This multi-lab effort (Keich, Noble, Payne & Smith) led by Alex Solivais should be of interest to anyone doing LFQ. We describe here how to control FDR in LFQ and provide the open source software to do it.
www.biorxiv.org/content/10.1...

02.12.2024 17:05 β€” πŸ‘ 31    πŸ” 15    πŸ’¬ 2    πŸ“Œ 2

It's easier to find decoys, because RT is a more continuous variable. We consider one continuous RT range, instead of looking at discrete points in m/z space and iterating until we find something.

04.12.2024 22:07 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0