Menu
Log in


Log in

News

B/D-HPP: FragPipe Enables One-Stop Data Analysis for Bottom-Up Proteomics

29 Mar 2023 9:58 AM | Anonymous

Written by: Fengchao Yu and Daniel A. Polasky, University of Michigan, USA

Mass spectrometry-based proteomics is a widely used technique to study peptides and proteins quantitatively. This approach has several advantages over other techniques, such as high throughput and sensitivity. However, the data analysis process is challenging, due to the complexity of tandem MS data and wide variety of experiments and workflows under the umbrella of proteomics. To overcome these challenges, various tools have been developed to process mass spectrometry data. Unfortunately, these tools typically focus on specific aspects of data analysis, such as peptide identification, protein identification, or quantification, requiring researchers to learn and “link” multiple tools to complete the analysis. Such a process can be time-consuming and often requires specialized knowledge, limiting the growth of proteomics in the broader research community. Complete, user-friendly software suites for proteomics have been developed commercially, but are expensive and may have limited capability to be adapted to new methods and applications. Aiming to strike a balance in this space, FragPipe is an open-source and freely available graphical user interface (GUI)-based software suite that provides a one-stop solution to streamline the processing of proteomics data from raw data to result tables.

FragPipe combines state-of-the-art tools in many areas of proteome informatics to process mass spectrometry data from identification to quantification. It leverages MSFragger1-3, a fast database search engine, to perform database searching. It also couples MSBooster and Percolator4 to re-score peptide-spectrum matches (PSMs) using deep-learning predicted features, and Philosopher5 for false discovery rate (FDR) estimation. In addition, FragPipe contains several results processing tools, such as PTM-Shepherd6, which is used to discover post-translational modifications and characterize their fragmentation. IonQuant7 is used for label-free and isotopic-labeling quantifications, while TMT-Integrator is used for isobaric-labeling quantification. Finally, recent additions of a spectrum viewer, FP-PDV8, for summarizing results and viewing annotated spectra, and FragPipe-Analyst for downstream results processing and comparison of different experimental conditions have expanded FragPipe into a complete pipeline for proteomics data.  

Most of the individual tools within FragPipe are being continuously developed and improved, offering cutting-edge capabilities with the convenience of a stable and user-friendly pipeline. For example, FragPipe hosts a set of advanced tools for glycoproteomics data analysis, including MSFragger Glyco search9, glycan composition assignment and FDR in PTM-Shepherd10, and O-Pair for O-glycan localization11. Analyzing glycoproteomics data is a challenging process, requiring characterization of both peptide and glycan components from glycopeptide mass spectra. MSFragger Glyco search excels at rapidly identifying glycopeptides, building on the open search methods developed in MSFragger. FragPipe allows this capability to be connected to the advanced methods for FDR control and quantitation available from other tools in FragPipe, enabling a complete platform for glycoproteomics. It also provides an easy way to integrate additional tools, such as the recently added O-Pair localization method that was originally implemented in MetaMorpheus.

Unlike other tools that are typically designed for either data-dependent acquisition (DDA) or data-independent acquisition (DIA) data, FragPipe can analyze both DDA and DIA data. To handle the complexity of DIA data, specific modules have been developed, including DIA-Umpire12, which demultiplexes spectra to pseudo-DDA spectra, and MSFragger-DIA, which directly searches multiplexed spectra. The search results are carefully curated to construct a spectral library, employing deep learning-based scoring and false discovery rate (FDR) filtering. This spectral library can be used to extract quantitative information from the DIA spectra. One of FragPipe’s key advantages is its ability to analyze DDA and DIA data together, which allows it to build a hybrid spectral library. Such a hybrid spectral library fully utilizes the information in both data types and contains more peptides, which results in more quantified peptides and proteins when used in library-based quantification.

Last but not least, FragPipe also has a command line interface that can be run on Linux servers, clusters, high-performance computers, etc. The GUI and the command line interface use the same codebase, which makes the result identical. The workflow files used to run the command line interface, and for saving settings in the GUI mode, are saved automatically whenever FragPipe is run and can be loaded to exactly reproduce a previous analysis. They can also be shared among users to provide easy access to developed methods and deposited with data uploaded to public repositories for reproducibility.

In summary, FragPipe is an all-in-one software tool that streamlines the entire process of mass spectrometry-based proteomics data analysis, from identification to quantification. Its ability to handle both DDA and DIA data types, support for glycopeptide identification and a wide variety of proteomics-associated workflows, and its ability to perform MS1-based and MS2-based quantification make it a versatile tool for researchers. FragPipe combines the user-friendliness of a commercial software program with the cutting-edge methods of research software to help improve data processing across bottom-up proteomics.


Figure 1. Graphical summarization of the modules and functionalities in FragPipe, which contains identification, quantification, PTM analysis, visualization, and downstream analysis.



Bios:

Fengchao Yu:

Fengchao is a research investigator from the Alexey Nesvizhskii Lab at the University of Michigan. His research interests include peptide identification, PTM discovery, label-free quantification, isotopic-labeling quantification, isobaric-labeling quantification, and DIA data analysis. Currently, Fengchao is the leading developer of FragPipe, MSFragger, and IonQuant. These tools have been used by research laboratories and companies in the United States and worldwide. He has also published papers in journals such as Nature Methods, Nature Biotechnology, Nature Communications, Molecular & Cellular Proteomics, and Journal of Proteome Research.



Daniel Polasky:

Daniel A. Polasky is a research investigator in the lab of Prof. Alexey Nesvizhskii in the department of Pathology at the University of Michigan. His research focuses on developing computational tools and methods for proteomics, with a particular focus on glycosylation and other post-translational modifications. He is a member of the MSFragger and FragPipe development teams, including leading work on MSFragger Glyco and associated tools for glycoproteomics data analysis. Before moving into computational proteomics, his PhD work in the lab of Prof. Brandon Ruotolo focused on developing mass spectrometry and ion mobility-mass spectrometry methods for analysis of intact proteins and protein complexes.



References:

  1. Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat Methods 14, 513-520, doi:10.1038/nmeth.4256 (2017).
  2. Yu, F. et al. Identification of modified peptides using localization-aware open search. Nat Commun 11, 4065 (2020).
  3. Yu, F. et al. Fast Quantitative Analysis of timsTOF PASEF Data with MSFragger and IonQuant. Mol Cell Proteomics 19, 1575-1585 (2020).
  4. Käll, L., Canterbury, J. D., Weston, J., Noble, W. S. & MacCoss, M. J. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nature Methods 4, 923-925, doi:10.1038/nmeth1113 (2007).
  5. da Veiga Leprevost, F. et al. Philosopher: a versatile toolkit for shotgun proteomics data analysis. Nature Methods 17, 869-870, doi:10.1038/s41592-020-0912-y (2020).
  6. Geiszler, D. J. et al. PTM-shepherd: Analysis and summarization of post-translational and chemical modifications from open search results. Molecular and Cellular Proteomics 20, 100018-100018, doi:10.1074/MCP.TIR120.002216 (2021).
  7. Yu, F., Haynes, S. E. & Nesvizhskii, A. I. IonQuant enables accurate and sensitive label-free quantification with FDR-controlled match-between-runs. Molecular and Cellular Proteomics 20, 100077-100077, doi:10.1016/J.MCPRO.2021.100077 (2021).
  8. Li, K., Vaudel, M., Zhang, B., Ren, Y. & Wen, B. PDV: An integrative proteomics data viewer. Bioinformatics 35, 1249-1251, doi:10.1093/bioinformatics/bty770 (2019).
  9. Polasky, D. A., Yu, F., Teo, G. C. & Nesvizhskii, A. I. Fast and comprehensive N- and O-glycoproteomics analysis with MSFragger-Glyco. Nat Methods 17, 1125-1132, doi:10.1038/s41592-020-0967-9 (2020).
  10. Polasky, D. A., Geiszler, D. J., Yu, F. & Nesvizhskii, A. I. Multi-attribute Glycan Identification and FDR Control for Glycoproteomics. Molecular & Cellular Proteomics, doi:10.1016/j.mcpro.2022.100205 (2022).
  11. Lu, L., Riley, N. M., Shortreed, M. R., Bertozzi, C. R. & Smith, L. M. O-Pair Search with MetaMorpheus for O-glycopeptide characterization. Nat Methods 17, 1133-1138, doi:10.1038/s41592-020-00985-5 (2020).
  12. Tsou, C. C. et al. DIA-Umpire: Comprehensive computational framework for data-independent acquisition proteomics. Nature Methods 12, 258-264, doi:10.1038/nmeth.3255 (2015).



The Human Proteome Organization is a 501(c)(3) tax exempt non-profit organization registered in the state of New Mexico.  |  © 2001-2022 HUPO. All rights reserved. 

Powered by Wild Apricot Membership Software