Nicolle Packer, Macquarie University, Australia
Performance of current software for automated intact glycopeptide identification and MS/MS spectral annotation in glycoproteomics
Aim of study: To assess the performance of current glycoproteomics software for automated intact N- and O-glycopeptide identification from high resolution MS/MS spectral data across laboratories.
Note - We would need to get your feedback by June 30, 2018 so that we can present the preliminary outcomes of this study at HUPO 2018 in Orlando. You will of course be a co-author on the manuscript that will be a compilation of the results of the study.
Please let us know the following:
1. Whether you will be a participant
2. Whether you are a developer and/or user
Introduction: Glycoproteomics, the study of intact glycopeptides in complex biological systems, is a growing discipline . Analytical advances have now facilitated LC-MS/MS-based glycoproteomics studies reporting hundreds and even thousands of unique intact glycopeptides from a single experiment. However, significant bottlenecks clearly exist in the accurate annotation of the large volumes of resulting MS/MS spectral data and in the confident identification and reporting of the corresponding intact glycopeptides. The identification of intact glycopeptides has previously relied heavily on manual interpretation and expert curation support. However, as the field of glycoproteomics transitions to rely increasingly on the use of large data sets in our experimental designs, the development of efficient software for accurate automated glycopeptide identification becomes absolutely essential.
Over the past 5-10 years the field of glycoproteomics has seen the development of multiple exciting tools that show promise for automated or semi-automated annotation and identification of glycopeptides from MS/MS spectral evidence .
This first study of the newly established HUPO Glycoproteomics Initiative (HGI) sets out to study the performance of the current informatics capabilities in this specialised field. Documenting the current status of glycoproteomics is vital to drive further technological developments and promote applications of system-wide glycopeptide analysis.
Overview of study: This informatics-focused glycoproteomics study consists of two parts:
Part A) Comparative study in which developers of glycoproteomics software (of academic and industrial origin) identify and report intact glycopeptides from provided LC-MS/MS glycopeptide datasets using exclusively their own developed software. The developers may improve their existing software in this process but need to provide an exact description of how their software was utilised to obtain the reported glycopeptides and provide software access and experimental conditions to the study committee to allow them to reproduce and interrogate the reported findings.
Part B) Comparative study in which expert users (research teams) in Glycoproteomics (from academic and industrial origin) identify intact glycopeptides from the same (as in A) provided LC-MS/MS glycopeptide datasets using one or more tools, which they routinely use for glycopeptide analysis including manual interpretation as support. Users must report how they obtained their findings.
Most participants will fit in either Part A or B; however, a research group can contribute to both parts, but must adhere to the provided guidelines and must provide separate reports from these two efforts. Two LC-MS/MS datasets containing mixtures of intact O- and N-glycopeptide data are provided to all involved participants, which should be reported on according to the provided guidelines.
Study details and the fine-print:
About the analysed glycoprotein samples: Human serum containing a complex mixture of N- and O-linked glycoproteins was used in this study (Thermo Fisher Scientific #31876). Briefly, the proteins were reduced, carbamidomethylated and digested exhaustively with porcine trypsin. The resulting peptide mixtures were split and analysed in their native form after enrichment using two different LC-MS/MS acquisition styles.
About the two LC-MS/MS acquisition style and access to the provided data: The glycopeptides were separated using nano-LC (C18) separation and detected in positive polarity using alternating fragmentation modes (HCD, ETciD, EThcD and CID, see “Terminology” below for definition) on a Thermo Orbitrap Lumos LC-MS/MS platform . Complementary fragmentation types were used within the same LC-MS/MS run to satisfy various software packages for glycopeptide identification. Precursor and product ions were recorded in “profile” mode at high resolution in the Orbitrap (OT) and at low resolution in the ion trap (IT). Two unprocessed data (raw) files acquired using two different acquisition styles are provided to participants:
Data file A: HCD (OT) – ETciD (OT) – CID (OT) (File: “A_glycopepnew_HCDETciDOTCIDpeptide.raw”)
Data file B: HCD (OT) – EThcD (OT) – CID (IT) (File: “B_glycopepnewHCDEThcDiTCIDpeptide.raw”)
Since some participants may not have access to Thermo (proprietary) software for data processing, the data files are also provided after .mgf or .mzxml. conversion. The link also provides access to other critical files for the glycopeptide identification and the reporting template.
About the proposed identification and reporting of intact glycopeptides: Participants are requested to report on identified intact glycopeptides in a tabulated form using a provided Excel template. The template also hosts detailed guidelines for the identification of intact glycopeptides (e.g. Protein and Glycan search space) and the requirements for the data reporting. The developers (Part A) are also expected to provide annotated MS/MS spectra of the identified (reported) glycopeptides (PDF/PPT preferred) if their software allows for this since annotated spectral evidence is a requirement of many journals in the reporting of glycopeptide data sets. Users are encouraged to do the same.
About the data comparison, disclosure of participant reports and dissemination of outcome: The reported data of the participants will be compiled by the HGI study committee. The identity of the participating developers and their software as well as the relative performance of their software (compared to other software and the user groups) will be disclosed. Should participating developers decide to not return their findings for various reasons, their identity and participation will not be disclosed in any dissemination of the study outcome. User groups will remain anonymous. Results will be compiled, compared, published and presentation (see below). All participants (developers and users alike) that return glycopeptide reports adhering to the study guidelines will be acknowledged for their efforts by being offered a co-authorship in the publication(s) arising from this study and mentioned by name in oral/poster presentations.
Time-line for 1st HGI study:
Sep 2016: HGI formed. Head and committee selected.
July 2017: LC-MS/MS glycoproteomics data generated by Thermo and quality validated by committee.
2017: Calls for participation in HGI study.
Sep 2017: HGI study introduced at World HUPO 2017.
March 2018: Data files and reporting template made available to registered participants.
30 June 2018: Deadline participant reporting of data.
July-September 2018: Compilation and comparison of data.
Sep 2018: Preliminary data presented World HUPO 2018.
Early – Mid 2019: Outcome(s) published.
Sep 2019: Outcomes presented at World HUPO 2019.
HGI study committee:
·HGI chair: Prof. Nicolle H. Packer, Macquarie University, Sydney, and Glycomics Institute, Griffith University, Gold Coast, Australia (email@example.com)
·HGI deputy chair: Dr. Morten Thaysen-Andersen, Macquarie University, Sydney, Australia (firstname.lastname@example.org)
Additional HGI study committee members:
·A/Prof. Daniel Kolarich, Griffith University, Gold Coast, Australia (email@example.com)
·Prof. Kai-Hooi Khoo, Academia Sinica, Taiwan (firstname.lastname@example.org)
·Prof. Katalin Medzihradszky, UCSF, CA (email@example.com)
·Prof. Joe Zaia, Boston University, MA (firstname.lastname@example.org)
·Prof. Goran Larson, Gothenburg, Sweden (email@example.com)
·Dr. Stuart Haslam, Imperial College, UK (firstname.lastname@example.org)
·Prof. Giuseppe Palmisano, University of Sao Paulo, Brazil (email@example.com)
·Prof. Jong Shin Yoo, Korea Basic Science Institute, Korea (firstname.lastname@example.org)
Acknowledgement: Dr Rosa Viner (Thermo) is thanked for providing valuable samples and high quality LC-MS/MS data to this study.