The Chromosome-Centric Human Proteome Project (C-HPP) was launched with the goal to catalogue the entirety of the parts list of the human proteome, specifically to find evidence at protein level of all human protein coding genes. C-HPP contributed with collaborating partners in developing guidelines for mass spectrometry data interpretation, with EBI to organize central data repository ProteomeXchange for proteomics experiences, to deliver stringent peptide and protein identification lists of human proteome from large-scale community deposited LC-MS/MS data in PeptideAtlas and with neXtProt to define 5 categories of protein evidence (PE1-5). In 2012, at the start of C-HPP, neXtProt included 20,059 protein entries, from which 13,664 had evidence at protein level (PE1), including 12,509 with mass spectrometry data. In January 2017, neXtProt accounted for 20,159 entries, from which 572 might correspond to non coding elements (PE5). C-HPP members, collaborating partners and the proteomics community found evidence at protein level (PE1) for 17,008 out of the 19,587 protein coding genes, from which 15,173 have mass-spectrometry evidence in PeptideAtlas, leaving to 2,579 missing proteins (MP) i.e. human protein coding genes with evidences at PE2-4 levels.
Advancing the HPP
At the Dublin HUPO congress and HPP Workshop, C-HPP PIs have made two important decisions for future directions.
First, they resolved to extend the term period of C-HPP from 2022 to 2027 in an attempt to reflect the current progress as we face a slowing in discovery reflecting an increasing proportion of the MPs that are extremely limited in spatial and temporal expression and still evade detection. 2022-2027 will provide extra time to plan and execute the whole C-HPP plans as bench marked from the lessons of the Human Genome Project (HGP). In fact, Dr. Leroy Hood mentioned that HGP group had a similar experience and that they reviewed the progress every 5 year and redirected the HGP based on the progress made during the past years.
Second, they launched neXt-CP50 challenge that is led by Young-Ki Paik, complementary to neXt-MP50 (see Figure 1), to characterize 1232 known PE1 which have no functional annotation as of 8-8-2017 (neXtProt). Details on the strategy and timeline will be available shortly for those 25 PIs who are involved in this campaign.
Over the past 5 years, protein coding genes with protein evidence at PE1 level increased from 68.1% to 86.8% and finding evidences for the remaining MPs set a challenge for the proteomics community. To promote these efforts in early 2017, C-HPP launched a new campaign the neXt-MP50 challenge led by Chris Overall, which aims to find evidences for the remaining missing proteins. This campaign has the ultimate goal to find missing proteins by identifying types of human samples not analyzed yet by the proteomics community considering sample location, stimulus, diseases/health, age. By using novel sample preparation techniques such as proteominer, mass spectrometry proteomics profiling or bioinformatics technologies, the neXt-MP50 aims to uncover the remaining dark matter of the human proteome. Proteogenomics methods integrating genomics and proteomics data tightly is important to identify sequence variants of human proteins. Proteogenomics data integration is gaining momentum amongst the proteomics and genomics communities and has a central role in C-HPP. In the next phase, C-HPP is promoting the use of proteogenomics data integration to reveal the amino acid sequence space of the human proteins as well as to promote the identification of peptides with post-translation modifications regarded as an important source of structure variability of human proteins. Proteogenomics data integration, peptides with post-translation modification and next-MP50 challenge to find the evidences for the remaining missing proteins are the goals of the second phase of C-HPP, which is on the agenda for the next five years.