Péter Horvatovich, University of Groningen, Netherlands
On February 21, 2018 the new release of neXtProt was published, which serves as the definitive reference database of the human proteome and is used by the Human Proteome Project (HPP) to assess the progress of completing the map of Human Proteome. The new version of neXtProt updated the human proteome evidence using the latest data from PeptideAtlas (release Human 2018-01) and resulted in 17,470 validated human proteins (PE1), 393 more compared to the previous version of January 17, 2017, leaving 2,186 missing proteins with status PE2+PE3+PE4 and 574 uncertain proteins (PE5). In this new release a total of 51 PeptideAtlas data sets was included from cancer tissues and cell lines, with over 1.4 million peptides detected by mass spectrometry. Data from UniProtKB, Ensembl, IntAct, GOA, and GeneIDs, have been also updated. The data have been complemented with GO molecular function, biological process and cellular component annotations for most of the human protein kinases, as well as expression information, mutagenesis and variant phenotype data. The number of validated proteins with unknown function (uPE1) is now 2,271 which can be obtained with SPARQL query NXQ_00022.
Two new rules have been implemented to reflect protein existence:
- proteins with PE2, PE3 or PE4 status have been upgraded to evidence at protein level (PE1) status if the entry has GOLD binary interaction data from neXtProt.
- As a consequence of UniProtKB demerging entries encoded by multiple genes, some neXtProt entries now have exactly the same protein sequence and are indistinguishable at protein level. The list can be retrieved using SPARQL query NXQ_00231. Peptides matching uniquely on such indistinguishable entries are now labeled “pseudo-unique” rather than "Found in other entries" and were used to validate protein existence if they complied with the HPP guidelines.
The Proteomics view for entries has been revamped and now loads faster. In addition, it is now possible to search the positional annotations listed in the feature table for a specific category or for text found in the feature description. For instance, searching for "SRM" returns the number of SRM peptides mapping to the isoform and allows to the users to rapidly browse the data for all SRM peptides.
New advanced search SPARQL queries have been added, such as to identify proteins with high proline content in the SnorQL interface (NXQ_00225), to list proteins with at least 2 uniquely mapping peptides larger than 9 amino acid lengths found in blood plasma, urine or cerebrospinal fluid to support markers research (NXQ_00226) and to identify proteins with experimentally determined lengthy alpha-helices (length larger than 75 amino acids) illustrating query of proteins with specific secondary structure (NXQ_00230).
New letter format of Journal of Proteome Research to report missing proteins
Discovery of a missing protein or new proteins can now be published in the Journal of Proteome Research December 2018 Special Issue as a short definitive report, submitted in the Letters format. To be considered for publication as a Letter, the missing protein(s) must meet the HUPO Data Interpretation Guidelines version 2.1 and be cast in the context of both the HPP and biological setting in which they were discovered. We anticipate this format will encourage many teams, particularly of the B/D-HPP and the general proteomics community, to highlight such protein discoveries when found in disease and biological sample analyses. Such side findings in a more biological focused analysis may otherwise may be lost or not further detailed as they were an incidental finding.
Letters have a maximum length of four journal pages and should contain sufficient experiment detail for the research to be reproduced. There should be no more than 3 figures, 2 tables and 20 references. A separate Table of Contents Graphic is required, but does not count toward the 4-page or Figure limit. Reporting of missing proteins must meet both the Journal of Proteome Research technical and the HUPO Data Interpretation Guidelines (see further details in Deutsch et al. PMID 27490519) including figure(s) of the annotated spectra and the data uploaded to ProteomeXchange with a PXD number included in the abstract. To be reviewed the HPP Checklist items must be fulfilled and submitted.