Kun-Hsing Yu (Harvard Medical School) and Maggie Lam (University of Colorado)
Among the missions of the HUPO Biology/Disease-driven Human Proteome Project (B/D-HPP) initiatives is to identify popular proteins in the human proteome associated with diseases and biological systems. Recently, there has been a number of software tools emerging that allow researchers to prioritize crucial proteins in various organ-systems and diseases from the biomedical literature. Information obtained using such tools which can be used to facilitate the development and promote the application of proteomics assays, such as multiple reaction monitoring to target disease-specific proteins. Our B/D-HPP groups have capitalized on these resources and made significant strides in assembling and sharing prioritized protein lists relevant to their biological systems and diseases of interest. Below we provide a brief overview of some of these software tools that enable the prioritization of popular proteins. These tools provide users ways to search and prioritize proteins related to their research interest and will facilitate targeted proteomics studies. Future algorithm and software development can further enable the prioritization of various proteoforms, such as phosphoproteome, glycoproteome, and alternative splicing isoforms, which will advance our understanding of the human proteome in health and disease states.
PubPular (Lau et al. J Proteome Res. 2018)
PubPular is an R/Shiny web interface for querying and visualizing popular proteins for custom search terms using Gene2PubMed/PubTator data sources and semantic similarity metrics. Recent updates of PubPular also support the query of pre-compiled protein lists from Disease Ontology and Human Phenotype Ontology disease terms as well as reverse protein-to-topic searches. PubPular can be accessed via http://pubpular.net.
PURPOSE and metaPURPOSE (Yu et al. J Proteome Res. 2018)
The Protein Universal Reference Publication-Originated Search Engine (PURPOSE) tool prioritizes proteins by the strength and specificity of the associations between proteins and the query terms. For each query, the number of PubMed publications are retrieved in real-time, and the results are summarized in the user-friendly web interface. PURPOSE is accessible through http://rebrand.ly/proteinpurpose.
metaPURPOSE is an extension of PURPOSE and prioritizes metabolites associated with any search terms. This tool addresses the challenges arisen from multiple synonyms associated with common metabolites and uses the PURPOSE algorithm to rank the retrieved metabolites. metaPURPOSE is hosted at http://rebrand.ly/metapurpose.
GLAD4U (Jourquin et al. BMC Genomics. 2012)
Gene List Automatically Derived For You (GLAD4U) is a web-based tool that allows users to retrieve a prioritized list of Entrez-Gene IDs. The tool leverages the hypergeometric test to rank the list of biological entities associated with the query. Their user interface enables users to download the query results, conduct functional enrichment analysis using WebGestalt, and view the detailed publication list related to the retrieved genes. The GLAD4U tool is at http://glad4u.zhang-lab.org.
FACTA+ (Tsuruoka et al. Bioinformatics. 2011)
Finding Associated Concepts with Text Analysis+ (FACTA+) is a text mining tool developed by the National Centre for Text Mining (NaCTeM), School of Computer Science, The University of Manchester, UK. The tool aims to assist users to identify associations among biomedical concepts, including genes, proteins, diseases, symptoms, drugs, and chemical compounds. For each query term, the associated biomedical concepts are retrieved and ranked. Users can view the snippets from the literature that describe the association. FACTA+ also allows users to find indirectly associated concepts by ranking the second-order associations between the query term and the target concepts. For access, go to http://www.nactem.ac.uk/facta/.