W13: Controversies and Equivocalities in Protein Informatics: Will the Real Proteins and Targets Please Stand Up?
Tuesday, April 16, 2019 | 12:30 - 4:00 pm
As the databases approach 130 million entries, we might assume that protein sequences were “done and dusted”. However, biomedical scientists may be surprised at the underlying uncertainties that will be directly addressed in this
workshop.
1) More than 15 years post-genomic completion, there is still no database consensus between 19,000 to 22,000 on the number of canonical human proteins with proof of existence.
2) While human Ensembl predicts on average ~10 transcripts as alternatively spliced coding forms of each gene locus experimental verification of these by proteomics is running ~30 fold lower than expected, implying that most are
never stably translated.
3) Human transcript data suggests that for ~20-30% of Swiss-Prot sequences the max-exon canonical default annotation does not represent the most abundant protein isoform form in vivo.
4) Reported tissue-specific protein expression, either by microarray, RNAseq, proteomics or antibody profiling, still shows wide discordance both in terms of measurement methodologies and results in different databases.
WORKSHOP AGENDA:
12:30 pm Course Welcome and Instructor Introduction
2:00 Refreshment Break
4:00 End of Workshop
After a short introduction David and Chris will present on the topics below. Timings will be flexible according to attendee inclinations. Note they also welcome any feedforward from attendees in terms of topic coverage and emphasis:
- Differences between NCBI, EBI, UniProt, HGNC, NextProt, Ensembl protein sequence databases and which to use for what.
- Overview of the causes and consequences of different canonical human protein numbers and evidence support levels in different resources.
- Which potential drug targets exist or not.
- Equivocalities of smORF (small open reading frame) counts and the APELA story (a smORF hiding in plain sight).
- Controversies over
alternative splicing existence and isoform abundance (with brief intro to the APPRIS database).
- Controversies over the detection, existence evidence, and quantification for germline and somatic nsSNPs (amino acid variants).
- Insights into the biases of different methodologies and sources for protein expression measurement.
- The potential of proteomics approaches and open datasets to at least partially resolve
some of the equivocalities and discrepancies described above.
SELECTED INSTRUCTOR PUBLICATIONS:
Methods, Tools and Current Perspectives in Proteogenomics. Ruggles KV, Krug K, Wang X, Clauser KR, Wang J, Payne SH, Fenyö D, Zhang B, Mani DR. Mol Cell Proteomics. 2017 Jun;16(6):959-981. doi: 10.1074/mcp.MR117.000024. Epub 2017 Apr 29. Review. PMID: 28456751 Free PMC Article
Last rolls of the yoyo: Assessing the human canonical protein count. Southan C. F1000Res. 2017 Apr 7;6:448. doi: 10.12688/f1000research.11119.1. eCollection 2017. Review. PMID: 28529709 Free PMC Article
INSTRUCTORS:
David Fenyő, PhD, Professor, Biochemistry and Molecular Pharmacology, Institute for Systems Genetics, New York University School of Medicine
Dr. David Fenyő’s research focuses on providing a detailed understanding of the dynamics of cellular processes. He applies mathematical, statistical, and computational methods to the analysis of quantitative data and the modeling of biological systems.
After receiving a PhD in Physics from Uppsala University in Sweden in 1991, he switched the emphasis of his research to bioinformatics, first as a postdoctoral fellow at the Rockefeller University, then as a co-founder of a bioinformatics startup
company, and subsequently as staff scientist and product manager at GE Healthcare. Dr. Fenyő joined the NYU School of Medicine in 2010 and he is currently Professor of Biochemistry and Molecular Pharmacology, and Director for the PhD program in Systems
and Computational Biomedicine and the Master’s program in biomedical informatics (further information https://med.nyu.edu/faculty/david-fenyo).
Christopher Southan, PhD, Principal Consultant, TW2Informatics
Dr. Christopher Southan works at the interface between bioinformatics, cheminformatics, pharmacology and drug discovery. His current role as Principal Consultant at TW2 informatics in Göteborg, Sweden was preceded by Senior Cheminformatian
for the Edinburgh University BPS/IUPHAR Guide to Pharmacology database team 2013-18. Prior to this, he set up TW2Informatics, engaging in patent data consulting for SureChem (2011-12) and the AstraZeneca Knowledge Engineering Program for testing
and documenting Chemistry Connect (2009-11). During 2008-9, he coordinated the ELIXIR Database Provider Survey at the EBI, preceded by a Principal Scientist and Bioinformatics Team Leader position in AstraZeneca, Mölndal (2004-7) preceded
by senior bioinformatics positions at Oxford Glycosciences Gemini Genomics and SmithKline Beecham. He has a PhD in Protein Chemistry from the Ludwig Maximillian University of Munich and a BSc Hons in Biochemistry from Dundee University (further information
on LinkedIn and https://sites.google.com/view/tw2informatics/home).