Learning meaningful representations of protein sequences

How we choose to represent our data has a fundamental impact on our ability to subsequently extract information from them. Machine learning promises to automatically determine efficient representations from large unstructured datasets, such as those arising in biology. However, empirical evidence su...

Full description

Bibliographic Details
Main Authors:	Boomsma, W. (Author), Detlefsen, N.S (Author), Hauberg, S. (Author)
Format:	Article
Language:	English
Published:	Nature Research 2022
Subjects:	amino acid sequence Amino Acid Sequence article attention automation data interpretation empirical analysis geometry learning machine learning Machine Learning protein transfer of learning
Online Access:	View Fulltext in Publisher


LEADER	02154nam a2200349Ia 4500
001	10.1038-s41467-022-29443-w
008	220425s2022 CNT 000 0 und d
020			\|a 20411723 (ISSN)
245	1	0	\|a Learning meaningful representations of protein sequences
260		0	\|b Nature Research \|c 2022
856			\|z View Fulltext in Publisher \|u https://doi.org/10.1038/s41467-022-29443-w
520	3		\|a How we choose to represent our data has a fundamental impact on our ability to subsequently extract information from them. Machine learning promises to automatically determine efficient representations from large unstructured datasets, such as those arising in biology. However, empirical evidence suggests that seemingly minor changes to these machine learning models yield drastically different data representations that result in different biological interpretations of data. This begs the question of what even constitutes the most meaningful representation. Here, we approach this question for representations of protein sequences, which have received considerable attention in the recent literature. We explore two key contexts in which representations naturally arise: transfer learning and interpretable learning. In the first context, we demonstrate that several contemporary practices yield suboptimal performance, and in the latter we demonstrate that taking representation geometry into account significantly improves interpretability and lets the models reveal biological information that is otherwise obscured. © 2022, The Author(s).
650	0	4	\|a amino acid sequence
650	0	4	\|a amino acid sequence
650	0	4	\|a Amino Acid Sequence
650	0	4	\|a article
650	0	4	\|a attention
650	0	4	\|a automation
650	0	4	\|a data interpretation
650	0	4	\|a empirical analysis
650	0	4	\|a geometry
650	0	4	\|a geometry
650	0	4	\|a learning
650	0	4	\|a machine learning
650	0	4	\|a machine learning
650	0	4	\|a Machine Learning
650	0	4	\|a protein
650	0	4	\|a transfer of learning
700	1		\|a Boomsma, W. \|e author
700	1		\|a Detlefsen, N.S. \|e author
700	1		\|a Hauberg, S. \|e author
773			\|t Nature Communications

Learning meaningful representations of protein sequences

Similar Items