Learning from cadherin structures and sequences: affinity determinants and protein architecture

Cadherins are a family of cell-surface proteins mediating adhesion that are important in development and maintenance of tissues. The family is defined by the repeating cadherin domain (EC) in their extracellular region, but they are diverse in terms of protein size, architecture and cellular functio...

Full description

Bibliographic Details
Main Author: Felsovalyi, Klara
Language:English
Published: 2014
Subjects:
Online Access:https://doi.org/10.7916/D8Q81B2Q
id ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-D8Q81B2Q
record_format oai_dc
collection NDLTD
language English
sources NDLTD
topic Protein binding
Cadherins
Biochemistry
Bioinformatics
Biophysics
spellingShingle Protein binding
Cadherins
Biochemistry
Bioinformatics
Biophysics
Felsovalyi, Klara
Learning from cadherin structures and sequences: affinity determinants and protein architecture
description Cadherins are a family of cell-surface proteins mediating adhesion that are important in development and maintenance of tissues. The family is defined by the repeating cadherin domain (EC) in their extracellular region, but they are diverse in terms of protein size, architecture and cellular function. The best-understood subfamily is the type I classical cadherins, which are found in vertebrates and have five EC domains. Among the five different type I classical cadherins, the binding interactions are highly specific in their homo- and heterophilic binding affinities, though their sequences are very similar. As previously shown, E- and N-cadherins, two prototypic members of the subfamily, differ in their homophilic K_D by about an order of magnitude, while their heterophilic affinity is intermediate. To examine the source of the binding affinity differences among type I cadherins, we used crystal structures, analytical ultracentrifugation (AUC), surface plasmon resonance (SPR), and electron paramagnetic resonance (EPR) studies. Phylogenetic analysis and binding affinity behavior show that the type I cadherins can be further divided into two subgroups, with E- and N-cadherin representing each. In addition to the affinity differences in their wild-type binding through the strand-swapped interface, a second interface also shows an affinity difference between E- and N-cadherin. This X-dimer interface, which is a weakly binding kinetic intermediate in E-cadherin, has a much stronger affinity in N-cadherin: nearly as strong as N-cadherin wild-type binding. In the swapped and X-dimer interactions of E- and N-cadherin, differences in hydrophobic surface area can mostly account for the affinity difference. However, several mutants of N-cadherin have a K_D an order of magnitude stronger even than the wild-type N-cadherin. In these mutants, the source of the strong affinity seems to be entropic stabilization through an equilibrium between multiple conformations with similar energies. We thus have a molecular-level understanding of vertebrate classical cadherins, with a detailed understanding of their adhesive mechanism and their binding affinity determinants. However, the adhesive mechanisms of cadherins from invertebrates, which are structurally divergent yet function in similar roles, remain unknown. We present crystal structures of the predicted N-terminal region of Drosophila N-cadherin (DN-cadherin). Of the 16 total predicted EC domains, we have crystallized the EC1-3 and EC1-4 segments. While the linker regions for the EC1-EC2 and EC3-EC4 pairs display binding of three Ca^2+ ions similar to that in vertebrate cadherins, domains EC2 and EC3 are joined in a bent orientation by a novel, previously uncharacterized Ca^2+-free linker. Based on sequence analysis of the further ECs of DN-cadherin, we predict another such Ca^2+-free linker between EC7 and EC8. Biophysical analysis demonstrates that a construct containing the first nine predicted EC domains of DN-cadherin forms homodimers with affinity similar to vertebrate classical cadherins. Intriguingly, this segment contains both the crystallized and predicted Ca^2+-free linkers, suggesting a complex binding interface. Sequence analysis of the cadherin family reveals that similar Ca^2+-free linkers are widely distributed in the ectodomains of both vertebrate and invertebrate cadherins. In cases of long cadherins, there are frequently multiple Ca^2+-free linkers in a single protein chain. It thus appears that a combination of calcium-binding and calcium-free linkers can allow cadherins to form three-dimensional arrangements that are more complex than the extended, calcium-rigidified structures in classical cadherins. Discovery of the Ca^2+-free linker, together with the differing numbers and arrangements of ECs and other domain types, implies that the cadherin superfamily is more structurally diverse than previously thought. Because little is known about the function and even less about the structure of the majority of the superfamily, studying the linear architecture (i.e. the precise sequence of ECs and the characteristics of the interdomain linkers) at the scale of the superfamily would give significant new insights on the structure and function of less-understood cadherins. With this motivation, we have constructed a cadherin database with relevant information on two different scales: the protein and the domain. On the whole protein level, we represent the architecture of each cadherin by recording the arrangement of ECs, different linker types, and other (non-EC) domain types in the protein. On the individual EC level, based on the sequence, we record the domain characteristics that give rise to the different structural features at the protein level. We have annotated over 9,600 proteins from 560 organisms, containing over 69,000 ECs; and built an online interface to search and access this information. Our aim is to provide a tool for understanding the protein architecture, function, and relationships among cadherins, a structurally diverse protein family. Together, these studies examine the relationships between sequence, structure and function of cadherins at different scales. In the classical cadherin study, small changes of one or two residues can dramatically alter the dimer conformations and thus lead to large differences in binding affinity between highly related cadherins, or between wild-type and mutant proteins. These seemingly small mutations can result in even higher binding affinity with the effect of entropic stabilization by multiple conformations. In DN-cadherin, the absence of certain calcium-binding motifs in adjacent ECs leads to a new linker type and a new interdomain orientation. This, in turn, has great implications in the global shape, and possibly the binding mechanism of the protein. The cadherin database aims to provide information at different structural levels in order to allow users to draw connections between primary sequence, domain structure and protein architecture, to ultimately learn about protein function.
author Felsovalyi, Klara
author_facet Felsovalyi, Klara
author_sort Felsovalyi, Klara
title Learning from cadherin structures and sequences: affinity determinants and protein architecture
title_short Learning from cadherin structures and sequences: affinity determinants and protein architecture
title_full Learning from cadherin structures and sequences: affinity determinants and protein architecture
title_fullStr Learning from cadherin structures and sequences: affinity determinants and protein architecture
title_full_unstemmed Learning from cadherin structures and sequences: affinity determinants and protein architecture
title_sort learning from cadherin structures and sequences: affinity determinants and protein architecture
publishDate 2014
url https://doi.org/10.7916/D8Q81B2Q
work_keys_str_mv AT felsovalyiklara learningfromcadherinstructuresandsequencesaffinitydeterminantsandproteinarchitecture
_version_ 1719046035970457600
spelling ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-D8Q81B2Q2019-05-09T15:14:19ZLearning from cadherin structures and sequences: affinity determinants and protein architectureFelsovalyi, Klara2014ThesesProtein bindingCadherinsBiochemistryBioinformaticsBiophysicsCadherins are a family of cell-surface proteins mediating adhesion that are important in development and maintenance of tissues. The family is defined by the repeating cadherin domain (EC) in their extracellular region, but they are diverse in terms of protein size, architecture and cellular function. The best-understood subfamily is the type I classical cadherins, which are found in vertebrates and have five EC domains. Among the five different type I classical cadherins, the binding interactions are highly specific in their homo- and heterophilic binding affinities, though their sequences are very similar. As previously shown, E- and N-cadherins, two prototypic members of the subfamily, differ in their homophilic K_D by about an order of magnitude, while their heterophilic affinity is intermediate. To examine the source of the binding affinity differences among type I cadherins, we used crystal structures, analytical ultracentrifugation (AUC), surface plasmon resonance (SPR), and electron paramagnetic resonance (EPR) studies. Phylogenetic analysis and binding affinity behavior show that the type I cadherins can be further divided into two subgroups, with E- and N-cadherin representing each. In addition to the affinity differences in their wild-type binding through the strand-swapped interface, a second interface also shows an affinity difference between E- and N-cadherin. This X-dimer interface, which is a weakly binding kinetic intermediate in E-cadherin, has a much stronger affinity in N-cadherin: nearly as strong as N-cadherin wild-type binding. In the swapped and X-dimer interactions of E- and N-cadherin, differences in hydrophobic surface area can mostly account for the affinity difference. However, several mutants of N-cadherin have a K_D an order of magnitude stronger even than the wild-type N-cadherin. In these mutants, the source of the strong affinity seems to be entropic stabilization through an equilibrium between multiple conformations with similar energies. We thus have a molecular-level understanding of vertebrate classical cadherins, with a detailed understanding of their adhesive mechanism and their binding affinity determinants. However, the adhesive mechanisms of cadherins from invertebrates, which are structurally divergent yet function in similar roles, remain unknown. We present crystal structures of the predicted N-terminal region of Drosophila N-cadherin (DN-cadherin). Of the 16 total predicted EC domains, we have crystallized the EC1-3 and EC1-4 segments. While the linker regions for the EC1-EC2 and EC3-EC4 pairs display binding of three Ca^2+ ions similar to that in vertebrate cadherins, domains EC2 and EC3 are joined in a bent orientation by a novel, previously uncharacterized Ca^2+-free linker. Based on sequence analysis of the further ECs of DN-cadherin, we predict another such Ca^2+-free linker between EC7 and EC8. Biophysical analysis demonstrates that a construct containing the first nine predicted EC domains of DN-cadherin forms homodimers with affinity similar to vertebrate classical cadherins. Intriguingly, this segment contains both the crystallized and predicted Ca^2+-free linkers, suggesting a complex binding interface. Sequence analysis of the cadherin family reveals that similar Ca^2+-free linkers are widely distributed in the ectodomains of both vertebrate and invertebrate cadherins. In cases of long cadherins, there are frequently multiple Ca^2+-free linkers in a single protein chain. It thus appears that a combination of calcium-binding and calcium-free linkers can allow cadherins to form three-dimensional arrangements that are more complex than the extended, calcium-rigidified structures in classical cadherins. Discovery of the Ca^2+-free linker, together with the differing numbers and arrangements of ECs and other domain types, implies that the cadherin superfamily is more structurally diverse than previously thought. Because little is known about the function and even less about the structure of the majority of the superfamily, studying the linear architecture (i.e. the precise sequence of ECs and the characteristics of the interdomain linkers) at the scale of the superfamily would give significant new insights on the structure and function of less-understood cadherins. With this motivation, we have constructed a cadherin database with relevant information on two different scales: the protein and the domain. On the whole protein level, we represent the architecture of each cadherin by recording the arrangement of ECs, different linker types, and other (non-EC) domain types in the protein. On the individual EC level, based on the sequence, we record the domain characteristics that give rise to the different structural features at the protein level. We have annotated over 9,600 proteins from 560 organisms, containing over 69,000 ECs; and built an online interface to search and access this information. Our aim is to provide a tool for understanding the protein architecture, function, and relationships among cadherins, a structurally diverse protein family. Together, these studies examine the relationships between sequence, structure and function of cadherins at different scales. In the classical cadherin study, small changes of one or two residues can dramatically alter the dimer conformations and thus lead to large differences in binding affinity between highly related cadherins, or between wild-type and mutant proteins. These seemingly small mutations can result in even higher binding affinity with the effect of entropic stabilization by multiple conformations. In DN-cadherin, the absence of certain calcium-binding motifs in adjacent ECs leads to a new linker type and a new interdomain orientation. This, in turn, has great implications in the global shape, and possibly the binding mechanism of the protein. The cadherin database aims to provide information at different structural levels in order to allow users to draw connections between primary sequence, domain structure and protein architecture, to ultimately learn about protein function.Englishhttps://doi.org/10.7916/D8Q81B2Q