Lecture series D2

“Structure and function of proteins”

notes based on Alberts et al 4th ed. (2002) Chapter 3

 

prepared by T. J. Newman, September 25-October 9, 2005

revised, October 1st 2006

 

this document not for public use – all images copyright Garland Science Publishing 2002

 

INTRODUCTION

 

 

 

STRUCTURE OF PROTEINS

 

 

 

·        Flexibility of a protein is mainly due to two degrees of orientational freedom at each alpha carbon

o       these degrees of freedom are conventionally labeled by the phi and psi angles

o       the Ramachandran plot below (phi,psi pairs for alpha carbons in a given protein) shows that these angles are highly correlated due to steric hinderance

 

 

 



 

·        The conformation (folded structure) of a protein is encoded in the amino acid sequence

o       this conformation is believed to correspond to a state of lowest free energy

o       experiments can be done in which, by adding solvents to the protein solution, the protein can be denatured

§        meaning the protein unfolds into a long flexible chain

o       on removal of the solvents, the protein renatures (refolds)

·        Proteins change their conformations when interacting with certain molecules

o       this is related to their ability to act as enzymes

·        Protein chaperones are molecules that help proteins fold as they are produced (from the ribosome)

 

·        The size of proteins varies from 50 – 1000 amino acids

·        Larger proteins are structurally composed of domains

o       these are regions that fold into specific shapes independently of other regions of the protein

 

FOLDING PATTERNS

 

·        From early studies of protein structure (in the 1960’s) it was realized that two folding patterns are very common

·        These are the a-helix and the b-sheet

·        These patterns arise from hydrogen bonds between N-H and C=O groups in the protein backbone

o       and hence are relatively independent of the particular side chains defining the amino acid sequence

 

 

·        In the a-helix, hydrogen bonds are formed between amino acids four peptide bonds apart

·        Short regions of a-helix (composed of non-polar side-chains) are common in transmembrane proteins – these screen the hydrophilic backbone of the protein from the lipid bilayer

·        In other proteins, a-helices (again composed of non-polar side-chains) wrap around each other in a coiled-coil

o       this allows the hydrophobic side-chains to be screened from the aqueous solution

 

 

·        The b-sheet is illustrated below:

 

 

·        The b-sheet can either consist of parallel sections of polypeptide chain, or anti-parallel sections

·        Both types produce rigid structures

 

 

HIGHER LEVEL DESCRIPTIONS

 

·        In discussing the structure of a protein molecule, one speaks of

o       primary structure: amino acid sequence

o       secondary structure: folding patterns – i.e., which regions of the protein are a-helices or b-sheets

o       tertiary structure: full three-dimensional structure of the protein

o       (for protein complexes: quaternary structure: the composite structure of the constituent proteins)

·        Aside from these levels of description, one can also discuss a protein in terms of domains, as mentioned above

o       domains are typically composed of between 40 – 350 amino acids

o       domains fold independently of the rest of the protein

o       domains often have specific functions within the protein

§        e.g. regulatory roles, or catalytic roles

o       the core of a domain consists of a particular combination of helices and sheets – known as a fold

o       of the many thousands of proteins whose conformations are known, a set of about 1000 different folds has been identified

 

 

·        Proteins can also be grouped into families – the similarity between certain proteins is most likely a result of evolution

o       i.e. gene coding for a protein is duplicated, allowing independent, but derivative, evolution of duplicate

·        E.g. the serine proteases – a large group of protein-cleaving (proteolytic) enzymes

o       members of this family are specific as to which peptide bonds they break,

o       but they share sequence and structural similarities (see below, two members of the family)

 

 

·        In other cases, the sequences may have diverged considerably,

o       and yet the three-dimensional structures between two putative members of a family can be striking

o       see below, the two gene regulatory proteins yeast a2 and Drosophila engrailed

o       these two proteins have a very similar conformation, and yet match at only 17 of the 60 amino acids

 

 

STRUCTURE FROM SEQUENCE

 

·        The complete three-dimensional structure of a protein is usually achieved through X-ray crystallography

o       this requires that the protein in question can be crystallized

o       because of the difficulties in achieving this, only a few thousand proteins have had their structure completely determined

o       larger membrane complexes and transmembrane proteins have been particularly hard to crystallize

·        Thus, studies have begun to focus on breaking a larger protein down into domains

o       by determining the structure of the domains through X-ray crystallography it is hoped that one can reconstruct the conformation of the entire protein

o       this goal is aided by the observation that domains fold in a limited number (1000 – 2000) of different ways

·        The database of actual protein sequences contains about 0.5 million entries

o       this number growing as more and more organism genomes are sequenced

·        Sophisticated computer programs have been developed to search these databases to find relationships between these protein sequences

o       one looks for homologous proteins – i.e.  proteins whose sequences indicated that they arose from a common ancestral gene

o       in practice (because of statistical noise) this requires that proteins share > 30% sequence identity

o       short “fingerprint sequences” which are known to relate to particular domain function are also used to find homologues

·        The thesis is that similar sequences (or sub-sequences) imply similar function

·        It is becoming apparent that larger proteins may have evolved by domains joining together

o       novel binding surfaces (and hence protein functions) can arise precisely at these domain joins

·        This process is called domain shuffling

·        Protein modules are smaller-than-average domains which seem to be particularly mobile building blocks

o       these modules consist of a core region of b-sheet surrounded by flexible polypeptide loops

o       these modules have their N and C termini located peripherally, thus allowing easy integration into existing polypeptides

 

     

 

 

·        Vertebrate genomes are not much larger than those of insects, worms, or simple plants

o       and yet, the proteins encoded tend to be more complex, i.e. contain more domains

o       the extra sophistication in vertebrate proteins implies a wider range of protein-protein interactions

 

 

LARGE PROTEINS

 

·        Proteins have particular regions which allow non-covalent binding

o       such regions are called binding sites

·        Two folded polypeptide chains can link together via such binding sites forming a larger protein

o       the polypeptide chains are called protein subunits

·        A simple example of such an arrangement is a dimer composed of two identical subunits

o       e.g. the Cro repressor protein shown below (this protein found in bacteria binds to DNA to turn off viral genes)

 

 

·        An example of a tetramer is neuraminidase, which forms a ring:

 

 

·        A more complicated example is Hemoglobin – the protein which carries oxygen in red blood cells

o       it is composed of two “a-globin” subunits and two “b-globin” subunits

 

 

·        Some large proteins have had their structures exactly determined – allowing us to contrast size and shape:

 

 

 

 

FILAMENTOUS PROTEINS

 

·        If protein subunits have two complementary binding sites, it is possible to create chains of such subunits

 

 

·        an important example is actin – subunits of actin join together to form long helical actin filaments

o       these are a crucial component of the cell cytoskeleton

 

 

·        This mechanism of linking subunits with complementary binding sites commonly leads to helical structures

 

·        Most enzymes are globular

·        In contrast, some proteins have an elongated shape – these can be used to construct larger elongated structures

o       such proteins are termed fibrous proteins

o       an important example is a-keratin – a dimer constructed from a coiled-coil of two subunits

o       the ends of the coiled-coil are binding sites, allowing the proteins to assemble into long filaments

o       keratin filaments are a major component of long-lived structures such as hair and nails

·        Fibrous proteins are abundant in the extracellular matrix (ECM) – providing a scaffold for cells to attach to and move through

o       such proteins are secreted by cells

o       an important example of such an ECM protein is collagen – formed from three interwoven subunits

o       collagen molecules then bind to one another side-by-side to form long collagen fibrils

§        these fibrils help to give connective tissue tensile strength

 

 

·        In contrast, elastin molecules covalently crosslink forming a floppy elastic network

o       these molecules can switch between coiled and extended conformations

o       this allow the network great extensibility

o       elastin networks are used to prevent certain tissues (e.g. skin, lung) from tearing

 

   

 

·        Extracellular proteins exist in a harsher environment than intracellular proteins

o       for this reason, they often have increased structural rigidity supplied by covalent disulphide (S-S) bonds

o       these bonds are quickly broken in the cell cytoplasm by reducing agents

 

LARGER STRUCTURES

 

·        Larger supramolecular structures are often created from a large number of separate proteins

o       e.g. virus capsids, ribosomes

o       advantages of this modular assembly are

§        less genetic information required

§        more robust assembly and disassembly

·        Such assemblies are often constructed as sheets which can then be converted into a tube or a sphere

o       see below, for examples of spherical virus capsids (i.e. protein coat)

 

    

 

·        Shown below is the postulated pathway for construction of the tomato bushy stunt virus

o       this has 180 identical capsid proteins (and a small RNA genome of 4500 nucleotides)

o       the capsid is not just a rigid structure – it must allow the genetic information to be deposited inside the host cell

o       Michael Thorpe’s group at ASU has ground-breaking computer algorithms to determine the dynamics of large protein complexes such as this

 

 

·        Self-assembly is a key process that allows the construction of some of these supra-molecular protein complexes

·        For example, the bacterial ribosome is composed of 55 different proteins, and 3 different RNA molecules

o       the ribosome will self-assemble if the constituent proteins and RNA’s are mixed together in solution

·        In other cases, larger structures are not formed from self-assembly

o       rather, auxiliary enzymes and “templating proteins” help to guide the assembly

 

·        We now turn from space to time – i.e. from structure to function of proteins

 

 

PROTEIN FUNCTION

 

·        A universal feature of proteins is that they bind to other molecules, e.g.

o       antibodies bind to foreign bodies (such as a virus or a bacterium)

o       catalytic enzymes bind to substrate molecules

o       actin molecules bind to each other to form filaments

·        Binding strength varies according to context, but is always highly specific

·        The molecule to which a protein binds is termed a ligand

·        Since protein-ligand bonding is non-covalent, many individual bonds are required