Lecture series D2
“Structure and function of proteins”
notes based on Alberts et al 4th ed. (2002) Chapter 3
prepared by T. J. Newman, September 25-October 9, 2005
revised, October 1st 2006
this document not
for public use – all images copyright Garland Science Publishing 2002
INTRODUCTION
STRUCTURE OF PROTEINS

·
Flexibility of a
protein is mainly due to two degrees of orientational freedom at each alpha
carbon
o
these degrees of
freedom are conventionally labeled by the phi and psi angles
o
the Ramachandran plot below
(phi,psi pairs for alpha carbons in a given protein) shows that these
angles are highly correlated due to steric hinderance




·
The conformation
(folded structure) of a protein is encoded in the amino acid sequence
o
this conformation
is believed to correspond to a state of lowest free energy
o
experiments can be
done in which, by adding solvents to the protein solution, the protein can be denatured
§
meaning the
protein unfolds into a long flexible chain
o
on removal of the
solvents, the protein renatures (refolds)
·
Proteins change
their conformations when interacting with certain molecules
o
this is related
to their ability to act as enzymes
·
Protein chaperones are
molecules that help proteins fold as they are produced (from the ribosome)
·
The size of
proteins varies from 50 – 1000 amino acids
·
Larger proteins
are structurally composed of domains
o
these are regions
that fold into specific shapes independently of other regions of the protein
FOLDING PATTERNS
·
From early
studies of protein structure (in the 1960’s) it was realized that two folding
patterns are very common
·
These are the a-helix and the b-sheet
·
These patterns
arise from hydrogen bonds between N-H and C=O groups in the protein backbone
o
and hence are
relatively independent of the particular side chains defining the amino acid
sequence

·
In the a-helix, hydrogen bonds are formed between amino acids
four peptide bonds apart
·
Short regions of a-helix (composed of non-polar side-chains) are common
in transmembrane proteins – these screen the hydrophilic backbone of the
protein from the lipid bilayer
·
In other proteins,
a-helices (again composed
of non-polar side-chains) wrap around each other in a coiled-coil
o
this
allows the hydrophobic side-chains to be screened from the aqueous solution

·
The b-sheet is illustrated below:

·
The b-sheet can either consist of parallel sections of
polypeptide chain, or anti-parallel sections
·
Both types
produce rigid structures

HIGHER LEVEL DESCRIPTIONS
·
In discussing the
structure of a protein molecule, one speaks of
o
primary structure: amino acid sequence
o
secondary structure: folding patterns – i.e., which regions of
the protein are a-helices or b-sheets
o
tertiary structure: full three-dimensional structure of the
protein
o
(for protein
complexes: quaternary structure: the composite
structure of the constituent proteins)
·
Aside from these
levels of description, one can also discuss a protein in terms of domains, as
mentioned above
o
domains are typically composed of between 40 – 350 amino
acids
o
domains fold
independently of the rest of the protein
o
domains often
have specific functions within the protein
§
e.g. regulatory
roles, or catalytic roles
o
the core of a
domain consists of a particular combination of helices and sheets – known as a fold
o
of the many
thousands of proteins whose conformations are known, a set of about 1000
different folds has been identified

·
Proteins can also
be grouped into families – the similarity
between certain proteins is most likely a result of evolution
o
i.e. gene coding
for a protein is duplicated, allowing independent, but derivative, evolution of
duplicate
·
E.g. the serine
proteases – a large group of protein-cleaving (proteolytic) enzymes
o
members of this
family are specific as to which peptide bonds they break,
o
but they share
sequence and structural similarities (see below, two members of the family)

·
In other cases,
the sequences may have diverged considerably,
o
and yet the
three-dimensional structures between two putative members of a family can be
striking
o
see below, the
two gene regulatory proteins yeast a2 and Drosophila
engrailed
o
these two proteins
have a very similar conformation, and yet match at only 17 of the 60 amino
acids

STRUCTURE FROM SEQUENCE
·
The complete
three-dimensional structure of a protein is usually achieved through X-ray
crystallography
o
this requires
that the protein in question can be crystallized
o
because of the
difficulties in achieving this, only a few thousand proteins have had their
structure completely determined
o
larger membrane
complexes and transmembrane proteins have been particularly hard to crystallize
·
Thus, studies
have begun to focus on breaking a larger protein down into domains
o
by determining
the structure of the domains through X-ray crystallography it is hoped that one
can reconstruct the conformation of the entire protein
o
this goal is
aided by the observation that domains fold in a limited number (1000 – 2000) of
different ways
·
The database of
actual protein sequences contains about 0.5 million entries
o
this number
growing as more and more organism genomes are sequenced
·
Sophisticated
computer programs have been developed to search these databases to find
relationships between these protein sequences
o
one looks for homologous proteins –
i.e. proteins whose sequences indicated
that they arose from a common ancestral gene
o
in practice
(because of statistical noise) this requires that proteins share > 30%
sequence identity
o
short “fingerprint sequences”
which are known to relate to particular domain function are also used to find
homologues
·
The thesis is
that similar sequences (or sub-sequences) imply similar function
·
It is becoming
apparent that larger proteins may have evolved by domains joining together
o
novel binding
surfaces (and hence protein functions) can arise precisely at these domain
joins
·
This process is
called domain shuffling
·
Protein modules are smaller-than-average domains which seem to
be particularly mobile building blocks
o
these modules
consist of a core region of b-sheet surrounded by flexible polypeptide loops
o
these modules
have their N and C termini located peripherally, thus allowing easy integration
into existing polypeptides

·
Vertebrate
genomes are not much larger than those of insects, worms, or simple plants
o
and yet, the
proteins encoded tend to be more complex, i.e. contain more domains
o
the extra
sophistication in vertebrate proteins implies a wider range of protein-protein
interactions
LARGE PROTEINS
·
Proteins have
particular regions which allow non-covalent binding
o
such regions are
called binding sites
·
Two folded
polypeptide chains can link together via such binding sites forming a larger
protein
o
the polypeptide
chains are called protein subunits
·
A simple example
of such an arrangement is a dimer composed of
two identical subunits
o
e.g. the Cro
repressor protein shown below (this protein found in bacteria binds to DNA to
turn off viral genes)

·
An example of a tetramer is neuraminidase, which forms a ring:

·
A more
complicated example is Hemoglobin – the protein which carries oxygen in red
blood cells
o
it is composed of
two “a-globin” subunits and two
“b-globin” subunits

·
Some large
proteins have had their structures exactly determined – allowing us to contrast
size and shape:


FILAMENTOUS PROTEINS
·
If protein
subunits have two complementary binding sites, it is possible to create chains
of such subunits

·
an important example
is actin – subunits of actin join together to
form long helical actin filaments
o
these are a
crucial component of the cell cytoskeleton

·
This mechanism of
linking subunits with complementary binding sites commonly leads to helical
structures
·
Most enzymes are
globular
·
In contrast, some
proteins have an elongated shape – these can be used to construct larger
elongated structures
o
such proteins are
termed fibrous proteins
o
an important
example is a-keratin – a dimer constructed
from a coiled-coil of two subunits
o
the ends of the
coiled-coil are binding sites, allowing the proteins to assemble into long
filaments
o
keratin filaments
are a major component of long-lived structures such as hair and nails
·
Fibrous proteins
are abundant in the extracellular matrix (ECM) –
providing a scaffold for cells to attach to and move through
o
such proteins are
secreted by cells
o
an important
example of such an ECM protein is collagen –
formed from three interwoven subunits
o
collagen
molecules then bind to one another side-by-side to form long collagen fibrils
§
these fibrils
help to give connective tissue tensile strength

·
In contrast, elastin molecules covalently crosslink forming a
floppy elastic network
o
these molecules
can switch between coiled and extended conformations
o
this allow the
network great extensibility
o
elastin networks
are used to prevent certain tissues (e.g. skin, lung) from tearing

·
Extracellular
proteins exist in a harsher environment than intracellular proteins
o
for this reason,
they often have increased structural rigidity supplied by covalent disulphide (S-S) bonds
o
these bonds are
quickly broken in the cell cytoplasm by reducing agents
LARGER STRUCTURES
·
Larger supramolecular structures are often created from a
large number of separate proteins
o
e.g. virus capsids, ribosomes
o
advantages of
this modular assembly are
§
less genetic
information required
§
more robust
assembly and disassembly
·
Such assemblies
are often constructed as sheets which can then be converted into a tube or a
sphere
o
see below, for
examples of spherical virus capsids (i.e. protein coat)

·
Shown below is
the postulated pathway for construction of the tomato bushy stunt virus
o
this has 180
identical capsid proteins (and a small RNA genome of 4500 nucleotides)
o
the capsid is not
just a rigid structure – it must allow the genetic information to be deposited
inside the host cell
o
Michael Thorpe’s
group at ASU has ground-breaking computer algorithms to determine the dynamics
of large protein complexes such as this

·
Self-assembly is a key process that allows the construction
of some of these supra-molecular protein complexes
·
For example, the
bacterial ribosome is composed of 55 different proteins, and 3 different RNA
molecules
o
the ribosome will
self-assemble if the constituent proteins and RNA’s are mixed together in
solution
·
In other cases,
larger structures are not formed from self-assembly
o
rather, auxiliary
enzymes and “templating proteins” help to guide the assembly
·
We now turn from
space to time – i.e. from structure to function of proteins
PROTEIN FUNCTION
·
A universal
feature of proteins is that they bind to other molecules, e.g.
o
antibodies bind to foreign bodies (such as a virus or a
bacterium)
o
catalytic enzymes
bind to substrate molecules
o
actin molecules
bind to each other to form filaments
·
Binding strength
varies according to context, but is always highly specific
·
The molecule to
which a protein binds is termed a ligand
·
Since
protein-ligand bonding is non-covalent, many individual bonds are required