Proteins could be said to be the most important molecules in
living beings. Just about everything in your body involves or is
made out of proteins. Proteins are actually long chains made up of
smaller molecules called amino acids. There are 20 different amino acids that make up all proteins.
One can think of the amino acids as being beads of 20 different
colors. Sometimes, hundreds of them make up one protein. Proteins
typically don't stay as long chains however. As soon as the chain
of amino acids is built, the chain folds and tangles up into a more
compact mass, ending up in a particular shape. This process is
called protein folding. Protein folding occurs because the various amino acids like to
stick to each other following certain rules. You can think of the
amino-acid (beads on a string) as being sticky, but sticky in such
a way that only certain colors can stick to certain other
colors. The amino acid chains built in the body must fold up in a
particular way to make useful proteins. The cell has mechanisms to
help the proteins fold properly and mechanism to get rid of
improperly folded proteins. Each gene tells the order of the amino
acids for one protein. The gene itself is a section of long chain
called DNA. In recent years scientists have sequenced the human genome;
finding over 30,000 genes within the human genome. The collection
of all human genes is known as "the human genome". Depending on how
genes are counted, there are over 30,000 genes in the human genome.
Each of these genes tells how to build the chain of amino acids for
the each of the 30,000 proteins. The collection of all of the human
proteins is known as "the human proteome." What the genes don't tell is exactly how the proteins will fold
into their compact final form. The final shape of a protein is very
important because that determines what it can do and what other
proteins it can connect to or interact with. You can think of the
protein shapes like puzzle pieces. For example muscle proteins
connect to each other to form a muscle fiber. They stick together
that way because of their shape, and certain other factors relating
to the shape. Everything that happens in cells, and in the body, is very
specifically controlled by protein shapes. For example, the
proteins of a virus or a bacteria may have a particular shapes that
interact with human proteins or human cell membrane, and let it
infect the cell. This is obviously an oversimplified description,
but it is important to understand how important the shapes of
proteins are. Knowing these shapes lets us understand how the
proteins perform their desired function and also how diseases
prevent proteins from doing the correct things to maintain a
healthy cell and body. When your grid agent is running it is trying to fold a single
protein from the set of human proteins with no known shape. The
client will try millions of shapes and return to the central server
the best 500 shapes it can find. The goodness of each shape the
grid agent tries is determined by something referred to as the
Rosetta score. The Rosetta score examines the packing of amino
acids in the protein and produces a number, the lower the number
the better. The program that the grid agent is running is called
Rosetta. As the computers try to fold the protein chains in
different ways, they attempt to find the particular folding/shape
that is closest to how the proteins really fold in our bodies. You
can see the pictures of the partially folded proteins in the right
half of the grid agent screen. The left side shows various scores
which tells how properly folded the protein is so far, per all of
the rules. If a trial fold gets a worse score, then the computer
tries to refold it a different way which may be produce a better
score. This is done millions of times for each protein; scientists
will look at only the lowest scoring structures. But what IS a protein? Most genes code for proteins. Proteins are polymers that are
built from smaller monomers called amino acids (lets say 150 at a
time, but the length of proteins vary from gene to gene). These
strings of amino acids (with different amino acids having different
shapes and chemical properties) then fold up to make more compact
shapes that have specific function. So nature can use the same 20
amino acids, that have a common backbone but different variable
groups, to make an astronomically large variety of shapes and
functions using the same 20 amino acids, the same ribosome (the
machine that strings the amino acids together). By changing the
order and type of amino acids in proteins, living things can come
up with new functions and shapes. This process is often called
mutation. Mutations to proteins can be changes of one amino acid in
a protein, say the hemoglobin in your blood) for another or the
deletion of several amino acids from a protein http://web.mit.edu/esgbio/www/dogma/mutants.html.
Many research efforts are currently underway to allow us to
rationally engineer protein sequences to make new functions and
therapies. Most drugs carry out their functions by binding to the specific
shapes that folded proteins make in cells. Understanding protein
three-dimensional structure is one of many things we need to
understand if we are to decode the Human genome or the genome of a
given pathogen. For more info on the central dogma of modern
biology see: To see the 20 amino acids see: When faced with the question, which proteins should we fold, the
following criterion were used: choose proteins that are important
to the people that will be donating the computing cycles that will
be folded. Overall predicting the structure of every protein in an organism
with Rosetta will contribute to our overall understanding of
several proteins in that genome and how those predicted proteins
interact with the organism as a system. Can you imagine trying to
fix a car or a machine knowing the function of only 60% of the
components. That is the situation that biomedical and biological
researchers, to their credit, operate in. Thus, anything that can
shed light on these mystery proteins is of use to the field of
biology and medicine. These predictions will not be a magic bullet
but provide a resource for biologists that are working on the
genomes we fold. The first category of proteins to fold are the proteins in the
Human Genome with no known structural homologs. Human proteins are
the targets of drugs and the key to improving human health.
Improving our understanding of these proteins has innumerable
positive effects. Some Human proteins in the blood are therapeutics
in and of themselves. The second category consists of proteins found in the genomes of
pathogens. Understanding the biology of these bacteria and viruses
that have cause disease will alow us to better fight them. Many of
these proteins are the targets of drugs or have roles in virulence
that have yet to be fully understood. The last category consists of proteins that are found in the
genomes of environmental microbes. These microbes represent the
majority of molecular biodiversity on the planet and understanding
these microbes and their role in our environment will be aided by a
deeper understanding of their proteomes (the structure and function
of the proteins in their genomes). These microbes are responsible
for global carbon and nitrogen cycles, they degrade human waste
products, and can perform countless undiscovered enzymatic
biosynthesis. Proteins are large complicated molecules, so simplifying how we
represent them visually is key to protein structure research. Rosetta is a computer program for de novo protein structure
prediction, where de novo implies modeling in the absence of
detectable sequence similarity to a previously determined
three-dimensional protein structure. Rosetta uses small sequence
similarities from the Protein Data Bank [http://www.rcsb.org/pdb/] to estimate possible
conformations for local sequence segments (three and nine residue
segments). These segments are called fragments of local structure.
It then assembles these pre-computed structure fragments by
minimizing a global scoring function that favors hydrophobic burial
and packing, strand pairing, compactness and energetically
favorable residue pairings. Results from the fourth and fifth
critical assessment of structure prediction (CASP4, CASP5)
[http://predictioncenter.llnl.gov/] have shown that
Rosetta is currently one of the best methods for de novo protein
structure prediction and distant fold recognition. Using Rosetta generated structure predictions we were previously
able to recapitulate or predict many functional insights not
detectable from primary sequence. Rosetta was also recently used to
generate both fold and function predictions for Pfam protein
families that had no link to a known structure, resulting in many
high confidence fold predictions. In spite of these successes,
Rosetta has a significant error rate, as do all methods for distant
fold recognition and de novo structure prediction. We thus
calculate not just the structure but also the probability that the
predicted structure is correct using the Rosetta confidence
function. The Rosetta confidence function partially mitigates this
error rate by assessing the accuracy of predicted folds. Another unavoidable source of uncertainty, with respect to
function prediction from structure, is the error associated with
distilling function from fold matches. Sometime fold carry out more
than one function. The predictions generated by de novo structure
prediction are thus best used in combination with other sources of
putative or general functional information such as proximity in
protein association or gene regulatory networks. Thus, making the
predictions resulting from this project available to the public in
a easily accessible way is a critical final step in this
project. For a quicktime movie showing a protein (Ubiquitin) being folded
by Rosetta click here [1d3z.mov OR 1d3z.mpg ] Rosetta uses a scoring function to judge different conformations
(shapes/packings of amino acids within the protein). The simulation
consists of making moves (changing the bond angles of a bunch of
amino acids) and then scoring the new conformation. The rosetta
score is a weighted sum of component scores, where each component
score is judging a different thing. The environment score is
judging how well the hydrophobic (oily) residues are packing
together to form a core, while the pair-score is judging how
compatible touching residues are with each other one pair at a
time. Environment score: The formation of a hydrophobic core,
or the hydrophobic effect, is for most proteins the central driving
force for protein folding. The Rosetta environment score rewards
burial of hydrophobic residues in a compact hydrophobic core and
penalizes solvation of these oily groups. I’ve represented
hydrophobic residues as orange stars. The left conformation is good
(all the hydrophobics together) while the rightmost conformation is
bad (with the hydrophobic amino acids not touching). Pair-score: Two conformations of a polypeptide are shown,
one (top) where the chain is folded back on itself bringing two
cysteins together (yellow + yellow = possible disulphide bond) and
forming a salt-bridge (blue+red = opposites attract). The
conformation at bottom does not make these pairings and the
pair-score would, thus, favor the top conformation. Click
to view larger image as a pdf When we display proteins we often use different coloring schemes
to help us see the interactions taking place between the different
amino acids. We have used the following color scheme for the Human
Proteome folding project: Read more in our recent Journal Articles: The earliest papers on Roseta De Novo structure prediction
(including works by Kim Simons, Rich Bonneau, Charlie EM Strauss,
Chris Bystroff, Ingo Ruczinski, Carol Rohl, Phil Bradley, Lars
Malmstrom, Dylan Chivian, David Kim, Jens Meiler, Jens Meiler, Jack
Schonbrun, David Baker, and others) can be found at: http://bakerlab.org Review of De Novo structure prediction methods: annual-rev-bonneau.pdf One-at-a-time Rosetta server (the Robetta server); Hosted at ISB
and Los Alamos National Labs (Charlie EM Strauss) [http://robetta.bakerlab.org/] Papers describing
Robetta: [http://www3.interscience.wiley.com] Read more about the scientists at the ISB and the University of
Washington leading the Human Proteome Folding Project. For more
information on this project direct scientific inquiries to either
Richard Bonneau or proteomefolding@systemsbiology.org.
PROJECT INFORMATION
GRAPHICAL OVERVIEW OF PROJECT
CENTRAL DOGMA
WHAT IS PROTEIN
DRAWINGS PROTEINS
ROSETTA
ROSETTA SCORE
THE AMINO ACIDS
MORE INFORMATION FOR SCIENTISTS
PEOPLE
PARTICIPANTS
WORLD
COMMUNITY GRID
UNITED DEVICES
ISB HALO
GROUP
NEWS RELEASE
HUMAN PROTEOME FOLDING PROJECT
Overview
The Human Proteome Folding Project will use the computer power
of millions of computers to predict the shape of Human proteins for
which researchers currently know little. From this shape scientists
hope to learn about the function of these proteins, as the shape of
proteins is inherently related to how they function in our bodies.
This database of protein structures and putative functions will let
scientists take the next steps understanding how diseases that
involve these proteins work and ultimately how to cure them.
Graphical Overview of
Project
Click to
view larger image as a pdf






Central Dogma
Click to view
larger image as a pdf




What is a
protein?
Proteins are far from being just things we eat. They are the
molecular machines that carry out metabolism, they carry messages
that direct development and enable the immune system to tell friend
from foe, they repair damage to our DNA after we’ve spent too
much time in the sun. In short proteins are at the center of Human
biology, all biology.
http://en.wikipedia.org/wiki/Central_dogma
http://web.mit.edu/esgbio/www/dogma/dogmadir.html
http://www.emc.maricopa.edu
http://web.mit.edu/esgbio/www/lm/proteins/aa/aminoacids.html
http://web.mit.edu/esgbio/www/lm/lmdir.html
Which proteins are important?
Drawing
Protiens
Click
to view larger image as a pdf




Rosetta
Rosetta
Score


The Amino
Acids
Hydrophobic
(oily): orange
Acidic (negatively charged): red
Basic (positive charge): blue
Histidine (positive or negative): purple
Sulphur containing residues: yellow
Everything else (even though every amino acid is special):
green
Click to view
larger image as a pdf
More information for
Scientists
Application to halobaterium NRC-1: [http://genomebiology.com/]
Application to Initial annotation of Haloarcula marismortui:
[http://www.genome.org/]
Application to the annotation of Pfam domains of unknown function:
[http://www.sciencedirect.com/]
[http://arjournals.annualreviews.org]
[http://www.ncbi.nlm.nih.gov]
PEOPLE
ISB:
Dr. Richard Bonneau:
Dr. Bonneau is the technical lead on the Human Proteome Folding
project. Dr. Bonneau has expertise primarily in ab initio protein
structure prediction, protein folding, and regulatory network
inference. He is currently focused on applying structure prediction
and structural information to functional annotation and the
modeling/prediction of regulatory and physical networks. Dr.
Bonneau working to develop general methods to solve protein
structures and protein complexes with small sets of distance
constraints derived from chemical cross-linking. At the ISB Dr.
Bonneau also works on a number of systems biology data-integration
and analysis algorithms, including algorithms designed to infer
global regulatory networks from systems-biology data.
Dr. Leroy Hood
Dr. Leroy Hood is recognized as one of the world's leading
scientists in molecular biotechnology and genomics. A passionate
and dedicated researcher, he holds numerous patents and awards for
his scientific breakthroughs and prides himself on his life-long
commitment to making science accessible and understandable to the
general public, especially children. One of this foremost goals is
bringing hands-on, inquiry-based science to K-12 classrooms.
[more: http://www.systemsbiology.org
]
University of Washington:
Lars Malmstroem: larsm@u.washington.edu
Lars Malmstroem has worked to engineer the infrastructure (at the
ISB/UW end) needed to handle the vast highly interconnected
data-sets that this project will generate; he will also be heavily
involved in developing the correct data-integration schemes to best
deliver the resultant predictions to biologists.
Dr. David Baker:
Rosetta was developed initially in the laboratory of David Baker by
a team that included a large number of scientists at several
institutions. The goal of current research in his laboratory is to
develop improved models of intra and intermolecular interactions
and to apply improved models to the prediction and design of
macromolecular structures and interactions. Prediction and design
applications can be of great biological interest in their own
right, and also provide very stringent and objective tests which
drive the improvement of the model and increases in fundamental
understanding.
[more: http://depts.washington.edu/
]