Introduction to Peptides and Proteins for Bioanalysis Using LC-MS
Articles Blog

Introduction to Peptides and Proteins for Bioanalysis Using LC-MS

October 8, 2019


Hello and welcome
to the next module in the Waters Peptide and
Protein Bioanalysis Boot Camp. My name is Khalid Khan, and
I’m part of Health Sciences’ marketing team here at Waters. Today, I will be presenting on
peptide and protein structure. So let’s get started. Here are a number of
workflows for large molecule biotherapeutic and protein
biomolecule analysis. Today, LC-MS is increasingly
used for protein quantification as an alternative to traditional
ligand binding assays. Proteins can be
analyzed by LC-MS, either using intact protein or
surrogate peptide workflows. Both tandem quadrupole and high
resolution mass spectrometers can be used. Normal flow and
microflow LC systems are also commonly used with
both of these mass spectrometer systems. Most of this module will
focus on the surrogate peptide workflow using tandem
mass spectrometers, and understanding your
peptides and protein structure is important when developing
both intact and surrogate peptide workflows. The areas covered
in this presentation will be the basic
structure of amino acids, peptides, and
proteins, including a few specific examples, such
as monoclonal antibodies. The basic structure of
peptides and proteins has an impact on both the sample
treatment and LC-MS method development. The ionization and
fragmentation of peptides and how these aspects
differ from small molecules will also be covered. The presentation
is mainly intended for scientists who already
have some experience in small molecule LC-MS
method development. Their aim is to provide
an introduction to peptide and protein
structure and explain the commonly used terms in
peptide protein LC-MS method development. The presentation
will also prepare you for subsequent modules in the
Waters Peptide and Protein Bioanalysis Boot Camp. In this first section,
let’s look at the structure of peptides and proteins. Peptides and proteins are chains
of amino acids joined together. There is no agreed
criteria that specifies the length of an amino acid
chain that defines whether it is called a peptide or protein. One common definition is that
if the amino acid chain consists of less than 50
amino acids, it is called a peptide, and
more than 50 amino acids, it is called a protein. This definition is
not absolute, and you can have large peptides
and small proteins of similar amino
acid chain lengths. All of human proteins
are formed from just 20 naturally occurring
amino acids, or 21, if you include selenocysteine. In terms of molecular
weight, peptides are typically less
than 6000 daltons, whereas proteins can
be anywhere from 5800 daltons for a small
protein such as insulin or several hundred thousand
daltons for large proteins such thyroglobulin. This slide illustrates
the mechanism of how two amino
acids join together to form a peptide bond. The carboxyl group
of one amino acid reacts with the amine
group of another amino acid to form a peptide bond. The resultant peptide will have
a carboxyl group on one end, and this is referred to
as the C-terminal end. The amine group is referred
to as the N-terminal end. As we will see later,
these peptide bonds fragment in a highly predictable
manner in a mass spectrometer collision cell. Amino acids and peptides
can exist as zwitterions. This means that they can have
both negative and positive charges, depending on the pH. This is an important factor
when developing sample clean up methods at the peptide level. This will be discussed in
more detail in later modules. The chain of amino acids that
form the backbone of a peptide or protein is referred to
as its primary structure. Amino acids are usually
represented by a single letter or three letter abbreviation. Here is the table of
the 21 amino acids from which human peptides
and proteins are formed. Some single letters are
obvious, for example, G for glycine and A for alanine. Others are less obvious,
such as K for lysine and R for arginine. As we will see later
in this presentation, lysine and arginine
are very important when we discuss the breakdown
of large proteins into smaller peptides using
specific enzyme digestion. This slide illustrates the
wide variety of structures and resultant chemical
properties of amino acids. The chemical structure
of the amino acids influences the polarity,
hydrophobicity, and acidic/basic nature
of the resultant peptides and proteins. Note that cysteine contains
a sulfur atom, which means that two cysteine
amino acids can form disulfide bonds between them. These disulfide bonds can form
in the same peptide chains or connect two different
peptide chains. I stated earlier that the
diverse properties of peptides and proteins have a large impact
on the sample pretreatment and LC-MS method development. Note that some amino acids have
a second amine group, which means that they have
multiple sites that can be protonated to
form multiply charged, positive ions. As the structures of all
amino acids are well known, it is possible to calculate
the mass of a peptide from its amino
acid constituents. Don’t worry you will not have
to calculate these manually. Software tools are available to
do this automatically for you. Software tools, such as
Skyline, will automatically calculate the molecular
weight of a peptide from its amino acid sequence. For example, the
peptide D-E-V-I-L, which consists of aspartic
acid, glutamic acid, valine, isoleucine, and leucine,
will have a mass of 587.31662 daltons. Note that the table above
lists the monoisotopic mass and average mass. The monoisotopic mass is
the mass where only the most abundant isotopes are
used in the calculation, i.e., carbon-12,
hydrogen-1, oxygen-16. The average mass has
all the minor isotopes also included in the
calculation, i.e., carbon-13, deuterium, and nitrogen-15. Proteins can exist in
different forms and structures. So far, we have only
discussed the basic amino acid sequence, which is referred
to as the primary structure. Amino acids can form
hydrogen bond interactions between each other, which
influences the shape of a peptide chain or protein. The most common structures are a
pleated sheet and half a helix. Bonds and interaction
between alpha helices and pleated sheets result
in tertiary structures. Sulfa bonds between
cysteine amino acids and the peptide chains are
common in tertiary structures. Finally, when more than one
different type of peptide chain is involved, quaternary
structure is produced. This slide illustrates
the primary structure of insulin, which includes
two amino acid chains joined together, the insulin A chain
and the insulin B chain. The diagram on this
slide also shows a diagram of the tertiary
structure of insulin. Here is an example of a
peptide drug, desmopressin. This is a relatively
small peptide comprised of nine amino acids. LC-MS development of a
peptide of this length can be treated in the same
way as a small molecule LC-MS method. The peptide can be
analyzed directly by LC-MS and standards that
are available for MRM method development. One difference from a small
molecule ESI mass spectrum is the presence of a
doubly charged positive ion in addition to the
singly charged ion. This is a key feature
of peptide ionization that will be discussed
later in this presentation and other modules. Note the doubly
charged ion at 535.22 and the singly charged
ion at 1069.435. An example of a small
protein is insulin, which consists of 51 amino acids. The A chain has 21 amino
acids, and the B chain is 30 amino acids. The monoisotopic
mass of insulin is 5023.6377, which is
outside the range of tandem quadrupole mass
spectrometer systems which typically have a maximum upper
range of below 2000 daltons. However, as insulin forms
multiply charged ions with three, four,
and five charges, it can be analyzed using
tandem mass spectrometers. In this example, the five plus
ion is shown at mass 1162. Insulin also forms three
plus and four plus ions. Note again the disulfide bonds
connecting the two amino acid chains between two
existing amino acids. These are very common
protein structures. Here are some examples
of larger proteins, ranging from insulin
like growth factor IGF-1 with the molecular
weight of 7649 to thyroglobulin, which has a
molecular weight over 660,000 daltons. The slide also shows
medium sized proteins, such as CRP and
apolipoprotein A1, which have molecular weight in
the mid 20,000 dalton range. We can see that as the
size of the proteins increase, the challenge of
measuring the intact protein gets more difficult
and is virtually impossible using limited range
tandem mass spectrometers. However, we can break down large
proteins into smaller peptide units and analyze these
peptides using tandem mass spectrometers. This approach is called a
surrogate peptide approach and is widely used in protein
bioanalysis and protein biomarker research. Antibodies are a specific
class of proteins with a common structure. They are large Y-shaped proteins
with two heavy chains and two light chains. The heavy chains are linked to
each other by disulfide bonds. Sulfa bonds also link the light
chains with the heavy chains. Human immunoglobulins
and antibody produce white plasma
cells to fight infections. The heavy chains contains
approximately 440 amino acids, and the light chains
contain 220 amino acids. Monoclonal antibody drugs now
form a very important class of therapeutics and need to be
measured in biomedical studies and clinical research studies. One of the most widely used
monoclonal antibody drugs today is infliximab,
which is used to treat autoimmune conditions
such as Crohn’s disease. Infliximab binds to
TNF alpha and has a molecular weight of
approximately 150,000 daltons. Infliximab is known as
a chimeric antibody. Infliximab binds to TNF alpha. And infliximab is a
chimeric antibody. So how do we analyze
large proteins of several thousand daltons
using tandem quadrupole mass spectrometers which usually have
a limited mass range of less than 2000 daltons. The approach used is to
break down the proteins into smaller peptides using
digestion with enzymes. A number of different
enzymes are used. The most commonly used
enzyme is trypsin, which cleaves proteins in
very specific locations. Trypsin cleaves proteins
adjacent to lysine and arginine. Cleavage is always on
the c-terminal side of the amino acid. This means that peptides arising
from trypsin digestion, which are called triptych
peptides, can be predicted from the amino
acid sequence of the protein. Online software tools
are available to predict triptych peptides. These online tools also
predict the fragmentation of those peptides in
a mass spectrometer. This is the basis of the
surrogate peptide approach, where a peptide or
peptides are quantified as a surrogate for the
proteins from which the peptides were derived from. In some cases, proteins cannot
be digested directly by enzymes such as trypsin and
require pretreatment prior to digestion. One example of this is
treatment of disulfide bonds, which are reduced and
alkylated prior to digestion. If the amino acid sequence
of the triptych peptide is unique to the protein from
which it was derived from, it is called the
signature peptide. The use of signature peptides
means that the method is more selective and specific. Triptych peptides should contain
between 8 and 20 amino acids. In addition, a triptych
peptide should not contain amino acids that can
be easily chemically modified, such as cysteine and methionine. The selection of
triptych peptides will be discussed in more
detail in other modules in this series. Now that we’ve covered the
basic structure of peptides and proteins, and
we’ve discussed how peptides can be
produced from proteins using enzyme digestion, let’s look
at how peptides fragment in a mass spectrometer. This slide highlights
some of the differences between LC-MS of small
molecules and LC-MS of proteins and peptides. One difference,
which has already been discussed in
earlier slides, is that peptides form
multiply charged ions. Doubly, triply, and even
high charge peptide ions are very common. This is very different
to small molecule LC-MS where usually the precursor
ion is singly charged. Peptide fragments generated in
a mass spectrometer collision cell will have fewer charges
then the precursor ions. This means that
peptide fragments that have fewer charges will
appear at a higher mass to charge ratio than
the precursor ions. This is very
different to what you would see in the small
molecule fragmentation where the product ion is always
at a lower mass to charge ratio than the precursor ion. Also, as we’ve seen
before, peptides fragment in a highly predictable manner
along the amino acid chain. Peptides can fragment at a
number of predictable locations in the peptide chain. The nomenclature that
result in fragment ions depend on which bond
has been broken. When fragmentation occurs
at the peptide bond, the C-terminal
fragments is called the y ion and the N-terminal
fragment is called the b ion. Y and b ions are
the most important for quantification
using mass spectrometry. For triptych peptides,
the y ion will always have a lysine or arginine amino
acid at the C-terminal end. Fragmentation can also occur
adjacent to the peptide bond, leading to other ions which
are called z, c, a, and x ions. As we’ve already
discussed, peptides can produce a number of
predictable fragment ions. The selections that we’re trying
to use in an MRM experiment need to be carefully considered. In this example,
fragmentation of the ion at 523.2808 results in a
number of fragment ions shown in the lower
half of the slide. Which ones would be the best
to use in an MRM method? There are a number of potential
fragment ions we could use. There’s the most
intense ion at 239– other ions at 341,
523, 873, 1045. Let’s evaluate these ions now. The ions shaded in
red, although intense, may not be a good choice
as these are all low masses and could be prone
to interference from other peptides. The ion at 1045 is the
singly charged ion from 5232, so it would not be utilized. The y ion shown in the green
shaded area at 873, 944, 802, and 674 are all
potentially usable as they are of sufficient
intensity and size. This slide again
highlights another feature of peptide fragmentation in
a mass spectrometer, which is doubly charged
ions fragmented to singly charged ions,
therefore resulting in a product ion
at a higher mass to charge ratio than
the precursor ion mass. So we may not have
access to standards of all the potential
triptych peptides we want to develop MRM methods for. However, there
are software tools such as Skyline which
can predict fragmentation of triptych peptides. Tools such as
Skyline’s prediction suggest fragment ions that
can be used in LC-MS method development. These ions can be evaluated
later by experiment. This is very important, as
it means that you do not need to have access to standards
of the triptych peptide for initial method development. So let’s summarize
what we learned about peptide ionization
or fragmentation. Peptides form
multiply charged ions, which is very different to
traditional small molecule analysis. Peptides fragment in a
highly predictable manner in the mass spectrometer,
and these fragments can be predicted
using software tools. The software tools
also recommend which MRM transition to use. The resultant fragment ions,
which are often y ions, have a higher mass to charge
than the precursor mass to charge. The MRM transitions
that are finally used are selected based on
specificity and intensity of the fragment ions. So let’s summarize some of the
key points of this introduction to peptides and
protein structure. Peptides and proteins
are made of amino acids and can form a variety
of complex structures. Small proteins and peptides
can be analyzed directly, i.e., intact by tandem
quadrupole LC-MS systems. Larger proteins usually require
digestion to smaller peptides for quantification by tandem
quadrupole LC-MS systems. Enzymatic cleavage
sites are predictable, and software tools
are available that can predict triptych peptides. The structure of
peptides and protein impacts all stages of
the bioanalysis workflow. This slide shows the workflow
for the surrogate workflow approach, where a protein
is enzymatically digested by trypsin to produce
signature or unique peptides. The process starts
with selecting unique peptides which
represent the protein we are trying to measure. These unique peptides are
predicted by software tools. The best MRM transitions are
then selected and optimized. We then go through the
process of optimizing some for preparation, which may
involve clean-up at the protein level, reduction of colation,
digestion, and peptide level clean-up. The MRM transitions,
may then need to be fine tuned using peptides generated
in a biological matrix. The structure of peptides and
protein is an important factor and needs to be considered
in all of the above steps. This presentation was
designed to introduce peptide and protein structure and
how the structure of peptide and protein influences
LC-MS method development. Further information is available
on a variety of web based resources, including these. Thank you for listening.

Leave a Reply

Your email address will not be published. Required fields are marked *