Let’s imagine that you are trying to express and purify a new protein. You attached a his-tag onto your protein of interest while cloning it to aid in purification, and you express your protein in Escherichia coli. You excitedly lyse your E. coli cells and run your his-tagged protein over a Ni2+ column. When you elute your column you get … drum roll, please … nothing!!!

Unfortunately, this disheartening outcome is common for both scientists that are new to protein purification, as well as those with decades of experience. Every new protein is its own peculiar beast, and successful expression and purification frequently requires multiple iterative rounds of trial and error.

Being a diligent scientist, you check the insoluble fraction of the cell lysate, and you find gobs of your his-tagged protein there. What do you do now? One option is to reclone your protein and exchange the his-tag for a larger affinity tag that will promote the solubility of your protein of interest.

GST-, MBP-, and SUMO-tags are three affinity tags that help solubilize recombinant proteins. These solubility tags are complete protein domains, so they are much larger than short peptide tags like his-tags. They are added onto the amino terminus of the protein of interest to enhance its solubility.

Why is it so important for proteins to be soluble? Let’s draw an analogy here to gold. Who wouldn’t want to own the rights to a mine with tons of gold that can potentially be recovered?

However, pound for pound (or in the case of proteins: milligram for milligram) wouldn’t you rather have gold that has already been mined, like gold bars for example? Of course you would! Gold mining efficiency is less than 100%, and it will take a lot of time, effort, and resources to mine as much gold as you can.

Insoluble proteins are like the gold buried underground – if you can figure out how to get it out of there, you are in business. However, having soluble protein is like the gold bars that have already been mined in that it puts you several steps ahead towards having a useful resource.

It’s important to point out that refolding your protein of interest from the insoluble fraction is another approach to take when solubility is a problem. When protein refolding works, it can be a great approach because it often results in both a high yield and high purity for your protein of interest.

However, refolding insoluble proteins isn’t a viable approach for all proteins. Enzymes, for example, usually have significantly reduced enzymatic activity, if any activity at all, when they are refolded from the insoluble fraction. Since refolding can only be applied to some proteins, many scientists prefer to first try using a solubility tag to help their protein of interest express into the soluble fraction for further purification.

In this article we will discuss GST-, MBP-, and SUMO-tags, and explore what makes a solubility tag so soluble.

GST

Glutathione S-transferase (GST) is a frequently used solubility tag. GST-tags interact with glutathione, and this interaction is leveraged in affinity chromatography, and to investigate molecular interactions using GST pulldowns. GST is a popular tag for many downstream assays.

GST from the parasite Schistosoma japonicum can be expressed very highly as a soluble protein in E. coli. Furthermore, when added on the N-terminus of proteins that are insolubly expressed in E. coli, the GST-fusion increases the expression of that protein in the soluble fraction (Smith and Johnson, 1988).

While the expression properties of S. japonicum GST were first analyzed in E. coli, GST is not just a parasitic protein. Rather, GST is a widely conserved enzyme, and we humans are thought to have as many as 16 functional GST genes ourselves (Nebert and Vasiliou, 2004).

GST binds to the tripeptide ligand glutathione (Figure 1). Using this interaction, GST-tagged proteins bind to glutathione agarose beads. After a wash step, excess, free glutathione is added to elute the GST-tagged protein (Figure 2).

I’m simplifying glutathione affinity purification here for brevity’s sake, but if you’re thinking about performing this purification and want more details, check out this protocol.

GST protein, green, binds to glutathione, orange (PDB: 1EEM).

Figure 1. GST, green, binds to glutathione, orange (PDB: 1EEM).



GST pulldowns

GST-tags are also used in a variety of downstream assays. GST pulldowns are one example that examines potential interactions between biomolecules, such as different proteins.

In this in vitro protein-protein interaction assay one protein will have a GST-tag. The GST-tagged protein will be incubated with another protein, and the mixture will then be added to glutathione-conjugated agarose beads and washed. Then the glutathione beads will be eluted with excess, free glutathione, just as described for purification, above. Alternatively, the entire bead-protein mixture after the wash step(s) can be analyzed without performing an elution step.

In this assay if the proteins do not interact, then the protein without the GST tag will only come through in the loading and wash steps (Figure 3, right panels). However, if there is an interaction between these two proteins, then the nontagged protein will also be present in the elution fraction (Figure 3, left panels).

GST is a frequently used choice as a solubility affinity tag due to the robustness of glutathione agarose bead purification and the popularity of GST pulldowns. However, there are a couple of considerations to keep in mind when using a GST tag:

  • Dimerization of GST tags.
  • Compatibility of your protein of interest with reducing agents.


ffinity-tagged protein purification. Affinity-tagged proteins bind to agarose beads conjugated with interacting partner molecule

Figure 2. Affinity-tagged protein purification. Affinity-tagged proteins bind to agarose beads conjugated with interacting partner molecule (column 2). After washing, tagged-proteins are eluted by adding an elution buffer that weakens the interaction between the tag and the liganded bead (column 3). In the case of GST-tagged proteins, glutathione-conjugated beads bind GST to the column, and free glutathione in the elution buffer elutes the GST-fusion protein.



Dimerization of GST-tags

GST dimerizes in its native context, and in the context of many GST-fusion proteins (Schäfer et al, 2015). This means that a GST fusion protein may result in dimerization, or other higher order oligomerizations, of the fusion protein even if the protein of interest does not dimerize or oligomerize in its native context without GST.

Why would the GST-enforced oligomerization of a protein even matter?

To illustrate this point, let’s consider the human oncogenic fusion protein BCR-ABL. BCR-ABL contains a tetramerization domain, and forming a tetramer drives higher activity of this kinase which is important for its role as an oncogene in chronic myelogenous leukemia.

 Illustrative example of GST pulldown results.

Figure 3. Illustrative example of GST pulldown results. Left, a protein (orange) that interacts with the GST-fusion protein (green) will coelute with GST. Right, a protein (purple) that does not interact with GST-fusion protein will come through in the wash step.


When GST is added onto BCR-ABL, the dimerization of GST molecules drives BCR-ABL activity even higher. In these experiments, it was important to understand that GST itself was causing this hyperactivity, rather than something intrinsic to BCR-ABL (Maru et al, 1996). This is an example of how the research tools used to study a protein can sometimes introduce artifacts, and careful control experiments should be designed and executed to rule out such errors.

As the BCR-ABL example illustrates, for applications where the oligomerization state of proteins is crucial, another soluble affinity tag, such as those discussed below, may be more appropriate.



Protein compatibility with reducing agents

Glutathione, used to elute GST-tagged proteins, is a reducing agent. It’s a relatively weak reducing agent compared to others frequently used for protein purification buffers such as bME, DTT, and TCEP (Lee et al., 2012). Yet, glutathione is added to the elution buffer in relatively high concentrations (~10 mM).

For proteins of interest with a structural disulfide bond, this amount of glutathione may be destabilizing. If your GST-fusion protein appears to be unstable, or denatured, in the elution buffer, you may want to check if glutathione is the source of this instability. If so, it may be worth considering one of the other soluble affinity tags discussed below to avoid denaturing your protein of interest.


MBP

Maltose-binding protein (MBP) enhances the solubility of fusion proteins. MBP is also used as an affinity tag by binding to amylose-conjugated agarose resin. Excess free maltose is added to elute MBP-fusion proteins from the agarose beads.

MBP is a bacterial protein that binds to maltose, a disaccharide consisting of two glucose monomers (Figure 4).

For MBP to function as a solubility tag, it’s added to the N-terminus of the protein of interest. Slightly different versions of MBP result in fusion proteins being expressed either in the periplasm or the cytoplasm of E. coli. So, make sure you know which version you are working with so you know where to find your protein!

If you want to get rid of the MBP tag after expression or affinity purification, then you will need to include a protease cleavage site between the MBP-tag and the protein of interest.

For some applications, however, you will want to retain the MBP-tag. One such application is when doing MBP-mediated crystallization.

X-ray crystallography is one way to learn the structure of a protein. Yet, not all proteins readily form crystals, which is the first step in this process. MBP is good at forming crystals, and can actually coax other proteins to form crystals when they are fused to MBP (Waugh, 2016).

Compared to GST, MBP is larger so it may be a less suitable tag for a larger protein of interest (see more below). However, MBP does not dimerize, and the eluting molecule is not a reducing agent. On the other hand, GST is more widely used in downstream applications compared to MBP.

When deciding which solubility tag to use, it will be worth thinking carefully about the characteristics of the protein you are purifying, and which downstream applications you have in mind.

See Table 1 for further comparison between the solubility tags discussed in this article.



SUMO

Small ubiquitin-like modifier (SUMO) is an affinity tag that enhances the solubility of proteins. Compared to GST and MBP, SUMO is relatively small (~12 kDa) and has a short and flexible recognition sequence allowing scarless cleavage for most fusion proteins.

SUMO is a posttranslational modification that is covalently added to lysine residues on other proteins (Figure 5). Hundreds of proteins in distinct biological areas are sumoylated, and sumoylation can impact their stability, activity, and cellular localization.

 Interaction between MBP, violet, and maltose, blue (PDB: 1ANF).

Figure 4. Interaction between MBP, violet, and maltose, blue (PDB: 1ANF).


SUMO is also used to enhance the soluble expression of proteins in E. coli. In this case, instead of being posttranslationally added to a lysine residue, SUMO is included upstream (N-terminal) of a protein of interest in the coding sequence to form a fusion protein (Figure 5).

In addition to increasing solubility, SUMO can be used as an affinity tag to purify the protein of interest. An antibody mimetic scaffold that recognizes SUMO is conjugated to agarose beads and binds SUMO-fusion proteins out of a complex biological mixture. After washing, the SUMO-fusion proteins are then eluted with three molar imidazole, pH 8 (Suderman et al, 2023).

There are a couple of readily apparent advantages of SUMO compared to other solubility tags, such as GST and MBP:

  • SUMO is relatively small (12 kDa) compared to GST (26 kDa) and MBP (45 kDa).
  • SUMO has a simple recognition sequence, allowing protease cleavage that usually leaves the N-terminus of the protein of interest without any additional amino acids.


 sumoylation is a posttranslational modification where SUMO (pink) is added to a lysine residue of a cellular protein (green). Right, SUMO can be used as a solubility tag by adding it upstream of a protein of interest.


Figure 5. Left, sumoylation is a posttranslational modification where SUMO (pink) is added to a lysine residue of a cellular protein (green). Right, SUMO can be used as a solubility tag by adding it upstream of a protein of interest.



SUMO is a small solubility tag

Not only are we humans bigger than E.coli, our proteins are bigger too! The average E. coli protein is 317 amino acids while in humans it is a whopping 510 residues (Francis and Page, 2010).

Longer proteins tend to be less amenable to expression in E. coli. But if you’re working with a large protein of interest that isn’t expressed into the soluble fraction in E. coli, it is likely still worth making the protein bigger by adding a solubility tag.

By adding SUMO (12 kDa), you will make the fusion protein smaller compared to using GST (26 kDa) or MBP (45 kDa) instead.

When working with a smaller protein, roughly 20 kDa and less, you can likely add any of these tags and get soluble expression in E. coli (Dyson et al, 2004). For larger proteins, you will want to carefully consider which solubility tag to use.



Scarless cleavage of SUMO tags

SUMO posttranslational modifications are removed from proteins using SUMO proteases that cut them off of the target protein. These proteases can also be used to cleave SUMO-fusion proteins.

Most proteases recognize a primary amino acid sequence and cut within that recognition sequence. However, SUMO proteases work a little differently – they bind to and recognize the SUMO tertiary structure and cut C-terminally of that structure (Figure 6).

UMO, magenta, cleavage by the SUMO protease SENP2, gray

Figure 6. SUMO, magenta, cleavage by the SUMO protease SENP2, gray (PDB: 1TGZ). The insertion of SUMO into SENP2 is the cleavage site.


This structure-based recognition, rather than sequence-based recognition, allows broader flexibility in the protein sequence between SUMO and the protein of interest. The residue immediately upstream cannot be a proline because these kinked amino acids don’t fit well into the SUMO protease catalytic site (Figure 6).

However, other than proline, most residues between SUMO and the protein of interest are compatible with cleavage. Researchers will frequently put two glycine residues upstream of the protein of interest to promote maximal flexibility in the linker sequence and maximal cleavage. The protease cleaves after the second glycine and leaves no residual amino acids from the SUMO tag on the protein of interest. This is called a “scarless” cleavage.


General limitations of solubility tags

In each of the above sections, we have discussed the advantages and disadvantages of individual solubility tags. Briefly let’s discuss the limitations of using solubility tags in general.

Hopefully it is very clear at this point that the advantages of solubility tags are that they enable higher expression of proteins in E. coli in the soluble fraction. Furthermore, they provide a convenient epitope for use in downstream applications – such as GST for GST pulldowns, for example.

However, solubility tags are full protein domains, and as such are substantially larger than small peptide affinity tags. They also need to be placed at the N-terminus of the protein of interest to increase solubility. Based on their increased size and restricted placement, there is an increased possibility that solubility tags will influence the function of the fusion protein.


When possible, protein function should be tested and compared with and without the solubility tag.

Table 1. Comparing solubility tags

Tag

Advantages

Disadvantages

GST

-Widely used in downstream

applications

- Dimerizes

- Medium tag, may not be suitable for

large proteins of interest




MBP

- Monomer

- Large tag, may not be suitable for large proteins of interest




SUMO

- Monomer

- Relatively small – good choice for a

large protein of interest

- Structure-based protease recognition

enables scarless cleavage




For some downstream applications, it will be desirable to cut off the solubility tag. At minimum, this adds additional steps into your purification scheme to separate your protein of interest from the solubility tag.

Incomplete cleavage can also be an issue with solubility tag fusion proteins, though additional purification steps (affinity and size exclusion purification) can often be used to separate the cut and uncut species. You may also need to troubleshoot buffer conditions that keep the protein of interest soluble after it has been cleaved from the solubility tag.

Despite these limitations, solubility tags are still a great way to express soluble proteins in E. coli.

To recap – solubility tags such as GST, MBP, and SUMO are frequently used to help express soluble proteins in E. coli, for affinity purification, and for downstream detection and experimental applications.