3 min read

Protein Expression Optimization Process

Featured Image

Scientists in charge of protein expression optimization projects all too often fly by the seat of their pants. Unfortunately, an ad-hoc approach to protein expression optimization is unlikely to achieve the company objectives.

The process tests one plasmid at a time based on experience and intuition. A solubility tag is added here, codon usage is optimized,  or a different promoter is tested. The problem with this approach is that it is often more guided by cloning strategies than by the nature of the data it generates. It is unlikely to lead to data suitable for analysis.

Generally, successful optimization strategies require the careful balancing of different control parameters (transcription, translation, plasmid copy numbers) that often interfere with each other. Optimizing protein expression cannot be achieved by maximizing every possible control parameter.

Data-Driven Protein Expression Optimization

Today, it is possible to use a structured approach that relies on rigorous experimental design and extensive use of gene synthesis/DNA assembly.

The process starts with a formalization of the optimization objective. Are we trying to maximize the expression of one protein, reduce the cost of a biomanufacturing process, or maximize the expression of a complex? It is impossible to be sure you are making progress forward if you don't know the destination.

In a second step, vector design options available to the project are formalized. At GenoFAB, we help teams think outside the box by suggesting strategies that may not have considered and to formalize these strategies in GenoCAD. Upon completion of this phase, we use GenoCAD combinatorial design tool to estimate the size the genetic design space available to the project. At this point, the number of possible expression vectors largely exceeds experimental capabilities.

It is common for people in this situation to worry about the cost of gene synthesis or the labor involved in assembling large numbers of expression vectors. In reality, however, collecting expression data is generally the limiting step. We help our clients design characterization strategies that can be performed quickly and inexpensively on a large number of expression vectors.

At this point, it becomes possible to design an entire experimental workflow that includes the assembly of a large number of plasmids and the collection of expression data. This leads to a dataset that can be used to build a mathematical model of how the different design parameters contribute to the vector performance.

This model is then used to predict the performance of vectors that were not included in the original dataset. Vectors predicted to have superior performance will then be assembled and characterized. A few iterations of this product development cycle may be necessary to achieve the target performance but at every stage, rationally designed datasets will be collected and analyzed to rationally predict the optimal design.

Is it worth it?

This approach takes time and costs money. Depending on the project it could take 6 to 12 months. This development effort may require a $120K investment. It is fair to wonder if this is worth it? The ad-hoc approach may seem faster and cheaper in the short run. It may not lead to the optimal vector but it could lead to a vector that's good enough. In this context, how would you justify the extra cost of a systematic approach?

The truth is that the ad-hoc approach may lead to a vector that's good enough in the same way that buying a lottery ticket may lead to a win. It may, but nothing is less certain. This type of uncertainty would be unacceptable in many aspects of managing a business. Chance should not be part of the determining factor of the success of a research project. A large dataset will support quantitative predictions in a way that the anecdotal results collected one plasmid at a time cannot.

In reality, the ad-hoc approach is likely to be slower in the long run - and time is money. If you have to test 24 plasmids, it's cheaper to test them all at once rather than spreading the effort over six months. The experiments will be cheaper and the results will have a greater impact on the company.

A vector that may seem like it is good enough when no longer be good enough when moving into manufacturing. A high-performing vector will impact the profitability of manufacturing processes. It can reduce process development time, campaign time, or reactor volumes, etc. It's about as important as the engine is to a car. It can mean the difference between winning and losing a bid, between profit and loss on a contract.