8 min read
Plasmid Certification Using GenoDIN
Jan 10, 2024 11:05:41 AM
By:
GenoFAB
Jan 10, 2024 11:05:41 AM
Track your plasmids and their documentation, control their dissemination, and ensure your plasmids' integrity and authenticity with GenoDIN. Schedule a call to learn more.
You need to certify your plasmids.
DNA Plasmids are the molecular workhorse of modern biotechnology. Plasmids are circular DNA molecules capable of replicating independently of the bacterial chromosome. In basic biological research, the development of new plasmids is often the first step of experiments aiming to understand structure-function relationships from genes to genomes, deciphering developmental processes, producing research reagents, and supporting the design, build, test, and learn (DBTL) cycle in synthetic biology. Plasmids are also an integral component of clinical applications, including gene therapy, vaccine development, and the production of recombinant drugs. Today, most plasmids are obtained through gene synthesis.
Despite this central importance, there is no standardized framework in biomedical research for associating plasmid documentation with physical plasmids found in DNA solutions or bacterial strains. Canonically, the two common links between a physical plasmid and its digital documentation are the plasmid sequence itself and the plasmid name. Plasmid names are often used to associate a physical plasmid and its documentation based on the assumption that a computer file name pXYZ.gb and a tube labeled pXYZ refer to each other. This assumption is naïve because the file name and contents can easily be edited without leaving a traceable log of changes. Similarly, a label provides no guarantee that the content of the tube matches the stated molecular/sample identity. Simple pipetting or labeling errors can create discrepancies. Even without user error, spontaneous or undocumented mutations could result in a content-label mismatch. Additionally, authors of engineered DNA molecules who wish to maintain and share detailed documentation with their users lack the necessary technology. This environment negatively impacts plasmid developers and users.
The challenges in uniquely matching a given plasmid to its author and documentation are compounded by the increasing sophistication of modern plasmids. Modern plasmids are designed in silico using bioinformatics algorithms to generate synthetic sequences and by combining DNA sequences from multiple origins. This level of sophistication makes it next to impossible to evaluate whether a DNA plasmid in hand contains unintended mutations or will function as intended. Many of these plasmids are now parts of large libraries where the sequences of different plasmids may differ by only a few bases. This makes it difficult to weigh the importance of minor sequence differences without a description of the library structure; likewise, any discrepancies between the nature of physical DNA plasmids and their supposed reference sequences can directly result in reproducibility issues, slowdowns in R&D efforts and raises significant security and safety issues for biotechnology applications.
Attempts to link plasmid documentation to physical plasmids have largely fallen short. For example, specialized plasmid repositories like Addgene, do not have systems for accurately linking plasmids and their documentation. Within Addgene, documentation for each plasmid includes a link to the depositor and potentially a link to the publication describing the plasmid. Many include downloadable annotated DNA sequences, a plasmid map, and additional practical information. However, the documentation is often incomplete or inconsistent. For example, the description of pUC19 includes two plasmid maps: one generated by Addgene and one generated by the depositor, along with a hand-drawn plasmid map from the original publication. The entry also includes four sequences: one contributed by the depositor, one produced by Illumina sequencing, and two Sanger sequencing reads. Comparing these four sequences uncovers significant differences that the Addgene documentation does not disclose or attempt to resolve. This example demonstrates that even though Addgene and organizations like them have invested considerable resources in providing a superior service with respect to plasmid storage and documentation, their approach fails to provide accurate information for even the most commonly used plasmids.
Quality control and intellectual property concerns. Exchanges of physical plasmids in the biotechnology supply chain are governed by licensing and Material Transfer Agreements (MTAs). These agreements are notoriously difficult to enforce because the plasmid provider has limited options to limit the plasmid receiver's ability to reuse or redistribute the plasmid under a different name or to determine whether such a breach has occurred. Similarly, receivers of plasmids struggle to understand their rights and may unintentionally infringe on existing agreement potentially exposing their employer to costly lawsuits. Additionally, the plasmid may be distributed to a collaborator or contractor. In this scenario, both the recipient and the provider have no way of knowing whether the correct plasmid was used to generate the resulting data because of the loose connection between the physical plasmid, its reference sequence, and other documentation. Together, the inability to associate DNA molecules with electronic records leads to reproducibility issues, difficulties licensing or building on intellectual property, and can create security and safety issues with biological products derived from plasmids.
The Solution: GenoFAB, Inc. is developing a DNA-encoded cryptographic certification system to embed a digital certificate within the plasmid – linking the macromolecule itself with its author, identity, and documentation for easy and secure resource sharing.
Prior Research: In 2008, Dr. Jean Peccoud published a detailed analysis of the content of the iGEM Registry of Biological Parts that quantified many of the problems regarding plasmid identity within the scientific community. In 2015, he and his lab performed a systematic analysis of 21,594 annotations in a library of 1,901 commonly used plasmids. In 2018, his group introduced the concept of cyberbiosecurity to refer to threats emerging at the interface between computers and experimental biology. This resulted from a year-long security assessment of a biomanufacturing facility sponsored by the Department of Defense. One of the vulnerabilities identified as part of this project was the possible discrepancy between the nature of the samples kept in laboratory freezers and their descriptions in the laboratory inventory management systems. There is widespread evidence that life scientists often work with biological material that is different from what they assume it to be. It is well known that the life science community is struggling with cell line authentication. We uncovered similar issues with plasmids, yeast strains, and plasmid annotations. The disconnect between physical samples and their description causes reproducibility issues that undermine the success of research projects.
How it works.
The DIN is a short software-generated DNA sequence inserted into a plasmid without disrupting the plasmid biological function. Conceptually, the DIN is analogous to a car’s Vehicle Identification Number (VIN) which provides information about the vehicle manufacturer, the plant where the vehicle was assembled, the model year, the serial number, and other information while retaining a security code that protects against transcription errors and forgery. Similarly, the DIN is composed of multiple data blocks encoding the plasmid origin and plasmid unique identifier. In addition, the DIN includes two security features: a digital signature and an Error-Correcting Code (ECC).
The digital signature is used to ensure the plasmid integrity and authenticity. Integrity refers to the fact that the plasmid has not changed since it was certified by the author. The plasmid authenticity confirms that the plasmid came from its stated author. The signature is derived from the entire plasmid sequence and encrypted using the message author’s secret key. The plasmid user can verify the signature using the author’s public key which may be a publicly available identifier like the ORCID number. The ECC is used to provide users with the means to understand why a signature might be invalidated (e.g., by a spontaneous mutation or a sequencing error). The ECC provides some level of redundancy in the plasmid sequence. It makes it possible to retrieve the original plasmid sequence even in the presence of a few mutations and without any other information than the mutated plasmid sequence. The ECC allows the user to verify a plasmid DIN even in the presence of mutations.
The GenoDIN technology will be the foundation of our DNA Certification Application that will provide plasmid authors and end users with an easy-to-use system for certifying and validating plasmids. Plasmid authors can use the application by simply uploading their plasmid’s sequence to generate the sequence of their certified plasmid. They will be able to order the certified plasmid from GenoFAB or make it themselves. They will also be able to manage their plasmid documentation. Conversely, plasmid users can sequence a plasmid upon receipt, upload the sequencing reads to the application, and receive the assembled plasmid sequence, a report indicating the possible presence of mutation, the author and plasmid identities, and the plasmid documentation. This system will secure the exchange of plasmids among investigators working in academic and industrial settings.
The GenoDIN benefits both developers and users of plasmids by providing key advantages:
- Plasmid identity: All competing solutions rely on tube labels to identify plasmids and they cannot insert serial numbers in the plasmids. The DIN solution encodes the plasmid unique identifier in the sequence of the DNA molecule itself. Multiple serial numbers can be associated with the same plasmid to track plasmids sent to individual investigators.
- Plasmid origin: All competing solutions establish the provenance of plasmid by printing the vendor information on the label. The plasmid origin is lost as soon as the plasmid changes container. The DIN technology encodes the identity of the plasmid developer in the DNA molecule itself. In addition, it provides users with means to verify that the plasmid and the plasmid author’s identity have not been tampered with.
- Plasmid documentation: GenoDIN technology makes it possible to associate a broad range of documents with the plasmids, provide different sets of documents to different users, and verify the integrity and authenticity of the documents.
- Plasmid integrity: To verify the integrity of plasmids procured from existing channels, users need to know the plasmid identity. This is often attempted by retrieving the plasmid sequence from the source (assuming that it is available) and comparing this published sequence with the sequence produced by assembling the sequencing reads. The GenoDIN makes it possible to verify the presence of mutations in a plasmid sequence without knowing the plasmid identity or its theoretical sequence.
- Plasmid licensing: GenoDIN technology provides an alternative to MTAs by making it possible to link the identity of a plasmid and the identity of a user to determine if the user is entitled to access the plasmid and its documentation.
- Credits and support: Competing solutions provide no means to keep track of plasmid users. This makes it difficult to provide post-sales support or to give plasmid authors credits for the use of their plasmids. GenoDIN provides values to plasmids users by allowing them to access documentation or verify their plasmids. These services are available irrespective of how the users procured the plasmids. Plasmid users are likely to verify plasmids several times at different points of their R&D or manufacturing workflows. These interactions provide avenues to cultivate post-sales relationships between authors and users including technical support, update to the plasmid documentations, citations, or royalty calculations.
Learn more about GenoDIN
Schedule a call to learn more about the technology, discuss how it can best meet your organization needs, and evaluate the possibility of doing a pilot project with your plasmids.
Recent Posts
Selecting the right expression vector
Expression vectors are plasmids used to express a gene of interest in a cellular host. Selecting...
Using high-throughput processes to optimize protein expression
Scientists in charge of protein expression optimization projects all too often fly by the seat of...
Expression Vector Optimization for Protein Production
Most recombinant proteins can be produced by simply cloning the corresponding genes into standard...