Skip to main content

DNA fragment clustering and grouping tool

Project description

The Coli Toolkit (CTK): An extension of the modular Yeast Toolkit for use in E. coli

This python package contains the code responsible for clustering small DNA fragments in preparation for de novo synthesis. The project is also described in the paper: The Coli Toolkit (CTK): An extension of the modular Yeast Toolkit to the E. coli chassis by **Jacob Mejlsted, Erik Kubaczka, Sebastian Wirth, and Heinz Koeppl.

Install

pip install DNA-fragment-clustering

Python Usage

from DNA_fragment_clustering import DNA_clustering, DNA_typer

DNA_clustering("input.csv", aggressive = False)

If aggressive = True is used, the algorithm will combine single sequences to achieve a higher level of compression, but this may saccrifice synthesizeability due to sequence similarity.

The DNA typer function only uses the path of a .csv file as input:

DNA_typer("input.csv")

It is also possible to use the two functions together:

DNA_clustering(DNA_typer("input.csv"), aggressive = False)

Clustering of de novo DNA fragments

The Python function DNA_clustering performs clustering and grouping of de novo DNA fragments meant for synthesis. From the methods:

The clustering software uses the Levenshtein similarity matrix to compute the differences between the various fragments that the user wants to synthesize. Using affinity propagation, the software defines clusters with high sequence similarity. From this, groups are made of up to three sequences from distinct clusters to obtain low sequence similarity in the final DNA sequence sent for synthesis. If the aggressive clustering option is selected, groups only containing one sequence are concatenated together to minimize the amount of DNA needed to be synthetized. Following the grouping, the DNA sequences are concatenated and the restriction sites for BsmBI are exchanged to BbsI and BspMI for the second and third occurrences, respectively. The final sequence is then outputted as a .csv file to the same folder as the input file was chosen from.

Input format

The input.csv files were based on the output format of Benchling.
The format requires two columns: Name, Sequence These are the name of the DNA fragment, and sequence in question, respectively. All other columns will be ignored

Typing of DNA fragments

The Python function DNA_typer adds bases to the 5'- and 3'-ends of the sequence to determine its part type, and to enable entry cloning into pYTK001.

Input format

The input.csv file requires three columns: Name, Type, and Sequence. These are the name of the DNA part, , it's type according to the YTK/CTK nomenclature, and the sequence in question, respectively. All other columns will be ignored

Citation

If you use this code or the data provided here, please cite the corresponding paper.

License

The code and the data is available under an MIT License. Please cite the corresponding paper if you use our code and/or data.

Funding & Acknowledgments

The authors acknowledge Anika Kofod Petersen for her work on the prototype of the de novo synthesis clustering pipeline. The work was made possible with the support of a scholarship from the German Academic Exchange Service (DAAD), project number 91877921 to J.M. E.K. was supported by ERC-PoC grant PLATE (101082333). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies. We acknowledge the use of Python and the aforementioned Python packages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dna_fragment_clustering-0.1.4.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dna_fragment_clustering-0.1.4-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file dna_fragment_clustering-0.1.4.tar.gz.

File metadata

  • Download URL: dna_fragment_clustering-0.1.4.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for dna_fragment_clustering-0.1.4.tar.gz
Algorithm Hash digest
SHA256 28ef3dbaf4e13f263fdd18768afcf5578aa9ba681ada56a5e9cb6fa068a4a7c6
MD5 2b53b3819fb01a05aa86c090ea8a8730
BLAKE2b-256 1f4683653d1b4e808675728570e9b0e34ffa238a8d7294f773eb6416c9b19e9b

See more details on using hashes here.

File details

Details for the file dna_fragment_clustering-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for dna_fragment_clustering-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c8264294f759c6f8cddf66bdf119a2794ead4ec3e6e8714815a65d79ea0668ed
MD5 b25392e4929e635a5dbba5a5c92bbe4c
BLAKE2b-256 5769395cb78586cec88b3238fe6a4075f5ba32e4891c9921e248c447143bc813

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page