DNA fragment clustering and grouping tool

Project description

The Coli Toolkit (CTK): An extension of the modular Yeast Toolkit for use in E. coli

This python package contains the code responsible for clustering small DNA fragments in preparation for de novo synthesis. The project is also described in the paper: The Coli Toolkit (CTK): An extension of the modular Yeast Toolkit to the E. coli chassis by Jacob Mejlsted^1,2,3, Erik Kubaczka^1,3, Sebastian Wirth^1,3, and Heinz Koeppl^1,3*.

Install

pip install DNA-fragment-clustering

CLI Usage

DNA-fragment-clustering input.csv --aggressive

Python Usage

from DNA-fragment-clustering import DNA_clustering DNA_clustering("input.csv")

Clustering of de novo DNA fragments

The Python executable DNA_fragments.py performs clustering and grouping of de novo DNA fragments meant for synthesis. From the methods:

The clustering software uses the Levenshtein similarity matrix to compute the differences between the various fragments that the user wants to synthesize. Using affinity propagation, the software defines clusters with high sequence similarity. From this, groups are made of up to three sequences from distinct clusters to obtain low sequence similarity in the final DNA sequence sent for synthesis. If the aggressive clustering option is selected, groups only containing one sequence are concatenated together to minimize the amount of DNA needed to be synthetized. Following the grouping, the DNA sequences are concatenated and the restriction sites for BsmBI are exchanged to BbsI and BspMI for the second and third occurrences, respectively. The final sequence is then outputted as a .csv file to the same folder as the input file was chosen from.

Setup

Please make sure that you have a working Python installation. The requirements and instructions on their installation can be found in Requirements.

Input format

The input .csv files were based on the output format of Benchling. Examples are provided in the clustering_sample_data folder. The format uses three columns: Name, Author, Sequence These are the name of the DNA fragment, the author/owner of the DNA sequences, and sequence in question, respectively.

Requirements

The software provided here is written in Python and makes use of libraries such as numpy, pandas and others.

Navigate with in your terminal or command line to project directory and run

pip install -r requirements.txt

to install all the dependencies required for the code. Be aware, that depending on your OS, you might have to use pip3 instead of pip.

In particular, this installs the following packages:

numpy
pandas
Levenshtein
sklearn.cluster
pathlib
tkinter
matplotlib
FlowCal
scipy
shutil
warnings

Citation

If you use this code or the data provided here, please cite the corresponding paper.

License

The code and the data is available under an MIT License. Please cite the corresponding paper if you use our code and/or data.

Funding & Acknowledgments

The authors acknowledge Anika Kofod Petersen for her work on the prototype of the de novo synthesis clustering pipeline. The work was made possible with the support of a scholarship from the German Academic Exchange Service (DAAD), project number 91877921 to J.M. E.K. was supported by ERC-PoC grant PLATE (101082333). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies. We acknowledge the use of Python and the aforementioned Python packages.

Project details

Release history Release notifications | RSS feed

0.1.4

Mar 6, 2026

0.1.3

Mar 2, 2026

0.1.2

Feb 23, 2026

0.1.1

Feb 23, 2026

This version

0.1.0

Feb 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dna_fragment_clustering-0.1.0.tar.gz (6.7 kB view details)

Uploaded Feb 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dna_fragment_clustering-0.1.0-py3-none-any.whl (7.9 kB view details)

Uploaded Feb 23, 2026 Python 3

File details

Details for the file dna_fragment_clustering-0.1.0.tar.gz.

File metadata

Download URL: dna_fragment_clustering-0.1.0.tar.gz
Upload date: Feb 23, 2026
Size: 6.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for dna_fragment_clustering-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b131399a232608ddb5738d7c96df9e4038a0b2f66999b8bf547f662bdfe7ba35`
MD5	`1e1666fd39623eda6c3ab4e95a5b2292`
BLAKE2b-256	`d1b0e1ce02826f57c2fbaa0dc11d59b89c280edbe46942fb61174f4e4a66f849`

See more details on using hashes here.

File details

Details for the file dna_fragment_clustering-0.1.0-py3-none-any.whl.

File metadata

Download URL: dna_fragment_clustering-0.1.0-py3-none-any.whl
Upload date: Feb 23, 2026
Size: 7.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for dna_fragment_clustering-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`54c365c5224e14ec6de49c1e01c38e438b9ec39de46489e8da04cfc06c849041`
MD5	`ba476bf30ffe9410c794dbb04913f563`
BLAKE2b-256	`5e6e5a12a33aa651c33585776f4bfb65195c9f4ada1393521fe206bd0b3da2c7`

See more details on using hashes here.

DNA-fragment-clustering 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

The Coli Toolkit (CTK): An extension of the modular Yeast Toolkit for use in E. coli

Install

CLI Usage

Python Usage

Clustering of de novo DNA fragments

Setup

Input format

Requirements

Citation

License

Funding & Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes