CLI tool for creating mmCIF files from various facility data sources
Project description
mmcif-gen
A versatile command-line tool for generating any mmCIF files from various data sources. This tool can be to create:
- Metadata mmCIF files (To capture experimental metadata from different facilities)
- Investigation mmCIF files (like: https://ftp.ebi.ac.uk/pub/databases/msd/fragment_screening/investigations/)
As is standard practice at the Protein Data Bank (PDB) the files generated are given the extension '.CIF' even though the file format is called mmCIF. More on mmCIF file format can be found here: mmcif.wwpdb.org/
The tool has transformational mapping to convert data as it is stored at various facilities to corresponding catgories and items in mmcif format.
Installation
Install directly from PyPI:
pip install mmcif-gen
Usage
General Options
-v,--version: Show the program's version number.
Main Commands
The tool provides two main commands:
fetch-facility-json: Fetch facility-specific JSON configuration filesmake-mmcif: Generate mmCIF files using the configurations
Fetching Facility JSON Files
The JSON operations files determine how the data would be mapped from the original source and translated into mmCIF format.
These files can be written, but can also be fetched from the github repository using simple commands.
# Fetch configuration for a specific facility
mmcif-gen fetch-facility-json dls-metadata
# Or
mmcif-gen fetch-facility-json xchem_operations
# Specify custom output directory
mmcif-gen fetch-facility-json dls-metadata -o ./mapping_operations
Generating metadata mmCIF Files
Currently the valid facilities to generate mmcif files for are pdbe, maxiv, dls, and xchem.
The general syntax for generating mmCIF files is:
mmcif-gen make-mmcif <facility> [options]
Full list of options:
[w3_pdb05@pdb-001 Investigations]$ mmcif-gen make-mmcif --help
usage: mmcif-gen make-mmcif [-h] [--json JSON] [--output-folder OUTPUT_FOLDER]
[--id ID]
{pdbe,maxiv,dls,xchem} ...
positional arguments:
{pdbe,maxiv,dls,xchem}
Specifies facility for which mmcif files will be used
for
pdbe Parameter requirements for investigation files from
PDBe data
maxiv Parameter requirements for investigation files from
MAX IV data
dls Parameter requirements for creating investigation
files from DLS data
xchem Parameter requirements for creating investigation
files from XChem data
optional arguments:
-h, --help show this help message and exit
--json JSON Path to transformation JSON file
--output-folder OUTPUT_FOLDER
Output folder for mmCIF files
--id ID File identifier
Each facility has its own set of required parameters, which can be checked by running the command with the --help flag.
mmcif-gen make-mmcif pdbe --help
Example Usage
DLS (Diamond Light Source)
# Using metadata configuration
mmcif-gen make-mmcif --json dls_metadata.json --output-folder ./out --id I_1234 dls --dls-json metadata-from-isypb.json
XChem
Parameters required
$ mmcif-gen make-mmcif xchem --help
usage: mmcif-gen make-mmcif xchem [-h] [--sqlite SQLITE] [--data-csv DATA_CSV]
options:
-h, --help show this help message and exit
--sqlite SQLITE Path to the .sqlite file for each data set
--data-csv DATA_CSV Path to the .csv file for each data set
Example command after fetching facility json:
mmcif-gen make-mmcif xchem --sqlite mmcif_gen/test/data/soakDBDataFile.sqlite --data-csv mmcif_gen/test/data/metadata.csv
Or you can manually specifiy the operation json as follows:
mmcif-gen make-mmcif --json mmcif_gen/operations/xchem/xchem_operations.json xchem --sqlite mmcif_gen/test/data/soakDBDataFile.sqlite --data-csv mmcif_gen/test/data/metadata.csv
Working with Investigation Files
Investigation files are a specialized type of mmCIF file that capture metadata across multiple experiments.
Investigation files are created in a very similar way:
PDBe
# Using model folder
mmcif-gen make-mmcif --json pdbe_investigation.json --output-folder ./out --id I_1234 pdbe --model-folder ./models
# Using PDB IDs
mmcif-gen make-mmcif --json mmcif_gen/operations/pdbe/pdbe_investigation.json --output-folder ./out --id I_321 pdbe --pdb-ids 6dmn 6dpp 6do8
# Using CSV input
mmcif-gen make-mmcif --json pdbe_investigation.json --output-folder ./out pdbe --csv-file groups.csv
MAX IV
# Using SQLite database
mmcif-gen make-mmcif maxiv --json maxiv_investigation.json --sqlite fragmax.sqlite --output-folder ./out --id I_1234
XChem
# Using SQLite database with additional information
mmcif-gen make-mmcif xchem --json xchem_investigation.json --sqlite soakdb.sqlite --txt ./metadata --deposit ./deposit --output-folder ./out
Data Enrichment
For investigation files that need enrichment with additional data (e.g., ground state information):
# Using the miss_importer utility
python miss_importer.py --investigation-file inv.cif --sf-file structure.sf --pdb-id 1ABC
Operation JSON Files
The tool uses JSON configuration files to define how data should be transformed into mmCIF format. These files can be:
- Fetched files using the
fetch-facility-jsoncommand - Modified versions of official configurations
Configuration File Structure
{
"source_category" : "_audit_author",
"source_items" : ["name"],
"target_category" : "_audit_author",
"target_items" : "_same",
"operation" : "distinct_union",
"operation_parameters" :{
"primary_parameters" : ["name"]
}
}
Refer to existing JSON files in the operations/ directory for examples.
Development
Project Structure
mmcif-gen/
├── facilities/ # Facility-specific implementations
│ ├── pdbe.py
│ ├── maxiv.py
│ └── ...
├── operations/ # JSON configuration files
│ ├── dls/
│ ├── maxiv/
│ └── ...
├── tests/ # Test cases
├── setup.py # Package configuration
└── README.md # Documentation
Running Tests
python -m unittest discover -s tests
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Support
For issues and questions, please use the GitHub issue tracker.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mmcif_gen-1.2.2.tar.gz.
File metadata
- Download URL: mmcif_gen-1.2.2.tar.gz
- Upload date:
- Size: 32.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47d476b518a5770617abcdb570bb28ea01349eba7fc3eb7081e7de190430250d
|
|
| MD5 |
bff77252c99d0a65d722a729ce520409
|
|
| BLAKE2b-256 |
d2362bc365994e0b5333ca07fd82f30734b31a2ba9354a15657f7a87feb31d20
|
File details
Details for the file mmcif_gen-1.2.2-py3-none-any.whl.
File metadata
- Download URL: mmcif_gen-1.2.2-py3-none-any.whl
- Upload date:
- Size: 36.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
628f97ee72b4cfe0e34d1e9e7e1ac92f657292c071844501c5e0c7d57f7e8029
|
|
| MD5 |
425e03bdd96f58838ef9d49fe5a251d2
|
|
| BLAKE2b-256 |
efcb627a53ef90ea4f339d954d7607c1769d9dcfb3e12f791f6da59a6f239d41
|