Skip to main content

CLI tool for creating mmCIF files from various facility data sources

Project description

mmcif-gen

A versatile command-line tool for generating any mmCIF files from various data sources. This tool can be to create:

  1. Metadata mmCIF files (To capture experimental metadata from different facilities)
  2. Investigation mmCIF files (like: https://ftp.ebi.ac.uk/pub/databases/msd/fragment_screening/investigations/)

As is standard practice at the Protein Data Bank (PDB) the files generated are given the extension '.CIF' even though the file format is called mmCIF. More on mmCIF file format can be found here: mmcif.wwpdb.org/

The tool has transformational mapping to convert data as it is stored at various facilities to corresponding catgories and items in mmcif format.

Installation

Install directly from PyPI:

pip install mmcif-gen

Usage

The tool provides two main commands:

  1. fetch-facility-json: Fetch facility-specific JSON configuration files
  2. make-mmcif: Generate mmCIF files using the configurations

Fetching Facility JSON Files

The JSON operations files determine how the data would be mapped from the original source and translated into mmCIF format.

These files can be written, but can also be fetched from the github repository using simple commands.

# Fetch configuration for a specific facility
mmcif-gen fetch-facility-json dls-metadata

# Or
mmcif-gen fetch-facility-json xchem_operations

# Specify custom output directory
mmcif-gen fetch-facility-json dls-metadata -o ./mapping_operations

Generating metadata mmCIF Files

Currently the valid facilities to generate mmcif files for are pdbe, maxiv, dls, and xchem.

The general syntax for generating mmCIF files is:

mmcif-gen make-mmcif <facility> [options]

Full list of options:

[w3_pdb05@pdb-001 Investigations]$ mmcif-gen make-mmcif --help
usage: mmcif-gen make-mmcif [-h] [--json JSON] [--output-folder OUTPUT_FOLDER]
                            [--id ID]
                            {pdbe,maxiv,dls,xchem} ...

positional arguments:
  {pdbe,maxiv,dls,xchem}
                        Specifies facility for which mmcif files will be used
                        for
    pdbe                Parameter requirements for investigation files from
                        PDBe data
    maxiv               Parameter requirements for investigation files from
                        MAX IV data
    dls                 Parameter requirements for creating investigation
                        files from DLS data
    xchem               Parameter requirements for creating investigation
                        files from XChem data

optional arguments:
  -h, --help            show this help message and exit
  --json JSON           Path to transformation JSON file
  --output-folder OUTPUT_FOLDER
                        Output folder for mmCIF files
  --id ID               File identifier

Each facility has its own set of required parameters, which can be checked by running the command with the --help flag.

mmcif-gen make-mmcif pdbe --help

Example Usage

DLS (Diamond Light Source)

# Using metadata configuration
mmcif-gen make-mmcif --json dls_metadata.json --output-folder ./out --id I_1234 dls --dls-json metadata-from-isypb.json

XChem

Parameters required

$ mmcif-gen make-mmcif xchem --help                                                                      
usage: mmcif-gen make-mmcif xchem [-h] [--sqlite SQLITE] [--data-csv DATA_CSV]

options:
  -h, --help           show this help message and exit
  --sqlite SQLITE      Path to the .sqlite file for each data set
  --data-csv DATA_CSV  Path to the .csv file for each data set

Example command after fetching facility json:

mmcif-gen make-mmcif xchem --sqlite mmcif_gen/test/data/soakDBDataFile.sqlite --data-csv mmcif_gen/test/data/metadata.csv

Or you can manually specifiy the operation json as follows:

mmcif-gen make-mmcif --json mmcif_gen/operations/xchem/xchem_operations.json xchem --sqlite mmcif_gen/test/data/soakDBDataFile.sqlite --data-csv mmcif_gen/test/data/metadata.csv

Working with Investigation Files

Investigation files are a specialized type of mmCIF file that capture metadata across multiple experiments.

Investigation files are created in a very similar way:

PDBe

# Using model folder
mmcif-gen make-mmcif --json pdbe_investigation.json --output-folder ./out --id I_1234 pdbe --model-folder ./models 

# Using PDB IDs
mmcif-gen make-mmcif  --json pdbe_investigation.json --output-folder ./out pdbe  --pdb-ids 6dmn 6dpp 6do8

# Using CSV input
mmcif-gen make-mmcif  --json pdbe_investigation.json --output-folder ./out pdbe --csv-file groups.csv 

MAX IV

# Using SQLite database
mmcif-gen make-mmcif maxiv --json maxiv_investigation.json --sqlite fragmax.sqlite --output-folder ./out --id I_1234

XChem

# Using SQLite database with additional information
mmcif-gen make-mmcif xchem --json xchem_investigation.json --sqlite soakdb.sqlite --txt ./metadata --deposit ./deposit --output-folder ./out

Data Enrichment

For investigation files that need enrichment with additional data (e.g., ground state information):

# Using the miss_importer utility
python miss_importer.py --investigation-file inv.cif --sf-file structure.sf --pdb-id 1ABC

Operation JSON Files

The tool uses JSON configuration files to define how data should be transformed into mmCIF format. These files can be:

  1. Fetched files using the fetch-facility-json command
  2. Modified versions of official configurations

Configuration File Structure

    {
        "source_category" : "_audit_author",
        "source_items" : ["name"],
        "target_category" : "_audit_author",
        "target_items" : "_same",
        "operation" : "distinct_union",
        "operation_parameters" :{
            "primary_parameters" : ["name"]
        }
    }

Refer to existing JSON files in the operations/ directory for examples.

Development

Project Structure

mmcif-gen/
├── facilities/            # Facility-specific implementations
│   ├── pdbe.py
│   ├── maxiv.py
│   └── ...
├── operations/           # JSON configuration files
│   ├── dls/
│   ├── maxiv/
│   └── ...
├── tests/               # Test cases
├── setup.py            # Package configuration
└── README.md          # Documentation

Running Tests

python -m unittest discover -s tests

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Support

For issues and questions, please use the GitHub issue tracker.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mmcif_gen-1.2.1.tar.gz (32.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mmcif_gen-1.2.1-py3-none-any.whl (36.4 kB view details)

Uploaded Python 3

File details

Details for the file mmcif_gen-1.2.1.tar.gz.

File metadata

  • Download URL: mmcif_gen-1.2.1.tar.gz
  • Upload date:
  • Size: 32.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for mmcif_gen-1.2.1.tar.gz
Algorithm Hash digest
SHA256 7e72a80d9ddab2ecacf8a9ab93de025d3b6337d01af2f2442d9511d37e0168a7
MD5 da3b6842acb938cdadcf97e0eb38d33f
BLAKE2b-256 b9704553f751a088c0a8551dd6d9b54e1fd0bdc4e4490f63efe7ee20e5d0012e

See more details on using hashes here.

File details

Details for the file mmcif_gen-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: mmcif_gen-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 36.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for mmcif_gen-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9a418db06913b6a6cd5cb5e0f76f2a30aa7ee78a18578805ffb05ce25996ba54
MD5 1a455f24356e66e955018a8d9eee6f31
BLAKE2b-256 455d04471611289897346ec71fff8b0c224943d2870c7723a96b73325d63c619

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page