Skip to main content

CLI tool for creating mmCIF files from various facility data sources

Project description

mmcif-gen

A versatile command-line tool for generating any mmCIF files from various data sources. This tool can be to create:

  1. Metadata mmCIF files (To capture experimental metadata from different facilities)
  2. Investigation mmCIF files (like: https://ftp.ebi.ac.uk/pub/databases/msd/fragment_screening/investigations/)

As is standard practice at the Protein Data Bank (PDB) the files generated are given the extension '.CIF' even though the file format is called mmCIF. More on mmCIF file format can be found here: mmcif.wwpdb.org/

The tool has transformational mapping to convert data as it is stored at various facilities to corresponding catgories and items in mmcif format.

Installation

Install directly from PyPI:

pip install mmcif-gen

Usage

The tool provides two main commands:

  1. fetch-facility-json: Fetch facility-specific JSON configuration files
  2. make-mmcif: Generate mmCIF files using the configurations

Fetching Facility JSON Files

The JSON operations files determine how the data would be mapped from the original source and translated into mmCIF format.

These files can be written, but can also be fetched from the github repository using simple commands.

# Fetch configuration for a specific facility
mmcif-gen fetch-facility-json dls-metadata

# Or
mmcif-gen fetch-facility-json xchem_operations

# Specify custom output directory
mmcif-gen fetch-facility-json dls-metadata -o ./mapping_operations

Generating metadata mmCIF Files

Currently the valid facilities to generate mmcif files for are pdbe, maxiv, dls, and xchem.

The general syntax for generating mmCIF files is:

mmcif-gen make-mmcif <facility> [options]

Full list of options:

[w3_pdb05@pdb-001 Investigations]$ mmcif-gen make-mmcif --help
usage: mmcif-gen make-mmcif [-h] [--json JSON] [--output-folder OUTPUT_FOLDER]
                            [--id ID]
                            {pdbe,maxiv,dls,xchem} ...

positional arguments:
  {pdbe,maxiv,dls,xchem}
                        Specifies facility for which mmcif files will be used
                        for
    pdbe                Parameter requirements for investigation files from
                        PDBe data
    maxiv               Parameter requirements for investigation files from
                        MAX IV data
    dls                 Parameter requirements for creating investigation
                        files from DLS data
    xchem               Parameter requirements for creating investigation
                        files from XChem data

optional arguments:
  -h, --help            show this help message and exit
  --json JSON           Path to transformation JSON file
  --output-folder OUTPUT_FOLDER
                        Output folder for mmCIF files
  --id ID               File identifier

Each facility has its own set of required parameters, which can be checked by running the command with the --help flag.

mmcif-gen make-mmcif pdbe --help

Example Usage

DLS (Diamond Light Source)

# Using metadata configuration
mmcif-gen make-mmcif --json dls_metadata.json --output-folder ./out --id I_1234 dls --dls-json metadata-from-isypb.json

XChem

Parameters required

$ mmcif-gen make-mmcif xchem --help                                                                      
usage: mmcif-gen make-mmcif xchem [-h] [--sqlite SQLITE] [--data-csv DATA_CSV]

options:
  -h, --help           show this help message and exit
  --sqlite SQLITE      Path to the .sqlite file for each data set
  --data-csv DATA_CSV  Path to the .csv file for each data set

Example command:

mmcif-gen make-mmcif --id 001 --json mmcif_gen/operations/xchem/xchem_metadata.json --output-folder pdbedeposit xchem --sqlite mmcif_gen/test/data/lb32633-1-soakDBDataFile.sqlite --cif-type model

Working with Investigation Files

Investigation files are a specialized type of mmCIF file that capture metadata across multiple experiments.

Investigation files are created in a very similar way:

PDBe

# Using model folder
mmcif-gen make-mmcif --json pdbe_investigation.json --output-folder ./out --id I_1234 pdbe --model-folder ./models 

# Using PDB IDs
mmcif-gen make-mmcif  --json pdbe_investigation.json --output-folder ./out pdbe  --pdb-ids 6dmn 6dpp 6do8

# Using CSV input
mmcif-gen make-mmcif  --json pdbe_investigation.json --output-folder ./out pdbe --csv-file groups.csv 

MAX IV

# Using SQLite database
mmcif-gen make-mmcif maxiv --json maxiv_investigation.json --sqlite fragmax.sqlite --output-folder ./out --id I_1234

XChem

# Using SQLite database with additional information
mmcif-gen make-mmcif xchem --json xchem_investigation.json --sqlite soakdb.sqlite --txt ./metadata --deposit ./deposit --output-folder ./out

Data Enrichment

For investigation files that need enrichment with additional data (e.g., ground state information):

# Using the miss_importer utility
python miss_importer.py --investigation-file inv.cif --sf-file structure.sf --pdb-id 1ABC

Operation JSON Files

The tool uses JSON configuration files to define how data should be transformed into mmCIF format. These files can be:

  1. Fetched files using the fetch-facility-json command
  2. Modified versions of official configurations

Configuration File Structure

    {
        "source_category" : "_audit_author",
        "source_items" : ["name"],
        "target_category" : "_audit_author",
        "target_items" : "_same",
        "operation" : "distinct_union",
        "operation_parameters" :{
            "primary_parameters" : ["name"]
        }
    }

Refer to existing JSON files in the operations/ directory for examples.

Development

Project Structure

mmcif-gen/
├── facilities/            # Facility-specific implementations
│   ├── pdbe.py
│   ├── maxiv.py
│   └── ...
├── operations/           # JSON configuration files
│   ├── dls/
│   ├── maxiv/
│   └── ...
├── tests/               # Test cases
├── setup.py            # Package configuration
└── README.md          # Documentation

Running Tests

python -m unittest discover -s tests

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Support

For issues and questions, please use the GitHub issue tracker.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mmcif_gen-1.2.0.tar.gz (32.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mmcif_gen-1.2.0-py3-none-any.whl (36.3 kB view details)

Uploaded Python 3

File details

Details for the file mmcif_gen-1.2.0.tar.gz.

File metadata

  • Download URL: mmcif_gen-1.2.0.tar.gz
  • Upload date:
  • Size: 32.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for mmcif_gen-1.2.0.tar.gz
Algorithm Hash digest
SHA256 9259743cad71328fb90f8d945d16eb1739341a8538e79bbd7d7e95144212588d
MD5 8dcf32a7403f477ce652828b4c401eda
BLAKE2b-256 359123d6e121a64d2f76e9624578b421797817b2c2a0d6f24ac8ba3a0a6bd5c1

See more details on using hashes here.

File details

Details for the file mmcif_gen-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: mmcif_gen-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 36.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for mmcif_gen-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3dba54610757659f553605ae21ad7e63ac1c9bbb862895960a0eb266777bba5f
MD5 6488e7c708fc234e2389f8466132b9d8
BLAKE2b-256 349e712fdf0f8465cd9dd38df592b2ba14e2be08e000ad7134bf1205b02d9ddd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page