Skip to main content

Application building relations between drugs, scientific publications, pubmed, journals and clinical trials.

Project description

Bazema linker

Application building relations between drugs, scientific publications, pubmed, journals and clinical trials.

The output is a JSON file.

Design

   +-------------------------+
   | input folder            |
   |   + drugs.csv           |
   |   | pubmed.csv          |
   |   | pubmed.json         |
   |   + clinical_trials.csv |
   +-------------------------+
               +         move valid
               |         files    +------------------+
               v           +----> |  archive folder  |
      +--------+-------+   |      +------------------+
      |                |+--+
      | bazema_linker  |
      | python job     |     move invalid
      |                |±--+ files
      +----------------+   |      +------------------+
               +           +----> |  errors folder   |
               |                  +------------------+
               v
+-----------------------------+
|  output folder              |
|   + result_2020_10_06.json  |
+-----------------------------+

Once the job is done, the input files are moved to an archive folder. Invalid files (name invalid, format invalid, parsing impossible) are moved to an errors folder.

Structure of input files

  • drugs.csv, 2 columns= atccode anddrug
  • pubmed.csv, 4 columns= id, title, date and journal
  • pubmed.json, same structure as a JSON
  • clinical_trials.csv, 4 columns= id, scientific_title, date and journal

Structure of generated output

[
    {
        "drug": "drug name",
        "clinical_trials": [
            {
                "title": "title of article",
                "date": "2020-01-01"
            }, {...}
        ],
        "pubmed": [
            {
                "title": "title of article",
                "date": "2020-01-01"
            }, {...}
        ],
        "journals": [
            {
                "date": "2020-01-01",
                "journal": "journal name"
            }, {...}
        ]
    },
    {...}
]

Usage

Requirements

  • Python >= 3.6

Installation

virtualenv -p python3 venv
source venv/bin/activate

pip install bazema_linker

Display usage

bazema_linker -h

Example

bazema_linker --input_dir data --output_dir result

Development

# Install
virtualenv -p python3 venv
source venv/bin/activate
make install

# Build
make test # coverage tests
make linter # runs pylint
make build

Ad-hoc Top journals

You can get the name of the journal with the most different drugs using the script top_journals.py and a result file produced by bazema_linker.

Usage

# no depedency required
python top_journals.py result/result_2020-10-06.json

# output
Journal with most different drugs is "Science" with a total of "15" different drugs.

TODO

  • Handle high volume of data, like few tera-octets -> use a highly scalable framework (i.e. Apache Spark, Apache Beam). Pay attention when broadcasting data across workers.
  • Deploy to Pypi using Github Actions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bazema_linker-1.2.tar.gz (7.4 kB view hashes)

Uploaded Source

Built Distribution

bazema_linker-1.2-py3.8.egg (14.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page