Skip to main content

Application building relations between drugs, scientific publications, pubmed, journals and clinical trials.

Project description

Bazema linker

Application building relations between drugs, scientific publications, pubmed, journals and clinical trials.

The output is a JSON file.

Design

   +-------------------------+
   | input folder            |
   |   + drugs.csv           |
   |   | pubmed.csv          |
   |   | pubmed.json         |
   |   + clinical_trials.csv |
   +-------------------------+
               +         move valid
               |         files    +------------------+
               v           +----> |  archive folder  |
      +--------+-------+   |      +------------------+
      |                |+--+
      | bazema_linker  |
      | python job     |     move invalid
      |                |±--+ files
      +----------------+   |      +------------------+
               +           +----> |  errors folder   |
               |                  +------------------+
               v
+-----------------------------+
|  output folder              |
|   + result_2020_10_06.json  |
+-----------------------------+

Once the job is done, the input files are moved to an archive folder. Invalid files (name invalid, format invalid, parsing impossible) are moved to an errors folder.

Structure of input files

  • drugs.csv, 2 columns= atccode anddrug
  • pubmed.csv, 4 columns= id, title, date and journal
  • pubmed.json, same structure as a JSON
  • clinical_trials.csv, 4 columns= id, scientific_title, date and journal

Structure of generated output

[
    {
        "drug": "drug name",
        "clinical_trials": [
            {
                "title": "title of article",
                "date": "2020-01-01"
            }, {...}
        ],
        "pubmed": [
            {
                "title": "title of article",
                "date": "2020-01-01"
            }, {...}
        ],
        "journals": [
            {
                "date": "2020-01-01",
                "journal": "journal name"
            }, {...}
        ]
    },
    {...}
]

Usage

Requirements

  • Python >= 3.6

Installation

virtualenv -p python3 venv
source venv/bin/activate

pip install bazema_linker

Display usage

bazema_linker -h

Example

bazema_linker --input_dir data --output_dir result

Development

# Install
virtualenv -p python3 venv
source venv/bin/activate
make install

# Build
make test # coverage tests
make linter # runs pylint
make build

Ad-hoc Top journals

You can get the name of the journal with the most different drugs using the script top_journals.py and a result file produced by bazema_linker.

Usage

# no depedency required
python top_journals.py result/result_2020-10-06.json

# output
Journal with most different drugs is "Science" with a total of "15" different drugs.

TODO

  • Handle high volume of data, like few tera-octets -> use a highly scalable framework (i.e. Apache Spark, Apache Beam). Pay attention when broadcasting data across workers.
  • Deploy to Pypi using Github Actions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bazema_linker-1.2.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

bazema_linker-1.2-py3.8.egg (14.4 kB view details)

Uploaded Source

File details

Details for the file bazema_linker-1.2.tar.gz.

File metadata

  • Download URL: bazema_linker-1.2.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.1

File hashes

Hashes for bazema_linker-1.2.tar.gz
Algorithm Hash digest
SHA256 32773b778164edb312a8ca51e372b4c498310efa2af6764ee2fc917a3bfb097b
MD5 2005fe6945c37a4363f7ac3f11134892
BLAKE2b-256 96c7bcf3e670cb718fdb3a5a83e77d701d8ec5ee242820f00908c670a108e0c2

See more details on using hashes here.

File details

Details for the file bazema_linker-1.2-py3.8.egg.

File metadata

  • Download URL: bazema_linker-1.2-py3.8.egg
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.1

File hashes

Hashes for bazema_linker-1.2-py3.8.egg
Algorithm Hash digest
SHA256 c27f6e836f0ab8fa0e3535310689496d94d9a4cadce786f54e4d8aacfda68ef6
MD5 a1531e0cfd0dd97a379ccfede1174413
BLAKE2b-256 30469a58d7b8d44ea021d75b1306db1dbef12989ca59de8a4794a1e93b2e5998

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page