Application building relations between drugs, scientific publications, pubmed, journals and clinical trials.
Project description
Bazema linker
Application building relations between drugs, scientific publications, pubmed, journals and clinical trials.
The output is a JSON file.
Design
+-------------------------+
| input folder |
| + drugs.csv |
| | pubmed.csv |
| | pubmed.json |
| + clinical_trials.csv |
+-------------------------+
+ move valid
| files +------------------+
v +----> | archive folder |
+--------+-------+ | +------------------+
| |+--+
| bazema_linker |
| python job | move invalid
| |±--+ files
+----------------+ | +------------------+
+ +----> | errors folder |
| +------------------+
v
+-----------------------------+
| output folder |
| + result_2020_10_06.json |
+-----------------------------+
Once the job is done, the input files are moved to an archive
folder.
Invalid files (name invalid, format invalid, parsing impossible)
are moved to an errors
folder.
Structure of input files
drugs.csv
, 2 columns=atccode
anddrug
pubmed.csv
, 4 columns=id
,title
,date
andjournal
pubmed.json
, same structure as a JSONclinical_trials.csv
, 4 columns=id
,scientific_title
,date
andjournal
Structure of generated output
[
{
"drug": "drug name",
"clinical_trials": [
{
"title": "title of article",
"date": "2020-01-01"
}, {...}
],
"pubmed": [
{
"title": "title of article",
"date": "2020-01-01"
}, {...}
],
"journals": [
{
"date": "2020-01-01",
"journal": "journal name"
}, {...}
]
},
{...}
]
Usage
Requirements
- Python >= 3.6
Installation
virtualenv -p python3 venv
source venv/bin/activate
pip install bazema_linker
Display usage
bazema_linker -h
Example
bazema_linker --input_dir data --output_dir result
Development
# Install
virtualenv -p python3 venv
source venv/bin/activate
make install
# Build
make test # coverage tests
make linter # runs pylint
make build
Ad-hoc Top journals
You can get the name of the journal with the most different drugs using
the script top_journals.py
and a result file produced by bazema_linker
.
Usage
# no depedency required
python top_journals.py result/result_2020-10-06.json
# output
Journal with most different drugs is "Science" with a total of "15" different drugs.
TODO
- Handle high volume of data, like few tera-octets -> use a highly scalable framework (i.e. Apache Spark, Apache Beam). Pay attention when broadcasting data across workers.
- Deploy to Pypi using Github Actions
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bazema_linker-1.2.tar.gz
.
File metadata
- Download URL: bazema_linker-1.2.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32773b778164edb312a8ca51e372b4c498310efa2af6764ee2fc917a3bfb097b |
|
MD5 | 2005fe6945c37a4363f7ac3f11134892 |
|
BLAKE2b-256 | 96c7bcf3e670cb718fdb3a5a83e77d701d8ec5ee242820f00908c670a108e0c2 |
File details
Details for the file bazema_linker-1.2-py3.8.egg
.
File metadata
- Download URL: bazema_linker-1.2-py3.8.egg
- Upload date:
- Size: 14.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c27f6e836f0ab8fa0e3535310689496d94d9a4cadce786f54e4d8aacfda68ef6 |
|
MD5 | a1531e0cfd0dd97a379ccfede1174413 |
|
BLAKE2b-256 | 30469a58d7b8d44ea021d75b1306db1dbef12989ca59de8a4794a1e93b2e5998 |