A tool for extractor patent literature in drug discovery
Project description
PEMT: A tool for extracting patent literature in drug discovery
Table of Contents
General Info
PEMT is a patent extractor tool that enables users to retrieve patents relevant to drug discovery. The overall workflow of the tool can be seen in the figure below:
Installation
$ pip install PEMT
The most recent code can be installed from the source on GitHub with:
$ pip install git+https://github.com/Fraunhofer-ITMP/PEMT.git
Alternatively, for developer the tool can be installed in an editable mode as shown below:
$ git clone https://github.com/Fraunhofer-ITMP/PEMT.git
$ conda create --name pemt python=3.8
$ conda activate pemt
$ cd PEMT
$ pip install pemt
For developers, the repository can be cloned from GitHub and installed in editable mode with:
$ git clone https://github.com/Fraunhofer-ITMP/PEMT.git
$ cd PEMT
$ pip install -e .
Documentation
Read the official docs for more information.
Input Data Formats
Data
For running PEMT from the gene level, you need the input file with the following structure:
symbol | uniprot |
---|---|
HGNC_Symbol_1 | Uniprot_ID_1 |
HGNC_Symbol_2 | Uniprot_ID_2 |
HGNC_Symbol_3 | Uniprot_ID_3 |
For running PEMT from the chemical level, you need the input file with the following structure:
chembl |
---|
ChEMBL_ID_1 |
ChEMBL_ID_2 |
ChEMBL_ID_3 |
Note: The data must be in a comma or tab separated file format. If not so, the file should have at least one of the columns shown above.
Usage
In-order to use PEMT, an installation of chromedriver is required.
As mentioned above, the tool has a two-step approach. Each of these steps can be run individually as well as together as show belwo:
- Chemical enrichment
The following command links chemicals to genes of interest based on causality. In this command it is necessary to indicate whether the file contains uniprot ids or not with the
--uniprot
or--no-uniprot
parameter.
$ pemt run-chemical-extractor --name=<ANALYSIS NAME> --data=<DATA FILE PATH> --input-type=<DATA FILE SEPARATOR> --uniprot
- Patent enrichment The following command interlinks chemicals to patent literature publicly available.
$ pemt run-patent-extractor --name=<ANALYSIS NAME> --chromedriver-path=<PATH TO CHROMEDRIVER> --os=<OS NAME> --no-chemical
We also allow the flexibility to start the pipeline from this step, if the user has list of chemicals in the right format as indicated above. The user then has to use the tag --chemical
and provide a respective --chemical-data
path.
- PEMT workflow The following command generates the patent enrichment on the gene data where the gene data file is a TSV file containing uniprot identifiers.
$ pemt run-pemt --name=<ANALYSIS NAME> --data=<DATA FILE PATH> --input-type=<DATA FILE SEPARATOR> --chromedriver-path=<PATH TO CHROMEDRIVER> --os=<OS NAME>
Issues
If you have difficulties using PEMT, please open an issue at our GitHub repository.
Disclaimer
PET is a scientific tool that has been developed in an academic capacity, and thus comes with no warranty or guarantee of maintenance, support, or back-up of data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file PEMT-0.0.2.tar.gz
.
File metadata
- Download URL: PEMT-0.0.2.tar.gz
- Upload date:
- Size: 60.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11c5c1f2d2f931a5dab36557b2a4b3e9be91a4a2fccd64712d9dba16d78b1b0e |
|
MD5 | 55a783da0ea9a2c22e39ca53751e9b2e |
|
BLAKE2b-256 | 0faf9d0835cc98bd329683c924483ae7f280c2d289970b80c068c9cac1971839 |