A Python tool for standardizing drug names using the latest FDA's UNII Names list.
Project description
Drugname Standardizer
The Drugname Standardizer is a Python tool for standardizing drug names using the official FDA's UNII Names List archive. It notably supports both JSON and CSV input formats, making it easy to ensure consistent drug naming in datasets.
Features
-
A trusted source for drug synonyms : the package automatically downloads the latest version of the UNII Names file from the official FDA repository. The
UNII_Names.txtis saved to the package'sdata/folder for future use. The user can also choose to indicate another local UNII Names file if a particular version is preferred. -
Parsing of the FDA's UNII Names List to map drug names (code / official / systematic / common / brand names) to a single preferred name (i.e. the Display Name of the UNII Names file).
-
Input versatility:
- a single drug name,
- a list of drug names,
- a JSON input file (a list of drugs to standardize)
- a CSV input file (a dataframe containing a column of drugs to standardize)
-
Provides both a Python package interface for scripting and a command-line interface (CLI) for direct use.
-
Resolves naming ambiguities of the FDA's UNII Names file by selecting the shortest Display Names. Rare but exists: 55 / 986397 associations in
UNII_Names_20Dec2024.txt. For example, forPRN1008the ambiguity is solved by keepingRILZABRUTINIBwhereas 2 associations exist:PRN1008... ...RILZABRUTINIB, (.ALPHA.E,3S)-PRN1008... ...RILZABRUTINIB
Warning:
There are code / official / systematic / common / brand names for drugs. Some are linked to different level of details about the compound.
The standardization proposed here gathers information at the "upper" level (i.e. the less detailled one). I relied on the "Preferred Substance Name" (= the Display name field) indicated in the correspondence table provided by the FDA.
For instance : both 3'-((1R)-1-((6R)-5,6-DIHYDRO-4-HYDROXY-2-OXO-6-PHENETHYL-6-PROPYL-2H-PYRAN-3-YL)PROPYL)-5-(TRIFLUOROMETHYL)-2-PYRIDINESULFONANILIDE (systematic name) and Aptivus (brand name) become TIPRANAVIR.
Usage
Python API
You can use the package programmatically in your Python scripts:
from drugname_standardizer import standardize
Examples:
- Get the preferred name for a specific drug:
drug_name = "GDC-0199"
preferred_name = standardize(drug_name)
print(preferred_name) # Outputs: VENETOCLAX
- Standardize a list of drugs:
drug_names = ["GDC-0199", "Aptivus", "diodrast"]
preferred_names = standardize(drug_names)
print(preferred_names) # Outputs: ["VENETOCLAX", "TIPRANAVIR", "IODOPYRACET"]
- Standardize a JSON file:
standardize(
input_file="drugs.json",
output_file="standardized_drugs.json",
file_type="json"
)
# Outputs: Standardized JSON file saved as standardized_drugs.json
- Standardize a CSV file:
standardize(
input_file="dataset.csv",
file_type="csv",
column_index=1
)
# Outputs: Standardized CSV file saved as dataset_drug_standardized.csv
Command-Line Interface
You can also use a CLI for standardizing JSON and CSV files.
- Required arguments:
--input,-i: A drug name or the path to a JSON/CSV file
- Optional arguments:
--file_type,-f: Type of the input file (jsonorcsv)--output,-o: The output file name (relative path can be given). Defaults: the input file name with_drug_standardizedadded before the extension.--column_index,-c: Index of the column containing the drug names to standardize (required for CSV files).--separator,-s: Field separator for CSV files. Defaults:,.--unii_file,-u: Path to a UNII Names List file. Defaults: automatic download of the latest version.
Examples:
- Get the preferred name for a specific drug:
drugname_standardizer -i "DynaCirc"
- Standardize a JSON file:
drugname_standardizer -i drugs.json -f json
- Standardize a CSV file: e.g., using custom separator and outputfile name:
drugname_standardizer -i dataset.csv -f csv -c 2 -s "\t" -o standardized_dataset.csv
Installation
Using pip
python3 -m pip install drugname_standardizer
GitHub repository
git clone https://github.com/StephanieChevalier/drugname_standardizer.git
cd drugname_standardizer
pip install -r requirements.txt
Requirements:
- Python 3.12+
- Dependencies:
pandas >= 2.2.2requests >= 2.32.2tqdm >= 4.66.4
How it works
-
Parse UNII File:
- Reads the UNII Names List to create a mapping of drug names to the Display Name (i.e. the preferred name).
- Resolves potential naming conflicts by selecting the shortest Display Name (55 / 986397 associations).
-
Standardize Names:
- For a single drug name: return the preferred name.
- For a list of drug names: maps drug names to their preferred names and return the updated list.
- For JSON input: Maps drug names to their preferred names and saves the results to a JSON file.
- For CSV input: Updates the specified column with standardized drug names and saves the modified DataFrame to a CSV file.
Package structure
drugname_standardizer/
├── drugname_standardizer/
│ ├── __init__.py # Package initialization
│ ├── standardizer.py # Core logic for name standardization
│ └── data/
│ └── UNII_Names.txt # UNII Names List file (ensured to be no older than 1 month when available)
├── tests/
│ ├── __init__.py
│ └── test_standardizer.py # Unit tests for the package
├── LICENSE # MIT License
├── pyproject.toml # Package configuration
├── README.md # Project documentation
└── requirements.txt # Development dependencies
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file drugname_standardizer-1.1.7.tar.gz.
File metadata
- Download URL: drugname_standardizer-1.1.7.tar.gz
- Upload date:
- Size: 13.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7170f890e9a688295e87dee44d509f376cafea629ed2cca95771aa36945c4114
|
|
| MD5 |
1ba35e25665cd1d01b126891fea9173c
|
|
| BLAKE2b-256 |
62577bcaa2c575b7cb74c7a74e10cff36b6135b3ff87ead5a21f3d7d4b712525
|
File details
Details for the file drugname_standardizer-1.1.7-py3-none-any.whl.
File metadata
- Download URL: drugname_standardizer-1.1.7-py3-none-any.whl
- Upload date:
- Size: 13.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab52e91c847b8bcbf8bef750bb6c326d2d303b70b35aa0d5851797129cfdf00d
|
|
| MD5 |
71a95c4d59aceb3d11ccf877a7693e8c
|
|
| BLAKE2b-256 |
6a2da96b2a55eb9392b6b99ad738643d9222e943e2a6f63f21b17d84faaabcd3
|