A Python tool for standardizing drug names using the latest FDA's UNII Names list.
Project description
Drugname Standardizer
The Drugname Standardizer is a Python package and CLI tool for standardizing drug names using the FDA's official UNII Names List archive. It supports both JSON and TSV/CSV input formats and is designed for easy integration in data processing pipelines.
Features
-
✅ Reliable source of synonyms: the tool automatically downloads the latest
UNII Namesfile from the official FDA repository and caches it locally (monthly freshness check). -
✅ Standardizes drug identifiers (code, official, systematic, common, brand names) to a single preferred name using the Display Name field of the
UNII Namesfile. -
✅ Multiple input types supported:
- A single drug name
- A list of names (Python)
- A JSON file with a list of names
- A TSV/CSV file with a column of names
-
✅ Python package interface (OOP style) and CLI interface (via
drugname_standardizercommand) -
✅ Ambiguity resolution: for entries with multiple display names in the FDA's UNII Names file, the shortest one is chosen. Rare but exists: 55 / 986397 associations in
UNII_Names_20Dec2024.txt. For example, forPRN1008the ambiguity is solved by keepingRILZABRUTINIBwhereas 2 associations exist:PRN1008... ...RILZABRUTINIB, (.ALPHA.E,3S)-PRN1008... ...RILZABRUTINIB
⚠️ There are code / official / systematic / common / brand names for drugs. Some are linked to different level of details about the compound.
This tool favors "high-level" naming (i.e. the less detailled one) : detailed systematic or branded names are mapped to a standardized, less verbose preferred name (as defined by the FDA Display Name field). For instance : both 3'-((1R)-1-((6R)-5,6-DIHYDRO-4-HYDROXY-2-OXO-6-PHENETHYL-6-PROPYL-2H-PYRAN-3-YL)PROPYL)-5-(TRIFLUOROMETHYL)-2-PYRIDINESULFONANILIDE (systematic name) and Aptivus (brand name) become TIPRANAVIR.
Python API
You can use the package programmatically in your Python scripts:
Usage
from drugname_standardizer import DrugStandardizer
ds = DrugStandardizer()
Standardize a single name
print(ds.standardize_name("GDC-0199")) # → VENETOCLAX
Standardize a list of names
names = ["aptivus", "gdc-0199"]
print(ds.standardize_list(names)) # → ['TIPRANAVIR', 'VENETOCLAX']
📄 Standardizing a JSON file
from drugname_standardizer import DrugStandardizer
ds = DrugStandardizer()
ds.standardize_json_file("drugs.json")
This will:
- read a list of drug names from
drugs.json, - standardize each name to its preferred form (based on the FDA Display Name),
- save the result as
drugs_drug_standardized.jsonby default.
You can optionally specify an output filename with output_path=....
📄 Standardizing a TSV/CSV file
ds.standardize_tsv_file(
input_path="dataset.csv",
column_drug=0,
separator=","
)
- The column at index
0(1st column) will be standardized. - The result will be saved as
dataset_drug_standardized.csvby default. - You can customize the output name using the
output_pathparameter.
Command-Line Interface (CLI)
Once installed, you can use the CLI tool directly:
Basic syntax
drugname_standardizer -i INPUT [options]
Required:
--input,-i: a drug name or path to a file (JSON/TSV)
Optional:
| Option | Description |
|---|---|
--file_type, -f |
Type of the input file: "json" or "tsv" |
--output, -o |
Output filename (optional, default: auto-generated) |
--column_drug, -c |
Column index with drug names for TSV input (starts at 0) |
--separator, -s |
Separator for TSV files (default: \t) |
--unii_file, -u |
Custom UNII Names file path (optional, overrides auto-download) |
CLI examples
-
Standardize a drug name:
drugname_standardizer -i GDC-0199
-
📄 Standardize a JSON list:
drugname_standardizer -i drugs.json -f json
The
-f jsonflag is required so the CLI interprets the input file correctly.
If-ois not specified, the output will be saved asdrugs_drug_standardized.jsonby default. -
📄 Standardize a TSV file (e.g., drug names in column 2, pipe separator):
drugname_standardizer -i dataset.tsv -f tsv -c 2 -s "|" -o standardized_dataset.tsv
The
-f tsvand-cflags are required for TSV/CSV files.
If-ois not specified, the output is saved asdataset_drug_standardized.jsonby default.
Installation
Using pip
pip install drugname_standardizer
From source
git clone https://github.com/StephanieChevalier/drugname_standardizer.git
cd drugname_standardizer
pip install -r requirements.txt
Requirements
- Python 3.7+
- Dependencies:
- Dependencies:
requests >= 2tqdm >= 4
How it works
-
Parsing UNII File:
- Downloads and parses the latest
UNII_Names.txtfile - Maps all name variants to their associated Display Name
- Resolves rare naming ambiguities (e.g., 55 ambiguous entries over ~986k)
- Downloads and parses the latest
-
Standardizing names:
- For a single drug name: return the preferred name.
- For a list of drug names: maps drug names to their preferred names and return the updated list.
- For JSON input: Maps drug names to their preferred names and saves the results to a JSON file.
- For TSV input: Updates the specified column with standardized drug names and saves the modified DataFrame to a TSV file.
Package structure
drugname_standardizer/
├── drugname_standardizer/
│ ├── __init__.py # Package initialization
│ ├── __main__.py # CLI entry point
│ ├── standardizer.py # Core logic for name standardization
│ └── data/
│ ├── UNII_Names.txt # UNII Names List file (ensured to be no older than 1 month when available)
│ └── UNII_dict.pkl # parsed UNII Names List
├── tests/
│ ├── __init__.py
│ └── test_standardizer.py # Unit tests for the package
├── LICENSE # MIT License
├── pyproject.toml # Package configuration
├── README.md # Project documentation
└── requirements.txt # Development dependencies
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file drugname_standardizer-1.2.7.tar.gz.
File metadata
- Download URL: drugname_standardizer-1.2.7.tar.gz
- Upload date:
- Size: 26.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3cfe0939b57025c57a75778d52e88634ba462d1844a915cf95ae3771ff9fa28
|
|
| MD5 |
3bce39f4e07851ab06b90af924de9261
|
|
| BLAKE2b-256 |
d47e6ae69ae04a84179ab8a2c5b1a1e20009e6b2b4f85518ffd2a3df0935ac4b
|
File details
Details for the file drugname_standardizer-1.2.7-py3-none-any.whl.
File metadata
- Download URL: drugname_standardizer-1.2.7-py3-none-any.whl
- Upload date:
- Size: 26.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6e4a22e4f99640c7e3298ae9ac5914bb17169a6f257a9eee0eb632cdeb89d58
|
|
| MD5 |
a9d338a76b3b1d33431d3971c5286dde
|
|
| BLAKE2b-256 |
9fbe888d1cf92d32279375359adef402d3845a245e6b15cd072f16ffe1b2a8d1
|