A Python tool for standardizing drug names using the latest FDA's UNII Names list.

These details have not been verified by PyPI

Project links

Homepage

Project description

Drugname Standardizer

The Drugname Standardizer is a Python tool for standardizing drug names using the official FDA's UNII Names List archive. It notably supports both JSON and TSV/CSV input formats, making it easy to ensure consistent drug naming in datasets.

Features

A trusted source for drug synonyms : the package automatically downloads the latest version of the UNII Names file from the official FDA repository. The UNII_Names.txt is saved to the package's data/ folder for future use. The user can also choose to indicate another local UNII Names file if a particular version is preferred.
Parsing of the FDA's UNII Names List to map drug names (code / official / systematic / common / brand names) to a single preferred name (i.e. the Display Name of the UNII Names file).
Input versatility:
- a single drug name,
- a list of drug names,
- a JSON input file (a list of drugs to standardize)
- a TSV input file (a dataframe containing a column of drugs to standardize)
Provides both a Python package interface for scripting and a command-line interface (CLI) for direct use.
Resolves naming ambiguities of the FDA's UNII Names file by selecting the shortest Display Names. Rare but exists: 55 / 986397 associations in UNII_Names_20Dec2024.txt. For example, for PRN1008 the ambiguity is solved by keeping RILZABRUTINIB whereas 2 associations exist:
- PRN1008 ... ... RILZABRUTINIB, (.ALPHA.E,3S)-
- PRN1008 ... ... RILZABRUTINIB

Warning:

There are code / official / systematic / common / brand names for drugs. Some are linked to different level of details about the compound. The standardization proposed here gathers information at the "upper" level (i.e. the less detailled one). I relied on the "Preferred Substance Name" (= the Display name field) indicated in the correspondence table provided by the FDA.
For instance : both 3'-((1R)-1-((6R)-5,6-DIHYDRO-4-HYDROXY-2-OXO-6-PHENETHYL-6-PROPYL-2H-PYRAN-3-YL)PROPYL)-5-(TRIFLUOROMETHYL)-2-PYRIDINESULFONANILIDE (systematic name) and Aptivus (brand name) become TIPRANAVIR.

Usage

Python API

You can use the package programmatically in your Python scripts:

from drugname_standardizer import standardize

Examples:

- Get the preferred name for a specific drug:

drug_name = "GDC-0199"
preferred_name = standardize(drug_name)
print(preferred_name)  # Outputs: VENETOCLAX

- Standardize a list of drugs:

drug_names = ["GDC-0199", "Aptivus", "diodrast"]
preferred_names = standardize(drug_names)
print(preferred_names)  # Outputs: ["VENETOCLAX", "TIPRANAVIR", "IODOPYRACET"]

- Standardize a JSON file:

standardize(
    input_file = "drugs.json",
    file_type = "json",
    output_file = "standardized_drugs.json"
)
# Outputs: Standardized JSON file saved as standardized_drugs.json

- Standardize a TSV/CSV file:

standardize(
    input_file = "dataset.csv",
    file_type = "tsv",
    column_drug = 0,
    separator = ","
)
# Outputs: Standardized TSV file saved as dataset_drug_standardized.csv

Command-Line Interface

You can also use a CLI for standardizing a drug name, a JSON file or a TSV/CSV file.

Required arguments:
- --input, -i: A drug name or the path to a JSON/TSV file
Optional arguments:
- --file_type, -f: Type of the input file (json or tsv)
- --output, -o: The output file name (relative path can be given). Defaults: the input file name with _drug_standardized added before the extension.
- --column_drug, -c: Index of the column containing the drug names to standardize (required for TSV files). Starts at 0: 1st column = column 0.
- --separator, -s: Field separator for TSV files. Defaults: \t.
- --unii_file, -u: Path to a UNII Names List file. Defaults: automatic download of the latest version.

Examples:

- Get the preferred name for a specific drug:

drugname_standardizer -i DynaCirc

- Standardize a JSON file:

drugname_standardizer -i drugs.json -f json

- Standardize a TSV file: e.g., using a pipe as separator and a custom file name for the output:

drugname_standardizer -i dataset.tsv -f tsv -c 2 -s "|" -o standardized_dataset.tsv

Installation

Using pip

python3 -m pip install drugname_standardizer

GitHub repository

git clone https://github.com/StephanieChevalier/drugname_standardizer.git
cd drugname_standardizer
pip install -r requirements.txt

Requirements:

Python 3.12+
Dependencies:
- pandas >= 2.2.2
- requests >= 2.32.2
- tqdm >= 4.66.4

How it works

Parse UNII File:
- Reads the UNII Names List to create a mapping of drug names to the Display Name (i.e. the preferred name).
- Resolves potential naming conflicts by selecting the shortest Display Name (55 / 986397 associations).
Standardize Names:
- For a single drug name: return the preferred name.
- For a list of drug names: maps drug names to their preferred names and return the updated list.
- For JSON input: Maps drug names to their preferred names and saves the results to a JSON file.
- For TSV input: Updates the specified column with standardized drug names and saves the modified DataFrame to a TSV file.

Package structure

drugname_standardizer/
├── drugname_standardizer/
│   ├── __init__.py               # Package initialization
│   ├── standardizer.py           # Core logic for name standardization
│   └── data/
│       ├── UNII_Names.txt  # UNII Names List file (ensured to be no older than 1 month when available)
│       └── UNII_dict.pkl   # parsed UNII Names List
├── tests/
│   ├── __init__.py               
│   └── test_standardizer.py      # Unit tests for the package
├── LICENSE                       # MIT License
├── pyproject.toml                # Package configuration
├── README.md                     # Project documentation
└── requirements.txt              # Development dependencies

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.2.7

Jun 12, 2025

1.2.6

Jun 12, 2025

1.2.5

Mar 17, 2025

1.2.4

Mar 16, 2025

1.2.3

Mar 16, 2025

This version

1.2.2

Mar 15, 2025

1.2.1

Jan 21, 2025

1.2.0

Jan 20, 2025

1.1.8

Jan 20, 2025

1.1.7

Jan 20, 2025

1.1.6

Jan 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drugname_standardizer-1.2.2.tar.gz (26.1 MB view details)

Uploaded Mar 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

drugname_standardizer-1.2.2-py3-none-any.whl (26.2 MB view details)

Uploaded Mar 15, 2025 Python 3

File details

Details for the file drugname_standardizer-1.2.2.tar.gz.

File metadata

Download URL: drugname_standardizer-1.2.2.tar.gz
Upload date: Mar 15, 2025
Size: 26.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.4

File hashes

Hashes for drugname_standardizer-1.2.2.tar.gz
Algorithm	Hash digest
SHA256	`815ac4a0b71f9c49d3da59484d222cc06431e6099df05248faa2fa7214cb9962`
MD5	`0247bf0995aba4c83f4fb053eb997bd6`
BLAKE2b-256	`ed6047f5ceb6c6b770c7c9a996be5ed1942f4cca27e59ffa762a6561412863db`

See more details on using hashes here.

File details

Details for the file drugname_standardizer-1.2.2-py3-none-any.whl.

File metadata

Download URL: drugname_standardizer-1.2.2-py3-none-any.whl
Upload date: Mar 15, 2025
Size: 26.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.4

File hashes

Hashes for drugname_standardizer-1.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4ee26b15762f6f5008bb2555b709b8e32ac0ed348d6871cf2a2f1110ea3553d5`
MD5	`dc139abe8c10aedc920ad80ddc0ee768`
BLAKE2b-256	`66b8ad6764d513df4a5f5a2cffd4f9fb1475af41885fe16a0c7d3a727bf2d848`

See more details on using hashes here.

drugname-standardizer 1.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Drugname Standardizer

Features

Warning:

Usage

Python API

Examples:

Command-Line Interface

Examples:

Installation

Using pip

GitHub repository

Requirements:

How it works

Package structure

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes