Implementation of a pydantic BaseModel class for CAS Registry Numbers® (CAS RN®);

These details have not been verified by PyPI

Project links

Project description

CAS Registry Number Validator and Sorter

A Python utility class for validating and sorting Chemical Abstract Service (CAS) Registry Numbers®. Pydantic is used for validation and is a dependency

Overview

The CAS class provides functionality to work with CAS Registry Numbers® (CAS RN®), which are unique numerical identifiers assigned to chemical substances. This tool helps validate the format and checksum of CAS RNs and enables sorting collections of CAS numbers.

Features

Validates Chemical Abstract Service Registry Number® (CAS RN®) format
Verifies CAS RN checksum
Makes CAS numbers sortable in ascending or descending order based on numeric comparison rather than string-based comparison
CAS object instances are hashed which allows for uses in Python Sets (i.e. a means to eliminate duplicates)
CAS object instances are str printed as plain strings of the CAS formatted registry number
CAS object instances can be JSON serialized by using Pydantic model_dump_json()
Pandas extension dtype (CASDtype) for working with CAS numbers in DataFrames with full type safety and validation

Usage

from cas_reg import CAS

# Create a CAS instance
cas_number = CAS(num="7732-18-5")  # Water
print(cas_number)
>>> "7732-18-5"

# use pydantic's by_alias to produce the model as a dictionary with CAS
print(cas_number.model_dump_json(by_alias=True))
>>> "{'CAS': '50-00-0'}"


# Invalid CAS numbers do not validate
try:
    cas_string = "50-01-1"
    my_cas_no = CAS(num=cas_string)
except ValueError as e:
    print(f"CAS Number, {cas_string}, is invalid: {e}") 


# Sort multiple CAS numbers
cas_str_list = ["7732-18-5", "67-56-1", "124-38-9"]
cas_list = [CAS(num=x) for x in cas_str_list]
cas_list.sort()
print([str(x) for x in cas_list])  
>>> ["67-56-1", "124-38-9", "7732-18-5"]

# Use python sets to eliminate duplicates
cas_str_list2 = ["50-00-0", "7732-18-5", "67-56-1", "124-38-9", "50-00-0"]
cas_list2 = [CAS(num=x) for x in cas_str_list2]
unique_cas_list = list(set(cas_list2))
unique_cas_list.sort()
print([str(x) for x in unique_cas_list])
>>> ["50-00-0", "67-56-1", "124-38-9", "7732-18-5"]

Usage of pandas extension dtype

The cas-reg package optionally provides a custom pandas extension dtype (CASDtype) and array type (CASArray) that allows you to work with CAS Registry Numbers in pandas DataFrames with full type safety and validation.

Installation with pandas support

# With uv
uv add "cas-reg[pandas]"

# With pip
pip install "cas-reg[pandas]"

Creating DataFrames with CAS columns

import pandas as pd
from cas_reg import CAS
from cas_reg.pandas_ext import CASDtype, CASArray

# Create a DataFrame with CAS numbers directly
df = pd.DataFrame({
    "cas": CASArray(["50-00-0", "58-08-2", "7732-18-5"]),
    "name": ["Formaldehyde", "Caffeine", "Water"],
    "molecular_weight": [30.03, 194.19, 18.02]
})

print(df)
#        cas            name  molecular_weight
# 0  50-00-0   Formaldehyde              30.03
# 1  58-08-2        Caffeine             194.19
# 2  7732-18-5         Water              18.02

print(df["cas"].dtype)
# CAS

Converting existing DataFrames

If you have an existing DataFrame with CAS numbers stored as strings (object dtype), you can convert them to the CASDtype:

# Existing DataFrame with CAS numbers as strings
df = pd.DataFrame({
    "cas": ["50-00-0", "58-08-2", "7732-18-5"],
    "name": ["Formaldehyde", "Caffeine", "Water"]
})

print(df["cas"].dtype)  # object

# Convert the column to CASDtype
df["cas"] = df["cas"].astype(CASDtype())

print(df["cas"].dtype)  # CAS

# Each value is now a CAS object
print(type(df["cas"].iloc[0]))  # <class 'cas_reg.CAS'>

Sorting DataFrames by CAS numbers

CAS numbers can be sorted naturally within a DataFrame:

df = pd.DataFrame({
    "cas": CASArray(["58-08-2", "50-00-0", "7732-18-5", "67-56-1"]),
    "name": ["Caffeine", "Formaldehyde", "Water", "Methanol"]
})

# Sort by CAS number (ascending)
df_sorted = df.sort_values(by="cas")
print(df_sorted)
#         cas          name
# 1   50-00-0  Formaldehyde
# 3   67-56-1      Methanol
# 0   58-08-2      Caffeine
# 2  7732-18-5         Water

# Sort by CAS number (descending)
df_desc = df.sort_values(by="cas", ascending=False)

Validation of CAS number structure

The CASDtype automatically validates CAS numbers when they are added to the array. Invalid CAS numbers will raise a ValueError:

# This will raise a ValueError due to invalid checksum
try:
    df = pd.DataFrame({
        "cas": CASArray(["50-00-1"])  # Invalid checksum
    })
except ValueError as e:
    print(f"Validation error: {e}")
    # Validation error: Invalid CAS checksum for '50-00-1'

# Handle missing values with None or pd.NA
df = pd.DataFrame({
    "cas": CASArray(["50-00-0", None, "58-08-2"]),
    "name": ["Formaldehyde", "Unknown", "Caffeine"]
})

# Check for missing values
print(df["cas"].isna())
# 0    False
# 1     True
# 2    False
# Name: cas, dtype: bool

Working with CAS columns

# Create a Series with CAS numbers
cas_series = pd.Series(CASArray(["50-00-0", "58-08-2", "7732-18-5"]))

# Get min and max CAS numbers
print(cas_series.min())  # 50-00-0
print(cas_series.max())  # 7732-18-5

# Filter DataFrames
df = pd.DataFrame({
    "cas": CASArray(["50-00-0", "58-08-2", "7732-18-5"]),
    "toxicity": ["high", "medium", "low"]
})

high_toxicity = df[df["toxicity"] == "high"]
print(high_toxicity["cas"].iloc[0])  # 50-00-0

# Concatenate DataFrames with CAS columns
df1 = pd.DataFrame({"cas": CASArray(["50-00-0", "58-08-2"])})
df2 = pd.DataFrame({"cas": CASArray(["7732-18-5"])})
df_combined = pd.concat([df1, df2], ignore_index=True)
print(df_combined["cas"].dtype)  # CAS

Joining DataFrames on CAS numbers

You can perform inner joins (or other merge operations) on DataFrames using CAS numbers as the key:

# Create two DataFrames with CAS number columns
chemicals_df = pd.DataFrame({
    "cas": CASArray(["50-00-0", "58-08-2", "7732-18-5", "67-56-1"]),
    "name": ["Formaldehyde", "Caffeine", "Water", "Methanol"],
    "formula": ["CH2O", "C8H10N4O2", "H2O", "CH3OH"]
})

properties_df = pd.DataFrame({
    "cas": CASArray(["58-08-2", "7732-18-5", "64-17-5", "50-00-0"]),
    "boiling_point": [178, 100, 78, -19],
    "state": ["solid", "liquid", "liquid", "gas"]
})

# Perform inner join on CAS number
merged_df = pd.merge(chemicals_df, properties_df, on="cas", how="inner")

print(merged_df)
#        cas          name    formula  boiling_point   state
# 0  50-00-0  Formaldehyde       CH2O            -19     gas
# 1  58-08-2      Caffeine  C8H10N4O2            178   solid
# 2  7732-18-5       Water        H2O            100  liquid

# The CAS dtype is preserved after merge
print(merged_df["cas"].dtype)  # CAS

# You can also do left, right, or outer joins
left_join = pd.merge(chemicals_df, properties_df, on="cas", how="left")
# This will include Methanol with NaN values for boiling_point and state

Converting to dictionary or JSON

df = pd.DataFrame({
    "cas": CASArray(["50-00-0", "58-08-2"]),
    "name": ["Formaldehyde", "Caffeine"]
})

# Convert to dictionary (CAS objects preserved)
records = df.to_dict(orient="records")
print(records[0]["cas"])  # CAS(num='50-00-0')

# Convert CAS objects to strings for JSON serialization
df["cas_str"] = df["cas"].astype(str)
json_data = df.to_json(orient="records")

Installation

With uv (recommended):

uv add cas-reg

With pip:

pip install cas-reg

Requirements

Python 3.11+
Pydantic 2.8.2+

Optional dependencies

For pandas support:

pandas 2.0.0+
numpy 1.24.0+

Install with: pip install "cas-reg[pandas]" or uv add "cas-reg[pandas]"

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Dec 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cas_reg-0.1.0.tar.gz (7.0 kB view details)

Uploaded Dec 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cas_reg-0.1.0-py3-none-any.whl (8.2 kB view details)

Uploaded Dec 14, 2025 Python 3

File details

Details for the file cas_reg-0.1.0.tar.gz.

File metadata

Download URL: cas_reg-0.1.0.tar.gz
Upload date: Dec 14, 2025
Size: 7.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.6

File hashes

Hashes for cas_reg-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0c6ab1b1263db366e63385b5c8b42339bac953d14b260539bc7f08fe20de6c3e`
MD5	`c11b74885f69fe38ae3c2cf34f05afe6`
BLAKE2b-256	`96928d27896fbdc88b21f662541f16f116c4bc9e99a31c93ce4e73b05d0be372`

See more details on using hashes here.

File details

Details for the file cas_reg-0.1.0-py3-none-any.whl.

File metadata

Download URL: cas_reg-0.1.0-py3-none-any.whl
Upload date: Dec 14, 2025
Size: 8.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.6

File hashes

Hashes for cas_reg-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`44892379112775aba87101ad762206af0f396bff701b23426b37caa8ebeb177a`
MD5	`e113faf7871214fb30724006dcad1984`
BLAKE2b-256	`27d349761f6adc19262000a9bb8c67a610ba82f3c8854f695db17c18b1e3bb89`

See more details on using hashes here.

cas-reg 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

CAS Registry Number Validator and Sorter

Overview

Features

Usage

Usage of pandas extension dtype

Installation with pandas support

Creating DataFrames with CAS columns

Converting existing DataFrames

Sorting DataFrames by CAS numbers

Validation of CAS number structure

Working with CAS columns

Joining DataFrames on CAS numbers

Converting to dictionary or JSON

Installation

Requirements

Optional dependencies

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes