Skip to main content

Chemical grouping package based on structural patterns, functional lists, and nomenclature classification

Project description

FCCgroup

PyPI Python License: MIT Issues

FCCgroup is a Python package for grouping chemicals with three complementary methods:

  • Structural pattern matching with SMARTS fingerprints
  • Functional list matching against packaged reference lists
  • Regex-based grouping from chemical names and formulas

The package is developed under the organization of Food Packaging Forum. Authored by Albert Anguera Sempere and Helene Wiesinger.

Features

  • Structural classification using SMARTS fingerprints
  • Functional list matching from packaged assets
  • Regex-based classification from names and formulas
  • Automatic CompTox enrichment when selected methods require missing fields
  • Flexible method selection through GroupingConfig(methods=...)
  • Optional SMARTS fingerprint subsetting via GroupingConfig(smarts_fingerprints=...)
  • Package data bundled under fccgroup/assets

Installation

Install from PyPI:

pip install fccgroup

Install from source:

git clone https://github.com/Food-Packaging-Forum/fccgroup.git
cd fccgroup
pip install -e .

Install development dependencies:

pip install -e .[dev]

Quick Start

import pandas as pd

from fccgroup import ChemicalGrouper, ColumnMapping, GroupingConfig, GroupingMethod

df = pd.DataFrame(
  {
    "CASRN": ["74-84-0"],
    "Structure": ["CC"],
    "Name": ["ethane"],
    "IUPAC": ["ethane"],
    "Formula": ["C2H6"],
  }
)

config = GroupingConfig(
  methods=[GroupingMethod.SMARTS, GroupingMethod.REGEX],
  column_mapping=ColumnMapping(
    cas="CASRN",
    smiles="Structure",
    name_columns=["Name", "IUPAC"],
    formula="Formula",
  ),
)

grouper = ChemicalGrouper(df=df, grouping_config=config)
results = grouper.group_chemicals(save=False)

# Columns are a MultiIndex: (group_label, column_name)
print(results.columns.tolist())
print(results.head())

Selecting Grouping Methods

FCCgroup does not expose a GroupingMode enum. Method selection is configured with GroupingMethod values:

  • GroupingMethod.SMARTS: structural pattern matching
  • GroupingMethod.LISTS: functional list matching
  • GroupingMethod.REGEX: regex-based grouping from names and formulas

Common configurations:

GroupingConfig(methods=[GroupingMethod.SMARTS], column_mapping=...)
GroupingConfig(methods=[GroupingMethod.SMARTS, GroupingMethod.LISTS], column_mapping=...)
GroupingConfig(
  methods=[GroupingMethod.SMARTS, GroupingMethod.LISTS, GroupingMethod.REGEX],
  column_mapping=...,
)

Filtering SMARTS fingerprints

To apply only a subset of the ~400 bundled SMARTS patterns, pass their names to smarts_fingerprints:

GroupingConfig(
  methods=[GroupingMethod.SMARTS],
  column_mapping=...,
  smarts_fingerprints={"Alkanes", "PAH derivatives hydrocarbon"},
)

When smarts_fingerprints is None (default), all available patterns are applied.

Custom assets path

By default ChemicalGrouper loads assets from the package installation directory. To point it at a different directory:

ChemicalGrouper(df=df, grouping_config=config, assets_path="/path/to/custom/assets")

Input Requirements

  • ChemicalGrouper must be initialized with a non-empty pandas DataFrame.
  • ColumnMapping must provide at least one of cas or smiles (the other may be None).
  • name_columns and formula are optional at configuration time, but REGEX grouping may trigger CompTox enrichment when they are missing.
  • Input column names can be custom; FCCgroup maps them to canonical internal fields.

Assets And External Services

  • Packaged assets live under fccgroup/assets.
  • Mapping.xlsx and the files in fccgroup/assets/lists are required for LISTS workflow.
  • CompTox (EPA) is used only when the selected methods require fields that are not already available in the mapped input columns (e.g. SMILES needed for SMARTS but only CAS provided).
  • CompTox enrichment requires a valid API key set in the COMPTOX_API_KEY environment variable.
  • CompTox usage depends on network availability and the EPA CompTox service.

Output

group_chemicals(save=True) returns a pandas DataFrame with a MultiIndex on columns. The first level groups results by method; the second level is the column name.

Top-level label Contents
Identifier Internal identifier columns (casId, SMILES)
Structural patterns Chemical groups and per-fingerprint columns (SMARTS method)
Lists Per-list membership columns (LISTS method)
Regex Pattern group columns (REGEX method)

Example column access:

# Access the SMILES identifier column
results[("Identifier", "SMILES")]

# Access the Chemical groups column
results[("Structural patterns", "Chemical groups")]

When save=True (default), results are also written to an Excel file in the current working directory.

Runtime Dependencies

FCCgroup currently declares the runtime dependencies described in requirements.txt

Citation

If you use FCCgroup in your research, please cite:

@software{fccgroup,
  title={FCCgroup: Chemical Grouping and Classification Package},
  author={Anguera Sempere, Albert and Wiesinger, Helene},
  organization={Food Packaging Forum},
  year={2026},
}

Contributing

Contributions are welcome through pull requests.

Support

For issues, questions, or suggestions, open an issue at https://github.com/Food-Packaging-Forum/fccgroup/issues.

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fccgroup-0.3.0.tar.gz (78.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fccgroup-0.3.0-py3-none-any.whl (78.9 MB view details)

Uploaded Python 3

File details

Details for the file fccgroup-0.3.0.tar.gz.

File metadata

  • Download URL: fccgroup-0.3.0.tar.gz
  • Upload date:
  • Size: 78.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for fccgroup-0.3.0.tar.gz
Algorithm Hash digest
SHA256 5921a8882746e001217abd27a8ec81a6a02da812c2f6803f9790ab7559d483b1
MD5 beb420c97a45b0245ab648108bdd8757
BLAKE2b-256 d66803ac42c38dff57ba55f92039ff7108bcf2d159488cf43897105313b78ac4

See more details on using hashes here.

File details

Details for the file fccgroup-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: fccgroup-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 78.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for fccgroup-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bb919505b9b82ff655ca566be799e7e4156b17d281df6a36034d38d66b16840b
MD5 35b87cdc08e212fdf399830ac8e74835
BLAKE2b-256 27ce9b5f7f2b4b29355d7d986a3a64d80e6046ff318dc42783bb30d0eafbf508

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page