Skip to main content

Chemical grouping package based on structural patterns, functional lists, and nomenclature classification

Project description

FCCgroup

FCCgroup is a Python package for grouping chemicals with three complementary methods:

  • Structural pattern matching with SMARTS fingerprints
  • Functional list matching against packaged reference lists
  • Regex-based grouping from chemical names and formulas

The package is developed under the organization of Food Packaging Forum. Authored by Albert Anguera Sempere and Helene Wiesinger.

Features

  • Structural classification using SMARTS fingerprints
  • Functional list matching from packaged assets
  • Regex-based classification from names and formulas
  • Automatic CIRpy enrichment when selected methods require missing fields
  • Flexible method selection through GroupingConfig(methods=...)
  • Package data bundled under fccgroup/assets

Installation

Install from PyPI:

pip install fccgroup

Install from source:

git clone https://github.com/Food-Packaging-Forum/fccgroup.git
cd fccgroup
pip install -e .

Install development dependencies:

pip install -e .[dev]

Quick Start

import pandas as pd

from fccgroup import ChemicalGrouper, ColumnMapping, GroupingConfig, GroupingMethod

df = pd.DataFrame(
  {
    "CASRN": ["74-84-0"],
    "Structure": ["CC"],
    "Name": ["ethane"],
    "IUPAC": ["ethane"],
    "Formula": ["C2H6"],
  }
)

config = GroupingConfig(
  methods=[GroupingMethod.SMARTS, GroupingMethod.REGEX],
  column_mapping=ColumnMapping(
    cas="CASRN",
    smiles="Structure",
    name_columns=["Name", "IUPAC"],
    formula="Formula",
  ),
)

grouper = ChemicalGrouper(df=df, grouping_config=config)
results = grouper.group_chemicals()

print(results.columns.tolist())
print(results.head())

Selecting Grouping Methods

FCCgroup does not expose a GroupingMode enum. Method selection is configured with GroupingMethod values:

  • GroupingMethod.SMARTS: structural pattern matching
  • GroupingMethod.LISTS: functional list matching
  • GroupingMethod.REGEX: regex-based grouping from names and formulas

Common configurations:

GroupingConfig(methods=[GroupingMethod.SMARTS], column_mapping=...)
GroupingConfig(methods=[GroupingMethod.SMARTS, GroupingMethod.LISTS], column_mapping=...)
GroupingConfig(
  methods=[GroupingMethod.SMARTS, GroupingMethod.LISTS, GroupingMethod.REGEX],
  column_mapping=...,
)

Input Requirements

  • ChemicalGrouper must be initialized with a non-empty pandas DataFrame.
  • ColumnMapping must provide at least one of cas or smiles.
  • name_columns and formula are optional at configuration time, but REGEX grouping may trigger CIRpy enrichment when they are missing.
  • Input column names can be custom; FCCgroup maps them to canonical internal fields.

Assets And External Services

  • Packaged assets live under fccgroup/assets.
  • Mapping.xlsx and the files in fccgroup/assets/lists are required for LISTS workflow.
  • CIRpy is used only when the selected methods require fields that are not already available in the mapped input columns.
  • CIRpy usage depends on network availability and the external resolver service.

Output

group_chemicals() returns a pandas DataFrame containing the normalized internal identifier columns plus the columns produced by the selected methods.

Typical outputs include:

  • SMILES and/or casId internal identifier columns
  • Chemical groups and SMARTS fingerprint columns when SMARTS is selected
  • Functional list columns when LISTS is selected
  • Regex-derived columns when REGEX is selected

Runtime Dependencies

FCCgroup currently declares the runtime dependencies described in requirements.txt

Citation

If you use FCCgroup in your research, please cite:

@software{fccgroup,
  title={FCCgroup: Chemical Grouping and Classification Package},
  author={Anguera Sempere, Albert and Wiesinger, Helene},
  organization={Food Packaging Forum},
  year={2026},
}

Contributing

Contributions are welcome through pull requests.

Support

For issues, questions, or suggestions, open an issue at https://github.com/Food-Packaging-Forum/fccgroup/issues.

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fccgroup-0.1.0.tar.gz (76.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fccgroup-0.1.0-py3-none-any.whl (76.2 MB view details)

Uploaded Python 3

File details

Details for the file fccgroup-0.1.0.tar.gz.

File metadata

  • Download URL: fccgroup-0.1.0.tar.gz
  • Upload date:
  • Size: 76.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for fccgroup-0.1.0.tar.gz
Algorithm Hash digest
SHA256 320b53736b32dd011409fe58da2bbd1d797843863608ca62b80f05dea4666c0b
MD5 2690f5895f2285e8d01d768979c092d5
BLAKE2b-256 703d080c1506c1aca55bccb6511850f38f054e0c73a8677cb5fa521066248cca

See more details on using hashes here.

File details

Details for the file fccgroup-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fccgroup-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 76.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for fccgroup-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3469240d6a20568852983520c9f6ef689984245e8210731d71ae4aca797366e0
MD5 4afcc976df5f7423cebca83f6809dda3
BLAKE2b-256 40e15182a5852c9881ecafbd2419129140fbadeba1965c3f7b2bc09cec175588

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page