Skip to main content

Chemical grouping package based on structural patterns, functional lists, and nomenclature classification

Project description

FCCgroup

FCCgroup is a Python package for grouping chemicals with three complementary methods:

  • Structural pattern matching with SMARTS fingerprints
  • Functional list matching against packaged reference lists
  • Regex-based grouping from chemical names and formulas

The package is developed under the organization of Food Packaging Forum. Authored by Albert Anguera Sempere and Helene Wiesinger.

Features

  • Structural classification using SMARTS fingerprints
  • Functional list matching from packaged assets
  • Regex-based classification from names and formulas
  • Automatic CIRpy enrichment when selected methods require missing fields
  • Flexible method selection through GroupingConfig(methods=...)
  • Package data bundled under fccgroup/assets

Installation

Install from PyPI:

pip install fccgroup

Install from source:

git clone https://github.com/Food-Packaging-Forum/fccgroup.git
cd fccgroup
pip install -e .

Install development dependencies:

pip install -e .[dev]

Quick Start

import pandas as pd

from fccgroup import ChemicalGrouper, ColumnMapping, GroupingConfig, GroupingMethod

df = pd.DataFrame(
  {
    "CASRN": ["74-84-0"],
    "Structure": ["CC"],
    "Name": ["ethane"],
    "IUPAC": ["ethane"],
    "Formula": ["C2H6"],
  }
)

config = GroupingConfig(
  methods=[GroupingMethod.SMARTS, GroupingMethod.REGEX],
  column_mapping=ColumnMapping(
    cas="CASRN",
    smiles="Structure",
    name_columns=["Name", "IUPAC"],
    formula="Formula",
  ),
)

grouper = ChemicalGrouper(df=df, grouping_config=config)
results = grouper.group_chemicals()

print(results.columns.tolist())
print(results.head())

Selecting Grouping Methods

FCCgroup does not expose a GroupingMode enum. Method selection is configured with GroupingMethod values:

  • GroupingMethod.SMARTS: structural pattern matching
  • GroupingMethod.LISTS: functional list matching
  • GroupingMethod.REGEX: regex-based grouping from names and formulas

Common configurations:

GroupingConfig(methods=[GroupingMethod.SMARTS], column_mapping=...)
GroupingConfig(methods=[GroupingMethod.SMARTS, GroupingMethod.LISTS], column_mapping=...)
GroupingConfig(
  methods=[GroupingMethod.SMARTS, GroupingMethod.LISTS, GroupingMethod.REGEX],
  column_mapping=...,
)

Input Requirements

  • ChemicalGrouper must be initialized with a non-empty pandas DataFrame.
  • ColumnMapping must provide at least one of cas or smiles.
  • name_columns and formula are optional at configuration time, but REGEX grouping may trigger CIRpy enrichment when they are missing.
  • Input column names can be custom; FCCgroup maps them to canonical internal fields.

Assets And External Services

  • Packaged assets live under fccgroup/assets.
  • Mapping.xlsx and the files in fccgroup/assets/lists are required for LISTS workflow.
  • CIRpy is used only when the selected methods require fields that are not already available in the mapped input columns.
  • CIRpy usage depends on network availability and the external resolver service.

Output

group_chemicals() returns a pandas DataFrame containing the normalized internal identifier columns plus the columns produced by the selected methods.

Typical outputs include:

  • SMILES and/or casId internal identifier columns
  • Chemical groups and SMARTS fingerprint columns when SMARTS is selected
  • Functional list columns when LISTS is selected
  • Regex-derived columns when REGEX is selected

Runtime Dependencies

FCCgroup currently declares the runtime dependencies described in requirements.txt

Citation

If you use FCCgroup in your research, please cite:

@software{fccgroup,
  title={FCCgroup: Chemical Grouping and Classification Package},
  author={Anguera Sempere, Albert and Wiesinger, Helene},
  organization={Food Packaging Forum},
  year={2026},
}

Contributing

Contributions are welcome through pull requests.

Support

For issues, questions, or suggestions, open an issue at https://github.com/Food-Packaging-Forum/fccgroup/issues.

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fccgroup-0.2.0.tar.gz (76.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fccgroup-0.2.0-py3-none-any.whl (76.2 MB view details)

Uploaded Python 3

File details

Details for the file fccgroup-0.2.0.tar.gz.

File metadata

  • Download URL: fccgroup-0.2.0.tar.gz
  • Upload date:
  • Size: 76.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for fccgroup-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1592d50b4b4e82cbc6dff753fb44254e2d5fb38206f7d037c1a6d3b1734b3b42
MD5 1e28c37db23b9d9912c8e99ef3db42e8
BLAKE2b-256 785984be2f2722bdff844b4aff3aca591b6ca08eb547dd055a1384f144dfd647

See more details on using hashes here.

File details

Details for the file fccgroup-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: fccgroup-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 76.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for fccgroup-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c71ecaab20b54f3dd104ce80e309c6cdd40471402215444f677a2dd13a82968b
MD5 e40d7e382229ace23bdab03a059c022f
BLAKE2b-256 b314b22c4eeb30b721306b611b94ca3835a4e684f1d3564562b4ac9c54001a33

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page