Skip to main content

Chemical grouping package based on structural patterns, functional lists, and nomenclature classification

Project description

FCCgroup

FCCgroup is a Python package for grouping chemicals with three complementary methods:

  • Structural pattern matching with SMARTS fingerprints
  • Functional list matching against packaged reference lists
  • Regex-based grouping from chemical names and formulas

The package is developed under the organization of Food Packaging Forum. Authored by Albert Anguera Sempere and Helene Wiesinger.

Features

  • Structural classification using SMARTS fingerprints
  • Functional list matching from packaged assets
  • Regex-based classification from names and formulas
  • Automatic CIRpy enrichment when selected methods require missing fields
  • Flexible method selection through GroupingConfig(methods=...)
  • Package data bundled under fccgroup/assets

Installation

Install from PyPI:

pip install fccgroup

Install from source:

git clone https://github.com/Food-Packaging-Forum/fccgroup.git
cd fccgroup
pip install -e .

Install development dependencies:

pip install -e .[dev]

Quick Start

import pandas as pd

from fccgroup import ChemicalGrouper, ColumnMapping, GroupingConfig, GroupingMethod

df = pd.DataFrame(
  {
    "CASRN": ["74-84-0"],
    "Structure": ["CC"],
    "Name": ["ethane"],
    "IUPAC": ["ethane"],
    "Formula": ["C2H6"],
  }
)

config = GroupingConfig(
  methods=[GroupingMethod.SMARTS, GroupingMethod.REGEX],
  column_mapping=ColumnMapping(
    cas="CASRN",
    smiles="Structure",
    name_columns=["Name", "IUPAC"],
    formula="Formula",
  ),
)

grouper = ChemicalGrouper(df=df, grouping_config=config)
results = grouper.group_chemicals()

print(results.columns.tolist())
print(results.head())

Selecting Grouping Methods

FCCgroup does not expose a GroupingMode enum. Method selection is configured with GroupingMethod values:

  • GroupingMethod.SMARTS: structural pattern matching
  • GroupingMethod.LISTS: functional list matching
  • GroupingMethod.REGEX: regex-based grouping from names and formulas

Common configurations:

GroupingConfig(methods=[GroupingMethod.SMARTS], column_mapping=...)
GroupingConfig(methods=[GroupingMethod.SMARTS, GroupingMethod.LISTS], column_mapping=...)
GroupingConfig(
  methods=[GroupingMethod.SMARTS, GroupingMethod.LISTS, GroupingMethod.REGEX],
  column_mapping=...,
)

Input Requirements

  • ChemicalGrouper must be initialized with a non-empty pandas DataFrame.
  • ColumnMapping must provide at least one of cas or smiles.
  • name_columns and formula are optional at configuration time, but REGEX grouping may trigger CIRpy enrichment when they are missing.
  • Input column names can be custom; FCCgroup maps them to canonical internal fields.

Assets And External Services

  • Packaged assets live under fccgroup/assets.
  • Mapping.xlsx and the files in fccgroup/assets/lists are required for LISTS workflow.
  • CIRpy is used only when the selected methods require fields that are not already available in the mapped input columns.
  • CIRpy usage depends on network availability and the external resolver service.

Output

group_chemicals() returns a pandas DataFrame containing the normalized internal identifier columns plus the columns produced by the selected methods.

Typical outputs include:

  • SMILES and/or casId internal identifier columns
  • Chemical groups and SMARTS fingerprint columns when SMARTS is selected
  • Functional list columns when LISTS is selected
  • Regex-derived columns when REGEX is selected

Runtime Dependencies

FCCgroup currently declares the runtime dependencies described in requirements.txt

Citation

If you use FCCgroup in your research, please cite:

@software{fccgroup,
  title={FCCgroup: Chemical Grouping and Classification Package},
  author={Anguera Sempere, Albert and Wiesinger, Helene},
  organization={Food Packaging Forum},
  year={2026},
}

Contributing

Contributions are welcome through pull requests.

Support

For issues, questions, or suggestions, open an issue at https://github.com/Food-Packaging-Forum/fccgroup/issues.

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fccgroup-0.1.1.tar.gz (76.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fccgroup-0.1.1-py3-none-any.whl (76.2 MB view details)

Uploaded Python 3

File details

Details for the file fccgroup-0.1.1.tar.gz.

File metadata

  • Download URL: fccgroup-0.1.1.tar.gz
  • Upload date:
  • Size: 76.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for fccgroup-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b476f61abc77fb8c9b4e005dca0b87204b56727f3c83e74ebfb1df0fca15d44a
MD5 1ec0ff3bd859d122fafc441497880a04
BLAKE2b-256 4424ee17f6457eb6a7c13709e31621827e5d5e1eef51af4c1fcc0fab0b739ef1

See more details on using hashes here.

File details

Details for the file fccgroup-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: fccgroup-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 76.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for fccgroup-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e81392be7d105ec9e677a35034fcb74fbe28647c5dce5e774685e7fabb86c659
MD5 159d28ef9ed79908c4f3eff7e0b877c5
BLAKE2b-256 a0f455e1a996738c66e02102c1325aa9a674c25752b3852f047678050eda1360

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page