Chemical grouping package based on structural patterns, functional lists, and nomenclature classification
Project description
FCCgroup
FCCgroup is a Python package for grouping chemicals with three complementary methods:
- Structural pattern matching with SMARTS fingerprints
- Functional list matching against packaged reference lists
- Regex-based grouping from chemical names and formulas
The package is developed under the organization of Food Packaging Forum. Authored by Albert Anguera Sempere and Helene Wiesinger.
Features
- Structural classification using SMARTS fingerprints
- Functional list matching from packaged assets
- Regex-based classification from names and formulas
- Automatic CIRpy enrichment when selected methods require missing fields
- Flexible method selection through
GroupingConfig(methods=...) - Package data bundled under
fccgroup/assets
Installation
Install from PyPI:
pip install fccgroup
Install from source:
git clone https://github.com/Food-Packaging-Forum/fccgroup.git
cd fccgroup
pip install -e .
Install development dependencies:
pip install -e .[dev]
Quick Start
import pandas as pd
from fccgroup import ChemicalGrouper, ColumnMapping, GroupingConfig, GroupingMethod
df = pd.DataFrame(
{
"CASRN": ["74-84-0"],
"Structure": ["CC"],
"Name": ["ethane"],
"IUPAC": ["ethane"],
"Formula": ["C2H6"],
}
)
config = GroupingConfig(
methods=[GroupingMethod.SMARTS, GroupingMethod.REGEX],
column_mapping=ColumnMapping(
cas="CASRN",
smiles="Structure",
name_columns=["Name", "IUPAC"],
formula="Formula",
),
)
grouper = ChemicalGrouper(df=df, grouping_config=config)
results = grouper.group_chemicals()
print(results.columns.tolist())
print(results.head())
Selecting Grouping Methods
FCCgroup does not expose a GroupingMode enum. Method selection is configured with GroupingMethod values:
GroupingMethod.SMARTS: structural pattern matchingGroupingMethod.LISTS: functional list matchingGroupingMethod.REGEX: regex-based grouping from names and formulas
Common configurations:
GroupingConfig(methods=[GroupingMethod.SMARTS], column_mapping=...)
GroupingConfig(methods=[GroupingMethod.SMARTS, GroupingMethod.LISTS], column_mapping=...)
GroupingConfig(
methods=[GroupingMethod.SMARTS, GroupingMethod.LISTS, GroupingMethod.REGEX],
column_mapping=...,
)
Input Requirements
ChemicalGroupermust be initialized with a non-empty pandas DataFrame.ColumnMappingmust provide at least one ofcasorsmiles.name_columnsandformulaare optional at configuration time, butREGEXgrouping may trigger CIRpy enrichment when they are missing.- Input column names can be custom; FCCgroup maps them to canonical internal fields.
Assets And External Services
- Packaged assets live under
fccgroup/assets. Mapping.xlsxand the files infccgroup/assets/listsare required for LISTS workflow.- CIRpy is used only when the selected methods require fields that are not already available in the mapped input columns.
- CIRpy usage depends on network availability and the external resolver service.
Output
group_chemicals() returns a pandas DataFrame containing the normalized internal identifier columns plus the columns produced by the selected methods.
Typical outputs include:
SMILESand/orcasIdinternal identifier columnsChemical groupsand SMARTS fingerprint columns when SMARTS is selected- Functional list columns when LISTS is selected
- Regex-derived columns when REGEX is selected
Runtime Dependencies
FCCgroup currently declares the runtime dependencies described in requirements.txt
Citation
If you use FCCgroup in your research, please cite:
@software{fccgroup,
title={FCCgroup: Chemical Grouping and Classification Package},
author={Anguera Sempere, Albert and Wiesinger, Helene},
organization={Food Packaging Forum},
year={2026},
}
Contributing
Contributions are welcome through pull requests.
Support
For issues, questions, or suggestions, open an issue at https://github.com/Food-Packaging-Forum/fccgroup/issues.
License
MIT License. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fccgroup-0.1.0.tar.gz.
File metadata
- Download URL: fccgroup-0.1.0.tar.gz
- Upload date:
- Size: 76.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
320b53736b32dd011409fe58da2bbd1d797843863608ca62b80f05dea4666c0b
|
|
| MD5 |
2690f5895f2285e8d01d768979c092d5
|
|
| BLAKE2b-256 |
703d080c1506c1aca55bccb6511850f38f054e0c73a8677cb5fa521066248cca
|
File details
Details for the file fccgroup-0.1.0-py3-none-any.whl.
File metadata
- Download URL: fccgroup-0.1.0-py3-none-any.whl
- Upload date:
- Size: 76.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3469240d6a20568852983520c9f6ef689984245e8210731d71ae4aca797366e0
|
|
| MD5 |
4afcc976df5f7423cebca83f6809dda3
|
|
| BLAKE2b-256 |
40e15182a5852c9881ecafbd2419129140fbadeba1965c3f7b2bc09cec175588
|