Chemical grouping package based on structural patterns, functional lists, and nomenclature classification
Project description
FCCgroup
FCCgroup is a Python package for grouping chemicals with three complementary methods:
- Structural pattern matching with SMARTS fingerprints
- Functional list matching against packaged reference lists
- Regex-based grouping from chemical names and formulas
The package is developed under the organization of Food Packaging Forum. Authored by Albert Anguera Sempere and Helene Wiesinger.
Features
- Structural classification using SMARTS fingerprints
- Functional list matching from packaged assets
- Regex-based classification from names and formulas
- Automatic CompTox enrichment when selected methods require missing fields
- Flexible method selection through
GroupingConfig(methods=...) - Optional SMARTS fingerprint subsetting via
GroupingConfig(smarts_fingerprints=...) - Package data bundled under
fccgroup/assets
Installation
Install from PyPI:
pip install fccgroup
Install from source:
git clone https://github.com/Food-Packaging-Forum/fccgroup.git
cd fccgroup
pip install -e .
Install development dependencies:
pip install -e .[dev]
Quick Start
import pandas as pd
from fccgroup import ChemicalGrouper, ColumnMapping, GroupingConfig, GroupingMethod
df = pd.DataFrame(
{
"CASRN": ["74-84-0"],
"Structure": ["CC"],
"Name": ["ethane"],
"IUPAC": ["ethane"],
"Formula": ["C2H6"],
}
)
config = GroupingConfig(
methods=[GroupingMethod.SMARTS, GroupingMethod.REGEX],
column_mapping=ColumnMapping(
cas="CASRN",
smiles="Structure",
name_columns=["Name", "IUPAC"],
formula="Formula",
),
)
grouper = ChemicalGrouper(df=df, grouping_config=config)
results = grouper.group_chemicals(save=False)
# Columns are a MultiIndex: (group_label, column_name)
print(results.columns.tolist())
print(results.head())
Selecting Grouping Methods
FCCgroup does not expose a GroupingMode enum. Method selection is configured with GroupingMethod values:
GroupingMethod.SMARTS: structural pattern matchingGroupingMethod.LISTS: functional list matchingGroupingMethod.REGEX: regex-based grouping from names and formulas
Common configurations:
GroupingConfig(methods=[GroupingMethod.SMARTS], column_mapping=...)
GroupingConfig(methods=[GroupingMethod.SMARTS, GroupingMethod.LISTS], column_mapping=...)
GroupingConfig(
methods=[GroupingMethod.SMARTS, GroupingMethod.LISTS, GroupingMethod.REGEX],
column_mapping=...,
)
Filtering SMARTS fingerprints
To apply only a subset of the ~400 bundled SMARTS patterns, pass their names to smarts_fingerprints:
GroupingConfig(
methods=[GroupingMethod.SMARTS],
column_mapping=...,
smarts_fingerprints={"Alkanes", "PAH derivatives hydrocarbon"},
)
When smarts_fingerprints is None (default), all available patterns are applied.
Custom assets path
By default ChemicalGrouper loads assets from the package installation directory. To point it at a different directory:
ChemicalGrouper(df=df, grouping_config=config, assets_path="/path/to/custom/assets")
Input Requirements
ChemicalGroupermust be initialized with a non-empty pandas DataFrame.ColumnMappingmust provide at least one ofcasorsmiles(the other may beNone).name_columnsandformulaare optional at configuration time, butREGEXgrouping may trigger CompTox enrichment when they are missing.- Input column names can be custom; FCCgroup maps them to canonical internal fields.
Assets And External Services
- Packaged assets live under
fccgroup/assets. Mapping.xlsxand the files infccgroup/assets/listsare required for LISTS workflow.- CompTox (EPA) is used only when the selected methods require fields that are not already available in the mapped input columns (e.g. SMILES needed for SMARTS but only CAS provided).
- CompTox enrichment requires a valid API key set in the
COMPTOX_API_KEYenvironment variable. - CompTox usage depends on network availability and the EPA CompTox service.
Output
group_chemicals(save=True) returns a pandas DataFrame with a MultiIndex on columns. The first level groups results by method; the second level is the column name.
| Top-level label | Contents |
|---|---|
Identifier |
Internal identifier columns (casId, SMILES) |
Structural patterns |
Chemical groups and per-fingerprint columns (SMARTS method) |
Lists |
Per-list membership columns (LISTS method) |
Regex |
Pattern group columns (REGEX method) |
Example column access:
# Access the SMILES identifier column
results[("Identifier", "SMILES")]
# Access the Chemical groups column
results[("Structural patterns", "Chemical groups")]
When save=True (default), results are also written to an Excel file in the current working directory.
Runtime Dependencies
FCCgroup currently declares the runtime dependencies described in requirements.txt
Citation
If you use FCCgroup in your research, please cite:
@software{fccgroup,
title={FCCgroup: Chemical Grouping and Classification Package},
author={Anguera Sempere, Albert and Wiesinger, Helene},
organization={Food Packaging Forum},
year={2026},
}
Contributing
Contributions are welcome through pull requests.
Support
For issues, questions, or suggestions, open an issue at https://github.com/Food-Packaging-Forum/fccgroup/issues.
License
MIT License. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fccgroup-0.3.0.tar.gz.
File metadata
- Download URL: fccgroup-0.3.0.tar.gz
- Upload date:
- Size: 78.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5921a8882746e001217abd27a8ec81a6a02da812c2f6803f9790ab7559d483b1
|
|
| MD5 |
beb420c97a45b0245ab648108bdd8757
|
|
| BLAKE2b-256 |
d66803ac42c38dff57ba55f92039ff7108bcf2d159488cf43897105313b78ac4
|
File details
Details for the file fccgroup-0.3.0-py3-none-any.whl.
File metadata
- Download URL: fccgroup-0.3.0-py3-none-any.whl
- Upload date:
- Size: 78.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb919505b9b82ff655ca566be799e7e4156b17d281df6a36034d38d66b16840b
|
|
| MD5 |
35b87cdc08e212fdf399830ac8e74835
|
|
| BLAKE2b-256 |
27ce9b5f7f2b4b29355d7d986a3a64d80e6046ff318dc42783bb30d0eafbf508
|