Package to support simplified application of machine learning models to datasets in materials science
Project description
Foundry-ML simplifies access to machine learning-ready datasets in materials science and chemistry.
- Search & Load - Find and use curated datasets with a few lines of code
- Understand - Rich schemas describe what each field means
- Cite - Automatic citation generation for publications
- Publish - Share your datasets with the community
- AI-Ready - MCP server for Claude and other AI assistants
Quick Start
pip install foundry-ml
Need optional integrations? Install extras only when you need them:
pip install "foundry-ml[torch]" # Enable dataset.get_as_torch()
pip install "foundry-ml[tensorflow]" # Enable dataset.get_as_tensorflow()
pip install "foundry-ml[huggingface]" # Enable push-to-hub
pip install "foundry-ml[excel]" # Excel import support via openpyxl
PyTorch/TensorFlow extras expect wheels compiled against NumPy 2.0. Install PyTorch 2.3+ and TensorFlow 2.18+ (or newer builds with NumPy 2 support) to avoid ABI errors.
from foundry import Foundry
# Connect
f = Foundry()
# Search
results = f.search("band gap", limit=5)
# Load
dataset = results.iloc[0].FoundryDataset
X, y = dataset.get_as_dict()['train']
# Understand
schema = dataset.get_schema()
print(schema['fields'])
# Cite
print(dataset.get_citation())
Cloud Environments
For Google Colab or remote Jupyter:
f = Foundry(no_browser=True, no_local_server=True)
CLI
foundry search "band gap"
foundry schema 10.18126/abc123
foundry --help
AI Agent Integration
foundry mcp install # Add to Claude Code
Documentation
Features
| Feature | Description |
|---|---|
| Search | Find datasets by keyword, DOI, or browse catalog |
| Load | Automatic download, caching, and format conversion |
| PyTorch/TensorFlow (extras) | dataset.get_as_torch(), dataset.get_as_tensorflow() |
| CLI | Terminal-based workflows |
| MCP Server | AI assistant integration |
| HuggingFace Export (extra) | Publish to HuggingFace Hub |
Available Datasets
Browse datasets at Foundry-ML.org or:
f = Foundry()
f.list(limit=20) # See available datasets
How to Cite
If you use Foundry-ML, please cite:
@article{Schmidt2024,
doi = {10.21105/joss.05467},
year = {2024},
publisher = {The Open Journal},
volume = {9},
number = {93},
pages = {5467},
author = {Kj Schmidt and Aristana Scourtas and Logan Ward and others},
title = {Foundry-ML - Software and Services to Simplify Access to Machine Learning Datasets in Materials Science},
journal = {Journal of Open Source Software}
}
Contributing
Foundry is open source. To contribute:
- Fork from
main - Make your changes
- Open a Pull Request
See CONTRIBUTING.md for details.
Support
This work was supported by the National Science Foundation under NSF Award Number: 1931306 "Collaborative Research: Framework: Machine Learning Materials Innovation Infrastructure".
Foundry integrates with Materials Data Facility, FuncX, and MAST-ML.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file foundry_ml-1.2.2.tar.gz.
File metadata
- Download URL: foundry_ml-1.2.2.tar.gz
- Upload date:
- Size: 55.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cbb8f26c70cd17b17fbb78f4eecf55ee3ea543ab658934b6f120b57ca0ef4c36
|
|
| MD5 |
fa58464a09fe824a75b43caaf88a35d4
|
|
| BLAKE2b-256 |
3b99b5eb09caf8fad27b2a7821fc42d149dbb9b737730bb5fb56f32fa7be6169
|
File details
Details for the file foundry_ml-1.2.2-py3-none-any.whl.
File metadata
- Download URL: foundry_ml-1.2.2-py3-none-any.whl
- Upload date:
- Size: 63.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22fcd641e11f97e27c483eeb6f8489adc0675c0d487e419495baadfe6c228a7c
|
|
| MD5 |
644df7837e17ea72423445813639deae
|
|
| BLAKE2b-256 |
d353bf02798c8c675d08386a77b41046ec957419ebe46f432b0547a6004f4c39
|