Skip to main content

Fast Organic Crystal Structure Prediction with Unit Cell Flow Matching

Project description

CLARI: Fast Organic Crystal Structure Prediction with Unit Cell Flow Matching

CLARI takes a molecule and predicts how it packs into a crystal. A single run produces many candidate structures.

Links

Checkpoints are available on Hugging Face as clari-large.ckpt and clari-med.ckpt.

Inputs are expected to use explicit-hydrogen SMILES. For example, prefer C([H])([H])([H])C([H])([H])[H] over CC.


Basic sampling

CLI

uv run sample \
  --checkpoint_path clari.ckpt \
  --output_dir out/ \
  --smiles 'C([H])([H])([H])C([H])([H])[H]' \
  --ids ethane \
  --n_samples 8

Python

from clari.inference import ClariSampler, SampleRequest

sampler = ClariSampler.from_checkpoint("clari.ckpt")
samples = sampler.sample(
    SampleRequest(
        id="ethane",
        smiles="C([H])([H])([H])C([H])([H])[H]",
        n_samples=8,
    ),
    output_dir="out/",
)

Load from the Hub instead of a local file (downloads once, then cached):

CLI

uv run sample \
  --from_hub Clari-M \
  --output_dir out/ \
  --smiles 'C([H])([H])([H])C([H])([H])[H]' \
  --ids ethane \
  --n_samples 8

Python

sampler = ClariSampler.from_hub("Clari-M")  # or "Clari-L"

Multiple molecules

CLI

uv run sample \
  --checkpoint_path clari.ckpt \
  --output_dir out/ \
  --smiles 'C([H])([H])([H])C([H])([H])O[H]' --ids ethanol \
  --smiles 'C1([H])=C([H])C([H])=C([H])C([H])=C1[H]' --ids benzene \
  --copies 4 \
  --n_samples 50

Python

from clari.inference import ClariSampler, SampleRequest

sampler = ClariSampler.from_checkpoint("clari.ckpt")
samples = sampler.sample([
    SampleRequest(
        id="ethanol",
        smiles="C([H])([H])([H])C([H])([H])O[H]",
        copies=4,
        n_samples=50,
    ),
    SampleRequest(
        id="benzene",
        smiles="C1([H])=C([H])C([H])=C([H])C([H])=C1[H]",
        copies=4,
        n_samples=50,
    ),
], output_dir="out/")

For co-crystals, pass (SMILES, copy_count) pairs. Pair-level copy counts are passed directly to Crystal.from_smiles.

samples = sampler.sample(
    SampleRequest(
        id="ethanol-water",
        smiles=[
            ("C([H])([H])([H])C([H])([H])O[H]", 1),
            ("O([H])[H]", 1),
        ],
        n_samples=50,
    ),
    output_dir="out/",
)

For many molecules, use a config file instead of repeating flags:

uv run sample --config jobs.json
{
  "checkpoint_path": "clari.ckpt",
  "output_dir": "out/",
  "smiles": [
    "C([H])([H])([H])C([H])([H])O[H]",
    "C1([H])=C([H])C([H])=C([H])C([H])=C1[H]"
  ],
  "ids": ["ethanol", "benzene"],
  "copies": [4, 4],
  "n_samples": [50, 50]
}

Co-crystal configs use the same pair shape:

{
  "checkpoint_path": "clari.ckpt",
  "output_dir": "out/",
  "ids": "ethanol-water",
  "smiles": [
    ["C([H])([H])([H])C([H])([H])O[H]", 1],
    ["O([H])[H]", 1]
  ],
  "n_samples": 50
}

Sample → rank → export top-K

CLI

uv run sample \
  --checkpoint_path clari.ckpt \
  --output_dir out/ \
  --smiles 'C([H])([H])([H])C([H])([H])[H]' \
  --ids ethane \
  --copies 4 \
  --n_samples 64

uv run rank out/

uv run export-cifs out/ --top_k 10

Python

sampler = ClariSampler.from_checkpoint("clari.ckpt")
sampler.sample(
    "C([H])([H])([H])C([H])([H])[H]",
    copies=4,
    n_samples=64,
    output_dir="out/",
)
# rank and export are CLI steps

Export specific samples by index

uv run export-cifs out/ --sample_idx 0 --sample_idx 7

Multi-GPU

CLI

uv run sample \
  --checkpoint_path clari.ckpt \
  --output_dir out/ \
  --smiles 'C([H])([H])([H])C([H])([H])[H]' \
  --ids ethane \
  --copies 4 \
  --n_samples 1000 \
  --num_gpus 4

Python

sampler = ClariSampler.from_checkpoint("clari.ckpt", num_gpus=4)
sampler.sample(
    "C([H])([H])([H])C([H])([H])[H]",
    copies=4,
    n_samples=1000,
    output_dir="out/",
)

Fixed batch size

CLI

uv run sample \
  --checkpoint_path clari.ckpt \
  --output_dir out/ \
  --smiles 'C([H])([H])([H])C([H])([H])[H]' \
  --ids ethane \
  --copies 4 \
  --n_samples 32 \
  --batch_size 8 \
  --compile false

Python

sampler = ClariSampler.from_checkpoint("clari.ckpt", compile=False)
sampler.sample(
    "C([H])([H])([H])C([H])([H])[H]",
    copies=4,
    n_samples=32,
    batch_size=8,
    output_dir="out/",
)

CPU smoke test

CLI

uv run sample \
  --checkpoint_path clari.ckpt \
  --output_dir out/ \
  --smiles 'C([H])([H])([H])C([H])([H])[H]' \
  --ids ethane \
  --n_samples 1 \
  --batch_size 1 \
  --device cpu \
  --n_steps 2 \
  --compile false \
  --use_bf16 false

Python

sampler = ClariSampler.from_checkpoint(
    "clari.ckpt",
    device="cpu",
    n_steps=2,
    compile=False,
    use_bf16=False,
)
sampler.sample(
    "C([H])([H])([H])C([H])([H])[H]",
    n_samples=1,
    batch_size=1,
    output_dir="out/",
)

For all options: uv run sample --help

Citation

@misc{lo2026fastorganiccrystalstructure,
      title={Fast Organic Crystal Structure Prediction with Unit Cell Flow Matching},
      author={Alston Lo and Luka Mucko and Austin H. Cheng and Andy Cai and Alastair J. A. Price and Wojciech Matusik and Alán Aspuru-Guzik},
      year={2026},
      eprint={2606.03199},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2606.03199},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clari-0.1.0.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clari-0.1.0-py3-none-any.whl (99.9 kB view details)

Uploaded Python 3

File details

Details for the file clari-0.1.0.tar.gz.

File metadata

  • Download URL: clari-0.1.0.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for clari-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8966cad2b861b9782c971966b6b8977e7a040c3ce372c8b7b70cca5b84981d23
MD5 9abc6286bb012c2acf4317de17de1827
BLAKE2b-256 c8cdd449fa3edfaf89dd5ddc0818cc87443756b43c7441e1f6679d8d6e35425f

See more details on using hashes here.

File details

Details for the file clari-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: clari-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 99.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for clari-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b68d604b0b12965005291233c68f2c013f8b39acd6c9456dc68f370db4e824b5
MD5 e5eeaa77fa83d7e74f06097f11c2e25b
BLAKE2b-256 7e4461ba8a3f91845ecf40faab0372699b49668f3945ffeb2f5fedf3eb38f5c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page