A Python library for Crystal Parquet Database.

These details have not been verified by PyPI

Project links

Project description

Crystal-Parquet-Database

Crystal Parquet Database (crystpqdb) is a Python library to build a unified local database of crystal structures by downloading datasets from multiple sources (Alexandria, Materials Project, Materials Cloud, andJARVIS) into a consistent on-disk layout.

Installation

1. PyPi

pip install crystpqdb

2. Manually

To install and use this package we use conda package manager for conda packages and Pixi to handle package depenedcies and virtual environements.

1. Install Miniforge

Miniforge is the community (conda-forge) driven minimalistic conda installer. Subsequent package installations come thus from conda-forge channel.

This is in comparison to Miniconda is the Anaconda (company) driven minimalistic conda installer. Subsequent package installations come from the anaconda channels (default or otherwise).

Download here

2. Install Pixi package manager

Linux/macOS

wget -qO- https://pixi.sh/install.sh | sh

Windows (PowerShell)

powershell -ExecutionPolicy ByPass -c "irm -useb https://pixi.sh/install.ps1 | iex"

3. Cloning the repo

git clone https://github.com/YKK-xTechLab-Engineering/YKK-Point-Cloud.git

4. Install dependencies and virtual environments through Pixi

pixi install

Quickstart

All downloads are created via a small factory and a per-source DownloadConfig.

1. Download the database combinded database

from pathlib import Path
from crystpqdb import download

data_root = Path("./data")
db_dir = data_root / "crystpqdb"
db_dir = download(db_dir)
print("Downloaded to: {}".format(db_dir))

2. Uses the Loaders to download datasets from different sources

This package uses defines a common BaseLoader interface to download datasets and transform them into a unified schema.

A factory method LoaderFactory or get_loader is used to get the correct loader for a given source and dataset. The name of the source_database and source_dataset are used to get the correct loader. If you do not know the name of the source and dataset, you can use the LoaderFactory to list all available sources and datasets, or and error will be raised and it will list the available sources databases and datasets.

import os
from crystpqdb.loaders import get_loader, LoaderConfig

# Define Configurations for the loader
config = LoaderConfig(
    api_key=os.getenv("MP_API_KEY"),
    download_from_scratch=False,
    ingest_from_scratch=True,
    transform_from_scratch=True
    )

# Get the loader
loader = get_loader("mp", "summary", data_dir=data_root, config=config)

# Run the loader
table = loader.run()
print(table.shape)

3. Loading all datasets into a single ParuqetDB

import os
from pathlib import Path
from parquetdb import ParquetDB

from crystpqdb.loaders import get_loader, LoaderConfig

datasets = [
    ("alex", "3d"),
    ("alex", "2d"),
    ("alex", "1d"),
    ("mp", "summary"),
    ("materialscloud", "mc3d"),
]

for source_database, source_dataset in datasets:
    loader = get_loader(source_database, source_dataset, data_dir=data_dir)
    table = loader.run()
    pqdb.create(table, convert_to_fixed_shape=False)

table = pqdb.read(columns = ["id"])
print(table.shape)

Note: This requires alot of memory (~64GB RAM) to load all the datasets into a single ParquetDB. Batch support is not yet implemented.

Current Loaders

Loader Class	(source_database, source_dataset)	Working?
Alexandria1DLoader	("alex", "1d")	✅
Alexandria2DLoader	("alex", "2d")	✅
Alexandria3DLoader	("alex", "3d")	✅
MPLoader	("mp", "summary")	✅
MC3DLoader	("materialscloud", "mc3d")	✅
JarvisLoader		❌

All listed loaders are currently implemented and functional. If you attempt to use a (source_database, source_dataset) pair not in this table, a ValueError will be raised and the available options will be listed.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.1.dev31 pre-release

Sep 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crystpqdb-0.0.1.dev31.tar.gz (302.4 kB view details)

Uploaded Sep 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crystpqdb-0.0.1.dev31-py3-none-any.whl (21.9 kB view details)

Uploaded Sep 1, 2025 Python 3

File details

Details for the file crystpqdb-0.0.1.dev31.tar.gz.

File metadata

Download URL: crystpqdb-0.0.1.dev31.tar.gz
Upload date: Sep 1, 2025
Size: 302.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for crystpqdb-0.0.1.dev31.tar.gz
Algorithm	Hash digest
SHA256	`80185404b7f2e049615fa70ef10af1bf610022538ecc8eb1b06cb7e821aa9cb3`
MD5	`bdef442bf41109fb2eddefb40511a605`
BLAKE2b-256	`664c205f695e089498f6fe6a7ad761983bbcbf067203d21c8658b60a41f365e7`

See more details on using hashes here.

File details

Details for the file crystpqdb-0.0.1.dev31-py3-none-any.whl.

File metadata

Download URL: crystpqdb-0.0.1.dev31-py3-none-any.whl
Upload date: Sep 1, 2025
Size: 21.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for crystpqdb-0.0.1.dev31-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0d1f9b0b6c1e5d73a090b281145d83b0a9c3c0512e0a957d98f9eb09e3f67998`
MD5	`da76f5de3e3e11dcae40f04bdcd3c27c`
BLAKE2b-256	`86caf64b20879552ae162880bcc04f048ac39f6c8040a4a2f59140b2bdf3d9aa`

See more details on using hashes here.

crystpqdb 0.0.1.dev31

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Crystal-Parquet-Database

Installation

1. PyPi

2. Manually

1. Install Miniforge

2. Install Pixi package manager

3. Cloning the repo

4. Install dependencies and virtual environments through Pixi

Quickstart

1. Download the database combinded database

2. Uses the Loaders to download datasets from different sources

3. Loading all datasets into a single ParuqetDB

Current Loaders

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes