Language models for astrochemistry
Project description
Features
TODO
Requirements
TODO
Installation
The project environment for Language models for astrochemistry is controlled by conda and poetry; the former for maintaining the Python environment, as well as additional libraries like CUDA, and the latter for Python specific dependencies. There is a bit of overlap between these two tools, however mostly because conda is not great for resolving dependencies, and poetry can’t handle things that aren’t Python (e.g. MPI, MKL).
The recommended procedure from scratch is to follow these steps:
$ conda create -n astrochem_embedding python=3.7
$ conda activate astrochem_embedding
$ pip install poetry
$ poetry install
Usage
Please see the Command-line Reference for details.
Project Structure
The project filestructure is laid out as such:
├── CITATION.cff ├── codecov.yml ├── CODE_OF_CONDUCT.rst ├── CONTRIBUTING.rst ├── data │ ├── external │ ├── interim │ ├── processed │ └── raw ├── docs │ ├── codeofconduct.rst │ ├── conf.py │ ├── contributing.rst │ ├── index.rst │ ├── license.rst │ ├── reference.rst │ ├── requirements.txt │ └── usage.rst ├── environment.yml ├── models ├── notebooks │ ├── dev │ ├── exploratory │ └── reports ├── noxfile.py ├── poetry.lock ├── pyproject.toml ├── README.rst ├── scripts │ └── train.py └── src └── astrochem_embedding ├── __init__.py ├── layers │ ├── __init__.py │ ├── layers.py │ └── tests │ ├── __init__.py │ └── test_layers.py ├── __main__.py ├── models │ ├── __init__.py │ ├── models.py │ └── tests │ ├── __init__.py │ └── test_models.py ├── pipeline │ ├── data.py │ ├── __init__.py │ ├── tests │ │ ├── __init__.py │ │ ├── test_data.py │ │ └── test_transforms.py │ └── transforms.py └── utils.py
A brief summary of what each folder is designed for:
data contains copies of the data used for this project. It is recommended to form a pipeline whereby the raw data is preprocessed, serialized to interim, and when ready for analysis, placed into processed.
models contains serialized weights intended for distribution, and/or testing.
notebooks contains three subfolders: dev is for notebook based development, exploratory for data exploration, and reports for making figures and visualizations for writeup.
scripts contains files that meant for headless routines, generally those with long compute times such as model training and data cleaning.
src/astrochem_embedding contains the common code base for this project.
Code development
All of the code used for this project should be contained in src/astrochem_embedding, at least in terms of the high-level functionality (i.e. not scripts), and is intended to be a standalone Python package.
The package is structured to match the abstractions for deep learning, specifically PyTorch, PyTorch Lightning, and Weights and Biases, by separating parts of data structures and processing and model/layer development.
Some concise tenets for development
Write unit tests as you go.
Commit changes, and commit frequently. Write semantic git commits!
Formatting is done with black; don’t fuss about it 😃
For new Python dependencies, use poetry add <package>.
For new environment dependencies, use conda env export -f environment.yml.
Notes on best practices, particularly regarding CI/CD, can be found in the extensive documentation for the Hypermodern Python Cookiecutter repository.
License
Distributed under the terms of the MIT license, Language models for astrochemistry is free and open source software.
Issues
If you encounter any problems, please file an issue along with a detailed description.
Credits
This project was generated from @laserkelvin’s PyTorch Project Cookiecutter, a fork of @cjolowicz’s Hypermodern Python Cookiecutter template.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for astrochem_embedding-0.1.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd807d2991eade1c63c06fb206b78db61d28b4a882ab94248c791571485a1e88 |
|
MD5 | b40c2d32e4bfee8bd1ec56b3ebebdd82 |
|
BLAKE2b-256 | ddbade99bc2af6a38f7fd559efef45d9d9054dc1ee26950de8a0ea1a0a5a43f1 |
Hashes for astrochem_embedding-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f807c8035c959c06e6314e065e41183a0787ef2baa464b235d9e9e01603c1a52 |
|
MD5 | e25dd8d63e4a9158f9dcf0f7e7c4d474 |
|
BLAKE2b-256 | 18b8d370eea9f21323b1278adb3ca423bfab735a8d5ba3647edcba61dd06aa18 |