Skip to main content

Galaxies Datasets

Project description

Galaxies Datasets

PyPI Status Python Version License Read the documentation at https://galaxies_datasets.readthedocs.io/ Tests Codecov pre-commit Black DOI

Galaxies Datasets is a collection of ready-to-use extragalactic astronomy datasets for use with TensorFlow, Jax, and other Machine Learning frameworks.

It follows the tensorflow_datasets framework, making it very easy to switch between different datasets. All datasets are exposed as tf.data.Datasets, enabling easy-to-use and high-performance input pipelines.

Usage

Loading a dataset can be as easy as:

from galaxies_datasets import datasets
import tensorflow_datasets as tfds

# Construct a tf.data.Dataset
ds = tfds.load("galaxy_zoo_challenge", split="train")

# Build your input pipeline
ds = ds.shuffle(1000).batch(128).prefetch(10).take(5)

In the example above:

from galaxies_datasets import datasets

registers the collection of galactic datasets with the tensorflow_datasets package making them available through its API. And that is it! …Almost.

For more details on tensorflow_datasets check out the documentation.

Some datasets require that you first manually download data. Check each dataset for instructions.

Datasets

Currently available datasets focus on galaxy morphology.

They include observational data from the Galaxy zoo project:

  • galaxy_zoo_challenge

  • galaxy_zoo2

  • galaxy_zoo_decals

As well as mock galaxy images from the EAGLE simulation:

  • eagle

Installation

You can install Galaxies Datasets via pip from PyPI:

$ pip install galaxies-datasets

Scripts

Galaxies Datasets provides some scripts to download and prepare data. The scripts are available through a command-line interface powered by Typer.

For example, to download images and data from the EAGLE simulation you could simply do:

galaxies_datasets eagle download USER SIMULATION

where USER is your username for the EAGLE public database and SIMULATION is the name of one of the EAGLE simulations.

For all available commands check the Command-line Interface reference, or run:

galaxies_datasets --help

The command-line interface also supports automatic completion in all operating systems, in all the shells (Bash, Zsh, Fish, PowerShell), so that you can just hit TAB and get the available options or subcommands.

To install automatic completion in bash run:

galaxies_datasets --install-completion bash

Citation

If you use this software, please cite it as below, in addition to any citation specific to the used datasets.

@software{lucas_bignone_2021_5521451,
    author       = {Lucas Bignone},
    title        = {Galaxies Datasets},
    month        = sep,
    year         = 2021,
    publisher    = {Zenodo},
    version      = {v0.1.1},
    doi          = {10.5281/zenodo.5521450},
    url          = {https://doi.org/10.5281/zenodo.5521450}
}

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the MIT license, Galaxies Datasets is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Disclaimer

This is a utility library that downloads and prepares datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset’s license.

If you’re a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Credits

This project was generated from @cjolowicz’s Hypermodern Python Cookiecutter template.

Icons made by Freepik from www.flaticon.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

galaxies_datasets-0.1.2.tar.gz (8.7 MB view details)

Uploaded Source

Built Distribution

galaxies_datasets-0.1.2-py3-none-any.whl (8.7 MB view details)

Uploaded Python 3

File details

Details for the file galaxies_datasets-0.1.2.tar.gz.

File metadata

  • Download URL: galaxies_datasets-0.1.2.tar.gz
  • Upload date:
  • Size: 8.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for galaxies_datasets-0.1.2.tar.gz
Algorithm Hash digest
SHA256 2c48bdcc225f9c4a404a5b7bcd9dbd2c481b664284c0b5f321ff77bdfd82eea6
MD5 66e805ed3019b5a33413d4c54dace608
BLAKE2b-256 b3d94e604b43ba38748ab64b4cc7f3de6a77a15092ce116086d7783676fd0bca

See more details on using hashes here.

File details

Details for the file galaxies_datasets-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: galaxies_datasets-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 8.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for galaxies_datasets-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 92d560ce331b9f1ace99d1ae65731519494d7fd32e72edca277946e3d2742409
MD5 84e6474cef265f2ab7dc9bf6481f6061
BLAKE2b-256 7d0da9396ed8a3c546feddf19a6fc1fa7bb82dd3468dda72ac755dcb4cbe3b32

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page