Skip to main content

A helper package to download example datasets used in various publications and deep-learning algorithms, including data featured in N2V, P(P)N2V, DivNoising, HDN, EmbedSeg, etc.

Project description

CAREamics Portfolio

License PyPI Python Version CI Datasets CI codecov

A helper package based on pooch allowing downloading various example datasets used in publications by the Jug lab, including data featured in N2V, P(P)N2V, DivNoising, HDN, EmbedSeg, etc.

The complete list of datasets can be found in datasets.json.

CAREamics-portfolio tooling was generated using pydev-guide/pyrepo-copier.

Installation

To install the portfolio in your conda environment, simply use pip:

$ pip install careamics-portfolio

Usage

Follow the example notebook for details on how to use the package.

The portfolio can be instantiated as follow:

from careamics_portfolio import PortfolioManager

portfolio = PortfolioManager()

You can explore the different datasets easily:

print(portfolio)
print(portfolio.denoising)
print(portfolio.denoising.N2V_SEM)

Finally, you can download the dataset of your choice:

from pathlib import Path

data_path = Path('data')

# to the path of your choice
portfolio.denoising.N2V_SEM.download(data_path)

# or to your system's cache
portfolio.denoising.N2V_SEM.download()

By default, if you do not pass path to the download() method, all datasets will be saved in your system's cache. New queries to download will not cause the files to be downloaded again (thanks pooch!!).

Important: if you download all datasets of interest using the same path, pooch will maintain a regsitry of files and you will not have to download them again!

Add a dataset to the portfolio

There are a few steps to follow in order to add a new dataset to the repository:

:white_check_mark: 1 - Create a PortfolioEntry child class

:white_check_mark: 2 - Instantiate the portfolio entry in an IterablePortfolio

:white_check_mark: 3 - Update registry.txt

:white_check_mark: 4 - Make sure all tests pass

Note: To run the tests, you will need to have pytest installed. You can create an environment with careamics-portfolio and pytest by running:

pip install "careamics-portfolio[test]"

1 - Create a portfolio entry

To add a dataset, subclass a PortfolioEntry and enter the following information (preferably in one of the current categories, e.g. denoising_datasets.py):

class MyDataset(PortfolioEntry):
    def __init__(self) -> None:
        super().__init__(
            portfolio="Denoising", # for instance
            name="MyDataset",
            url="https://url.to.myfile/MyFile.zip",
            file_name="MyFile.zip",
            hash="953a815333805a423b7342971289h10121263917019bd16cc3341", # sha256
            description="Description of the dataset.",
            license="CC-BY 3.0",
            citation="Citation of the dataset",
            files={
                "/folder/in/the/zip": ["file1.tif", "file2.tif"], # folder can be "."
            },
            size=13.0, # size in MB
            tags=["tag1", "tag2"],
            is_zip=True,
        )

To obtain sha256 hash of your file, you can run the following code and read out the sha256 from the pooch prompt:

import pooch

url = "https://url.to.myfile/MyFile.zip"
pooch.retrieve(url, known_hash=None)

Likewise, to get the size in MB of your file:

import os

os.path.getsize(file_path) / 1024 / 1024

2 - Add the entry to a portfolio

Add the file class to one of the categories (e.g. denoising) in portfolio.py:

class Denoising(IterablePortfolio):
    def __init__(self) -> None:
        self._N2V_BSD68 = N2V_BSD68()
        self._N2V_SEM = N2V_SEM()
        self._N2V_RGB = N2V_RGB()
        self._flywing = Flywing()

        # add your dataset as a private attribute
        self._myDataset = MyDataset()

        [...]

    # and add a public getter
    @property
    def MyDataset(self) -> MyDataset:
        return self._myDataset

3 - Update registry

Finally, update the registry by running the following pythons script:

python scripts/update_registry.py

or run:

from careamics_portfolio import update_registry
update_registry()

The datasets.json file is updated using:

python scripts/update_json.py

4 - Verify that all tests pass

Verify that all tests pass, it can take a while:

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

careamics_portfolio-0.0.15.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

careamics_portfolio-0.0.15-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file careamics_portfolio-0.0.15.tar.gz.

File metadata

  • Download URL: careamics_portfolio-0.0.15.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for careamics_portfolio-0.0.15.tar.gz
Algorithm Hash digest
SHA256 768d51a569bea47d40dfb5def0bbd5b1582bb91bf454f95c2acd57edff619a08
MD5 82153734780297045f15aa5b5d31c50e
BLAKE2b-256 ab6fbcc595b533380de2ccbc716fe918c8b021d9b6d4f8a56a55088432519382

See more details on using hashes here.

Provenance

The following attestation bundles were made for careamics_portfolio-0.0.15.tar.gz:

Publisher: release.yml on CAREamics/careamics-portfolio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file careamics_portfolio-0.0.15-py3-none-any.whl.

File metadata

File hashes

Hashes for careamics_portfolio-0.0.15-py3-none-any.whl
Algorithm Hash digest
SHA256 8a5759950ac05d19684186bdbfb88ab3365057971485f348d4d940bc1758c6e2
MD5 e1dd4ab54aa1329a3b669dd416421be2
BLAKE2b-256 6f4d3091955c97370649ec8fadb7907931e52bc815bd782d83eb8c6b8d3612be

See more details on using hashes here.

Provenance

The following attestation bundles were made for careamics_portfolio-0.0.15-py3-none-any.whl:

Publisher: release.yml on CAREamics/careamics-portfolio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page