A helper package to download example datasets used in various publications and deep-learning algorithms, including data featured in N2V, P(P)N2V, DivNoising, HDN, EmbedSeg, etc.
Project description
CAREamics Portfolio
A helper package based on pooch allowing downloading various example datasets used in publications by the Jug lab, including data featured in N2V, P(P)N2V, DivNoising, HDN, EmbedSeg, etc.
The complete list of datasets can be found in datasets.json.
CAREamics-portfolio tooling was generated using pydev-guide/pyrepo-copier.
Installation
To install the portfolio in your conda environment, simply use pip
:
$ pip install careamics-portfolio
Usage
Follow the example notebook for details on how to use the package.
The portfolio can be instantiated as follow:
from careamics_portfolio import PortfolioManager
portfolio = PortfolioManager()
You can explore the different datasets easily:
print(portfolio)
print(portfolio.denoising)
print(portfolio.denoising.N2V_SEM)
Finally, you can download the dataset of your choice:
from pathlib import Path
data_path = Path('data')
# to the path of your choice
portfolio.denoising.N2V_SEM.download(data_path)
# or to your system's cache
portfolio.denoising.N2V_SEM.download()
By default, if you do not pass path
to the download()
method, all datasets
will be saved in your system's cache. New queries to download will not cause
the files to be downloaded again (thanks pooch!!).
Important: if you download all datasets of interest using the same path, pooch will maintain a regsitry of files and you will not have to download them again!
Add a dataset to the portfolio
There are a few steps to follow in order to add a new dataset to the repository:
:white_check_mark: 1 - Create a PortfolioEntry
child class
:white_check_mark: 2 - Instantiate the portfolio entry in an IterablePortfolio
:white_check_mark: 3 - Update registry.txt
:white_check_mark: 4 - Make sure all tests pass
Note: To run the tests, you will need to have
pytest
installed. You can create an environment withcareamics-portfolio
andpytest
by running:pip install "careamics-portfolio[test]"
1 - Create a portfolio entry
To add a dataset, subclass a PortfolioEntry
and enter the following information
(preferably in one of the current categories, e.g. denoising_datasets.py
):
class MyDataset(PortfolioEntry):
def __init__(self) -> None:
super().__init__(
portfolio="Denoising", # for instance
name="MyDataset",
url="https://url.to.myfile/MyFile.zip",
file_name="MyFile.zip",
hash="953a815333805a423b7342971289h10121263917019bd16cc3341", # sha256
description="Description of the dataset.",
license="CC-BY 3.0",
citation="Citation of the dataset",
files={
"/folder/in/the/zip": ["file1.tif", "file2.tif"], # folder can be "."
},
size=13.0, # size in MB
tags=["tag1", "tag2"],
is_zip=True,
)
To obtain sha256 hash of your file, you can run the following code and read out the sha256 from the pooch prompt:
import pooch
url = "https://url.to.myfile/MyFile.zip"
pooch.retrieve(url, known_hash=None)
Likewise, to get the size in MB of your file:
import os
os.path.getsize(file_path) / 1024 / 1024
2 - Add the entry to a portfolio
Add the file class to one of the categories (e.g. denoising) in
portfolio.py
:
class Denoising(IterablePortfolio):
def __init__(self) -> None:
self._N2V_BSD68 = N2V_BSD68()
self._N2V_SEM = N2V_SEM()
self._N2V_RGB = N2V_RGB()
self._flywing = Flywing()
# add your dataset as a private attribute
self._myDataset = MyDataset()
[...]
# and add a public getter
@property
def MyDataset(self) -> MyDataset:
return self._myDataset
3 - Update registry
Finally, update the registry by running the following pythons script:
python scripts/update_registry.py
or run:
from careamics_portfolio import update_registry
update_registry()
The datasets.json file is updated using:
python scripts/update_json.py
4 - Verify that all tests pass
Verify that all tests pass, it can take a while:
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file careamics_portfolio-0.0.14.tar.gz
.
File metadata
- Download URL: careamics_portfolio-0.0.14.tar.gz
- Upload date:
- Size: 25.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab949740dedfa4d097b27fcb3c28d50e10ff2d089666d0f907b4fb6ea817c564 |
|
MD5 | 4b7611b0495cce9b6b8605c00e57cb77 |
|
BLAKE2b-256 | ebfc8b1fd0380fd17ba3175724ad116c1157f6404aa1a425e06bf87c9a11e719 |
File details
Details for the file careamics_portfolio-0.0.14-py3-none-any.whl
.
File metadata
- Download URL: careamics_portfolio-0.0.14-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d418bc49b4bfcfef3fd35a9900ebab66a2c0db80695c02389e891278f2b1bb05 |
|
MD5 | 58ce89db80506c449c8eaac2e5ae1f89 |
|
BLAKE2b-256 | 640be452993ba5bb72cf3c766ab7b7a6b682bda1b8ea03789986986f23e214eb |