Package Placeholder
Reason this release was yanked:
testing
Project description
Dataset API
Dataset API structure
dataset_api
├── conda
│ └── recipes
│ ├── py38_recipe
│ └── py39_recipe
├── src
│ └── dataset_librarian
│ ├── dataset_api
│ ├── scripts
│ ├── __init__.py
│ ├── dataset.py
│ ├── datasets_urls.json
├── MANIFEST.in
├── README.md
├── pyproject.toml
└── requirements.txt
Environment setup
Clone the Model Zoo for Intel® Architecture repository and navigate to the dataset_api
directory.
# Step 1 (recommended): Create and activate a virtual environment
## Option 1: Using virtualenv
virtualenv -p python3 venv
. venv/bin/activate
## Option 2: Using conda
conda create -n venv python=<3.8 or 3.9> -c conda-forge
conda activate venv
# Step 2: Installing package
## Option 1: Installing from source code
cd models/datasets/dataset_api
python -m pip install --upgrade pip build setuptools wheel
python -m pip install .
## Option 2: Installing from PyPI
python -m pip install dataset-librarian
PyPI package can be found here.
Datasets
Dataset name | Description | Download | Preprocessing | command |
---|---|---|---|---|
brca |
Breast Cancer dataset that contains categorized contrast enhanced mammography data and radiologists’ notes. | supported | A prerequisite: Use a browser, download the Low Energy and Subtracted images, then provide the path to the directory that contains the downloaded images using --directory argument. |
python -m dataset_librarian.dataset -n brca --download --preprocess -d <path to the dataset directory> |
tabformer |
Credit card data for TabFormer | supported | not supported | python -m dataset_librarian.dataset -n tabformer --download |
dureader-vis |
DuReader-vis for document automation. Chinese Open-domain Document Visual Question Answering (Open-Domain DocVQA) dataset, containing about 15K question-answering pairs and 158K document images from the Baidu search engine. | supported | not supported | python -m dataset_librarian.dataset -n dureader-vis --download |
msmarco |
MS MARCO is a collection of datasets focused on deep learning in search | supported | not supported | python -m dataset_librarian.dataset -n msmarco --download |
mvtec-ad |
MVTEC Anomaly Detection DATASET for industrial inspection. It contains over 5000 high-resolution images divided into fifteen different object and texture categories. | supported | supported | python -m dataset_librarian.dataset -n mvtec-ad --download --preprocess -d <path to the dataset directory> |
Command-line Interface
Input Arguments | Description |
---|---|
--list (-l) | list the supported datasets. |
--name (-n) | dataset name |
--directory (-d) | directory location where the raw dataset will be saved on your system. It's also where the preprocessed dataset files will be written. If not set, a directory with the dataset name will be created. |
--download | download the dataset specified. |
--preprocess | preprocess the dataset if supported. |
Python API
from dataset_librarian.dataset_api.download import download_dataset
from dataset_librarian.dataset_api.preprocess import preprocess_dataset
# Download the datasets
download_dataset('brca', <path to the raw dataset directory>)
# Preprocess the datasets
preprocess_dataset('brca', <path to the raw dataset directory>)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for dataset_librarian-0.0.0.dev1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | f2bf1a54db8ee573e4bc850ed22c533a1f3eac8eb72f2a79cfc1577d796a49ac |
|
MD5 | c013458ecf01377612aa39bb21b9143d |
|
BLAKE2b-256 | 96c61847a77c29ab6f3567ea8b8f2f4e571d82e403dd324de1ccf1db9416ce63 |
Close
Hashes for dataset_librarian-0.0.0.dev1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e8e95a44385a69ed0979af6d9b05fd170398c54504bf97edbc58e39a4cdcb27a |
|
MD5 | 760c031e83691c0fd6e4266d60c80c1c |
|
BLAKE2b-256 | d0ca2e39e4ff7b35125206391775ae51c712568681d899614deb80c67797945e |