Package Placeholder
Reason this release was yanked:
testing
Project description
Dataset API
Dataset API structure
models/datasets/dataset_api/
|── dataset.py
├── dataset_urls.json
├── README.md
├── scripts
├── setup.sh
└── dataset_api/
├── __init__.py
├── download.py
└── preprocess.py
Environment setup
Clone the Model Zoo for Intel® Architecture repository and navigate to the dataset_api
directory.
# Optional: create and activate a virtual environment
virtualenv -p python3 venv
. venv/bin/activate
cd models/datasets/dataset_api
# Install dependencies, some dependencies might require root privilages.
./setup.sh
Datasets
Dataset name | Description | Download | Preprocessing | command |
---|---|---|---|---|
brca |
Breast Cancer dataset that contains categorized contrast enhanced mammography data and radiologists’ notes. | supported | A prerequisite: Use a browser, download the Low Energy and Subtracted images, then provide the path to the directory that contains the downloaded images using --directory argument. |
python dataset.py -n brca --download --preprocess -d <path to the dataset directory> |
tabformer |
Credit card data for TabFormer | supported | not supported | python dataset.py -n tabformer --download |
dureader-vis |
DuReader-vis for document automation. Chinese Open-domain Document Visual Question Answering (Open-Domain DocVQA) dataset, containing about 15K question-answering pairs and 158K document images from the Baidu search engine. | supported | not supported | python dataset.py -n dureader-vis --download |
msmarco |
MS MARCO is a collection of datasets focused on deep learning in search | supported | not supported | python dataset.py -n msmarco --download |
mvtec-ad |
MVTEC Anomaly Detection DATASET for industrial inspection. It contains over 5000 high-resolution images divided into fifteen different object and texture categories. | supported | supported | python dataset.py -n mvtec-ad --download --preprocess -d <path to the dataset directory> |
Command-line Interface
Input Arguments | Description |
---|---|
--list (-l) | list the supported datasets. |
--name (-n) | dataset name |
--directory (-d) | directory location where the raw dataset will be saved on your system. It's also where the preprocessed dataset files will be written. If not set, a directory with the dataset name will be created. |
--download | download the dataset specified. |
--preprocess | preprocess the dataset if supported. |
Python API
from dataset_api.download import download_dataset
from dataset_api.preprocess import preprocess_dataset
# Download the datasets
download_dataset('brca', <path to the raw dataset directory>)
# Preprocess the datasets
preprocess_dataset('brca', <path to the raw dataset directory>)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for dataset-librarian-0.0.0.dev0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85a9cdf7256855f4dfc1ae5998560ab8e034a85b3755833ce85c71e617bda681 |
|
MD5 | 60e3cc96e891a445c8aed11ea2291639 |
|
BLAKE2b-256 | 82d4aa826bf3893cb2cb904354f1f39e40f52dde07eca76b9e30223fcea172dc |
Close
Hashes for dataset_librarian-0.0.0.dev0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9bbe157f257bd21fde1ce3e92f72c59c7eed83ec619ef4664b4f66d6b86bbbfa |
|
MD5 | dce0513ec1ac6bd9d5e7e6efb10f2941 |
|
BLAKE2b-256 | ff317cabd6fb8fa5d3dd9339f618d154d5650d44ecf667f5be020b81b5bb0633 |