smartdoc15-ch1

A Python wrapper for the "computable" version of the SmartDoc 2015 - Challenge 1 dataset.

These details have not been verified by PyPI

Project links

Project description

The source for this project is available here.

The SmartDoc 2015 Challenge 1 dataset was originally created for the SmartDoc 2015 competition focusing on the evaluation of document image acquisition method using smartphones. The challenge 1, in particular, consisted in detecting and segmenting document regions in video frames extracted from the preview stream of a smartphone.

This dataset was packaged in a new format and a python wrapper (this current package) was created to facilitate its usage.

Dataset version

The version of the dataset used by this wrapper is: 2.0.0.

The source for the dataset is here: https://github.com/jchazalon/smartdoc15-ch1-dataset

Sample usage

There are 3 tasks this Python package helps you to test your methods against, but first of all you have to properly install this package:

pip install smartdoc_ch1

A good practice is to install such package in a virtual environment. We recommend to use Virtualenv Wrapper to use virtual environments.

Task 1: Segmentation

Segmentation: this is the original task.

Inputs are video frames, and expected output is a composed of the coordinated of the four corners of the document image in each frame (top left, bottom left, bottom right and top right). The evaluation is performed by computing the intersection over union (“IoU” or also “Jaccard index”) of the expected document region and the found region. The tricky thing is that the coordinates are projected to the document referential in order to allow comparisons between different frames and different document models. The original evaluation code is available at https://github.com/jchazalon/smartdoc15-ch1-eval, and the Python wrapper also contains an implementation using the new data format.

read dataset
[opt. read models]
[opt. train/test split + train]
test
eval

Task 2: Model classification

Model classification: this is a new task.

Inputs are video frames, and expected output is the identifier of the document model represented in each frame. There are 30 models named “datasheet001”, “datasheet002”, …, “tax005”. The evaluation is performed as any multi-class classification task.

read dataset
[opt. read models]
[opt. train/test split + train]
test
eval

Task 3: Model type classification

Model type classification: this is a new task.

Inputs are video frames, and expected output is the identifier of the document model type represented in each frame. There are 6 models types, each having 5 members, named “datasheet”, “letter”, “magazine”, “paper”, “patent” and “tax”. The evaluation is performed as any multi-class classification task.

read dataset
[opt. read models]
[opt. train/test split + train]
test
eval

Optional: Using model images

Manual download option

If you are behind a proxy, have a slow connexion or for any other reason, you may want to download the dataset manually instead of letting the Python wrapper do it for you. This is simple:

download the frames.tar.gz and models.tar.gz files from https://github.com/jchazalon/smartdoc15-ch1-dataset/releases to some local directory;
choose where you want to store the files and manually create the file hierarchy (the smartdoc_ch1_home intermediate directory is important here):

mkdir -p PATH_TO_STORAGE_DIR/smartdoc_ch1_home/frames
mkdir -p PATH_TO_STORAGE_DIR/smartdoc_ch1_home/models

extract the archives to their target directories:

tar -xzf PATH_TO_FRAMES.TAR.GZ -C PATH_TO_STORAGE_DIR/smartdoc_ch1_home/frames
tar -xzf PATH_TO_MODELS.TAR.GZ -C PATH_TO_STORAGE_DIR/smartdoc_ch1_home/models

Then, make sure you specify data_home=PATH_TO_STORAGE_DIR and download_if_missing=False when you call the load_sd15ch1_frames and load_sd15ch1_models functions. The functions get_sd15ch1_basedir_frames and get_sd15ch1_basedir_models also require that you specify data_home=PATH_TO_STORAGE_DIR.

By default, the path to local dataset storage complies with Scikit-learn standard location: PATH_TO_STORAGE_DIR=~/scikit_learn_data

API

TODO DOC

MODEL_VARIANT_01_ORIGINAL = "01-original"
MODEL_VARIANT_02_EDITED = "02-edited"
MODEL_VARIANT_03_CAPTURED = "03-captured-nexus"
MODEL_VARIANT_04_CORRECTED = "04-corrected-nexus"
MODEL_VARIANT_05_SCALED33 = "05-corrected-nexus-scaled33"

load_sd15ch1_frames(data_home=None,
                        sample=1.0,
                        shuffle=False,
                        random_state=0,
                        download_if_missing=True,
                        load_images=False,
                        resize=None,
                        color=False,
                        with_model_classif_targets=True,
                        with_modeltype_classif_targets=True,
                        with_segmentation_targets=True,
                        with_model_shapes=True,
                        return_X_y=False,
                        )

load_sd15ch1_models(data_home=None,
                        download_if_missing=True,
                        load_images=False,
                        variant=MODEL_VARIANT_05_SCALED33,
                        color=False,
                        with_model_ids=True,
                        with_modeltype_ids=True,
                        return_X_y=False,
                        )

read_sd15ch1_image(root_dir,
                       image_relative_path,
                       resize=None,
                       color=False)

read_sd15ch1_images(root_dir,
                        image_relative_path_seq,
                        resize=None,
                        color=False)

get_sd15ch1_basedir_frames(data_home=None)

get_sd15ch1_basedir_models(data_home=None)

eval_sd15ch1_segmentations(segmentations,
                           target_segmentations,
                           model_shapes,
                           frame_resize_factor=1.0,
                           print_summary=False)

eval_sd15ch1_classifications(labels,
                             target_labels)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.8

Jun 11, 2018

0.7

Jun 11, 2018

0.5

Mar 8, 2018

0.4

Mar 8, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smartdoc15_ch1-0.8.tar.gz (27.6 kB view details)

Uploaded Jun 11, 2018 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

smartdoc15_ch1-0.8-py2.py3-none-any.whl (29.9 kB view details)

Uploaded Jun 11, 2018 Python 2Python 3

File details

Details for the file smartdoc15_ch1-0.8.tar.gz.

File metadata

Download URL: smartdoc15_ch1-0.8.tar.gz
Upload date: Jun 11, 2018
Size: 27.6 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for smartdoc15_ch1-0.8.tar.gz
Algorithm	Hash digest
SHA256	`b0c22381aacc05517fda26bf37472b95eb18cb15a573cd3b94b59506502f79a2`
MD5	`59fb7eccb6aaafe7b4f741bb36ad4926`
BLAKE2b-256	`e7a6ec6ffb269bbb08755e9ed4d96da690bf8d8c37bafef74f43f1ecffacccc8`

See more details on using hashes here.

File details

Details for the file smartdoc15_ch1-0.8-py2.py3-none-any.whl.

File metadata

Download URL: smartdoc15_ch1-0.8-py2.py3-none-any.whl
Upload date: Jun 11, 2018
Size: 29.9 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for smartdoc15_ch1-0.8-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`9d6bc2cbaf870aacc6558986b5b7b12e5d72353efd069ace8b50cbc8b1448e45`
MD5	`d467f16fb445d6466d1814265875569c`
BLAKE2b-256	`a6e2a1585a9ec2af0bfd650f249e3fffd146ae6dd394ed0f2df093373a5ad0b8`

See more details on using hashes here.

smartdoc15-ch1 0.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Dataset version

Sample usage

Task 1: Segmentation

Task 2: Model classification

Task 3: Model type classification

Optional: Using model images

Manual download option

API

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes