Featurize images using a decapitated, pre-trained deep learning network
Project description
Pic2Vec
Featurize images using a small, contained pre-trained deep learning network
- Free software: BSD license
Features
This is the prototype for image features engineering. Supports Python 2.7, 3.4, 3.5, 3.6, and 3.7
pic2vec
is a python package that performs automated feature extraction
for image data. It supports feature engineering on new image data, and allows
traditional machine learning algorithms (such as tree-based algorithms) to
train on image data.
Input Specification
Data Format
pic2vec
works on image data represented as either:
- A directory of image files.
- As URL pointers contained in a CSV.
- Or as a directory of images with a CSV containing pointers to the image files.
If no CSV is provided with the directory, it automatically generates a CSV to store the features with the appropriate images.
Each row of the CSV represents a different image, and image rows can also have columns containing other data about the images as well. Each image's featurized representation will be appended as a series of new columns at the end of the appropriate image row.
Constraints Specification
The goal of this project was to make the featurizer as easy to use and hard to break as possible. If working properly, it should be resistant to badly-formatted data, such as missing rows or columns in the csv, image mismatches between a CSV and an image directory, and invalid image formats.
However, for the featurizer to function optimally, it prefers certain constraints:
-
The CSV should have no missing columns or rows, and there should be full overlap between images in the CSV and the image directory
-
If checking predictions on a separate test set (such as on Kaggle), the filesystem needs to sort filepaths consistently with the sorting of the test set labels. The order in the CSV (whether generated automatically or passed in) will be considered the canonical order for the feature vectors.
The featurizer can only process .png, .jpeg, or .bmp image files. Any other images will be left out of the featurization by being represented by zero vectors in the image batch.
Quick Start
The following Python code shows a typical usage of pic2vec
:
from pic2vec import ImageFeaturizer
image_column_name = 'images'
my_csv = 'path/to/data.csv'
my_image_directory = 'path/to/image/directory/'
my_featurizer = ImageFeaturizer(model='xception', depth=2, autosample=True)
featurized_df = my_featurizer.featurize(image_column_name, csv_path=my_csv,
image_path=my_image_directory)
Examples
To get started, see the following example:
- Cats vs. Dogs: Dataset from combined directory + CSV
Examples coming soon: 2. Hot Dog, Not Hot Dog: Dataset from a CSV with URLs and no image directory
Installation
See the Installation Guide for details.
Installing Keras/Tensorflow
If you run into trouble installing Keras or Tensorflow as a dependency, read the Keras installation guide and Tensorflow installation guide for details about installing Keras/Tensorflow on your machine.
Using Featurizer Output With DataRobot
pic2vec
generates a flat CSV which is ready for supervised modeling, if the data has been labelled with a variable that
can be used as a target. The images are transformed into a set of regular columns containing numeric data.
Additionally, if unlabelled, it can be used for unsupervised learning (such as anomaly detection).
Running tests
To run the unit tests with pytest
, run
py.test tests
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
======= History
0.101.1 (2019-9-25)
- Limited Keras version to pre-2.3.0 to fix issues from Keras' breaking changes
0.101.0 (2019-3-25)
- Updated version of Trafaret to a non-beta version
- Updated keras to 2.2.3 or greater
- This library upgrade changes prediction consistency across past versions of pic2vec. ResNet50 is the model that has changed the most, due to changes in implementation. Other models have small floating point changes, but still pass np.testing.assert_allclose tests.
0.100.1 (2019-3-24)
- Updated version of Pillow to 5.4.1, in order to support Python 3.7
- Updated the README
0.100.0 (2018-12-10)
- Added test coverage and increased error checking
- Changed default csv name
- Changed
image_column_headers
toimage_columns
everywhere - Updated examples
- Updated version of scipy to 1.1 and numpy to 1.15
0.99.2 (2018-08-01)
- Updated the notebook example
- Some code cleanup
0.99.1 (2018-06-20)
- Lots of code cleanup
- Changed new_csv_name argument to new_csv_path everywhere for consistency
- Removed '_full' from the saved csv_name for the full dataframe. Features-only csv still has '_features_only' in csv name.
- Added 'featurized' to saved csv names
- Removed new_csv_path as argument to functions that do not actually require it
0.99.0 (2018-04-02)
- Added batch processing
- Made pic2vec more programmatic (removed automatic csv-writing, etc.)
- Bound keras to <2.1.5 to remove resnet problem
0.9.0 (2017-09-24)
- Fixed Keras backwards compatibility issues (include_top deprecated, require_flatten added)
- Fixed ResNet50 update issues (removed a zero-padding layer, updated weights)
0.8.2 (2017-08-14)
- Updated trafaret requirement for PyPi package
- Updated cats vs. dogs example
0.8.1 (2017-08-07)
- Fixed bugs with robust naming
- Added error message for failed image conversion
0.8.0 (2017-08-02)
- Added robust naming options to the generated csv files
0.7.1 (2017-08-02)
- Fixed PIL truncated image bug
0.7.0 (2017-08-02)
- Fixed bug with CSV badly formed URLs
- Fixed mistake with InceptionV3 preprocessing happening for every model
0.6.3 (2017-07-25)
- Added Travis and Coveralls for testing and coverage automation
- Repo went public
- Python 3.x compatibility
0.6.2 (2017-07-14)
- Fixed image format recognition.
0.6.1 (2017-07-12)
- Directory-only now natural sorted.
0.6.0 (2017-07-11)
- Added multi-column support
- Added missing image column to csv
0.5.0 (2017-07-06)
- Renamed to pic2vec
- Tests parametrized
0.4.3 (2017-07-03)
- Second round of code review- optimized code, better type checking with trafaret
0.4.2 (2017-06-30)
- Improved README test examples
0.4.1 (2017-06-30)
- Fixed documentation
0.4.0 (2017-06-29)
- Added ability to call multiple models, and packaged in SqueezeNet with weights.
0.3.0 (2017-06-26)
- Created installation instructions and readme files, ready for prototype distribution
0.2.9(2017-06-25)
- Fixed import problem that prevented generated csvs from saving
0.2.8(2017-06-25)
- Fixed variable name bugs
0.2.7(2017-06-25)
- Changed image_directory_path to the more manageable image_path
- Made testing module and preprocessing module slightly more robust.
0.2.6(2017-06-23)
- Added features-only csv test, and got rid of the column headers in the file
- Added Documentation to data featurization modeules
0.2.5(2017-06-23)
- 100% test coverage
- Fixed a problem where a combined directory + csv was appending to the wrong rows when there was a mismatch between the directory and the csv.
0.2.4(2017-06-22)
- Fixed more bugs in build_featurizer
0.2.3(2017-06-22)
- Fixed build_featurizer troubles with building new csv paths in current directory
0.2.2(2017-06-22)
- Full requirements for keras imported
0.2.1 (2017-06-22)
- Bug fixes
0.2.0 (2017-06-22)
- Second release on PyPI.
- Install keras with tensorflow backend specifically
0.1.0 (2017-06-14)
- First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pic2vec-0.101.1.tar.gz
.
File metadata
- Download URL: pic2vec-0.101.1.tar.gz
- Upload date:
- Size: 7.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.9.2 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/2.7.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35d0e446c91e5cb1633b4104261bbc5491aa76f883d642c9df88ce114b5206b4 |
|
MD5 | 3f74a914368de79c7870632b357343b2 |
|
BLAKE2b-256 | d434609accfef8dd0b094971bf608c63ad31122fb04adc333d499876bba99ae9 |
File details
Details for the file pic2vec-0.101.1-py2.py3-none-any.whl
.
File metadata
- Download URL: pic2vec-0.101.1-py2.py3-none-any.whl
- Upload date:
- Size: 4.6 MB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.9.2 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/2.7.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9771edee57f1cddfb0d4ce9ed6dcac776fe67a0c4389dd734fb561a26157ec39 |
|
MD5 | 4259f26470888b3861c91229e0e7adea |
|
BLAKE2b-256 | 6cfcaac8c157cc6b30996717121a76f2ee5b1dfe8cb513dec8d77d29492907a7 |