ContentAI Activity Classification Service
Project description
activity-classifier-extractor
Generates activity classifications from low-level feature inputs in support of analytic workflows within the ContentAI Platform, published as the extractor dsai_activity_classifier.
Getting Started
verbose - (bool) - verbose input/output configuration printing (default=false)
path_content - (str) - input video path for files to label (default=video.mp4)
path_result - (str) - output path for samples (default=.)
path_models - (str) - manifest path for model information (default=data/models/manifest.json)
time_interval - (float) - time interval for predictions from models (default=3.0)
average_predictions - (bool) - flatten predictions across time and class (default=false)
round_decimals - (int) - rounding decimals for predictions (default=5)
score_min - (float) - apply a minimum score threshold for classes (default=0.1)
dependencies
pip install --no-cache-dir -r requirements.txt
Execution and Deployment
This package is meant to be run as a one-off processing tool that aggregates the insights of other extractors.
command-line standalone
Run the code as if it is an extractor. In this mode, configure a few environment variables to let the code know where to look for content.
One can also run the command-line with a single argument as input and optionally ad runtime configuration (see runtime variables) as part of the EXTRACTOR_METADATA variable as JSON.
EXTRACTOR_METADATA='{"compressed":True}'
Locally Run Classifier on Results
For utility, the above line has been wrapped in the bash script run_local.sh.
./run_local.sh <docker_image> [<source_directory> <output_data_dir> [<json_args>]] [<all_args>]
- run clip extraction on source with prior processing
<docker_image> = 0 IF local command-line based (args using arg parse)
= 1 IF local docker emulation
= IMAGE_NAME IF docker image name to run
./run_local.sh 0 --path_content features/ --path_result results/ --verbose
./run_local.sh 1 features/ results/ 0 '{\"verbose\"true}'
Through all of the above examples, the underlying command-line execution is similar to this excution run on the testing data.
python -u activity_classifier/main.py --path_content testing/data/launch/video.mp4
--path_result testing/class --path_models activity_classifier/data/models/manifest.json --verbose
Feature-Based Similarity
A helper script is also avaialble to compute the similarity of clips in one or more feature files. (v1.1.0)
python -u activity_classifier/features.py --path_content testing/data/dummy.txt \\
--feature_type dsai_videocnn dsai_vggish --path_result testing/dist
ContentAI
Deployment
Deployment is easy and follows standard ContentAI steps.
contentai deploy dsai_activity_classifier
Deploying...
writing workflow.dot
done
Alternatively, you can pass an image name to reduce rebuilding a docker instance.
docker build -t dsai_activity_classifier
contentai deploy metadata-flatten dsai_activity_classifier
Locally Downloading Results
You can locally download data from a specific job for this extractor to directly analyze.
contentai data wHaT3ver1t1s --dir data
Run as an Extractor
contentai run https://bucket/video.mp4 -w 'digraph { dsai_videocnn -> dsai_activity_classifier; dsai_vggish -> dsai_activity_classifier }'
JOB ID: 1Tfb1vPPqTQ0lVD1JDPUilB8QNr
CONTENT: s3://bucket/video.mp4
STATE: complete
START: Fri Feb 15 04:38:05 PM (6 minutes ago)
UPDATED: 1 minute ago
END: Fri Feb 15 04:43:04 PM (1 minute ago)
DURATION: 4 minutes
EXTRACTORS
my_extractor
TASK STATE START DURATION
724a493 complete 5 minutes ago 1 minute
Or run it via the docker image. Please review the run_local.sh file for more information.
View Extractor Logs (stdout)
contentai logs -f <my_extractor>
my_extractor Fri Nov 15 04:39:22 PM writing some data
Job complete in 4m58.265737799s
Adding New Models
There are two steps to adding new models.
First, train the models and formulate a well-known structure (this can be done exhaustively across a number of model types). See MODELS.rst for more details.
Update the manifest according to the instructions below to indicate how the activity classifier should load the model (e.g. the framework), the required features, and a few fields for understanding other descriptions (e.g. the name and the id).
Updating The Manifest
Adding models to the pre-determined set of models is as easy as editing a manifest file and adding a model into git LFS.
Archive the new model into a serialized fileset. At time of writing, this was serializing models from sklearn with simple pickle load/save serialization.
Gather all of the relevant output files and compress them if you can. Currently, the library understands gzip compression extensions (e.g. “.gz”).
Choose the appropriate sub-directory that corresponds to the upstream feature extractor. For example, models built on 3dcnn features may process new videos (via extractor chaining) to the extractor dsai_3dcnn. If one doesn’t exist yet, please create a new directory, but remember what combination of audio and video features is required.
Modify the manifest file in activity_classifier/data/models/manifest.json for your new entry. Specifically, the input video and audio features must be defined as well as the serialization library. Below is an example block that indicates 3dcnn` video and ``vggish audio features for a model crated with sklearn where prediction results will be nested with the name Running.
[ ... { "path": "3dcnn-vggish/lr-Running.pkl.gz", "name": "Running", "id": "ugc", "framework": "sklearn", "video": "dsai_videocnn", "audio": "dsai_vggish" }, ... ]
Prepare to add your model files to the repo. NOTE This repo uses `git-lfs <https://git-lfs.github.com/>`__ to store all binary files like models. If your model is added with regular git tools alone, you will get a sternly worded email (and friendly advice on how to re-add correctly).
(from the base directory only) git lfs track activity_classifier/data/models/3dcnn/moonwalk_model.pkl.gz git add activity_classifier/data/models/3dcnn/moonwalk_model.pkl.gz git add activity_classifier/data/models/manifest.json
Test your model with the data in the testing directory. The CI/CD process should do this too but it’s always easier to find and fix problems here than with a vague email. The features in this directory came from processing of the HBO Max Launch Video, which is publicly available as a reference.
(from the base directory) ./run_local.sh 0 --path_content testing/data/test.mp4 --time_interval 1.5 (check for predictions from your new model in data.json)
Testing
Testing is included via tox. To launch testing for the entire package, just run tox at the command line. Testing can also be run for a specific file within the package by setting the evironment variable TOX_ARGS.
TOX_ARG=test_basic.py tox
Future Development
additional training hooks?
Changes
Generates activity classifications from low-level feature inputs in support of analytic workflows within the ContentAI Platform.
1.3
1.3.7
fix run_local typos
more verbosity checks
1.3.6
modeling.py separators
docs reorg
1.3.5
contentai key request fix
1.3.3
docs update
multiclass write
1.3.2
docker build update, run example update
1.3.1
docs fix for example of using package
bug fix for default location, change inputs to classify function
1.3.0
move models out of the primary package
breaking change, rename input param path_models to path_manifest
1.2
1.2.2
bump version for model migration to LFS
1.2.1
fix docker/deployed image run command
1.2.0
switch to package representation, push to pypi
several updates for MANIFEST definition (id)
inclusion of multi-parameter training and testing framework
safety for model loading, catch exceptions, return gracefully
update documents to split for binary models
1.1
1.1.1
cosmetic change for reuse in other libraries
1.1.0
refactor feature code, add utility for difference computation among segments
min value thresholding to avoid low scoring results in output (default=0.1)
refactor caching information for feature load (allow flatten, remove cache, allow multi-asset)
allow recursive feature load for distance compute
1.0
1.0.2
fixes for output, modify to require other extractors as dependencies
fix order of paramters for local runs
1.0.1
updates for integration of other models, fixes for prediction output
add l2norm after average/merge in time of source features
1.0.0
initial project merge from other sources
generates json prediction dict
callable as package
includes some testing routines with windowing comparison
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file contentai_activity_classifier-1.3.7-py2.py3-none-any.whl
.
File metadata
- Download URL: contentai_activity_classifier-1.3.7-py2.py3-none-any.whl
- Upload date:
- Size: 33.3 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56e26f407fff2ba91eb1d46cc1ecbe40611b2fa850f68e0b6be8bd4830b15ead |
|
MD5 | 9fdc93eb53b81f224d5498430f5922ea |
|
BLAKE2b-256 | 88f91ea16c08edee2934a0cd195c28c8e9bae36315dfe929ca4fde555ea3f681 |