Skip to main content

ContentAI Activity Classification Service

Project description

activity-classifier-extractor

Generates activity classifications from low-level feature inputs in support of analytic workflows within the ContentAI Platform, published as the extractor dsai_activity_classifier.

  1. Getting Started

  2. Execution

  3. Creating Models

  4. Testing

  5. Future Development

  6. Changes

Getting Started

This library is used as a single-run executable.
Runtime parameters can be passed for processing that configure the returned results and can be examined in more detail in the main script.
  • verbose - (bool) - verbose input/output configuration printing (default=false)

  • path_content - (str) - input video path for files to label (default=video.mp4)

  • path_result - (str) - output path for samples (default=.)

  • path_models - (str) - manifest path for model information (default=data/models/manifest.json)

  • time_interval - (float) - time interval for predictions from models (default=3.0)

  • average_predictions - (bool) - flatten predictions across time and class (default=false)

  • round_decimals - (int) - rounding decimals for predictions (default=5)

  • score_min - (float) - apply a minimum score threshold for classes (default=0.1)

dependencies

To install package dependencies in a fresh system, the recommended technique is a set of
vanilla pip packages. The latest requirements should be validated from the requirements.txt file but at time of writing, they were the following.
pip install --no-cache-dir -r requirements.txt

Execution and Deployment

This package is meant to be run as a one-off processing tool that aggregates the insights of other extractors.

command-line standalone

Run the code as if it is an extractor. In this mode, configure a few environment variables to let the code know where to look for content.

One can also run the command-line with a single argument as input and optionally ad runtime configuration (see runtime variables) as part of the EXTRACTOR_METADATA variable as JSON.

EXTRACTOR_METADATA='{"compressed":True}'

Locally Run Classifier on Results

For utility, the above line has been wrapped in the bash script run_local.sh.

./run_local.sh <docker_image> [<source_directory> <output_data_dir> [<json_args>]] [<all_args>]
   - run clip extraction on source with prior processing

  <docker_image> = 0 IF local command-line based (args using arg parse)
                 = 1 IF local docker emulation
                 = IMAGE_NAME IF docker image name to run

  ./run_local.sh 0 --path_content features/ --path_result results/ --verbose
  ./run_local.sh 1 features/ results/ 0 '{\"verbose\"true}'

Through all of the above examples, the underlying command-line execution is similar to this excution run on the testing data.

python -u activity_classifier/main.py --path_content testing/data/launch/video.mp4
        --path_result testing/class --path_models activity_classifier/data/models/manifest.json --verbose

Feature-Based Similarity

A helper script is also avaialble to compute the similarity of clips in one or more feature files. (v1.1.0)

python -u activity_classifier/features.py --path_content testing/data/dummy.txt \\
        --feature_type dsai_videocnn dsai_vggish --path_result testing/dist

ContentAI

Deployment

Deployment is easy and follows standard ContentAI steps.

contentai deploy dsai_activity_classifier
Deploying...
writing workflow.dot
done

Alternatively, you can pass an image name to reduce rebuilding a docker instance.

docker build -t dsai_activity_classifier
contentai deploy metadata-flatten dsai_activity_classifier

Locally Downloading Results

You can locally download data from a specific job for this extractor to directly analyze.

contentai data wHaT3ver1t1s --dir data

Run as an Extractor

contentai run https://bucket/video.mp4  -w 'digraph { dsai_videocnn -> dsai_activity_classifier; dsai_vggish -> dsai_activity_classifier }'

JOB ID:     1Tfb1vPPqTQ0lVD1JDPUilB8QNr
CONTENT:    s3://bucket/video.mp4
STATE:      complete
START:      Fri Feb 15 04:38:05 PM (6 minutes ago)
UPDATED:    1 minute ago
END:        Fri Feb 15 04:43:04 PM (1 minute ago)
DURATION:   4 minutes

EXTRACTORS

my_extractor

TASK      STATE      START           DURATION
724a493   complete   5 minutes ago   1 minute

Or run it via the docker image. Please review the run_local.sh file for more information.

View Extractor Logs (stdout)

contentai logs -f <my_extractor>
my_extractor Fri Nov 15 04:39:22 PM writing some data
Job complete in 4m58.265737799s

Adding New Models

There are two steps to adding new models.

  1. First, train the models and formulate a well-known structure (this can be done exhaustively across a number of model types). See MODELS.rst for more details.

  2. Update the manifest according to the instructions below to indicate how the activity classifier should load the model (e.g. the framework), the required features, and a few fields for understanding other descriptions (e.g. the name and the id).

Updating The Manifest

Adding models to the pre-determined set of models is as easy as editing a manifest file and adding a model into git LFS.

  1. Archive the new model into a serialized fileset. At time of writing, this was serializing models from sklearn with simple pickle load/save serialization.

  2. Gather all of the relevant output files and compress them if you can. Currently, the library understands gzip compression extensions (e.g. “.gz”).

  3. Choose the appropriate sub-directory that corresponds to the upstream feature extractor. For example, models built on 3dcnn features may process new videos (via extractor chaining) to the extractor dsai_3dcnn. If one doesn’t exist yet, please create a new directory, but remember what combination of audio and video features is required.

  4. Modify the manifest file in activity_classifier/data/models/manifest.json for your new entry. Specifically, the input video and audio features must be defined as well as the serialization library. Below is an example block that indicates 3dcnn` video and ``vggish audio features for a model crated with sklearn where prediction results will be nested with the name Running.

    [ ...
    {
        "path": "3dcnn-vggish/lr-Running.pkl.gz",
        "name": "Running",
        "id": "ugc",
        "framework": "sklearn",
        "video": "dsai_videocnn",
        "audio": "dsai_vggish"
    },
    ... ]
  5. Prepare to add your model files to the repo. NOTE This repo uses `git-lfs <https://git-lfs.github.com/>`__ to store all binary files like models. If your model is added with regular git tools alone, you will get a sternly worded email (and friendly advice on how to re-add correctly).

    (from the base directory only)
    git lfs track activity_classifier/data/models/3dcnn/moonwalk_model.pkl.gz
    git add activity_classifier/data/models/3dcnn/moonwalk_model.pkl.gz
    git add activity_classifier/data/models/manifest.json
  6. Test your model with the data in the testing directory. The CI/CD process should do this too but it’s always easier to find and fix problems here than with a vague email. The features in this directory came from processing of the HBO Max Launch Video, which is publicly available as a reference.

    (from the base directory)
    
    ./run_local.sh 0 --path_content testing/data/test.mp4 --time_interval 1.5
    
    (check for predictions from your new model in data.json)

Testing

Testing is included via tox. To launch testing for the entire package, just run tox at the command line. Testing can also be run for a specific file within the package by setting the evironment variable TOX_ARGS.

TOX_ARG=test_basic.py tox

Future Development

  • additional training hooks?

Changes

Generates activity classifications from low-level feature inputs in support of analytic workflows within the ContentAI Platform.

1.3

1.3.7

  • fix run_local typos

  • more verbosity checks

1.3.6

  • modeling.py separators

  • docs reorg

1.3.5

  • contentai key request fix

1.3.3

  • docs update

  • multiclass write

1.3.2

  • docker build update, run example update

1.3.1

  • docs fix for example of using package

  • bug fix for default location, change inputs to classify function

1.3.0

  • move models out of the primary package

  • breaking change, rename input param path_models to path_manifest

1.2

1.2.2

  • bump version for model migration to LFS

1.2.1

  • fix docker/deployed image run command

1.2.0

  • switch to package representation, push to pypi

  • several updates for MANIFEST definition (id)

  • inclusion of multi-parameter training and testing framework

  • safety for model loading, catch exceptions, return gracefully

  • update documents to split for binary models

1.1

1.1.1

  • cosmetic change for reuse in other libraries

1.1.0

  • refactor feature code, add utility for difference computation among segments

  • min value thresholding to avoid low scoring results in output (default=0.1)

  • refactor caching information for feature load (allow flatten, remove cache, allow multi-asset)

  • allow recursive feature load for distance compute

1.0

1.0.2

  • fixes for output, modify to require other extractors as dependencies

  • fix order of paramters for local runs

1.0.1

  • updates for integration of other models, fixes for prediction output

  • add l2norm after average/merge in time of source features

1.0.0

  • initial project merge from other sources

  • generates json prediction dict

  • callable as package

  • includes some testing routines with windowing comparison

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

contentai_activity_classifier-1.3.7-py2.py3-none-any.whl (33.3 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page