Skip to main content

makes it easier to implement a ContentAI extractor

Project description

contentai-extractor-runtime-python

This is a python package used for implementing a custom extractor that runs on the ContentAI platform.

https://pypi.org/project/contentaiextractor/

  1. Usage
  2. API Documentation
  3. Dependencies
  4. Develop
  5. Changes

Usage

pip install contentaiextractor
import contentaiextractor as contentai

# download content locally
content_path = contentai.download_content()

# access metadata that was supplied when running a job
# contentai run s3://bucket/video.mp4 -d '{ "input": "value" }'
inputData = contentai.metadata()["input"]

# get output from another extractor
csv = contentai.get("extractor", "data.csv")
json = contentai.get_json("extractor", "data.json")

# extract some data
outputData = []
outputData.append({"frameNumber": 1})

# output data from this extractor
contentai.set("output", outputData)

API Documentation

ContentAIError Objects

class ContentAIError(Exception)

represents a contentai error

Fields

  • extractor_name - name of the extractor being run
  • job_id - current job id
  • content_url - URL of the content the extractor is run against
  • content_path - local path where the extractor can access the content
  • result_path - local path where the extractor should write the results
  • running_in_contentai - boolean set to True; useful for testing code locally
  • metadata_json - raw string (or None if not set) for active extractor run (also, see parsed metadata())

Functions

download_content

download_content()

download content to work with locally

returns local path where content is written

metadata

metadata()

returns a dict containing input metadata

example:

access metadata that was supplied when running a job

contentai run s3://bucket/video.mp4 -d '{ "input: "value" }'
input = contentai.metadata()["input"]

extractors

extractors()

get list of all extractors executed against this content url

returns a list of strings

[
    "extractor1",
    "extractor2"
],

example:

# get all data from all extractors
for extractor in contentai.extractors():
    for key in contentai.keys(extractor):
        data = contentai.get(extractor, key)

keys

keys(extractor_name)

get a list of keys for specified extractor

returns a dict containing a list of keys

[
  "data.json",
  "data.csv",
  "data.txt,"
]

example:

keys = contentai.keys("azure_videoindexer")
for key in keys:
    data = contentai.get("azure_videoindexer", key)

get

get(extractor_name, key)

get the contents of a particular key

example:

# get another extractor's output
data = contentai.get("some_extractor", "output.csv")

get_json

get_json(extractor_name, key)

get the json contents of a particular key

example:

# get another extractor's output
data = contentai.get_json("some_extractor", "data.json")

get_bytes

get_bytes(extractor_name, key)

get the contents of a particular key in raw bytes

example:

# get another extractor's output
data = contentai.get_bytes("some_extractor", "output.bin")

set

set(key, value)

set results data for this extractor

can be called multiple times with different keys

value is a string

example:

contentai.set("output", "hello world")

set_json

set_json(key, value)

set results data for this extractor

can be called multiple times with different keys

value can be anything

example:

data = {}
data["foo"] = bar
contentai.set_json("output", data)

set_bytes

set_bytes(key, value)

set results data for this extractor

can be called multiple times with different keys

value is bytes

example:

some_file = open("some-file", "rb")
contentai.set_bytes("output", some_file.read())

save_results

save_results()

save results immediately, instead of waiting until process exits

parse_content_url

parse_content_url()

extract details from content url

returns

  • source_bucket_name - the s3 bucket name derived from content_url
  • source_bucket_key - the s3 bucket key derived from content_url
  • source_bucket_region - the s3 bucket region derived from content_url

the following content url formats are supported:

  • Simple (CLI) Format - s3://{bucket}/{key}
  • Virtual Hosted Format - https://{bucket}.s3.amazonaws.com/{key}
  • Virtual Hosted Format with Region - https://{bucket}.s3.{region}.amazonaws.com/{key}

Dependencies

pip install -r requirements.txt

Develop

 Choose a make command to run

  build    build package
  deploy   upload package to pypi
  docs     generates api docs in markdown

Release

To publish a new release to pypi, increment the version number in setup.py, tag the commit and push it.

Changes

  • 1.1.0

    • add extractors()
  • 1.0.4

    • updated changelog
  • 1.0.3

    • fixes issue where EXTRACTOR_METADATA envvar was indavertently required
  • 1.0.2

    • add safety to setting retrieval on local runs
    • documentation updates
  • 1.0.1

    • api docs for publish to pypi
  • 1.0.0

    • initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contentaiextractor-1.1.0.tar.gz (6.4 kB view hashes)

Uploaded Source

Built Distribution

contentaiextractor-1.1.0-py3-none-any.whl (9.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page