Skip to main content

makes it easier to implement a ContentAI extractor

Project description

contentai-extractor-runtime-python

This is a python package used for implementing a custom extractor that runs on the ContentAI platform.

https://pypi.org/project/contentaiextractor/

  1. Usage
  2. API Documentation
  3. Dependencies
  4. Develop
  5. Changes

Usage

pip install contentaiextractor
import contentaiextractor as contentai

# download content locally
content_path = contentai.download_content()

# access metadata that was supplied when running a job
# contentai run s3://bucket/video.mp4 -d '{ "input": "value" }'
inputData = contentai.metadata()["input"]

# get output from another extractor
csv = contentai.get("extractor", "data.csv")
json = contentai.get_json("extractor", "data.json")

# extract some data
outputData = []
outputData.append({"frameNumber": 1})

# output data from this extractor
contentai.set("output", outputData)

API Documentation

ContentAIError Objects

class ContentAIError(Exception)

represents a contentai error

Fields

  • extractor_name - name of the extractor being run
  • job_id - current job id
  • content_url - URL of the content the extractor is run against
  • content_path - local path where the extractor can access the content
  • result_path - local path where the extractor should write the results
  • running_in_contentai - boolean set to True; useful for testing code locally
  • metadata_json - raw string (or None if not set) for active extractor run (also, see parsed metadata())

Functions

download_content

download_content()

download content to work with locally

returns local path where content is written

metadata

metadata()

returns a dict containing input metadata

example:

access metadata that was supplied when running a job

contentai run s3://bucket/video.mp4 -d '{ "input: "value" }'
input = contentai.metadata()["input"]

extractors

extractors()

get list of all extractors executed against this content url

returns a list of strings

[
    "extractor1",
    "extractor2"
],

example:

# get all data from all extractors
for extractor in contentai.extractors():
    for key in contentai.keys(extractor):
        data = contentai.get(extractor, key)

keys

keys(extractor_name)

get a list of keys for specified extractor

returns a dict containing a list of keys

[
  "data.json",
  "data.csv",
  "data.txt,"
]

example:

keys = contentai.keys("azure_videoindexer")
for key in keys:
    data = contentai.get("azure_videoindexer", key)

get

get(extractor_name, key)

get the contents of a particular key

example:

# get another extractor's output
data = contentai.get("some_extractor", "output.csv")

get_json

get_json(extractor_name, key)

get the json contents of a particular key

example:

# get another extractor's output
data = contentai.get_json("some_extractor", "data.json")

get_bytes

get_bytes(extractor_name, key)

get the contents of a particular key in raw bytes

example:

# get another extractor's output
data = contentai.get_bytes("some_extractor", "output.bin")

set

set(key, value)

set results data for this extractor

can be called multiple times with different keys

value is a string

example:

contentai.set("output", "hello world")

set_json

set_json(key, value)

set results data for this extractor

can be called multiple times with different keys

value can be anything

example:

data = {}
data["foo"] = bar
contentai.set_json("output", data)

set_bytes

set_bytes(key, value)

set results data for this extractor

can be called multiple times with different keys

value is bytes

example:

some_file = open("some-file", "rb")
contentai.set_bytes("output", some_file.read())

save_results

save_results()

save results immediately, instead of waiting until process exits

parse_content_url

parse_content_url()

extract details from content url

returns

  • source_bucket_name - the s3 bucket name derived from content_url
  • source_bucket_key - the s3 bucket key derived from content_url
  • source_bucket_region - the s3 bucket region derived from content_url

the following content url formats are supported:

  • Simple (CLI) Format - s3://{bucket}/{key}
  • Virtual Hosted Format - https://{bucket}.s3.amazonaws.com/{key}
  • Virtual Hosted Format with Region - https://{bucket}.s3.{region}.amazonaws.com/{key}

Dependencies

pip install -r requirements.txt

Develop

 Choose a make command to run

  build    build package
  deploy   upload package to pypi
  docs     generates api docs in markdown

Release

To publish a new release to pypi, increment the version number in setup.py, tag the commit and push it.

Changes

  • 1.1.0

    • add extractors()
  • 1.0.4

    • updated changelog
  • 1.0.3

    • fixes issue where EXTRACTOR_METADATA envvar was indavertently required
  • 1.0.2

    • add safety to setting retrieval on local runs
    • documentation updates
  • 1.0.1

    • api docs for publish to pypi
  • 1.0.0

    • initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contentaiextractor-1.1.0.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

contentaiextractor-1.1.0-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file contentaiextractor-1.1.0.tar.gz.

File metadata

  • Download URL: contentaiextractor-1.1.0.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.9.0

File hashes

Hashes for contentaiextractor-1.1.0.tar.gz
Algorithm Hash digest
SHA256 c768840059baef9ec812d31d713d90173b470d880a15569661d796a1bec21bcf
MD5 eb9a9994e17bc809b059eeb1a28d3f0d
BLAKE2b-256 7e4c56557c59a8fcdb279fe817f8d4fe6f3029cd6f5edbc64303844c298a165d

See more details on using hashes here.

File details

Details for the file contentaiextractor-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: contentaiextractor-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.9.0

File hashes

Hashes for contentaiextractor-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8fd69efe605841b1a8610c395b1ba4a8cb5705822fe964b06ddd7819c0206d35
MD5 518fb7f01234dd63b862eddb7b4fc0f0
BLAKE2b-256 d5d002f734252821966b100a345641f2b729b0658c423eb9ed0abc098269fd6e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page