makes it easier to implement a ContentAI extractor

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

contentai-extractor-runtime-python

This is a python package used for implementing a custom extractor that runs on the ContentAI platform.

https://pypi.org/project/contentaiextractor/

Usage
API Documentation
Dependencies
Develop
Changes

Usage

pip install contentaiextractor

import contentaiextractor as contentai

# download content locally
content_path = contentai.download_content()

# access metadata that was supplied when running a job
# contentai run s3://bucket/video.mp4 -d '{ "input": "value" }'
inputData = contentai.metadata()["input"]

# get output from another extractor
csv = contentai.get("extractor", "data.csv")
json = contentai.get_json("extractor", "data.json")

# extract some data
outputData = []
outputData.append({"frameNumber": 1})

# output data from this extractor
contentai.set("output", outputData)

API Documentation

ContentAIError Objects

class ContentAIError(Exception)

represents a contentai error

Fields

extractor_name - name of the extractor being run
job_id - current job id
content_url - URL of the content the extractor is run against
content_path - local path where the extractor can access the content
result_path - local path where the extractor should write the results
running_in_contentai - boolean set to True; useful for testing code locally
metadata_json - raw string (or None if not set) for active extractor run (also, see parsed metadata())

Functions

download_content

download_content()

download content to work with locally

returns local path where content is written

metadata

metadata()

returns a dict containing input metadata

example:

access metadata that was supplied when running a job

contentai run s3://bucket/video.mp4 -d '{ "input: "value" }'

input = contentai.metadata()["input"]

extractors

extractors()

get list of all extractors executed against this content url

returns a list of strings

[
    "extractor1",
    "extractor2"
],

example:

# get all data from all extractors
for extractor in contentai.extractors():
    for key in contentai.keys(extractor):
        data = contentai.get(extractor, key)

keys

keys(extractor_name)

get a list of keys for specified extractor

returns a dict containing a list of keys

[
  "data.json",
  "data.csv",
  "data.txt,"
]

example:

keys = contentai.keys("azure_videoindexer")
for key in keys:
    data = contentai.get("azure_videoindexer", key)

get

get(extractor_name, key)

get the contents of a particular key

example:

# get another extractor's output
data = contentai.get("some_extractor", "output.csv")

get_json

get_json(extractor_name, key)

get the json contents of a particular key

example:

# get another extractor's output
data = contentai.get_json("some_extractor", "data.json")

get_bytes

get_bytes(extractor_name, key)

get the contents of a particular key in raw bytes

example:

# get another extractor's output
data = contentai.get_bytes("some_extractor", "output.bin")

set

set(key, value)

set results data for this extractor

can be called multiple times with different keys

value is a string

example:

contentai.set("output", "hello world")

set_json

set_json(key, value)

set results data for this extractor

can be called multiple times with different keys

value can be anything

example:

data = {}
data["foo"] = bar
contentai.set_json("output", data)

set_bytes

set_bytes(key, value)

set results data for this extractor

can be called multiple times with different keys

value is bytes

example:

some_file = open("some-file", "rb")
contentai.set_bytes("output", some_file.read())

save_results

save_results()

save results immediately, instead of waiting until process exits

parse_content_url

parse_content_url()

extract details from content url

returns

source_bucket_name - the s3 bucket name derived from content_url
source_bucket_key - the s3 bucket key derived from content_url
source_bucket_region - the s3 bucket region derived from content_url

the following content url formats are supported:

Simple (CLI) Format - s3://{bucket}/{key}
Virtual Hosted Format - https://{bucket}.s3.amazonaws.com/{key}
Virtual Hosted Format with Region - https://{bucket}.s3.{region}.amazonaws.com/{key}

Dependencies

pip install -r requirements.txt

Develop

 Choose a make command to run

  build    build package
  deploy   upload package to pypi
  docs     generates api docs in markdown

Release

To publish a new release to pypi, increment the version number in setup.py, tag the commit and push it.

Changes

1.1.0
- add extractors()
1.0.4
- updated changelog
1.0.3
- fixes issue where EXTRACTOR_METADATA envvar was indavertently required
1.0.2
- add safety to setting retrieval on local runs
- documentation updates
1.0.1
- api docs for publish to pypi
1.0.0
- initial release

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.1.0

Nov 18, 2020

1.0.5rc0 pre-release

Nov 13, 2020

1.0.4

Aug 3, 2020

1.0.3

Aug 3, 2020

1.0.2

Jul 31, 2020

1.0.1

Jul 29, 2020

1.0.0

Jul 22, 2020

0.1.0rc9 pre-release

Jul 22, 2020

0.1.0rc8 pre-release

Jul 22, 2020

0.1.0rc7 pre-release

Jul 22, 2020

0.1.0rc6 pre-release

Jul 22, 2020

0.1.0rc5 pre-release

Jul 22, 2020

0.1.0rc4 pre-release

Jul 22, 2020

0.1.0rc3 pre-release

Jul 22, 2020

0.1.0rc2 pre-release

Jul 22, 2020

0.1.0rc1 pre-release

Jul 22, 2020

0.1.0rc0 pre-release

Jul 21, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contentaiextractor-1.1.0.tar.gz (6.4 kB view hashes)

Uploaded Nov 18, 2020 Source

Built Distribution

contentaiextractor-1.1.0-py3-none-any.whl (9.8 kB view hashes)

Uploaded Nov 18, 2020 Python 3

Hashes for contentaiextractor-1.1.0.tar.gz

Hashes for contentaiextractor-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c768840059baef9ec812d31d713d90173b470d880a15569661d796a1bec21bcf`
MD5	`eb9a9994e17bc809b059eeb1a28d3f0d`
BLAKE2b-256	`7e4c56557c59a8fcdb279fe817f8d4fe6f3029cd6f5edbc64303844c298a165d`

Hashes for contentaiextractor-1.1.0-py3-none-any.whl

Hashes for contentaiextractor-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8fd69efe605841b1a8610c395b1ba4a8cb5705822fe964b06ddd7819c0206d35`
MD5	`518fb7f01234dd63b862eddb7b4fc0f0`
BLAKE2b-256	`d5d002f734252821966b100a345641f2b729b0658c423eb9ed0abc098269fd6e`