makes it easier to implement a ContentAI extractor
Project description
contentai-extractor-runtime-python
This is a python package used for implementing a custom extractor that runs on the ContentAI platform.
https://pypi.org/project/contentaiextractor/
Usage
pip install contentaiextractor
import contentaiextractor as contentai
# download content locally
content_path = contentai.download_content()
# access metadata that was supplied when running a job
# contentai run s3://bucket/video.mp4 -d '{ "input": "value" }'
inputData = contentai.metadata()["input"]
# get output from another extractor
csv = contentai.get("extractor", "data.csv")
json = contentai.get_json("extractor", "data.json")
# extract some data
outputData = []
outputData.append({"frameNumber": 1})
# output data from this extractor
contentai.set("output", outputData)
API Documentation
ContentAIError Objects
class ContentAIError(Exception)
represents a contentai error
Fields
extractor_name
- name of the extractor being runjob_id
- current job idcontent_url
- URL of the content the extractor is run againstcontent_path
- local path where the extractor can access the contentresult_path
- local path where the extractor should write the resultsrunning_in_contentai
- boolean set toTrue
; useful for testing code locallymetadata_json
- raw string (orNone
if not set) for active extractor run (also, see parsed metadata())
Functions
download_content
download_content()
download content to work with locally
returns local path where content is written
metadata
metadata()
returns a dict containing input metadata
example:
access metadata that was supplied when running a job
contentai run s3://bucket/video.mp4 -d '{ "input: "value" }'
input = contentai.metadata()["input"]
extractors
extractors()
get list of all extractors executed against this content url
returns a list of strings
[
"extractor1",
"extractor2"
],
example:
# get all data from all extractors
for extractor in contentai.extractors():
for key in contentai.keys(extractor):
data = contentai.get(extractor, key)
keys
keys(extractor_name)
get a list of keys for specified extractor
returns a dict containing a list of keys
[
"data.json",
"data.csv",
"data.txt,"
]
example:
keys = contentai.keys("azure_videoindexer")
for key in keys:
data = contentai.get("azure_videoindexer", key)
get
get(extractor_name, key)
get the contents of a particular key
example:
# get another extractor's output
data = contentai.get("some_extractor", "output.csv")
get_json
get_json(extractor_name, key)
get the json contents of a particular key
example:
# get another extractor's output
data = contentai.get_json("some_extractor", "data.json")
get_bytes
get_bytes(extractor_name, key)
get the contents of a particular key in raw bytes
example:
# get another extractor's output
data = contentai.get_bytes("some_extractor", "output.bin")
set
set(key, value)
set results data for this extractor
can be called multiple times with different keys
value is a string
example:
contentai.set("output", "hello world")
set_json
set_json(key, value)
set results data for this extractor
can be called multiple times with different keys
value can be anything
example:
data = {}
data["foo"] = bar
contentai.set_json("output", data)
set_bytes
set_bytes(key, value)
set results data for this extractor
can be called multiple times with different keys
value is bytes
example:
some_file = open("some-file", "rb")
contentai.set_bytes("output", some_file.read())
save_results
save_results()
save results immediately, instead of waiting until process exits
parse_content_url
parse_content_url()
extract details from content url
returns
source_bucket_name
- the s3 bucket name derived from content_urlsource_bucket_key
- the s3 bucket key derived from content_urlsource_bucket_region
- the s3 bucket region derived from content_url
the following content url
formats are supported:
- Simple (CLI) Format -
s3://{bucket}/{key}
- Virtual Hosted Format -
https://{bucket}.s3.amazonaws.com/{key}
- Virtual Hosted Format with Region -
https://{bucket}.s3.{region}.amazonaws.com/{key}
Dependencies
pip install -r requirements.txt
Develop
Choose a make command to run
build build package
deploy upload package to pypi
docs generates api docs in markdown
Release
To publish a new release to pypi, increment the version number in setup.py
, tag the commit and push it.
Changes
-
1.1.0
- add
extractors()
- add
-
1.0.4
- updated changelog
-
1.0.3
- fixes issue where
EXTRACTOR_METADATA
envvar was indavertently required
- fixes issue where
-
1.0.2
- add safety to setting retrieval on local runs
- documentation updates
-
1.0.1
- api docs for publish to pypi
-
1.0.0
- initial release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for contentaiextractor-1.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8fd69efe605841b1a8610c395b1ba4a8cb5705822fe964b06ddd7819c0206d35 |
|
MD5 | 518fb7f01234dd63b862eddb7b4fc0f0 |
|
BLAKE2b-256 | d5d002f734252821966b100a345641f2b729b0658c423eb9ed0abc098269fd6e |