makes it easier to implement a ContentAI extractor
Project description
contentai-extractor-runtime-python
This is a python package used for implementing a custom extractor that runs on the ContentAI platform.
https://pypi.org/project/contentaiextractor/
Usage
pip install contentaiextractor
import contentaiextractor as contentai
# download content locally
content_path = contentai.download_content()
# access metadata that was supplied when running a job
# contentai run s3://bucket/video.mp4 -d '{ "input": "value" }'
inputData = contentai.metadata()["input"]
# get output from another extractor
csv = contentai.get("extractor", "data.csv")
json = contentai.get_json("extractor", "data.json")
# extract some data
outputData = []
outputData.append({"frameNumber": 1})
# output data from this extractor
contentai.set("output", outputData)
API Documentation
ContentAIError Objects
class ContentAIError(Exception)
represents a contentai error
Fields
extractor_name
- name of the extractor being runjob_id
- current job idcontent_url
- URL of the content the extractor is run againstcontent_path
- local path where the extractor can access the contentresult_path
- local path where the extractor should write the resultsrunning_in_contentai
- boolean set toTrue
; useful for testing code locallymetadata_json
- raw string (orNone
if not set) for active extractor run (also, see parsed metadata())
Functions
download_content
download_content()
download content to work with locally
returns local path where content is written
metadata
metadata()
returns a dict containing input metadata
example:
access metadata that was supplied when running a job
contentai run s3://bucket/video.mp4 -d '{ "input: "value" }'
input = contentai.metadata()["input"]
extractors
extractors()
get list of all extractors executed against this content url
returns a list of strings
[
"extractor1",
"extractor2"
],
example:
# get all data from all extractors
for extractor in contentai.extractors():
for key in contentai.keys(extractor):
data = contentai.get(extractor, key)
keys
keys(extractor_name)
get a list of keys for specified extractor
returns a dict containing a list of keys
[
"data.json",
"data.csv",
"data.txt,"
]
example:
keys = contentai.keys("azure_videoindexer")
for key in keys:
data = contentai.get("azure_videoindexer", key)
get
get(extractor_name, key)
get the contents of a particular key
example:
# get another extractor's output
data = contentai.get("some_extractor", "output.csv")
get_json
get_json(extractor_name, key)
get the json contents of a particular key
example:
# get another extractor's output
data = contentai.get_json("some_extractor", "data.json")
get_bytes
get_bytes(extractor_name, key)
get the contents of a particular key in raw bytes
example:
# get another extractor's output
data = contentai.get_bytes("some_extractor", "output.bin")
set
set(key, value)
set results data for this extractor
can be called multiple times with different keys
value is a string
example:
contentai.set("output", "hello world")
set_json
set_json(key, value)
set results data for this extractor
can be called multiple times with different keys
value can be anything
example:
data = {}
data["foo"] = bar
contentai.set_json("output", data)
set_bytes
set_bytes(key, value)
set results data for this extractor
can be called multiple times with different keys
value is bytes
example:
some_file = open("some-file", "rb")
contentai.set_bytes("output", some_file.read())
save_results
save_results()
save results immediately, instead of waiting until process exits
parse_content_url
parse_content_url()
extract details from content url
returns
source_bucket_name
- the s3 bucket name derived from content_urlsource_bucket_key
- the s3 bucket key derived from content_urlsource_bucket_region
- the s3 bucket region derived from content_url
the following content url
formats are supported:
- Simple (CLI) Format -
s3://{bucket}/{key}
- Virtual Hosted Format -
https://{bucket}.s3.amazonaws.com/{key}
- Virtual Hosted Format with Region -
https://{bucket}.s3.{region}.amazonaws.com/{key}
Dependencies
pip install -r requirements.txt
Develop
Choose a make command to run
build build package
deploy upload package to pypi
docs generates api docs in markdown
Release
To publish a new release to pypi, increment the version number in setup.py
, tag the commit and push it.
Changes
-
1.1.0
- add
extractors()
- add
-
1.0.4
- updated changelog
-
1.0.3
- fixes issue where
EXTRACTOR_METADATA
envvar was indavertently required
- fixes issue where
-
1.0.2
- add safety to setting retrieval on local runs
- documentation updates
-
1.0.1
- api docs for publish to pypi
-
1.0.0
- initial release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file contentaiextractor-1.1.0.tar.gz
.
File metadata
- Download URL: contentaiextractor-1.1.0.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.9.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c768840059baef9ec812d31d713d90173b470d880a15569661d796a1bec21bcf |
|
MD5 | eb9a9994e17bc809b059eeb1a28d3f0d |
|
BLAKE2b-256 | 7e4c56557c59a8fcdb279fe817f8d4fe6f3029cd6f5edbc64303844c298a165d |
File details
Details for the file contentaiextractor-1.1.0-py3-none-any.whl
.
File metadata
- Download URL: contentaiextractor-1.1.0-py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.9.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8fd69efe605841b1a8610c395b1ba4a8cb5705822fe964b06ddd7819c0206d35 |
|
MD5 | 518fb7f01234dd63b862eddb7b4fc0f0 |
|
BLAKE2b-256 | d5d002f734252821966b100a345641f2b729b0658c423eb9ed0abc098269fd6e |