Skip to main content

REST extention for the Cognite extractor-utils framework

Project description

Cognite extractor-utils REST extension

The REST extension for Cognite extractor-utils provides a way to easily write your own extractors for RESTful source systems.

The library is currently under development, and should not be used in production environments yet.

Overview

The REST extension for extractor utils templetizes how the extractor will make HTTP requests to the source, automatically serializes the response into user-defined DTO classes, and handles uploading of data to CDF.

The only part of the extractor necessary to for a user to implement are

  • Describing how HTTP requests should be constructed using pre-built function decorators
  • Describing the response schema using Python dataclasses
  • Implementing a mapping from the source data model to the CDF data model

For example, consider CDF's Events API as a source. We could describe the response schema as an EventsList dataclass:

@dataclass
class RawEvent:
    externalId: Optional[str]
    dataSetId: Optional[int]
    startTime: Optional[int]
    endTime: Optional[int]
    type: Optional[str]
    subtype: Optional[str]
    description: Optional[str]
    metadata: Optional[Dict[str, str]]
    assetIds: Optional[List[Optional[int]]]
    source: Optional[str]
    id: Optional[int]
    lastUpdatedTime: Optional[int]
    createdTime: Optional[int]


@dataclass
class EventsList:
    items: List[RawEvent]
    nextCursor: Optional[str]

We can then write a handler that takes in one of these EventLists, and returns CDF Events, as represented by instances of the Event class from the cognite.extractorutils.rest.typing module.

extractor = RestExtractor(
    name="Event extractor",
    description="Extractor from CDF events to CDF events",
    version="1.0.0",
    base_url=f"https://api.cognitedata.com/api/v1/projects/{os.environ['COGNITE_PROJECT']}/",
    headers={"api-key": os.environ["COGNITE_API_KEY"]},
)

@extractor.get("events", response_type=EventsList)
def get_events(events: EventsList) -> Generator[Event, None, None]:
    for event in events.items:
        yield Event(
            external_id=f"testy-{event.id}",
            description=event.description,
            start_time=event.startTime,
            end_time=event.endTime,
            type=event.type,
            subtype=event.subtype,
            metadata=event.metadata,
            source=event.source,
        )

with extractor:
    extractor.run()

A full example is provided in the example.py file.

The return type

If the return type is set to cognite.extractorutils.rest.http.JsonBody then the raw json payload will be passed to the handler. This is useful for cases where the payload is hard or impossible to describe with data classes.

If the return type is set to requests.Response, the raw response message itself is passed to the handler.

Lists at the root

Using Python dataclasses we're not able to express JSON structures where the root element is a list. To get around that responses of this nature will be automatically converted to something which can be modeled with Python dataclasses.

A JSON structure containing a list as it's root element will be converted to an object containing a single key, "items", which has the original JSON list as it's value, as in the example below.

[{"object_id": 1}, {"object_id": 2}, {"object_id": 3}]

will be converted to

{
    "items": [{"object_id": 1}, {"object_id": 2}, {"object_id": 3}]
}

This does not apply if the return type is set to JsonBody.

Contributing

We use poetry to manage dependencies and to administrate virtual environments. To develop extractor-utils, follow the following steps to set up your local environment:

  1. Install poetry: (add --user if desirable)
    $ pip install poetry
    
  2. Clone repository:
    $ git clone git@github.com:cognitedata/python-extractor-utils-rest.git
    
  3. Move into the newly created local repository:
    $ cd python-extractor-utils-rest
    
  4. Create virtual environment and install dependencies:
    $ poetry install
    

All code must pass typing and style checks to be merged. It is recommended to install pre-commit hooks to ensure that these checks pass before commiting code:

$ poetry run pre-commit install

This project adheres to the Contributor Covenant v2.0 as a code of conduct.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cognite_extractor_utils_rest-0.5.0.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file cognite_extractor_utils_rest-0.5.0.tar.gz.

File metadata

  • Download URL: cognite_extractor_utils_rest-0.5.0.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.28.2 requests-toolbelt/0.10.1 urllib3/1.26.14 tqdm/4.64.1 importlib-metadata/6.0.0 keyring/23.13.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.9.16

File hashes

Hashes for cognite_extractor_utils_rest-0.5.0.tar.gz
Algorithm Hash digest
SHA256 c1075d92af99771da087c604f0c35fd8ea7714ab5c8179da3bb45dac10f03ed4
MD5 8387fe924ba5a2bf82a523d4be0d6c1d
BLAKE2b-256 840d410102e8f454153787bdc92b67b327127c09bae0a3f8da3d48351b9c12f6

See more details on using hashes here.

File details

Details for the file cognite_extractor_utils_rest-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: cognite_extractor_utils_rest-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.28.2 requests-toolbelt/0.10.1 urllib3/1.26.14 tqdm/4.64.1 importlib-metadata/6.0.0 keyring/23.13.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.9.16

File hashes

Hashes for cognite_extractor_utils_rest-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 80b561add0fc876a350e8c018ed863f072f2a273e44b76796eb40947f4014462
MD5 8e98c58527ee791273d579dd85302091
BLAKE2b-256 aaf30b93c8129b279c328f9fe849fba911616f22936d4d42f717a76dff64b632

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page