Skip to main content

Python SDK for reading and writing signals to Clarify.

Project description

Clarify logo

PyClarify

PyPI package version number Actions Status License Code style: black


PyClarify helps users of Clarify to easily read, write and manipulate data in Clarify.

  • Data scientists can easily filter data, convert it to pandas with our built in methods, and write results back.
  • System integrators can set up pipelines for automatic streaming of data, and update labels on the fly.

Useful tutorials and documentation

Prerequisites

In order to start using the Python SDK, you need

Where to get it

The source code is currently hosted on GitHub at: https://github.com/clarify/pyclarify

Binary installers for the latest released version are available at the Python Package Index (PyPI).

# PyPI install
pip install pandas

Dependencies

Interact with Clarify

PyClarify provides a fast and easy way to interact with Clarify. The ClarifyClient class takes as an argument the path of your credentials in string format, which should always be the first step when starting to interact with PyClarify.

For information about the Clarify Developer documentation click here.

Quickstart

We recommend using Google Colab to quickly learn how to interact with Clarify using Python. We have created an interactive introduction tutorial where you will learn all the basics to get you started.

Open In Colab

Access you data with the ClarifyClient

from pyclarify import ClarifyClient

client = ClarifyClient("clarify-credentials.json")

Create new Signals

from pyclarify import Signal

signal = Signal(
    name = "Home temperature",
    description = "Temperature in the bedroom",
    labels = {"data-source": ["Raspberry Pi"], "location": ["Home"]}
)

response = client.save_signals(
    input_ids=["<INPUT_ID>"],
    signals=[signal],
    create_only=False
)

Populate your signals using DataFrames

from pyclarify import DataFrame

data = DataFrame(
    series={"<INPUT_ID_1>": [1, None], "<INPUT_ID_2>": [None, 5]},
    times = ["2021-11-01T21:50:06Z",  "2021-11-02T21:50:06Z"],
)

response = client.insert(data)

Query your stored signals

response = client.select_signals(
    skip=10,
    limit=50,
    sort=["-id"]
)

Publish them as Items

from pyclarify import Item

client = ClarifyClient("./clarify-credentials.json")

item = Item(
    name = "Home temperature",
    description = "Temperature in the bedroom",
    labels = {"data-source": ["Raspberry Pi"], "location": ["Home"]},
    visible=True
)
response = client.publish_signals(
    signal_ids=['<SIGNAL_ID>'],
    items=[item],
    create_only=False
)

Use filters to get a specific selection

from pyclarify.query import Filter, Regex

only_raspberries = Filter(
    fields={
        "labels.unit-type": Regex(value="Raspberry")
    }
)

response = client.select_items(
    filter=only_raspberries
)

Get the data and include relationships

response = client.select_dataframe(
    filter=only_raspberries,
    include=["item"]
)

Use help to get more insight

$ help(client.select_dataframe)
>>> select_dataframe(
        filter: Optional[pyclarify.query.filter.Filter] = None,
        sort: List[str] = [],
        limit: int = 20,
        skip: int = 0,
        total: bool = False,
        gte: Union[datetime.datetime, str] = None,
        lt: Union[datetime.datetime, str] = None,
        last: int = -1,
        rollup: Union[str, datetime.timedelta] = None,
        include: List[str] = []
    ) -> pyclarify.views.generics.Response method of pyclarify.client.ClarifyClient instance

    Return dataframe for items.

    Time selection:
    - Maximum window size is 40 days (40 * 24 hours) when rollup is null or less than PT1M (1 minute).
    - Maximum window size is 400 days (400 * 24 hours) whenrollup is greater than or equal to PT1M (1 minute).
    - No maximum window size if rollup is window.

    Parameters
    ----------
    filter: Filter, optional
        A Filter Model that describes a mongodb filter to be applied.

    sort: list of strings
        List of strings describing the order in which to sort the items in the response.

    limit: int, default 20
        The maximum number of resources to select. Negative numbers means no limit, which may or may not be allowed.

    skip: int default: 0
        Skip the first N matches. A negative skip is treated as 0.

    total: bool default: False
        When true, force the inclusion of a total count in the response. A total count is the total number of resources that matches filter.

    gte: string(RFC 3339 timestamp) or python datetime, optional, default <now - 7 days>
        An RFC3339 time describing the inclusive start of the window.

    lt: string(RFC 3339 timestamp) or python datetime, optional, default <now + 7 days>
        An RFC3339 time describing the exclusive end of the window.

    last: int, default -1
        If above 0, select last N timestamps per series. The selection happens after the rollup aggregation.

    rollup: timedelta or string(RFC 3339 duration) or "window", default None
        If RFC 3339 duration is specified, roll-up the values into either the full time window
        (`gte` -> `lt`) or evenly sized buckets.

    include: List of strings, optional
        A list of strings specifying which relationships to be included in the response.

    Example
    -------

        >>> client.select_dataframe(
        >>>     filter = query.Filter(fields={"name": query.NotEqual(value="Air Temperature")}),
        >>>     sort = ["-id"],
        >>>     limit = 5,
        >>>     skip = 3,
        >>>     total = False,
        >>>     gte="2022-01-01T01:01:01Z",
        >>>     lt="2022-01-09T01:01:01Z",
        >>>     rollup="PT24H",

    Returns
    -------
    Response
        In case of a valid return value, returns a pydantic model with the following format:

            >>> jsonrpc = '2.0'
            >>> id = '1'
            >>> result = SelectDataFrameResponse(
            >>>    meta={
            >>>        'total': -1,
            >>>        'groupIncludedByType': True
            >>>    },
            >>>    data=DataFrame(
            >>>        times=[datetime.datetime(2022, 7, 12, 12, 0, tzinfo=datetime.timezone.utc),..],
            >>>        series={
            >>>            'c5ep6ojsbu8cohpih9bg': [0.18616, 0.18574000000000002, ...,],
            >>>            ...
            >>>        }
            >>>     )
            >>>     included=None
            >>> ),
            >>> error = None

        In case of the error the method return a pydantic model with the following format:

            >>> jsonrpc = '2.0'
            >>> id = '1'
            >>> result = None
            >>> error = Error(
            >>>         code = '-32602',
            >>>         message = 'Invalid params',
            >>>         data = ErrorData(trace = <trace_id>, params = {})
            >>> )

Changelog

Wondering about upcoming or previous changes to the SDK? Take a look at the CHANGELOG.

Contributing

Want to contribute? Check out CONTRIBUTING.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyclarify-0.4.0a1.tar.gz (32.0 kB view hashes)

Uploaded Source

Built Distribution

pyclarify-0.4.0a1-py3-none-any.whl (46.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page