This is the API client of Open Innovation Platform
Project description
API Client
Welcome to the documentation for the Open Innovation API Client! This guide provides an overview of the API client library, its installation, usage, and available methods. To get started, install the library using pip. Once installed, import the library into your Python project. Initialize the API client by providing the API server's hostname and an optional access token. The documentation also covers the schemas used by the API client, which define data structure, Understanding these schemas helps construct valid queries and interact effectively with the API.
Installation and Setup
To install the Open Innovation API Client, follow these steps:
-
Install the required dependencies by running the following command:
pip install oip-core-client
-
Import the library into your Python project:
from oip_core_client.main import APIClient
Initialization
To initialize the API Client, use the following code:
client = APIClient(api_host, access_token=None)
Parameters
api_host
(str, required): The hostname of the API server.access_token
(str, optional): Your API authentication token. If not provided, the$APICLIENT_TOKEN
environment variable will be used.
To generate an access token, you can use the get_access_token
function provided in the library. This function allows you to obtain an access token by providing the API server's hostname, your username, and password. The function returns the generated access token as a string.
Here's an example of how to generate an access token using the get_access_token
function:
from oi_platform_client.lib import get_access_token
api_host = "api_host"
username = "your_username"
password = "your_password"
access_token = get_access_token(api_host, username, password)
Methods
get_dataframe
get_dataframe(query, entity: Optional[str] = None, dataset_id: Optional[str] = None, clear_cache: bool = False)
Retrieves data based on a given query and returns it as a Pandas DataFrame. The function has the following parameters:
Parameters
query
(Query): The query object specifying the data retrieval parameters. Thequery
consists of three main components: filter, meta, and neighbors. More details are defined in the schema section.filter
(QueryFilter): The query filter specifies the conditions used to filter the data. It is a list of filters that define the criteria for selecting specific data records. Each filter contains information such as the field/column to filter (col
), the operator to apply (op
), the value to compare (value
).meta
(QueryMeta): The meta filter provides additional metadata and options for the query. It includes parameters such as the logical operator for combining multiple filters (logical_op
), pagination options (page_num
andpage_size
), sample size (sample_size
), sorting options (sort
), down-sampling options (downsampling
), geo-spatial down-sampling options (gdownsampling
), time range (time_min
andtime_max
), depth range (depth_min
anddepth_max
), specific columns to retrieve (cols
), cumulative data flag (cumulative
), formulas to apply to the data (formulas
), and specific entities to retrieve (entities
).neighbors
(List[str]): The neighbors parameter specifies the relationships or connections between entities. It is a list of entity names that are related to the target entity. This parameter is used to retrieve data from connected entities, such as parent entities, child entities, or related entities.
entity
(Optional[str]): The name of the entity for which the data is being requested. Eitherentity
ordataset_id
should be provided.dataset_id
(Optional[str]): The ID of the dataset for which the data is being requested. Eitherentity
ordataset_id
should be provided.clear_cache
(bool, optional): Flag indicating whether to clear the cache before retrieving the data. Default isFalse
.
Returns
pandas.DataFrame
: The retrieved data as a Pandas DataFrame.
Raises
ValueError
: If both 'entity' and 'dataset_id' are provided or if neither 'entity' nor 'dataset_id' are provided.ValueError
: If the dataset or entity does not exist.ValueError
: If no data is found and the DataFrame is empty.
Example 1
from oi_platform_client.schema import Query, QueryFilter, Filter, QueryMeta
# Define the query filter
filter1: Filter = {"col": "col1", "op": "=", "value": 1}
filter2: Filter = {"col": "col2", "op": ">", "value": 0.5}
query_filter: QueryFilter = [filter1, filter2]
# Define the query meta
query_meta: QueryMeta = {
"logical_op": "and",
"page_num": 1,
"page_size": 10,
}
# Define the neighbors
neighbors = ["entity1", "entity2"]
# Create the query object
query: Query = {"filter": query_filter, "meta": query_meta, "neighbors": neighbors}
# Retrieve the data as a DataFrame
data_frame = client.get_dataframe(query=query, entity="example_entity", clear_cache=True)
Example 2
from oi_platform_client.schema import Query, QueryFilter, Filter, QueryMeta
# Define the query filter
filter1: Filter = {"col": "col1", "op": "=", "value": 1}
filter2: Filter = {"col": "col2", "op": ">", "value": 0.5}
query_filter: QueryFilter = [filter1, filter2]
# Define the query meta
query_meta: QueryMeta = {
"logical_op": "and",
"page_num": 1,
}
# Create the query object
query: Query = {"filter": query_filter, "meta": query_meta}
# Define the dataset ID
dataset_id = "7bb9fb49-3b4e-45df-9c72-1beab18054e0"
# Retrieve the data as a DataFrame
data_frame = client.get_dataframe(query=query, dataset_id=dataset_id, clear_cache=True)
commit_dataset
commit_dataset(df, dataset_id=None, dataset_name=None, dataset_category='tabular')
Commit a Pandas DataFrame as a dataset.
Parameters
df
(pandas.DataFrame): The DataFrame to be committed as a dataset.dataset_id
(str, optional): The ID of the dataset to be updated. If not provided, a new dataset will be created.dataset_name
(str, optional): The name of the dataset. If not provided anddataset_id
is provided, it will keep the old name.dataset_category
(str, optional): The category of the dataset. Available categories: 'tabular', 'time-series', 'depth-series'. Default is 'tabular'.- The DataFrame 'df' must have a column called 'time' for the 'time-series' category.
- The DataFrame 'df' must have a column called 'depth_time' for the 'depth-series' category.
Returns
str
: The ID of the committed dataset.
Raises
ValueError
: If the DataFramedf
is empty orNone
.ValueError
: If bothdataset_id
anddataset_name
are missing.ValueError
: Ifdataset_category
is not one of the available categories.ValueError
: Ifdataset_category
is 'time-series' and the DataFramedf
doesn't have a 'time' column.ValueError
: Ifdataset_category
is 'depth-series' and the DataFramedf
doesn't have a 'depth_time' column.
Example
data = {
'Name': ['John', 'Alice', 'Bob', 'Emily'],
'Age': [25, 32, 28, 35],
'City': ['New York', 'London', 'Paris', 'Sydney']
}
df = pd.DataFrame(data)
# Commit the DataFrame as a dataset
dataset_id = client.commit_dataset(
dataset_name="dataset_name",
dataset_category="tabular",
df=df
)
Schemas
The API Client utilizes several schemas to define the structure of the data and parameters used in the API. Understanding these schemas is essential for constructing valid queries and interacting with the API effectively.
Query Object Schema
The query object represents the parameters for data retrieval. It has the following components:
class Query(TypedDict, total=False):
filter: QueryFilter # query filter
meta: QueryMeta # meta filter
neighbors: List[str] # neighbors of the target entity
filter
The query filter specifies the conditions used to filter the data. It is a list of filters defined by the QueryFilter
class.
class QueryFilter(TypedDict, total=False):
col: str
op: str
value: FilterValue
col
: The column/field being filtered.op
: The operator to apply, such as "=", ">", "contains", etc.value
: The value to compare against.
meta
The meta filter provides additional metadata and options for the query. It includes various optional attributes:
class QueryMeta(TypedDict, total=False):
logical_op: str # and/or
page_num: Optional[int]
page_size: Optional[int]
sample_size: Optional[int]
sort: Optional[FilterMetaSort]
downsampling: Optional[FilterMetaDownsampling]
gdownsampling: Optional[FilterMetaGDownsampling]
time_min: Optional[Union[datetime, str]]
time_max: Optional[Union[datetime, str]]
depth_min: Optional[float]
depth_max: Optional[float]
cols: Optional[List[str]]
cumulative: Optional[bool]
formulas: Optional[List[str]]
entities: Optional[List[str]]
class FilterMetaSort(TypedDict):
order_by: List[str]
order: List[int]
class FilterMetaDownsampling(TypedDict, total=False):
interval: Optional[str]
nb_pts: Optional[int]
agg_op: Optional[str]
grp_by: Optional[str]
grp_by_pn: Optional[int]
grp_by_ps: Optional[int]
class FilterMetaGDownsampling(TypedDict, total=False):
ncells: Optional[int]
bounds: Optional[Union[List[float], Tuple[float, float, float, float]]]
logical_op
(str): The logical operator to combine multiple filters ("and" or "or").page_num
(int): The page number of the query results.page_size
(int): The number of results to be returned per page.sample_size
(int): The number of samples to be returned in the query results.sort
(FilterMetaSort
): Represents the sort metadata for the query.order_by
: A list of strings representing the names of the fields to sort by.order
: A list of integers specifying the order direction for each column. (+1) represents ascending order, (-1) represents descending order.
downsampling
(FilterMetaDownsampling
): Represents the time-based down-sampling metadata for the query.interval
: A string representing the time interval for down-sampling. It uses a concise representation, such as "3w" for 3 weeks, "2h" for 2 hours, etc.nb_pts
: An integer representing the number of points to return (if specified). Ifnb_pts
is provided, there's no need forinterval
as it will be calculated based on the value ofnb_pts
.agg_op
: A string representing the down-sampling aggregation operator (e.g., min, max, avg, sum, count).grp_by
: Optional grouping by a specific field.grp_by_pn
: Optional grouping by a specific field and specifying the number of points.grp_by_ps
: Optional grouping by a specific field and specifying the page size.
gdownsampling
(FilterMetaGDownsampling
): Represents the geo-spatial down-sampling metadata for the query.ncells
: An integer representing the number of cells to be considered in the geo-spatial down-sampling process.bounds
: A list or tuple of floats defining the bounds of the area to be considered in the geo-spatial down-sampling process (longitude min, longitude max, latitude min, latitude max).
time_min
: The minimum time to be considered in the query (datetime or string).time_max
: The maximum time to be considered in the query (datetime or string).depth_min
: The minimum depth to be considered in the query (float).depth_max
: The maximum depth to be considered in the query (float).cols
: A list of strings representing the names of the columns to be returned in the query results.
neighbors
A list of strings representing the neighbors of the target entity.
The query object provides a flexible way to define filters, metadata, and neighbors for retrieving data from the API.
Filter Data Types and Accepted Operators
The Filter object within the query filter allows you to specify different data types for filtering. The accepted operators op vary depending on the data type. the table illustrates the accepted types for each operator.
Operator | Compatible Column Types | Filter Value Type | Example of Filter |
---|---|---|---|
"=" | number, string, boolean, time | Same as column type | python filters = [{"col": "name", "op": "=", "value": "jhon"}] Find documents where the "name" column is equal to "jhon". Type of the column is number. |
">" | number, time | Same as column type | filters = [{"col": "age", "op": "=", "value": 20}] Find documents where the "age" column is equal to 20. Type of the column is number. |
">=" | number, time | Same as column type | filters = [{"col": "star_date", "op": ">=", "value": "2012-01-01T00:00:00"}] Retrieve documents where the "star_date" column is greater than or equal to January 1, 2012, at 00:00:00. Type of the column is time. |
"<" | number, time | Same as column type | filters = [{"col": "height", "op": "<", "value": 189}] Find documents where the "height" column is less than 189. Type of the column is number. |
"<=" | number, time | Same as column type | filters = [{"col": "height", "op": "<=", "value": 167}] Type of the column is number. |
"!=" | number, string, boolean, time | Same as column type | filters = [{"col": "availability", "op": "!=", "value": False}] Retrieve documents where the "availability" column is not equal to False. Type of the column is boolean. |
"IN" | number, string, boolean | List of values having the same as column type | filters = [{"col": "grade", "op": "IN", "value": [15, 13, 12]}] Retrieve documents where the "grade" column has a value that matches any of the values in the provided list [15, 13, 12]. Type of the column is number. |
"NOT IN" | number, string | List of values having the same as column type | filters = [{"col": "countries", "op": "NOT IN", "value": ["Russia", "China"]}] Retrieve documents where the "countries" column does not have a value that matches any of the values in the provided list ["Russia", "China"]. Type of the column is number. |
"contains" | string | Same as column type | filters = [{"col": "name", "op": "contains", "value": "Mc"}] Retrieve documents where the "name" column contains the substring "Mc". Type of the column is number. |
"lcontains" | list_string, list_number | Number if the column type is list_number String if the column type is list_string |
filters = [{"col": "options", "op": "lcontains", "value": "computer science"}] Retrieve documents where the "options" column contains the exact string "computer science" as one of its elements. Type of the column is list of string. |
"dcontains" | dict | String | filters = [{"col": "set_up", "op": "dcontains", "value": "pc=macbook"}] Retrieve documents where the "set_up" column is a dictionary and it contains the key "pc" with the value "macbook". Type of the column is dict. |
"stext" | string | String | filters = [{"col": "$text", "op": "stext", "value": "jhon"}] Search for the value "jhon" across all fields in the documents that have the type string. Type of the column is string. |
"gwithin" | geo_point | Polygon | filters = [{"col": "cities", "op": "gwithin", "value": [[32.3, 45.9], [2.3, 5.9], [6.3, 55.9], [39.3, 66.9]]}] The "gwithin" operation represents a spatial query for points within a polygon. The "value" is a list of coordinate points forming a polygon. The filter is looking for documents where the geo point in the "cities" column falls within the specified polygon defined by the provided coordinate points. Type of the column is geo_point. |
"gnear" | geo_point | Geo_point, max distance, min distance | filters = [{"col": "cities", "op": "gnear", "value": [34.5, 55.4, 22.4, 11.4]}] The "gnear" operation represents a spatial query for points near a specific location within a distance range. The "value" list contains the latitude (value[0]) and longitude (value[1]) coordinates of the reference point, followed by the maximum distance (value[2]) and minimum distance (value[4]) in kilometers. Note: we can provide only the maximum distance; in this case, the distance range will be from the reference point to the maximum distance. Type of the column is geo_point. |
"null" | any | filters = [{"col": "total", "op": "null"}] Retrieve all documents where the "total" column is null, meaning it does not have a value assigned to it. Type of the column is any of the types in the platform. |
|
"not_null" | any | filters = [{"col": "total", "op": "not_null"}] Retrieve all documents where the "total" column is not null, meaning it has a value assigned to it. Type of the column is any of the types in the platform. |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file oip_core_client-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: oip_core_client-0.0.1-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 029b356f3cab263d11ffef653c91f00ec9356a41fb4782cc9394f22fd3152137 |
|
MD5 | 06359852e41f0ad089c1741640524afb |
|
BLAKE2b-256 | 4ffa11f19b30aae02eb4f3df0c2065238e1c9c0c8a506663ff0c4513dd81c13b |