Skip to main content

JAI - Trust your data

Project description

Jai SDK - Trust your data

PyPI Latest Release Python Version Documentation Status codecov License Code style: yapf

Installation

The source code is currently hosted on GitHub at: https://github.com/jquant/jai-sdk

Installing jai-sdk using pip:

pip install jai-sdk

Get your Auth Key

First, you'll need and Authorization key to use the backend API.

To get an Trial version API using the sdk, fill the values with your information:

from jai import Jai

r = Jai.get_auth_key(email=EMAIL, firstName=FIRSTNAME, lastName=LASTNAME)

If the response code is 201, then you should be receiving an email with your Auth Key.

Get Started

If you already have an Auth Key, the you can use the sdk:

from jai import Jai
j = Jai(AUTH_KEY)

Setting up your databases

All data should be in pandas.DataFrame or pandas.Series format

Aplication using the NLP FastText model

### fasttext implementation
# save this if you want to work in the same database later
name = 'text_data'

### Insert data and train the FastText model
# data can be a list of texts, pandas Series or DataFrame.
# if data is a list, then the ids will be set with range(len(data_list))
# if data is a pandas type, then the ids will be the index values.
# heads-up: index values must not contain duplicates.
j.setup(name, data, db_type='FastText')

# wait for the training to finish
j.wait_setup(name, 10)

Aplication using the NLP BERT model

### BERT implementation
# generate a random name for identification of the base; it can be a user input
name = j.generate_name(20, prefix='sdk_', suffix='_text')

# this time we choose db_type="Text", applying the pre-trained BERT model
j.setup(name, data, db_type='Text', batch_size=1024)
j.wait_setup(name, 10)

Checking database

Here are some methods to check your databases.

The name of your database should appear in:

>>> j.names
['jai_database', 'jai_unsupervised', 'jai_supervised']

or you can check if a given database name is valid:

>>> j.is_valid(name)
True

You can also check the types for each of your databases with:

>>> j.info
                        db_name       db_type
0                  jai_database          Text
1              jai_unsupervised  Unsupervised
2                jai_supervised    Supervised

If you want to check which ids are in your database:

>>> j.ids(name)
['1000 items from 0 to 999']

Similarity

After you're done setting up your database, you perform similarity searches:

  • Using the indexes of the input data
# Find the 5 most similar values for ids 0 and 1
results = j.similar(name, [0, 1], top_k=5)

# Find the 20 most similar values for every id from [0, 99]
ids = list(range(100))
results = j.similar(name, ids, top_k=20)

# Find the 100 most similar values for every input value
results = j.similar(name, data.index, top_k=100, batch_size=1024)
  • Using new data to be processed All data should be in pandas.DataFrame or pandas.Series format
# Find the 100 most similar values for every new_data
results = j.similar(name, new_data, top_k=100, batch_size=1024)

The output will be a list of dictionaries with ("query_id") being the id of the value you want to find similars and ("results") a list with top_k dictionaries with the "id" and the "distance" between "query_id" and "id".

[
  {
    'query_id': 0,
    'results':
    [
      {'id': 0, 'distance': 0.0},
      {'id': 3836, 'distance': 2.298321008682251},
      {'id': 9193, 'distance': 2.545339584350586},
      {'id': 832, 'distance': 2.5819168090820312},
      {'id': 6162, 'distance': 2.638622283935547},
      ...
    ]
  },
  ...,
  {
    'query_id': 9,
    'results':
    [
      {'id': 9, 'distance': 0.0},
      {'id': 54, 'distance': 5.262974262237549},
      {'id': 101, 'distance': 5.634262561798096},
      ...
    ]
  },
  ...
]

Removing data

After you're done with the model setup, you can delete your raw data

# Delete the raw data inputed as it won't be needed anymore
j.delete_raw_data(name)

If you no longer need the model or anything else related to your database:

j.delete_database(name)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jai-sdk-0.3.0.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

jai_sdk-0.3.0-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file jai-sdk-0.3.0.tar.gz.

File metadata

  • Download URL: jai-sdk-0.3.0.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.10

File hashes

Hashes for jai-sdk-0.3.0.tar.gz
Algorithm Hash digest
SHA256 12d82f7b9679110b382350b7701a06b89aa3a095ca5a3480c04175bfa18a7cea
MD5 e18b3eb7cc7ce224deba57c03d270054
BLAKE2b-256 84ce85e5ba1a36d66b6a7f5f3f6166ea7044e61f0068becce1ccf3aa9e8d76c5

See more details on using hashes here.

File details

Details for the file jai_sdk-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: jai_sdk-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.10

File hashes

Hashes for jai_sdk-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 961bd271bafba5207ba90b1e470816c8449b3fcc2fedd29cfe6b4e5718c88e88
MD5 75daafe57c9f0766afcac72bb0a7d957
BLAKE2b-256 bca79024d5fd696538f582d2c15d5d05757d473027676fc1f227120f8714546f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page