JAI - Trust your data

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

jai-sdk

jai SDKs

Examples

Instanciating your base class

from jai import Jai
j = Jai(AUTH_KEY)

Setting up your databases

All data should be an pandas.DataFrame or pandas.Series

Aplication using the model NLP FastText

### fasttext implementation
# save this if you wish to work in the same database later
name = 'text_data'

### Data insertion and train the unsupervised FastText model
# data can be a list of texts, pandas Series or DataFrame.
# if data is a list, then ids will be set with range(len(data_list))
# if data is a pandas type, then the ids will be the index values, index must not contain duplicated values
j.setup(name, data, db_type='FastText')

# wait for the train to finish
j.wait_setup(name, 10)

Aplication using the model NLP BERT

### bert implementation
# generate a random name for identification of the base, can be a user input
name = j.generate_name(20, prefix='sdk_', suffix='_text')

# this time we choose db_type="Text", applying the pre-trained BERT model
j.setup(name, data, db_type='Text', batch_size=1024)
j.wait_setup(name, 10)

Checking database

Here are some methods to check your databases:

The name of your database should appear in:

>>> j.names
['jai_database', 'jai_unsupervised', 'jai_supervised']

or you can check if it's valid:

>>> j.is_valid(name)
True

and you can check the databases types for each of your databases with:

>>> j.info
                        db_name       db_type
0                  jai_database          Text
1              jai_unsupervised  Unsupervised
2                jai_supervised    Supervised

if you want to check which ids are in your database:

>>> j.ids(name)
['1000 items from 0 to 999']

Similarity

After you're done with setting up your database, you can find similarity:

Using the indexes of the inputed data

# Find the 5 most similar values for the ids 0 and 1
results = j.similar(name, [0, 1], top_k=5)

# Find the 20 most similar values for every id in 0 to 100
ids = list(range(100))
results = j.similar(name, ids, top_k=20)

# Find the 100 most similar values for every inputed value
results = j.similar(name, data.index, top_k=100, batch_size=1024)

Using new data to be processed All data should be an pandas.DataFrame or pandas.Series

# Find the 100 most similar values for every new_data
results = j.similar(name, new_data, top_k=100, batch_size=1024)

The output will be a list of dictionaries with ("query_id") the id of the value you want to find similars and ("results") a list with top_k dictionaries with the "id" and the "distance" between "query_id" and "id".

[
  {
    'query_id': 0,
    'results':
    [
      {'id': 0, 'distance': 0.0},
      {'id': 3836, 'distance': 2.298321008682251},
      {'id': 9193, 'distance': 2.545339584350586},
      {'id': 832, 'distance': 2.5819168090820312},
      {'id': 6162, 'distance': 2.638622283935547},
      ...
    ]
  },
  ...,
  {
    'query_id': 9,
    'results':
    [
      {'id': 9, 'distance': 0.0},
      {'id': 54, 'distance': 5.262974262237549},
      {'id': 101, 'distance': 5.634262561798096},
      ...
    ]
  },
  ...
]

Removing data

After you're done with the model setup, you can delete the inserted raw data

# Delete the raw data inputed as it won't be needed anymore
j.delete_raw_data(name)

If you want to keep the environment clean

j.delete_database(name)

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.25.0

Oct 5, 2023

0.24.0

Aug 19, 2023

0.23.0

Mar 6, 2023

0.22.4

Jan 17, 2023

0.22.3

Nov 19, 2022

0.22.2

Sep 28, 2022

0.22.1

Sep 25, 2022

0.22.0

Sep 23, 2022

0.21.0

Sep 17, 2022

0.20.1

Jul 14, 2022

0.20.0

Jul 7, 2022

0.19.1

Apr 7, 2022

0.19.0

Mar 25, 2022

0.18.0

Mar 10, 2022

0.17.1

Feb 8, 2022

0.17.0

Dec 7, 2021

0.16.0

Oct 21, 2021

0.15.2

Oct 15, 2021

0.15.1

Oct 5, 2021

0.15.0

Aug 3, 2021

0.14.0

Jul 21, 2021

0.13.2

Jun 23, 2021

0.13.1

Jun 22, 2021

0.13.0

Jun 15, 2021

0.12.0

Jun 1, 2021

0.11.3

May 25, 2021

0.11.2

May 18, 2021

0.11.0

May 18, 2021

0.10.1

May 17, 2021

0.10.0

May 7, 2021

0.9.1

Apr 23, 2021

0.9.0

Apr 19, 2021

0.8.1

Apr 15, 2021

0.8.0

Apr 14, 2021

0.7.0

Apr 7, 2021

0.6.3

Mar 31, 2021

0.6.2

Mar 30, 2021

0.6.1

Mar 29, 2021

0.6.0

Mar 29, 2021

0.5.0

Mar 23, 2021

0.4.1

Mar 18, 2021

0.4.0

Mar 12, 2021

0.3.2

Mar 10, 2021

0.3.1

Mar 8, 2021

0.3.0

Mar 8, 2021

0.2.1

Mar 2, 2021

0.2.0

Mar 2, 2021

0.1.2

Feb 24, 2021

0.1.1

Feb 23, 2021

This version

0.1.0

Feb 19, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jai-sdk-0.1.0.tar.gz (12.5 kB view hashes)

Uploaded Feb 19, 2021 Source

Built Distribution

jai_sdk-0.1.0-py3-none-any.whl (13.8 kB view hashes)

Uploaded Feb 19, 2021 Python 3

Hashes for jai-sdk-0.1.0.tar.gz

Hashes for jai-sdk-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`19b2865b224a5ee30b3bce3d72265c64ea07c43a40d7f635aa2e544ec613d10c`
MD5	`c7ab1be6544f46a9de3d78dd2e0f0c47`
BLAKE2b-256	`db29f57c3adf8bcc7dad8791d36a2924a5219d9b380493c5eec674647bfd3c29`

Hashes for jai_sdk-0.1.0-py3-none-any.whl

Hashes for jai_sdk-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0dd37a910087dd78a55ff0cc8edee8620b6eb9fba6193af8b9d5fc5aa6f0f551`
MD5	`7447bde9cde2e4053d67ef03aff0e486`
BLAKE2b-256	`67ce5c2f390eb807d1efe8dd86e918276ebd2a32a2ad4ec6e3f776816e3daafc`