A library to accelerate ML and ETL pipeline by connecting all data sources

These details have not been verified by PyPI

Project links

Homepage

Project description

DataLigo

This library helps to read and write data from most of the data sources. It accelerate the ML and ETL process without worrying about the multiple data connectors.

Installation

pip install -U dataligo

Install from sources

Alternatively, you can also clone the latest version from the repository and install it directly from the source code:

pip install -e .

Quick tour

>>> from dataligo import Ligo
>>> from transformers import pipeline

>>> ligo = Ligo('./ligo_config.yaml') # Check the sample_ligo_config.yaml for reference
>>> print(ligo.get_supported_data_sources_list())
['s3',
 'gcs',
 'azureblob',
 'bigquery',
 'snowflake',
 'redshift',
 'starrocks',
 'postgresql',
 'mysql',
 'oracle',
 'mssql',
 'mariadb',
 'sqlite',
 'elasticsearch',
 'mongodb',
 'dynamodb',
 'redis']

>>> mongodb = ligo.connect('mongodb')
>>> df = mongodb.read_as_dataframe(database='reviewdb',collection='reviews',return_type='pandas') # Default return_type is pandas
>>> df.head()
        _id	                        Review
0	64272bb06a14f52787e0a09e	good and interesting
1	64272bb06a14f52787e0a09f	This class is very helpful to me. Currently, I...
2	64272bb06a14f52787e0a0a0	like!Prof and TAs are helpful and the discussi...
3	64272bb06a14f52787e0a0a1	Easy to follow and includes a lot basic and im...
4	64272bb06a14f52787e0a0a2	Really nice teacher!I could got the point eazl...

>>> classifier = pipeline("sentiment-analysis")
>>> reviews = df.Review.tolist()
>>> results = classifier(reviews,truncation=True)
>>> for result in results:
>>>     print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
label: POSITIVE, with score: 0.9999
label: POSITIVE, with score: 0.9997
label: POSITIVE, with score: 0.9999
label: POSITIVE, with score: 0.999
label: POSITIVE, with score: 0.9967

>>> df['predicted_label'] = [result['label'] for result in results]
>>> df['predicted_score'] = [round(result['score'], 4) for result in results]

# Write the results to the MongoDB
>>> mongodb.write_dataframe(df,'reviewdb','review_sentiments')

Example DataLigo Pipeline

ETL Pipeline

dataligo ETL pipeline diagram

ML Pipeline

dataligo ML pipeline diagram

Supported Connectors

Data Sources	Type	pandas	polars	dask
S3	datalake	[x] read [x] write	[x] read [x] write	[ ] read [ ] write
GCS	datalake	[x] read [x] write	[x] read [x] write	[ ] read [ ] write
Azure Blob Storage	datalake	[x] read [x] write	[x] read [x] write	[ ] read [ ] write
Snowflake	datawarehouse	[x] read [x] write	[x] read [x] write	[ ] read [ ] write
BigQuery	datawarehouse	[x] read [x] write	[x] read [x] write	[x] read [ ] write
StarRocks	datawarehouse	[x] read [x] write	[x] read [x] write	[x] read [ ] write
Redshift	datawarehouse	[x] read [x] write	[x] read [x] write	[x] read [ ] write
PostgreSQL	database	[x] read [x] write	[x] read [x] write	[x] read [ ] write
MySQL	database	[x] read [x] write	[x] read [x] write	[x] read [ ] write
MariaDB	database	[x] read [x] write	[x] read [x] write	[x] read [ ] write
MsSQL	database	[x] read [x] write	[x] read [x] write	[x] read [ ] write
Oracle	database	[x] read [x] write	[x] read [x] write	[x] read [ ] write
SQLite	database	[x] read [x] write	[x] read [x] write	[x] read [ ] write
MongoDB	nosql	[x] read [x] write	[x] read [x] write	[ ] read [ ] write
ElasticSearch	nosql	[x] read [x] write	[x] read [x] write	[ ] read [ ] write
DynamoDB	nosql	[x] read [x] write	[x] read [x] write	[ ] read [ ] write
Redis(beta)	nosql	[x] read [ ] write	[ ] read [ ] write	[ ] read [ ] write

Acknowledgement

Some functionalities of DataLigo are inspired by the following packages.

ConnectorX

DataLigo used Connectorx to read data from most of the RDBMS databases to utilize the performance benefits and inspired the return_type parameter from it
dynamo-pandas

DataLigo used dynamo-pandas to read and write data from DynamoDB

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.7.3

May 2, 2023

0.7.2

Apr 27, 2023

0.7.1

Apr 27, 2023

0.7.0

Apr 23, 2023

0.6.1

Apr 15, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataligo-0.7.3.tar.gz (25.2 kB view details)

Uploaded May 2, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dataligo-0.7.3-py3-none-any.whl (27.3 kB view details)

Uploaded May 2, 2023 Python 3

File details

Details for the file dataligo-0.7.3.tar.gz.

File metadata

Download URL: dataligo-0.7.3.tar.gz
Upload date: May 2, 2023
Size: 25.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.7.12

File hashes

Hashes for dataligo-0.7.3.tar.gz
Algorithm	Hash digest
SHA256	`2cae8c9cf4afb8b7e55ab45d219a0662f1fed55f8404bd5aed298c06f2fad547`
MD5	`8c120160da610d0a5285447f27a6c8b2`
BLAKE2b-256	`497bafc9dea7103b57bf78c65fd331716f6a8af123295fa1a4d9acd1a248c4fb`

See more details on using hashes here.

File details

Details for the file dataligo-0.7.3-py3-none-any.whl.

File metadata

Download URL: dataligo-0.7.3-py3-none-any.whl
Upload date: May 2, 2023
Size: 27.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.7.12

File hashes

Hashes for dataligo-0.7.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fbaf9fa1c34687d8b1e7f8203c8104b52165b41cd52400795c75005c8a621c36`
MD5	`1c44734756c6a8a51ac124c7d824d5d4`
BLAKE2b-256	`451431bf6f4df03e715628409eb5097c4a24f624bbaf754cd024668e0af65260`

See more details on using hashes here.

dataligo 0.7.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DataLigo

Installation

Quick tour

Example DataLigo Pipeline

ETL Pipeline

ML Pipeline

Supported Connectors

Acknowledgement

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes