Skip to main content

Utilities for data manipulation including creation of DAGs and tables

Project description

pyg-sql

pyg-sql creates sql_table, a think wrapper on sql-alchemy (sa.Table), providing three different functionailities:

- simplified create/filter/sort/access of a table
- maintainance of a table where records are unique per specified primary keys. Auto-archive of old data
- creation of a full no-sql like document-store

Simplification is achieved since: - sql_table holds both the Table and the engine objects so we can merge these operations into one call. - In addition, we maintain the 'statement' so that a final selction statement can be built gradually rather than all at once

Primary Keys are applied if primary keys (pk) are specified, sql_table also ensure we always have at most one entry of that key. We create a parallel deleted_database.table to auto-archive

Document store: Simple choose doc = True, the table will then be viewed as a document-store.

:Example: table creation
------------------------
>>> from pyg_base import * 
>>> from pyg_sql import * 
>>> import datetime

>>> t = get_sql_table(db = 'test', table = 'students', non_null = ['name', 'surname'], 
                      _id = dict(_id = int, created = datetime.datetime), 
                      nullable =  dict(doc = str, details = str, dob = datetime.date, age = int, grade = float))
>>> t = t.delete()
>>> assert len(t) == 0; assert t.count() == 0


:Example: table insertion
-------------------------
>>> t = t.insert(name = 'yoav', surname = 'git', age = 48)
>>> t = t.insert(name = 'anna', surname = 'git', age = 37)
>>> assert len(t) == 2
>>> t = t.insert(name = ['ayala', 'itamar', 'opher'], surname = 'gate', age = [17, 11, 16])
>>> assert len(t) == 5

:Example: simple access
-----------------------
>>> assert t.sort('age')[0].name == 'itamar'                                                     # youngest
>>> assert t.sort('age')[-1].name == 'yoav'                                                      # access of last record
>>> assert t.sort(dict(age=-1))[0].name == 'yoav'                                                # sort in descending order
>>> assert t.sort('name')[::].name == ['anna', 'ayala', 'itamar', 'opher', 'yoav']
>>> assert t.sort('name')[['name', 'surname']][::].shape == (5, 2)                              ## access of specific column(s)
>>> assert t.distinct('surname') == ['gate', 'git']
>>> assert t['surname'] == ['gate', 'git']
>>> assert t[dict(name = 'yoav')] == t.inc(name = 'yoav')[0]


:Example: simple filtering
--------------------------
>>> assert len(t.inc(surname = 'gate')) == 3
>>> assert len(t.inc(surname = 'gate').inc(name = 'ayala')) == 1    # you can build filter in stages
>>> assert len(t.inc(surname = 'gate', name = 'ayala')) == 1        # or build in one step
>>> assert len(t.inc(surname = 'gate').exc(name = 'ayala')) == 2

>>> assert len(t > dict(age = 30)) == 2
>>> assert len(t <= dict(age = 37)) == 4
>>> assert len(t.inc(t.c.age > 30)) == 2  # can filter using the standard sql-alchemy .c.column objects
>>> assert len(t.where(t.c.age > 30)) == 2  # can filter using the standard sql-alchemy "where" statement 

:Example: primary keys and deleted records
------------------------------------------
The table as set up can have multiple items so:

>>> t = t.delete()
>>> t = t.insert(name = 'yoav', surname = 'git', age = 46)
>>> t = t.insert(name = 'yoav', surname = 'git', age = 47)
>>> t = t.insert(name = 'yoav', surname = 'git', age = 48)
>>> assert len(t) == 3

>>> t = t.delete() 
>>> t = get_sql_table(db = 'test', table = 'students', non_null = ['name', 'surname'], 
                      _id = dict(_id = int, created = datetime.datetime), 
                      nullable =  dict(doc = str, details = str, dob = datetime.date, age = int, grade = float), 
                      pk = ['name', 'surname'])         ## <<<------- We set primary keys

>>> t = t.delete()
>>> t = t.insert(name = 'yoav', surname = 'git', age = 46)
>>> t = t.insert(name = 'yoav', surname = 'git', age = 47)
>>> t = t.insert(name = 'yoav', surname = 'git', age = 48)
>>> assert len(t) == 1
>>> assert t[0].age == 48

Where did the data go to? We automatically archive the deleted old records for dict(name = 'yoav', surname = 'git') here:

>>> t.deleted 

t.deleted is a table by same name
- exists on deleted_test database, 
- same table structure with added 'deleted' column

>>> assert len(t.deleted.inc(name = 'yoav', age = 46)) > 0
>>> t.deleted.delete() 

:Example: doc management
------------------------
We now suppose that we are not sure what records we want to keep for each student

>>> from pyg import *
>>> import datetime
>>> t = get_sql_table(db = 'test', table = 'unstructured_students', non_null = ['name', 'surname'], 
                      _id = dict(_id = int, created = datetime.datetime), 
                      nullable =  dict(doc = str, details = str, dob = datetime.date, age = int, grade = float), 
                      pk = ['name', 'surname'],
                      doc = True)   ##<---- The table will actually be a document store

We are now able to keep varied structure per each record. We are only able to filter against the columns specified above

>>> t = t.delete()

>>> doc = dict(name = 'yoav', surname = 'git', age = 30, profession = 'coder', children = ['ayala', 'opher', 'itamar'])
>>> inserted_doc = t.insert_one(doc)
>>> assert t.inc(name = 'yoav', surname = 'git')[0].children == ['ayala', 'opher', 'itamar']

>>> doc2 = dict(name = 'anna', surname = 'git', age = 28, employer = 'Cambridge University', hobbies = ['chess', 'music', 'swimming'])
>>> _ = t.insert_one(doc2)
>>> assert t[dict(age = 28)].hobbies == ['chess', 'music', 'swimming']  # Note that we can filter or search easily using the column 'age' that was specified in table. We cannot do this on 'employer'

:Example: document store containing pd.DataFrames.
----------
>>> from pyg import *
>>> doc = dict(name = 'yoav', surname = 'git', age = 35, 
               salary = pd.Series([100,200,300], drange(2)),
               costs = pd.DataFrame(dict(transport = [0,1,2], food = [4,5,6], education = [10,20,30]), drange(2)))

>>> t = get_sql_table(db = 'test', table = 'unstructured_students', non_null = ['name', 'surname'], 
                      _id = dict(_id = int, created = datetime.datetime), 
                      nullable =  dict(doc = str, details = str, dob = datetime.date, age = int, grade = float), 
                      pk = ['name', 'surname'],
                      writer = 'c:/temp/%name/%surname.parquet', ##<---- The location where pd.DataFrame/Series are to be stored
                      doc = True)   

>>> inserted = t.insert_one(doc)
>>> import os
>>> assert 'costs.parquet' in os.listdir('c:/temp/yoav/git') and ('salary.parquet' in os.listdir('c:/temp/yoav/git'))

We can now access the data seemlessly:

>>> read_from_db = t.inc(name = 'yoav')[0]     
>>> read_from_file = pd_read_parquet('c:/temp/yoav/git/salary.parquet')
>>> assert list(read_from_db.salary.values) == [100, 200, 300]
>>> assert list(read_from_file.values) == [100, 200, 300]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyg-sql-0.0.1.tar.gz (27.9 kB view hashes)

Uploaded Source

Built Distribution

pyg_sql-0.0.1-py3-none-any.whl (26.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page