Skip to main content

Utilities for data manipulation including creation of DAGs and tables

Project description

pyg-mongo

pip install from https://pypi.org/project/pyg-mongo/

pyg-mongo introduces three new concepts that help you work with mongodb

  • The q query generating engine, making easy to filter mongo documents
  • The mongo_table makes read/write into mongo of complicated object seemless
  • the mongo_table with pk specified, implements an audited primary-keyed table

The q-query

You can use q to write those complicated Mongo filter dicts:

>>> from pyg_mongo import q
>>> q.age == 3
{"age": {"$eq": 3}}

>>> (q.age > 3) & (q.gender == 'm')
{"$and": [{"age": {"$gt": 3}}, {"gender": {"$eq": "m"}}]}


>>> q(q.age < 10, name = re.compile('ben'), surname = ['smith', 'jones'])
$and:
    {"age": {"$lt": 10}}
    {"name": {"regex": "ben"}}
    {"surname": {"$in": ["smith", "jones"]}}

The mongo_table

mongo_table uses q under the hood, making filtering easy. It also pre-process on both read and write to make:

  • number primitives (such as float32) that cannot be stored in MongoDB are converted to normal primitives
  • objects are jsonified/cast into bytes (pandas) so that the can be stored directly
  • allows the user to pre-save documents to csv/parquet/npy files while the metadata is saved in MongoDB

The post-reading process then makes the whole experience transparent to the user.

>>> table = mongo_table('table','db').delete_many() # create table and drop any existing records
>>> doc = dict(a = np.array([1,2,3]), s = pd.Series([1,2,3]), df = pd.DataFrame(dict(a = [1,2], b = [3,4])))
>>> doc = table.insert_one(doc)
>>> len(table)
1
>>> read_doc = table[0]
>>> read_doc['a']
array([1, 2, 3])

Here is how we save into a directory...

>>> doc = dict(a = np.array([1,2,3]), s = pd.Series([1,2,3]), df = pd.DataFrame(dict(a = [1,2], b = [3,4])), root = 'c:/temp.parquet')
>>> table.insert_one(doc)
>>> assert os.path.isfile('c:/temp/df.parquet') and os.path.isfile('c:/temp/s.parquet') and os.path.isfile('c:/temp/a.npy')
# you can even specify the root to depend on keys in the document...

>>> doc = dict(name = 'james', surname = 'smith', a = np.array([1,2,3]), s = pd.Series([1,2,3]), df = pd.DataFrame(dict(a = [1,2], b = [3,4])), root = 'c:/temp/%name_%surname.parquet')
>>> table.insert_one(doc)
>>> assert os.path.isfile('c:/temp/james_smith/df.parquet')

The mongo_table with primary keys

>>> table = mongo_table(db = 'school', table = 'students', pk = ['year', 'name', 'surname'])
>>> table.reset.delete_many()
>>> table.insert_one(dict(year = 1, name = 'abe', surname = 'abraham', age = 6, weight = 35, height = 1.34, attendance = pd.Series([0,1,0], drange(-2)), version = 1))
>>> assert len(table) == 1 

# since we got the height wrong... let us fix this:
>>> table.insert_one(dict(year = 1, name = 'abe', surname = 'abraham', age = 6, weight = 35, height = 1.35, attendance = pd.Series([0,1,0], drange(-2)), version = 2))
>>> assert len(table) == 1  # primary keys insertion. The old record is marked as deleted and table thinks there is still only one doc

## here come new students
>>> table.insert_one(dict(year = 1, name = 'ben', surname = 'bradshaw', age = 7, weight = 40, height = 1.14, attendance = pd.Series([0,1,0], drange(-2))))
>>> table.insert_one(dict(year = 1, name = 'clive', surname = 'cohen', age = 6.2, weight = 20, height = 1.34, attendance = pd.Series([1,1,0], drange(-2))))
>>> table.insert_one(dict(year = 2, name = 'dana', surname = 'dowe', age = 8.2, weight = 25, height = 1.04, attendance = pd.Series([1,1,1], drange(-2))))

>>> assert len(table) == 4
>>> assert table.year == [1, 2] #distinct
>>> table[::] - 'attendance'

Out[117]: 
dictable[4 x 9]
year|name |surname |pk                         |age|version|_id                     |height|weight
1   |abe  |abraham |['name', 'surname', 'year']|6  |2      |61b019ec1180e336cb2d845a|1.35  |35    
1   |ben  |bradshaw|['name', 'surname', 'year']|7  |None   |61b019ec1180e336cb2d845c|1.14  |40    
1   |clive|cohen   |['name', 'surname', 'year']|6.2|None   |61b019ec1180e336cb2d845d|1.34  |20    
2   |dana |dowe    |['name', 'surname', 'year']|8.2|None   |61b019ec1180e336cb2d845e|1.04  |25    

>>> assert len(table.exc(year = 2)) == 3

There is more to pyg-mongo but this is a good taster.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyg-mongo-0.0.15.tar.gz (24.0 kB view details)

Uploaded Source

Built Distribution

pyg_mongo-0.0.15-py3-none-any.whl (19.6 kB view details)

Uploaded Python 3

File details

Details for the file pyg-mongo-0.0.15.tar.gz.

File metadata

  • Download URL: pyg-mongo-0.0.15.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pyg-mongo-0.0.15.tar.gz
Algorithm Hash digest
SHA256 539bda5d2c2dbfc5caa382db27f728811b2c0f4e5dd5f8ff84bd955d5f654139
MD5 29cc5c71771b3e7d5e31ada08a47427a
BLAKE2b-256 3de43bd742a881c2ef0e9979b1bcb0717f7d13e30280909238ada76db6430bfc

See more details on using hashes here.

File details

Details for the file pyg_mongo-0.0.15-py3-none-any.whl.

File metadata

  • Download URL: pyg_mongo-0.0.15-py3-none-any.whl
  • Upload date:
  • Size: 19.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pyg_mongo-0.0.15-py3-none-any.whl
Algorithm Hash digest
SHA256 66ab2a70452c05817d5d423e8e1895ca51e6c59807172ca7403b00d3874ab535
MD5 46cbae30ce13592ce2e0bf1f10de04c7
BLAKE2b-256 6bbdd373281266adc698f92a4f91a010108bcc369aff58ee2e95f6464066ff5b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page