Utilities for data manipulation including creation of DAGs and tables
Project description
pyg-mongo
pip install from https://pypi.org/project/pyg-mongo/
pyg-mongo introduces three new concepts that help you work with mongodb
- The q query generating engine, making easy to filter mongo documents
- The mongo_table makes read/write into mongo of complicated object seemless
- the mongo_table with pk specified, implements an audited primary-keyed table
The q-query
You can use q to write those complicated Mongo filter dicts:
>>> from pyg_mongo import q
>>> q.age == 3
{"age": {"$eq": 3}}
>>> (q.age > 3) & (q.gender == 'm')
{"$and": [{"age": {"$gt": 3}}, {"gender": {"$eq": "m"}}]}
>>> q(q.age < 10, name = re.compile('ben'), surname = ['smith', 'jones'])
$and:
{"age": {"$lt": 10}}
{"name": {"regex": "ben"}}
{"surname": {"$in": ["smith", "jones"]}}
The mongo_table
mongo_table uses q under the hood, making filtering easy. It also pre-process on both read and write to make:
- number primitives (such as float32) that cannot be stored in MongoDB are converted to normal primitives
- objects are jsonified/cast into bytes (pandas) so that the can be stored directly
- allows the user to pre-save documents to csv/parquet/npy files while the metadata is saved in MongoDB
The post-reading process then makes the whole experience transparent to the user.
>>> table = mongo_table('table','db').delete_many() # create table and drop any existing records
>>> doc = dict(a = np.array([1,2,3]), s = pd.Series([1,2,3]), df = pd.DataFrame(dict(a = [1,2], b = [3,4])))
>>> doc = table.insert_one(doc)
>>> len(table)
1
>>> read_doc = table[0]
>>> read_doc['a']
array([1, 2, 3])
Here is how we save into a directory...
>>> doc = dict(a = np.array([1,2,3]), s = pd.Series([1,2,3]), df = pd.DataFrame(dict(a = [1,2], b = [3,4])), root = 'c:/temp.parquet')
>>> table.insert_one(doc)
>>> assert os.path.isfile('c:/temp/df.parquet') and os.path.isfile('c:/temp/s.parquet') and os.path.isfile('c:/temp/a.npy')
# you can even specify the root to depend on keys in the document...
>>> doc = dict(name = 'james', surname = 'smith', a = np.array([1,2,3]), s = pd.Series([1,2,3]), df = pd.DataFrame(dict(a = [1,2], b = [3,4])), root = 'c:/temp/%name_%surname.parquet')
>>> table.insert_one(doc)
>>> assert os.path.isfile('c:/temp/james_smith/df.parquet')
The mongo_table with primary keys
>>> table = mongo_table(db = 'school', table = 'students', pk = ['year', 'name', 'surname'])
>>> table.reset.delete_many()
>>> table.insert_one(dict(year = 1, name = 'abe', surname = 'abraham', age = 6, weight = 35, height = 1.34, attendance = pd.Series([0,1,0], drange(-2)), version = 1))
>>> assert len(table) == 1
# since we got the height wrong... let us fix this:
>>> table.insert_one(dict(year = 1, name = 'abe', surname = 'abraham', age = 6, weight = 35, height = 1.35, attendance = pd.Series([0,1,0], drange(-2)), version = 2))
>>> assert len(table) == 1 # primary keys insertion. The old record is marked as deleted and table thinks there is still only one doc
## here come new students
>>> table.insert_one(dict(year = 1, name = 'ben', surname = 'bradshaw', age = 7, weight = 40, height = 1.14, attendance = pd.Series([0,1,0], drange(-2))))
>>> table.insert_one(dict(year = 1, name = 'clive', surname = 'cohen', age = 6.2, weight = 20, height = 1.34, attendance = pd.Series([1,1,0], drange(-2))))
>>> table.insert_one(dict(year = 2, name = 'dana', surname = 'dowe', age = 8.2, weight = 25, height = 1.04, attendance = pd.Series([1,1,1], drange(-2))))
>>> assert len(table) == 4
>>> assert table.year == [1, 2] #distinct
>>> table[::] - 'attendance'
Out[117]:
dictable[4 x 9]
year|name |surname |pk |age|version|_id |height|weight
1 |abe |abraham |['name', 'surname', 'year']|6 |2 |61b019ec1180e336cb2d845a|1.35 |35
1 |ben |bradshaw|['name', 'surname', 'year']|7 |None |61b019ec1180e336cb2d845c|1.14 |40
1 |clive|cohen |['name', 'surname', 'year']|6.2|None |61b019ec1180e336cb2d845d|1.34 |20
2 |dana |dowe |['name', 'surname', 'year']|8.2|None |61b019ec1180e336cb2d845e|1.04 |25
>>> assert len(table.exc(year = 2)) == 3
There is more to pyg-mongo but this is a good taster.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyg-mongo-0.0.15.tar.gz
.
File metadata
- Download URL: pyg-mongo-0.0.15.tar.gz
- Upload date:
- Size: 24.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 539bda5d2c2dbfc5caa382db27f728811b2c0f4e5dd5f8ff84bd955d5f654139 |
|
MD5 | 29cc5c71771b3e7d5e31ada08a47427a |
|
BLAKE2b-256 | 3de43bd742a881c2ef0e9979b1bcb0717f7d13e30280909238ada76db6430bfc |
File details
Details for the file pyg_mongo-0.0.15-py3-none-any.whl
.
File metadata
- Download URL: pyg_mongo-0.0.15-py3-none-any.whl
- Upload date:
- Size: 19.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66ab2a70452c05817d5d423e8e1895ca51e6c59807172ca7403b00d3874ab535 |
|
MD5 | 46cbae30ce13592ce2e0bf1f10de04c7 |
|
BLAKE2b-256 | 6bbdd373281266adc698f92a4f91a010108bcc369aff58ee2e95f6464066ff5b |