tools collection for data engineer
Project description
This pacakge provides various tools to perform task on data, in easy and efficient manner; more modules could be added into the tools collection with development.
-
universal way to connect most database softwares via JDBC (include kerberos auth for Hive), using Fast/Batch load technology to speed up the temporary table creation and query; as well as functions to convert clob into string or save the blob into specified file.
-
add multiprocessing capablity to pandas dataframe when dealing with cpu intensive operation on large volume data.
-
form based authentication module for requests package.
-
restapi client using aiohttp package with retry function.
sample usage:
## connect to mysql
import pydtc
conn = pydtc.connect('mysql', '127.0.0.1', 'user', 'pass')
pydtc.read_sql('select * from demo.sample', conn)
conn.close()
### or use with clause for auto close
with pydtc.connect('mysql', '127.0.0.1', 'user', 'pass') as conn:
conn.read_sql('select * from demo.sample')
# pydtc.read_sql('select * from demo.sample', conn)
## DBAPI 2.0
with pydtc.connect_dbapi('mysql', '127.0.0.1', 'user', 'pass') as conn:
pd.read_sql('select * from demo.sample', conn)
## pandas multiprocessing groupby then apply
def func(df, key, value):
dd = {key : value}
dd['some_key'] = [len(df.other_key)]
return pd.DataFrame(dd)
new_df = pydtc.p_groupby_apply(func, df, 'group_key')
## access web page in website with form based authenticaion
from pydtc import HttpFormAuth
import requests
r = requests.get('http://www.example.com/private_webpage.html', auth=HttpFormAuth('user', 'password'))
## restapi get and update
# Fake Online REST API for Testing and Prototyping
# https://jsonplaceholder.typicode.com/
from pydtc import api_get, api_update
api_get('https://jsonplaceholder.typicode.com/todos/1')
# or
api_update('https://jsonplaceholder.typicode.com/todos/1', data={'title': 'foo'}, method='patch')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pydtc-0.7.0.tar.gz
.
File metadata
- Download URL: pydtc-0.7.0.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f9c86e1713fc6ad7cf49a5c9aef6680bbd22752df575d92ea86c39bf5bea844 |
|
MD5 | 440727811329932e86791bc2817d4753 |
|
BLAKE2b-256 | 629cddd1baf1e9172447d1b3650e49e5d337816fe98e523ba9363a5ccde316f9 |