Skip to main content

tools collection for data engineer

Project description

This pacakge provides various tools to perform task on data, in easy and efficient manner; more modules could be added into the tools collection with development.

  1. universal way to connect most database softwares via JDBC (include kerberos auth for Hive), using Fast/Batch load technology to speed up the temporary table creation and query; as well as functions to convert clob into string or save the blob into specified file.

  2. add multiprocessing capablity to pandas dataframe when dealing with cpu intensive operation on large volume data.

  3. form based authentication module for requests package.

  4. restapi client using aiohttp package with retry function.

sample usage:

## connect to mysql
    import pydtc

    conn = pydtc.connect('mysql', '127.0.0.1', 'user', 'pass')
    pydtc.read_sql('select * from demo.sample', conn)
    conn.close()

### or use with clause for auto close
    with pydtc.connect('mysql', '127.0.0.1', 'user', 'pass') as conn:
        conn.read_sql('select * from demo.sample')
        # pydtc.read_sql('select * from demo.sample', conn)

    ## DBAPI 2.0    
    with pydtc.connect_dbapi('mysql', '127.0.0.1', 'user', 'pass') as conn:
        pd.read_sql('select * from demo.sample', conn)

## pandas multiprocessing groupby then apply
    def func(df, key, value):
        dd = {key : value}
        dd['some_key'] = [len(df.other_key)]

        return pd.DataFrame(dd)

    new_df = pydtc.p_groupby_apply(func, df, 'group_key')

## access web page in website with form based authenticaion
    from pydtc import HttpFormAuth
    import requests

    r = requests.get('http://www.example.com/private_webpage.html', auth=HttpFormAuth('user', 'password'))

## restapi get and update
# Fake Online REST API for Testing and Prototyping
# https://jsonplaceholder.typicode.com/
    from pydtc import api_get, api_update

    api_get('https://jsonplaceholder.typicode.com/todos/1')
    # or
    api_update('https://jsonplaceholder.typicode.com/todos/1', data={'title': 'foo'}, method='patch')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydtc-0.7.0.tar.gz (10.3 kB view details)

Uploaded Source

File details

Details for the file pydtc-0.7.0.tar.gz.

File metadata

  • Download URL: pydtc-0.7.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for pydtc-0.7.0.tar.gz
Algorithm Hash digest
SHA256 6f9c86e1713fc6ad7cf49a5c9aef6680bbd22752df575d92ea86c39bf5bea844
MD5 440727811329932e86791bc2817d4753
BLAKE2b-256 629cddd1baf1e9172447d1b3650e49e5d337816fe98e523ba9363a5ccde316f9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page