A selection of tools for easier processing of data using Pandas and AWS

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
Topic
- Software Development :: Libraries

Project description

Dativa Tools

Provides useful libraries for processing large data sets. Developed by the team at www.dativa.com as we find them useful in our projects.

Any questions, please email hello AT dativa.com

Installation

pip install dativatools

Description

The library includes two modules:

dativatools - which contains the legacy classes
dativa.tools - which contains the more recent classes.

Over time it is expected that we will migrate all classes over to the dativa.tools module

dativa.tools.aws.AthenaClient

An easy to use client for AWS Athena that will create tables from S3 buckets (using AWS Glue) and run queries against these tables. It support full customisation of SerDe and column names on table creation.

Examples:

Creating tables

ac = AthenaClient(aws_region, db_name)
ac.create_table(table_name='my_first_table',
               crawler_target={'S3Targets': [
                   {'Path': 's3://my-bucket/table-data'}]}
               )

# Create a table with a custom SerDe and column names, typical for CSV files
ac.create_table(table_name='comcast_visio_match',
               crawler_target={'S3Targets': [
                   {'Path': 's3://my-bucket/table-data-2', 'Exclusions': ['**._manifest']}]},
               serde='org.apache.hadoop.hive.serde2.OpenCSVSerde',
               columns=[{'Name': 'id', 'Type': 'string'}, {
                   'Name': 'device_id', 'Type': 'string'}, {'Name': 'subscriber_id', 'Type': 'string'}]
               )

Running queries

ac = AthenaClient(aws_region, db_name)
 ac.add_query(sql=query,
                 name="From field {0}".format(test_columns[i]),
                 output_location=s3_bucket + 'test-processed')

    i = i + number_fields + 1

ac.wait_for_completion()

Fetch results of query

ac = AthenaClient(aws_region, db_name)
ac.add_query(sql=query,
                 name="From field {0}".format(test_columns[i]),
                 output_location=s3_bucket + 'test-processed')

ac.wait_for_completion()
ac.get_query_result(query)

dativa.tools.aws.S3Client

An easy to use client for AWS S3 that copies data to S3. Examples:

Copy files from folder in local filesystem to s3 bucket

s3 = S3Client()
s3.put_folder(source="/home/user/my_folder", bucket="bucket_name", destination="backup/files")

# Copy all csv files from folder to s3
s3.put_folder(source="/home/user/my_folder", bucket="bucket_name", destination="backup/files", 'file_format="*.csv")

dativa.tools.SQLClient

A SQL client that wraps any PEP249 compliant connection object and provides detailed logging and simple query execution. In provides the following methods:

execute_query

Runs a query and ignores any output

Parameters:

query - the query to run, either a SQL file or a SQL query
parameters - a dict of parameters to substitute in the query
replace - a dict or items to be replaced in the SQL text
first_to_run - the index of the first query in a mult-command query to be executed

execute_query_to_df

Runs a query and returns the output of the final statement in a DataFrame.

Parameters:

query - the query to run, either a SQL file or a SQL query
parameters - a dict of parameters to substitute in the query
replace - a dict or items to be replaced in the SQL text

def execute_query_to_csv

Runs a query and writes the output of the final statement to a CSV file.

Parameters:

query - the query to run, either a SQL file or a SQL query
csvfile - the file name to save the query results to
parameters - a dict of parameters to substitute in the query
replace - a dict or items to be replaced in the SQL text

Example code

# set up the SQL client from environment variables
sql = SqlClient(psycopg2.connect(
    database=os.environ["DB_NAME"],
    user=os.environ["USER"],
    password=os.environ["PASSWORD"],
    host=os.environ["HOST"],
    port=os.environ["PORT"],
    client_encoding="UTF-8",
    connect_timeout=10))

# create the full schedule table
df = sql.execute_query_to_df(query="sql/my_query.sql",
                             parameters={"start_date": "2018-01-01",
                                         "end_date": "2018-02-01"})

dativa.tools.log_to_stdout

A convenience function to redirect a specific logger and its children to stdout

log_to_stdout("dativa.tools", logging.DEBUG)

dativa.tools.pandas.CSVHandler

A wrapper for pandas CSV handling to read and write DataFrames that is provided in pandas with consistent CSV parameters and sniffing the CSV parameters automatically. Includes reading a CSV into a DataFrame, and writing it out to a string.

Support functions for Pandas

dativa.tools.pandas.is_numeric - a function to check whether a series or string is numeric
dativa.tools.pandas.string_to_datetime - a function to convert a string, or series of strings to a datetime, with a strptime date format that supports nanoseconds
dativa.tools.pandas.datetime_to_string - a function to convert a datetime, or a series of datetimes to a string, with a strptime date format that supports nanoseconds
dativa.tools.pandas.format_string_is_valid - a function to confirm whether a strptime format string returns a date
dativa.tools.pandas.get_column_name - a function to return the name of a column from a passed column name or index.
dativa.tools.pandas.get_unique_column_name - a function to return a unique column name when adding new columns to a DataFrame

Legacy classes

dativatools.CommonUtility

Supports various common activities including getting detailed descriptions about exceptions, logging activity into a CSV file or database table and sending email reports of failures.

dativatools.DataValidation

Class containing methods to validate file sizes, dates, counts, names and extensions at a specified location.

dativatools.DatabaseManagement

Generic database management operations including data insertion, table deletion, backup, rename, drop and create as well as query execution.

dativatools.RsyncLib

Class to perform file transfer using Rsync.

dativatools.SFTPLib

Class to perform file transfer using SFTP.

dativatools.ArchiveManager

Class to manage archiving and unarchiving of files to and from specific locations.

dativatools.TextToCsvConverter

Class containing methods required to convert a text file to CSV and change certain parameters like headers, separators etc.

dativatools.S3Lib

Supports connecting to and getting and putting data to and from AWS S3 buckets.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
Topic
- Software Development :: Libraries

Release history Release notifications | RSS feed

3.3.832

May 1, 2020

3.3.813

Apr 15, 2020

3.2.796

Apr 7, 2020

3.2.789

Apr 2, 2020

3.2.782

Mar 29, 2020

3.2.779

Mar 29, 2020

3.2.758

Mar 19, 2020

3.2.752

Mar 16, 2020

3.2.749

Mar 14, 2020

3.2.727

Feb 26, 2020

3.1.693

Jan 30, 2020

3.1.683

Jan 24, 2020

3.1.664

Jan 15, 2020

3.1.621

Dec 9, 2019

3.1.602

Nov 27, 2019

3.1.585

Nov 20, 2019

3.0.607

Nov 27, 2019

3.0.578

Nov 18, 2019

3.0.569

Nov 15, 2019

3.0.538

Oct 17, 2019

3.0.513

Sep 24, 2019

3.0.508

Sep 23, 2019

3.0.506

Sep 23, 2019

3.0.477

Sep 5, 2019

3.0.435

Aug 14, 2019

3.0.417

Aug 1, 2019

3.0.410

Jul 31, 2019

3.0.409

Jul 30, 2019

3.0.407

Jul 29, 2019

3.0.404

Jul 28, 2019

3.0.381

Jul 17, 2019

3.0.380

Jul 15, 2019

3.0.378

Jul 14, 2019

3.0.371

Jul 14, 2019

3.0.368

Jul 11, 2019

3.0.367

Jul 9, 2019

3.0.366

Jul 6, 2019

3.0.365

Jul 5, 2019

3.0.364

Jul 2, 2019

3.0.361

Jul 2, 2019

3.0.355

Jun 28, 2019

3.0.354

Jun 27, 2019

3.0.352

Jun 26, 2019

3.0.351

Jun 14, 2019

3.0.350

Jun 14, 2019

2.12.345

Jun 9, 2019

2.12.342

Jun 9, 2019

2.12.341

Jun 6, 2019

2.12.339

Jun 4, 2019

2.12.335

May 24, 2019

2.12.326

May 23, 2019

2.12.317

May 17, 2019

2.12.283

May 3, 2019

2.12.274

Apr 29, 2019

2.12.272

Apr 4, 2019

2.12.271

Mar 21, 2019

2.12.269

Mar 19, 2019

2.12.268

Mar 11, 2019

2.12.266

Feb 26, 2019

2.12.263

Feb 25, 2019

2.12.258

Feb 7, 2019

2.12.255

Jan 30, 2019

2.12.245

Jan 7, 2019

2.12.244

Jan 5, 2019

2.12.240

Jan 5, 2019

2.12.229

Dec 14, 2018

2.12.227

Dec 13, 2018

2.12.221

Dec 4, 2018

2.12.215

Nov 28, 2018

2.12.214

Nov 28, 2018

2.12.191

Nov 20, 2018

2.12.190

Nov 19, 2018

2.12.189

Nov 19, 2018

2.11.188

Nov 16, 2018

2.11.186

Nov 7, 2018

2.11.181

Nov 5, 2018

2.11.180

Nov 5, 2018

2.11.176

Nov 5, 2018

2.11.172

Oct 30, 2018

2.11.165

Oct 23, 2018

2.11.161

Oct 18, 2018

2.11.159

Oct 17, 2018

2.10.12

Sep 17, 2018

2.10.11

Sep 12, 2018

2.10.5

Sep 6, 2018

2.10.3

Aug 27, 2018

2.10.2

Aug 17, 2018

2.10.1

Aug 15, 2018

2.9.19

Aug 13, 2018

2.9.18

Aug 10, 2018

2.9.16

Aug 7, 2018

2.9.15

Aug 7, 2018

2.9.14

Jul 18, 2018

2.9.13

Jul 13, 2018

2.9.12

Jul 1, 2018

2.9.11

Jun 30, 2018

2.9.10

Jun 29, 2018

2.9.9

Jun 20, 2018

This version

2.9.8

Jun 19, 2018

2.9.7

Jun 13, 2018

2.9.6

Jun 11, 2018

2.9.5

Jun 11, 2018

2.9.4

Jun 7, 2018

2.9.3

Jun 5, 2018

2.9.2

Jun 5, 2018

2.9.1

Jun 4, 2018

2.9

May 17, 2018

2.8.4444

May 17, 2018

2.8.5

May 17, 2018

2.8.3

May 14, 2018

2.8.2

May 14, 2018

2.8.1

May 14, 2018

2.8

May 12, 2018

2.7.1

May 12, 2018

2.7

May 12, 2018

2.6.13

Jan 19, 2018

2.6.12

Jan 3, 2018

2.6.11

Dec 15, 2017

2.6.10

Dec 8, 2017

2.6.9

Dec 7, 2017

2.6.8

Dec 7, 2017

2.6.7

Nov 22, 2017

2.6.6

Nov 9, 2017

2.6.5

Nov 8, 2017

2.6.4

Nov 8, 2017

2.6.3

Oct 27, 2017

2.6.2

Oct 16, 2017

2.6.1

Oct 9, 2017

2.6

Sep 25, 2017

2.5

Sep 22, 2017

2.4

Sep 22, 2017

2.2

Sep 15, 2017

2.1

Sep 13, 2017

2.0

Sep 12, 2017

1.9

Sep 8, 2017

1.8

Sep 7, 2017

1.7

Sep 6, 2017

1.6

Sep 6, 2017

1.5

Sep 1, 2017

1.4

Sep 1, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dativatools-2.9.8.tar.gz (40.3 kB view details)

Uploaded Jun 19, 2018 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dativatools-2.9.8-py2.py3-none-any.whl (65.5 kB view details)

Uploaded Jun 19, 2018 Python 2Python 3

File details

Details for the file dativatools-2.9.8.tar.gz.

File metadata

Download URL: dativatools-2.9.8.tar.gz
Upload date: Jun 19, 2018
Size: 40.3 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for dativatools-2.9.8.tar.gz
Algorithm	Hash digest
SHA256	`1333270280fd7a2ed3d1a280c5063d9030bf5f8caebbd52cffddbe63bdec209b`
MD5	`00baea033e1ebc1c3ef38b588a85e67b`
BLAKE2b-256	`ad50976e2685e203b5279b76fdf5e794eec4153d18f67119eec31443bd322ba8`

See more details on using hashes here.

File details

Details for the file dativatools-2.9.8-py2.py3-none-any.whl.

File metadata

Download URL: dativatools-2.9.8-py2.py3-none-any.whl
Upload date: Jun 19, 2018
Size: 65.5 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for dativatools-2.9.8-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`e20b0959ce2e21de94de3071389961fae01f280c307c3e461a5e5aa7e3331d37`
MD5	`96ec4f0f389de6a89cc2497a54e5a0b5`
BLAKE2b-256	`2f08e8946b555c755bb0a6871ce16288045eda16c37083a734d38cd413a9fb59`

See more details on using hashes here.

dativatools 2.9.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Dativa Tools

Installation

Description

dativa.tools.aws.AthenaClient

Creating tables

Running queries

Fetch results of query

dativa.tools.aws.S3Client

Copy files from folder in local filesystem to s3 bucket

dativa.tools.SQLClient

execute_query

execute_query_to_df

def execute_query_to_csv

Example code

dativa.tools.log_to_stdout

dativa.tools.pandas.CSVHandler

Support functions for Pandas

Legacy classes

dativatools.CommonUtility

dativatools.DataValidation

dativatools.DatabaseManagement

dativatools.RsyncLib

dativatools.SFTPLib

dativatools.ArchiveManager

dativatools.TextToCsvConverter

dativatools.S3Lib

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes