Skip to main content

Perform QA between 2 dataframes

Project description

The New DST QA Library (DST2)

DST2 is the new QA library that addresses issues like ease of use, dynamic reporting and error management. The new library has only one function called "perform_qa" that does most of the check for the deliverables (similar pandas dataframes). Functionalities:

  • Dynamic Error Management
  • Flexible Reporting
  • Search Operations
  • Shorter Notebook Codes
  • More..

Installation

Install the package via pip with code below:

  > pip install DST2
To Upgrade
  > pip install --upgrade DST2
#For importing the QA library
import DST2.QA as q
import pandas as pd

The package is built on top of pandas thus making it easier to compare dataframes

dfOld = pd.read_excel('OLD_FILE.xyz') #The extension .xyz could be xlsx, csv, json or any that can be read by pandas
dfNew = pd.read_excel('NEW_FILE.xyz') #The extension .xyz could be xlsx, csv, json or any that can be read by pandas

Starting a QA process

When initiating a new QA process, you will have to provide the following:

  • Name of the Excel report
  • The previous and new deliverable via pandas
  • The index column (a column name or a list of columns)
#Initiate a QA process
qa = q.QA_Report("Report 1",dfOld,dfNew,'Entity ID')
#Create Reports
qa.create_report()
#Let's create another report specifying parameters
#Start a Report
qa2 = q.QA_Report("Report 2",dfOld,dfNew,'Entity ID')

Perform QA

This is the core of the QA process where you decide to:

  • Perform column or score comparisons
  • Set deltas
  • Search columns for QA
  • Perform QA on all columns
#Perform QA on Columns comparison
spec_cols = ['Highest Controversy Level-Answer Category','Does the company meet your screening criteria?'] #fields in both files
qa2.perform_qa(columns=spec_cols)
#Perform QA on Score changes with default delta = 5
cols = ['Total ESG Score','Percentile']
qa2.perform_qa(columns=cols,type='score', delta=5) #default is 5 anyways
#Create Reports
qa2.create_report()

Recap!

We have used 3 parameters with the perform_qa function which are:

  • choosing an index that identifies each row uniquely - 'Entity ID'
  • columns -- To specify the columns to perform QA on
  • type -- To specify if it is a column or score comparison and by default it performs a column comparison
  • delta -- By default it is set to 5 and it is used when we perform a score comparison to define a threshold.
#Start a Report
qa3 = q.QA_Report("Report 3",dfOld,dfNew,'Entity ID')

More on parameters

We have used 3 more parameters with the perform_qa function which are

  • all_cols -- To perform QA on all columns and it is set to False by default
  • keywords -- To search for some keywords in field names eligible for QA
  • takeout_keywords -- To search for some keywords in field names and remove those fieldnames NOT eligible for QA
  • In this last example we have added the type score because we are performing score changes
qa3.perform_qa(all_cols=True,takeout_keywords=['score','percentile'])
qa3.perform_qa(keywords=['score','percentile'],takeout_keywords='overall',type='score', delta=10)
qa3.create_report()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DST2-0.0.3.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

DST2-0.0.3-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file DST2-0.0.3.tar.gz.

File metadata

  • Download URL: DST2-0.0.3.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for DST2-0.0.3.tar.gz
Algorithm Hash digest
SHA256 7246cd20b46764171524e4b6c50a0ce589df4a1040e01fb4f14c6334a7c7a86c
MD5 d8c333e8eba82dd86cba9a1febe2e1e1
BLAKE2b-256 3aaa7f896736eafd019e4d430c71d2b67a1228ddb4cec21480d6690b723cad79

See more details on using hashes here.

File details

Details for the file DST2-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: DST2-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for DST2-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f1bb2543c2ed8696247076c55654533c9bf00640972470f72764fb8157c05da6
MD5 d81f9b0cf5cb88c3a9016d89aaad04fb
BLAKE2b-256 10cb6b94249620479ed8c20965d71a1c52399303c509370cf0ff7f86cf7f2cc1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page