Skip to main content

Multiple tools and utilities for ETL pipelines and others.

Project description

Ditat ETL

Multiple tools and utilities for ETL pipelines and others.

Utils

time_it

Decorator to time function and class method. Additional text can be added.

from ditat_etl.utils import time_it

@time_it()
def f():
  '''Do something'''
f()
f time: 0.1

Url

Extension of module requests/urllib3 for Proxy usage and Bulk usage.

Url

High-level usage

from ditat_etl import url

response = url.get('https://google.com')
# You can pass the same parameters as the library requests and other special parameters.

# Check low level usage for more details.

Low-level usage

from ditat_etl.url import Url

u = Url()

We use the logging module and it is set by default with 'DEBUG'. You can change this parameter to any allowed level

u = Url(debug_level='WARNING') # Just an example

Manage your proxies

u.add_proxies(n=3) # Added 3 new proxies (not necessarily valid) to self.proxies

u.clean_proxies() # Multithreaded to validate and keep only valid proxies.
print(u.proxies)
# You can also u.proxies = [], set them manually but this is not recommended.

Main functionality

def request(
    queue: str or list,
    expected_status_code: int=200,
    n_times: int=1,
    max_retries: int=None,
    use_proxy=False,
    _raise=True,
    ***kwargs
    ):

Examples

result = u.request('https://google.com')

result = u.request(queue=['https://google.com', 'htttps://facebook.com'], use_proxy=True)

# You can also pass optional parameter valid por a requests "Request"
import json
result = u.request(queue='https://example.com', method='post', data=json.dumps({'hello': 'world'}))

Databases

Useful wrappers for databases and methods to execute queries.

Postgres

It is compatible with pandas.DataFrame interaction, either reading as dataframes and pushing to the db.

from ditat_etl.databases import Postgres

config = {
    "database": "xxxx",
    "user": "xxxx",
    "password": "xxxx",
    "host": "xxxxx",
    "port": "xxxx"
}
p = Postgres(config)

The main base function is query.

p.query(
    query_statement: list or str,
    df: bool=False,
    as_dict: bool=False,
    commit: bool=True,
    returning: bool=True,
    mogrify: bool=False,
    mogrify_tuple: tuple or list=None,
    verbose=False
)

This function is a workaround of pandas.to_sql() which drops the table before inserting. It really works like an upsert and it gives you the option to do nothing or update on the column(s) constraint.

p.insert_df_to_sql(
    df: pd.DataFrame,
    tablename: str,
    commit=True,
    conflict_on: list=None,
    do_update_columns: bool or list=False,
    verbose=False
):

This one is similar, it lets you "upsert" without necessarily having a primary key or constraint. Ideally use the previous method.

p.update_df_to_sql(
    df: pd.DataFrame,
    tablename: str,
    on_columns: str or list,
    insert_new=True,
    commit=True,
    verbose=False
):

Project details


Release history Release notifications | RSS feed

This version

0.3.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ditat_etl-0.3.1.tar.gz (33.9 kB view details)

Uploaded Source

Built Distribution

ditat_etl-0.3.1-py3-none-any.whl (90.8 kB view details)

Uploaded Python 3

File details

Details for the file ditat_etl-0.3.1.tar.gz.

File metadata

  • Download URL: ditat_etl-0.3.1.tar.gz
  • Upload date:
  • Size: 33.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for ditat_etl-0.3.1.tar.gz
Algorithm Hash digest
SHA256 c957c6c22433131c0df0581fab7867a71c9cbb854ba95d7586c27ff8235f988b
MD5 c53c6c4ee909f78995c10cdb52c7fde9
BLAKE2b-256 41300541eaa230afa7c5a2d31e29034a56fdfdc2a565d5b6076cb0736597476d

See more details on using hashes here.

Provenance

File details

Details for the file ditat_etl-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: ditat_etl-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 90.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for ditat_etl-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d904b1a8bcc339917cb1c9005878b943fde6a1c1ec40fbb9c4ed25a0f2852770
MD5 2161c29da93cf6f102fe121979ce99df
BLAKE2b-256 73a32d3e1d96d45777b24bce556f81994c26e2b50cb483f3a94e362c94879782

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page