Multiple tools and utilities for ETL pipelines and others.
Project description
Ditat ETL
Multiple tools and utilities for ETL pipelines and others.
Utils
time_it
Decorator to time function and class method. Additional text can be added.
from ditat_etl.utils import time_it
@time_it()
def f():
'''Do something'''
f()
f time: 0.1
Url
Extension of module requests/urllib3 for Proxy usage and Bulk usage.
Url
High-level usage
from ditat_etl import url
response = url.get('https://google.com')
# You can pass the same parameters as the library requests and other special parameters.
# Check low level usage for more details.
Low-level usage
from ditat_etl.url import Url
u = Url()
We use the logging module and it is set by default with 'DEBUG'. You can change this parameter to any allowed level
u = Url(debug_level='WARNING') # Just an example
Manage your proxies
u.add_proxies(n=3) # Added 3 new proxies (not necessarily valid) to self.proxies
u.clean_proxies() # Multithreaded to validate and keep only valid proxies.
print(u.proxies)
# You can also u.proxies = [], set them manually but this is not recommended.
Main functionality
def request(
queue: str or list,
expected_status_code: int=200,
n_times: int=1,
max_retries: int=None,
use_proxy=False,
_raise=True,
***kwargs
):
Examples
result = u.request('https://google.com')
result = u.request(queue=['https://google.com', 'htttps://facebook.com'], use_proxy=True)
# You can also pass optional parameter valid por a requests "Request"
import json
result = u.request(queue='https://example.com', method='post', data=json.dumps({'hello': 'world'}))
Databases
Useful wrappers for databases and methods to execute queries.
Postgres
It is compatible with pandas.DataFrame interaction, either reading as dataframes and pushing to the db.
from ditat_etl.databases import Postgres
config = {
"database": "xxxx",
"user": "xxxx",
"password": "xxxx",
"host": "xxxxx",
"port": "xxxx"
}
p = Postgres(config)
The main base function is query.
p.query(
query_statement: list or str,
df: bool=False,
as_dict: bool=False,
commit: bool=True,
returning: bool=True,
mogrify: bool=False,
mogrify_tuple: tuple or list=None,
verbose=False
)
This function is a workaround of pandas.to_sql() which drops the table before inserting. It really works like an upsert and it gives you the option to do nothing or update on the column(s) constraint.
p.insert_df_to_sql(
df: pd.DataFrame,
tablename: str,
commit=True,
conflict_on: list=None,
do_update_columns: bool or list=False,
verbose=False
):
This one is similar, it lets you "upsert" without necessarily having a primary key or constraint. Ideally use the previous method.
p.update_df_to_sql(
df: pd.DataFrame,
tablename: str,
on_columns: str or list,
insert_new=True,
commit=True,
verbose=False
):
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ditat_etl-0.0.9.tar.gz
.
File metadata
- Download URL: ditat_etl-0.0.9.tar.gz
- Upload date:
- Size: 13.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79f57f3c6ca900f7215636dd2aea2a808a74d00aa557e7df90441690cf75e583 |
|
MD5 | e3a17a2b7b52f534b5d2d60c80be6899 |
|
BLAKE2b-256 | a1fd8e6014d83c87364d5972ffe0febad1847409daa2554d02409f4c999b655f |
Provenance
File details
Details for the file ditat_etl-0.0.9-py3-none-any.whl
.
File metadata
- Download URL: ditat_etl-0.0.9-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aeecbcfe1572404d203864897d19fbd56cbf198cc9238bec68c64c940dd7f799 |
|
MD5 | 8f029a40386868f5fa9a93b1ea4b3a9d |
|
BLAKE2b-256 | aa50700ff9e4b12d14f16f82b939a54365170ab0ad0bb328c51fa1482ac3abed |