Multiple tools and utilities for ETL pipelines and others.
Project description
Ditat ETL
Multiple tools and utilities for ETL pipelines and others.
Utils
time_it
Decorator to time function and class method. Additional text can be added.
from ditat_etl.utils import time_it
@time_it()
def f():
'''Do something'''
f()
f time: 0.1
Url
Extension of module requests/urllib3 for Proxy usage and Bulk usage.
Url
High-level usage
from ditat_etl import url
response = url.get('https://google.com')
# You can pass the same parameters as the library requests and other special parameters.
# Check low level usage for more details.
Low-level usage
from ditat_etl.url import Url
u = Url()
We use the logging module and it is set by default with 'DEBUG'. You can change this parameter to any allowed level
u = Url(debug_level='WARNING') # Just an example
Manage your proxies
u.add_proxies(n=3) # Added 3 new proxies (not necessarily valid) to self.proxies
u.clean_proxies() # Multithreaded to validate and keep only valid proxies.
print(u.proxies)
# You can also u.proxies = [], set them manually but this is not recommended.
Main functionality
def request(
queue: str or list,
expected_status_code: int=200,
n_times: int=1,
max_retries: int=None,
use_proxy=False,
_raise=True,
***kwargs
):
Examples
result = u.request('https://google.com')
result = u.request(queue=['https://google.com', 'htttps://facebook.com'], use_proxy=True)
# You can also pass optional parameter valid por a requests "Request"
import json
result = u.request(queue='https://example.com', method='post', data=json.dumps({'hello': 'world'}))
Databases
Useful wrappers for databases and methods to execute queries.
Postgres
It is compatible with pandas.DataFrame interaction, either reading as dataframes and pushing to the db.
from ditat_etl.databases import Postgres
config = {
"database": "xxxx",
"user": "xxxx",
"password": "xxxx",
"host": "xxxxx",
"port": "xxxx"
}
p = Postgres(config)
The main base function is query.
p.query(
query_statement: list or str,
df: bool=False,
as_dict: bool=False,
commit: bool=True,
returning: bool=True,
mogrify: bool=False,
mogrify_tuple: tuple or list=None,
verbose=False
)
This function is a workaround of pandas.to_sql() which drops the table before inserting. It really works like an upsert and it gives you the option to do nothing or update on the column(s) constraint.
p.insert_df_to_sql(
df: pd.DataFrame,
tablename: str,
commit=True,
conflict_on: list=None,
do_update_columns: bool or list=False,
verbose=False
):
This one is similar, it lets you "upsert" without necessarily having a primary key or constraint. Ideally use the previous method.
p.update_df_to_sql(
df: pd.DataFrame,
tablename: str,
on_columns: str or list,
insert_new=True,
commit=True,
verbose=False
):
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ditat_etl-0.2.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ce89aac44a6cad0593f28f95a64a3116cc18aaf78c294d27e80fa293fdc16db |
|
MD5 | f959932ba6fe86a4d0c17edeab16d693 |
|
BLAKE2b-256 | 5fc609a208b96689499953e7e43d4cbaea7b0e9239c1c4ca51754690df33370d |