HDX Python Utilities
Project description
The HDX Python Utilities Library provides a range of helpful utilities:
Easy downloading of files with support for authentication, streaming and hashing
Database utilities (inc. connecting through SSH and SQLAlchemy helpers)
This library is part of the Humanitarian Data Exchange (HDX) project. If you have humanitarian related data, please upload your datasets to HDX.
Usage
The library has detailed API documentation which can be found here: http://ocha-dap.github.io/hdx-python-utilities/. The code for the library is here: https://github.com/ocha-dap/hdx-python-utilities.
Downloading files
Various utilities to help with downloading files. Includes retrying by default.
For example, given YAML file extraparams.yml:
mykey: basic_auth: "XXXXXXXX" locale: "en"
We can create a downloader as shown below that will use the authentication defined in basic_auth and add the parameter locale=en to each request (eg. for get request http://myurl/lala?param1=p1&locale=en):
with Download(extra_params_yaml='extraparams.yml', extra_params_lookup='mykey') as downloader: response = downloader.download(url) # get requests library response json = response.json() # Download file to folder/filename f = downloader.download_file('http://myurl', post=False, parameters=OrderedDict([('b', '4'), ('d', '3')]), folder=tmpdir, filename=filename) filepath = abspath(f) # Read row by row from tabular file for row in downloader.get_tabular_rows('http://myurl/my.csv', dict_rows=True, headers=1) a = row['col']
Other useful functions:
# Build get url from url and dictionary of parameters Download.get_url_for_get('http://www.lala.com/hdfa?a=3&b=4', OrderedDict([('c', 'e'), ('d', 'f')])) # == 'http://www.lala.com/hdfa?a=3&b=4&c=e&d=f' # Extract url and dictionary of parameters from get url Download.get_url_params_for_post('http://www.lala.com/hdfa?a=3&b=4', OrderedDict([('c', 'e'), ('d', 'f')])) # == ('http://www.lala.com/hdfa', OrderedDict([('a', '3'), ('b', '4'), ('c', 'e'), ('d', 'f')]))
Loading and Saving JSON and YAML
Examples:
# Load YAML mydict = load_yaml('my_yaml.yml') # Load 2 YAMLs and merge into dictionary mydict = load_and_merge_yaml('my_yaml1.yml', 'my_yaml2.yml') # Load YAML into existing dictionary mydict = load_yaml_into_existing_dict(existing_dict, 'my_yaml.yml') # Load JSON mydict = load_json('my_json.yml') # Load 2 JSONs and merge into dictionary mydict = load_and_merge_json('my_json1.json', 'my_json2.json') # Load JSON into existing dictionary mydict = load_json_into_existing_dict(existing_dict, 'my_json.json') # Save dictionary to YAML file in pretty format # preserving order if it is an OrderedDict save_yaml(mydict, 'mypath.yml', pretty=True, sortkeys=False) # Save dictionary to JSON file in compact form # sorting the keys save_json(mydict, 'mypath.json', pretty=False, sortkeys=False)
Database utilities
These are built on top of SQLAlchemy and simplify its setup.
Your SQLAlchemy database tables must inherit from Base in hdx.utilities.database eg.
from hdx.utilities.database import Base class MyTable(Base): my_col = Column(Integer, ForeignKey(MyTable2.col2), primary_key=True)
Examples:
# Get SQLAlchemy session object given database parameters and # if needed SSH parameters. If database is PostgreSQL, will poll # till it is up. with Database(database='db', host='1.2.3.4', username='user', password='pass', driver='driver', ssh_host='5.6.7.8', ssh_port=2222, ssh_username='sshuser', ssh_private_key='path_to_key') as session: session.query(...) # Extract dictionary of parameters from SQLAlchemy url result = Database.get_params_from_sqlalchemy_url(TestDatabase.sqlalchemy_url) # Build SQLAlchemy url from dictionary of parameters result = Database.get_sqlalchemy_url(**TestDatabase.params) # Wait util PostgreSQL is up Database.wait_for_postgres('mydatabase', 'myserver', 5432, 'myuser', 'mypass')
Dictionary and list utilities
Examples:
# Merge dictionaries d1 = {1: 1, 2: 2, 3: 3, 4: ['a', 'b', 'c']} d2 = {2: 6, 5: 8, 6: 9, 4: ['d', 'e']} result = merge_dictionaries([d1, d2]) assert result == {1: 1, 2: 6, 3: 3, 4: ['d', 'e'], 5: 8, 6: 9} # Diff dictionaries d1 = {1: 1, 2: 2, 3: 3, 4: {'a': 1, 'b': 'c'}} d2 = {4: {'a': 1, 'b': 'c'}, 2: 2, 3: 3, 1: 1} diff = dict_diff(d1, d2) assert diff == {} d2[3] = 4 diff = dict_diff(d1, d2) assert diff == {3: (3, 4)} # Add element to list in dict d = dict() dict_of_lists_add(d, 'a', 1) assert d == {'a': [1]} dict_of_lists_add(d, 2, 'b') assert d == {'a': [1], 2: ['b']} dict_of_lists_add(d, 'a', 2) assert d == {'a': [1, 2], 2: ['b']} # Spread items in list so similar items are further apart input_list = [3, 1, 1, 1, 2, 2] result = list_distribute_contents(input_list) assert result == [1, 2, 1, 2, 1, 3] # Get values for the same key in all dicts in list input_list = [{'key': 'd', 1: 5}, {'key': 'd', 1: 1}, {'key': 'g', 1: 2}, {'key': 'a', 1: 2}, {'key': 'a', 1: 3}, {'key': 'b', 1: 5}] result = extract_list_from_list_of_dict(input_list, 'key') assert result == ['d', 'd', 'g', 'a', 'a', 'b'] # Cast either keys or values or both in dictionary to type d1 = {1: 2, 2: 2.0, 3: 5, 'la': 4} assert key_value_convert(d1, keyfn=int) == {1: 2, 2: 2.0, 3: 5, 'la': 4} assert key_value_convert(d1, keyfn=int, dropfailedkeys=True) == {1: 2, 2: 2.0, 3: 5} d1 = {1: 2, 2: 2.0, 3: 5, 4: 'la'} assert key_value_convert(d1, valuefn=int) == {1: 2, 2: 2.0, 3: 5, 4: 'la'} assert key_value_convert(d1, valuefn=int, dropfailedvalues=True) == {1: 2, 2: 2.0, 3: 5} # Cast keys in dictionary to integer d1 = {1: 1, 2: 1.5, 3.5: 3, '4': 4} assert integer_key_convert(d1) == {1: 1, 2: 1.5, 3: 3, 4: 4} # Cast values in dictionary to integer d1 = {1: 1, 2: 1.5, 3: '3', 4: 4} assert integer_value_convert(d1) == {1: 1, 2: 1, 3: 3, 4: 4} # Cast values in dictionary to float d1 = {1: 1, 2: 1.5, 3: '3', 4: 4} assert float_value_convert(d1) == {1: 1.0, 2: 1.5, 3: 3.0, 4: 4.0} # Average values by key in two dictionaries d1 = {1: 1, 2: 1.0, 3: 3, 4: 4} d2 = {1: 2, 2: 2.0, 3: 5, 4: 4, 7: 3} assert avg_dicts(d1, d2) == {1: 1.5, 2: 1.5, 3: 4, 4: 4} # Read and write lists to csv l = [[1, 2, 3, 'a'], [4, 5, 6, 'b'], [7, 8, 9, 'c']] write_list_to_csv(l, filepath, headers=['h1', 'h2', 'h3', 'h4']) newll = read_list_from_csv(filepath) newld = read_list_from_csv(filepath, dict_form=True, headers=1) assert newll == [['h1', 'h2', 'h3', 'h4'], ['1', '2', '3', 'a'], ['4', '5', '6', 'b'], ['7', '8', '9', 'c']] assert newld == [{'h1': '1', 'h2': '2', 'h4': 'a', 'h3': '3'}, {'h1': '4', 'h2': '5', 'h4': 'b', 'h3': '6'}, {'h1': '7', 'h2': '8', 'h4': 'c', 'h3': '9'}] # Convert command line arguments to dictionary args = 'a=1,big=hello,1=3' assert args_to_dict(args) == {'a': '1', 'big': 'hello', '1': '3'}
HTML utilities
These are built on top of BeautifulSoup and simplify its setup.
Examples:
# Get soup for url with optional kwarg downloader=Download() object soup = get_soup('http://myurl') tag = soup.find(id='mytag') # Get text of tag stripped of leading and trailing whitespace # and newlines and with   replaced with space result = get_text('mytag') # Extract HTML table as list of dictionaries result = extract_table(tabletag)
Compare files
Compare two files:
result = compare_files(testfile1, testfile2) # Result is of form eg.: # ["- coal ,3 ,7.4 ,'needed'\n", '? ^\n', # "+ coal ,1 ,7.4 ,'notneeded'\n", '? ^ +++\n']
Emailing
Example of setup and sending email:
smtp_initargs = { 'host': 'localhost', 'port': 123, 'local_hostname': 'mycomputer.fqdn.com', 'timeout': 3, 'source_address': ('machine', 456), } username = 'user@user.com' password = 'pass' email_config_dict = { 'connection_type': 'ssl', 'username': username, 'password': password } email_config_dict.update(smtp_initargs) recipients = ['larry@gmail.com', 'moe@gmail.com', 'curly@gmail.com'] subject = 'hello' text_body = 'hello there' html_body = """\ <html> <head></head> <body> <p>Hi!<br> How are you?<br> Here is the <a href="https://www.python.org">link</a> you wanted. </p> </body> </html> """ sender = 'me@gmail.com' with Email(email_config_dict=email_config_dict) as email: email.send(recipients, subject, text_body, sender=sender)
Configuring Logging
The library provides coloured logs with a simple default setup which should be adequate for most cases. If you wish to change the logging configuration from the defaults, you will need to call setup_logging with arguments.
from hdx.utilities.easy_logging import setup_logging ... logger = logging.getLogger(__name__) setup_logging(KEYWORD ARGUMENTS)
KEYWORD ARGUMENTS can be:
Choose |
Argument |
Type |
Value |
Default |
---|---|---|---|---|
One of: |
logging_config_dict |
dict |
Logging configuration dictionary |
|
or |
logging_config_json |
str |
Path to JSON Logging configuration |
|
or |
logging_config_yaml |
str |
Path to YAML Logging configuration |
Library’s internal logging_configuration.yml |
One of: |
smtp_config_dict |
dict |
Email Logging configuration dictionary |
|
or |
smtp_config_json |
str |
Path to JSON Email Logging configuration |
|
or |
smtp_config_yaml |
str |
Path to YAML Email Logging configuration |
Do not supply smtp_config_dict, smtp_config_json or smtp_config_yaml unless you are using the default logging configuration!
If you are using the default logging configuration, you have the option to have a default SMTP handler that sends an email in the event of a CRITICAL error by supplying either smtp_config_dict, smtp_config_json or smtp_config_yaml. Here is a template of a YAML file that can be passed as the smtp_config_yaml parameter:
handlers: error_mail_handler: toaddrs: EMAIL_ADDRESSES subject: "RUN FAILED: MY_PROJECT_NAME"
Unless you override it, the mail server mailhost for the default SMTP handler is localhost and the from address fromaddr is noreply@localhost.
To use logging in your files, simply add the line below to the top of each Python file:
logger = logging.getLogger(__name__)
Then use the logger like this:
logger.debug('DEBUG message') logger.info('INFORMATION message') logger.warning('WARNING message') logger.error('ERROR message') logger.critical('CRITICAL error message')
Path utilities
Examples:
# Get current directory of script dir = script_dir(ANY_PYTHON_OBJECT_IN_SCRIPT) # Get current directory of script with filename appended path = script_dir_plus_file('myfile.txt', ANY_PYTHON_OBJECT_IN_SCRIPT) # Gets temporary directory from environment variable # TEMP_DIR and falls back to os function temp_folder = get_temp_dir() # Gets temporary directory from environment variable # TEMP_DIR and falls back to os function, # optionally appends the given folder, creates the # folder and on exiting, deletes the folder with temp_dir('papa') as tempdir: ...
Text processing
Examples:
# Extract words from a string sentence into a list result = get_words_in_sentence("Korea (Democratic People's Republic of)") assert result == ['Korea', 'Democratic', "People's", 'Republic', 'of'] # Find matching text in strings a = 'The quick brown fox jumped over the lazy dog. It was so fast!' b = 'The quicker brown fox leapt over the slower fox. It was so fast!' c = 'The quick brown fox climbed over the lazy dog. It was so fast!' result = get_matching_text([a, b, c], match_min_size=10) assert result == ' brown fox over the It was so fast!'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hdx-python-utilities-1.6.1.tar.gz
.
File metadata
- Download URL: hdx-python-utilities-1.6.1.tar.gz
- Upload date:
- Size: 47.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 25d5f9a35055d255d50a2bf4e1831c0c3269bc24fec8b0fde924bd121f4482e5 |
|
MD5 | a8b6e433ae5f7cf249486e1edc04f176 |
|
BLAKE2b-256 | e743b2bfd5641ca9114b0d4839f532da8e0d74e20a612b9bce4283f2c9c87fc0 |
File details
Details for the file hdx_python_utilities-1.6.1-py2.py3-none-any.whl
.
File metadata
- Download URL: hdx_python_utilities-1.6.1-py2.py3-none-any.whl
- Upload date:
- Size: 28.2 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c5d838e1623a9c3a430e828f6e7e0d4c0e02b0be688bc4dfb8c38634a5bb2e4 |
|
MD5 | 70fbdc01fdd724da4a87d2796e80d667 |
|
BLAKE2b-256 | 3674ce5bbdc980a6f873dfaf03419c1cbedc805c04dd5f14e23bfc3258feeeb9 |