Wrapper to ease data management into Tableau Hyper format from CSV files
Project description
Tableau-Hyper-Management
What is this repository for?
Based on Tableau Hyper API this repository is intended to manage importing any CSV file into Tableau-Hyper format (to be used with Tableau Desktop/Server) with minimal configuration (as column detection, content type detection and reinterpretation of content are part of the included logic), therefore speed up the process of building extract.
Even better, can be used in conjunction with Tableau Server Client to take resulted Tableau Hyper file and publish it to a Tableau Server, therefore automating the tedious task to refresh data on the server side (of course, only relevant in case no direct connection from Tableau Server to data source is possible or nature of content is not supported directly by data source: one real-life example can be daily snapshot of a dynamically changing content to capture big variations in time).
Who do I talk to?
Repository owner is: Daniel Popiniuc
Installation
Installation can be completed in few steps as follows:
- Ensure you have git available to your system:
$ git --version
If you get an error depending on your system you need to install it.
For Windows you can do so from Git for Windows;
- Download this project from Github:
$ git clone https://github.com/danielgp/tableau-hyper-management
- Create a Python Virtual Environment using following command executed from project root folder:
$ python -m venv virtual_environment/
- Upgrade pip (PIP is a package manager for Python packages) and SetupTools using following command executed from newly created virtual environment and Scripts sub-folder:
$ python -m pip install --upgrade pip
$ pip install --upgrade setuptools
- Install project prerequisites using following command executed from project root folder:
$ python setup.py install
Usage
Converting CSV file into Tableau Extract (Hyper format)
$ python <local_path_of_this_package>/converter.py --input-file <full_path_and_file_base_name_to_file_having_content_as_CSV>(.txt|.csv) --csv-field-separator ,|; --output-file <full_path_and_file_base_name_to_generated_file>(.hyper) (--output-log-file <full_path_and_file_name_to_log_running_details>) (--unique-values-to-analyze-limit 100|200=default_value_if_omitted|500|1000)
- conventions used:
- (content_within_round_parenthesis) = optional
- <content_within_html_tags> = variables to be replaced with user values relevant strings
- single vertical pipeline = separator for alternative options
Publishing a Tableau Extract (Hyper format) to a Tableau Server
$ python <local_path_of_this_package>/publish_data_source.py --input-file <full_path_and_file_base_name_with_tableau_extract>(.hyper) --tableau-server <tableau_server_url> --tableau-site <tableau_server_site_to_publish_to> --tableau-project <tableau_server_project_to_publish_to> --publishing-mode Append|CreateNew|Overwrite==default_if_omitted --input-credentials-file %credentials_file% (--output-log-file <full_path_and_file_name_to_log_running_details>)
- conventions used:
- (content_within_round_parenthesis) = optional
- <content_within_html_tags> = variables to be replaced with user values relevant strings
- single vertical pipeline = separator for alternative options
Implemented features
- dynamic fields detection based ont 1st line content and provided field separator (strategic advantage);
- dynamic advanced content type detection covering following data types: integer, float-dot, date-iso8601, date-DMY-dash, date-DMY-dot, date-DMY-slash, date-MDY, date-MDY-medium, date-MDY-long, time-12, time-12-micro-sec, time-24, time-24-micro-sec, datetime-iso8601, datetime-iso8601-micro-sec, datetime-MDY, datetime-MDY-micro-sec, datetime-MDY-medium, datetime-MDY-medium-micro-sec, datetime-MDY-long, datetime-MDY-long-micro-sec, string;
- support for empty field content for any data type (required re-interpreting CSV to be accepted by Hyper Inserter to ensure INT or DOUBLE data types are considered);
- use Panda package to benefit of Data Frames speed and flexibility;
- log file to capture entire logic details (very useful for either traceability but also debugging);
- most of the logic actions are not timed for performance measuring so you can plan better your needs;
- publishing a Tableau Extract (Hyper format) to a Tableau Server now is supported.
Change Log / Releases detailed
see CHANGE_LOG.md
Planned features to add (of course, when time will permit / help would be appreciated / votes|feedback is welcomed)
- additional formats to be recognized, like:
- float-USA-thousand-separator,
- float-EU,
- float-EU-thousand-separator;
- geographical identifiers (Country, US - Zip Codes)
Features to request template
Required software/drivers/configurations
Used references
Code quality analysis
Build Status
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for tableau-hyper-management-1.2.16.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70a8843c98c7b340ee2fc2ac0a68b6fd9896f67b9ae2c3f2ed06c271d367174f |
|
MD5 | b8a70a36d1ed79e38f38b6cc2a36d304 |
|
BLAKE2b-256 | efdb46de7891230c62c8f14bddbd47e670a9d397f1806a4a66f4b1738ae94210 |