Create and publish tableau hyper files from parquet files.
Project description
Parquet to Hyper
Package to convert parquet files into a single hyper file.
Benchmarking
The package was benchmarked using a Latitude 3420 Dell Notebook with 16GB of Memory RAM, a 250GB SSD, and an i7-1165G7 CPU. The time may vary using different architectures. The table used for benchmarking contained 60 columns. Although the tests were carried with a maximum of 500 million of rows, the package supports higher amount of volume. The limitation is only to the size of a single parquet file (up to 30GB). For larger volumes, it's recommended to split them into multiple parquet files. Follow the results:
Rows (in millions) | Time (in seconds) | Parquet size (in MegaBytes) |
---|---|---|
1 | 4.05 | 54 |
10 | 36.8 | 520 |
100 | 412.6 | 4900 |
500 | 2669.25 | 25400 |
How to use
Installation
pip install parquet-to-hyper
Initializing object
from packages.hyper_file import HyperFile
parquet_folder = '/path/to/your/folder' # The folder where the parquet files are
parquet_extension = 'parquet' # Optional. Don't use it if the parquet files has no extension
hf = HyperFile(parquet_folder, parquet_extension)
Create a single file
hyper_filename = 'path/to/your/db.hyper' # Path to save hyper file with filename
rows = hf.create_hyper_file(hyper_file_name)
print(f'Hyper created with {rows} rows.')
Deleting rows from an existing hyper file
This function deletes rows based on a control column (date column) and the days to delete from current day.
hyper_filename = 'path/to/your/db.hyper' # Path to load hyper file with filename
control_column = 'date_column'
days = 7
hf.delete_rows(hyper_filename)
print(f'{rows} rows were deleted.')
Appending rows from parquet into an existing hyper file
hyper_filename = 'path/to/your/db.hyper' # Path to load hyper file with filename
rows = hf.append_rows(hyper_filename)
print(f'{rows} were appended.')
Publishing hyper file into Tableau server
from packages.hyper_file import HyperFile
tsu = TableauServerUtils(tableau_address, token_name, token_value)
project_id = tsu.get_project_id(project_name)
tsu.publish_hyper(project_id, 'test.hyper')
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file parquet-to-hyper-1.1.4.tar.gz
.
File metadata
- Download URL: parquet-to-hyper-1.1.4.tar.gz
- Upload date:
- Size: 47.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3af4142f1ab6acfc9e8fdd270786047d0a4224db44142e40f0560d5d17e87e9 |
|
MD5 | 47d4aba0f57e07dc5e5ba37ebab3cc49 |
|
BLAKE2b-256 | 7871a22ac654d63f33ab1da6106515a0b0b405c2d8c5dadf56ef27208258e1cd |
File details
Details for the file parquet_to_hyper-1.1.4-py3-none-any.whl
.
File metadata
- Download URL: parquet_to_hyper-1.1.4-py3-none-any.whl
- Upload date:
- Size: 34.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c181b5792194cde7b010f2349c9893860f39e1ecbe6563775f05247f3653421 |
|
MD5 | ab2760a640ae454bcdcafc91344526bc |
|
BLAKE2b-256 | f0c0332bc220f513630e93e5ebb46aef7b4cb4cafc0b63cf9a235beb9579358b |