Skip to main content

A library for reading Qlik Sense .qvd file format from Python, written in Rust.

Project description

Read Qlik Sense .qvd files 🛠

CI pipeline

A python library for reading Qlik Sense .qvd file format, written in Rust. Files can be read to DataFrame or dictionary. Large files can be read in parts.

Install

Install from PyPi:

pip install qvd-utils

Usage

from qvd_utils import qvd_reader

df = qvd_reader.read('test.qvd')
print(df)

For large files specify a chunk_size parameter get a generator of dicts:

import pandas as pd
from qvd_utils import qvd_reader

chunks = qvd_reader.read_in_chunks('test.qvd', chunk_size=1000)

for chunk in chunks:
    df = pd.DataFrame.from_dict(chunk)
    print(df)

Developing

Create a virtual env https://docs.python-guide.org/dev/virtualenvs/ and activate it.

python3 -m venv venv

Then install dev dependencies:

pip install pandas maturin

Afterwards, run

maturin develop --release

to install the generated python lib to the virtual env.

Test

To run the tests, you can use these commands:

cargo test  # runs all Rust unit tests
pytest test_qvd_reader.py  # runs all Python tests

QVD File Structure

A QVD file is split into 3 parts; XML Metdata, Symbols table and the bit stuffed binary indexes.

XML Metadata

This section is at the top of the file and is in human readable XML. This section contains metadata about the file in gneneral such as table name, number of records, size of records as well as data about individual fields including field name, length offset in symbol table.

Symbol table

Directly after the xml section is the symbol table. This is a table of every unique value contained within each column. The columns are in the order described in the metadata fields section. In the metadata we can find the byte offset from the start of the symbols section for each column. Symbol types cannot be determined from the metadata and are instead determined by a flag byte preceding each symbol. These types are:

  • 1 - 4 byte signed int (u32) - little endiand
  • 2 - 8 byte signed float (f64) - little endian
  • 4 - null terminated string
  • 5 - 4 bytes of junk follwed by a null terminated string representing an integer
  • 6 - 8 bytes of junk followed by a null terminated string representing a float

Binary Indexes

After the symbol table are the binary indexes that map to the symbols for each row. They are bit stuffed and reversed binary numbers that point to the index of the symbol in the symbols table for each field.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

qvd_utils-0.1.2-cp311-cp311-manylinux_2_28_x86_64.whl (648.2 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

File details

Details for the file qvd_utils-0.1.2-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for qvd_utils-0.1.2-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f9b34abd0072acd4d978b69299c373f69d017305dca47c0fb58cc7c3ee55e1fe
MD5 f2afce41bf37bf26f2d6bbb93bb10595
BLAKE2b-256 af1e808db00494753495237095d477b5fcdb8b68cb440b87e40dafd6bfdf34b9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page