Skip to main content

Utility library for reading/writing Qlik View Data (QVD) files in Python.

Project description

PyQvd

Utility library for reading/writing Qlik View Data (QVD) files in Python.

The PyQvd library provides a simple API for reading/writing Qlik View Data (QVD) files in Python. Using this library, it is possible to parse the binary QVD file format and convert it to a Python object structure or vice versa.



Install

PyQvd is a Python library available through pypi. The recommended way to install and maintain PyQvd as a dependency is through the package installer (PIP). Before installing this library, download and install Python.

You can get PyQvd using the following command:

pip install PyQvd

Usage

Below is a quick example how to use PyQvd.

from pyqvd import QvdDataFrame

df = QvdDataFrame.from_qvd('sample.qvd')
print(df.head(5))

The above example loads the PyQvd library and parses an example QVD file. A QVD file is typically loaded using the static QvdDataFrame.from_qvd function of the QvdDataFrame class itself. After loading the file's content, numerous methods and properties are available to work with the parsed data.

QVD File Format

The QVD file format is a binary file format that is used by QlikView to store data. The format is proprietary. However, the format is well documented and can be parsed without the need of a QlikView installation. In fact, a QVD file consists of three parts: a XML header, and two binary parts, the symbol and the index table. The XML header contains meta information about the QVD file, such as the number of data records and the names of the fields. The symbol table contains the actual distinct values of the fields. The index table contains the actual data records. The index table is a list of indices which point to values in the symbol table.

XML Header

The XML header contains meta information about the QVD file. The header is always located at the beginning of the file and is in human readable text format. The header contains information about the number of data records, the names of the fields, and the data types of the fields.

Symbol Table

The symbol table contains the distinct/unique values of the fields and is located directly after the XML header. The order of columns in the symbol table corresponds to the order of the fields in the XML header. The length and offset of the symbol sections of each column are also stored in the XML header. Each symbol section consist of the unique symbols of the respective column. The type of a single symbol is determined by a type byte prefixed to the respective symbol value. The following type of symbols are supported:

Code Type Description
1 Integer signed 4-byte integer (little endian)
2 Float signed 8-byte IEEE floating point number (little endian)
4 String null terminated string
5 Dual Integer signed 4-byte integer (little endian) followed by a null terminated string
6 Dual Float signed 8-byte IEEE floating point number (little endian) followed by a null terminated string

Index Table

After the symbol table, the index table follows. The index table contains the actual data records. The index table contains binary indices that refrences to the values of each row in the symbol table. The order of the columns in the index table corresponds to the order of the fields in the XML header. Hence, the index table does not contain the actual values of a data record, but only the indices that point to the values in the symbol table.

API Documentation

QvdDataFrame

The QvdDataFrame class represents the data frame stored inside of a finally parsed QVD file. It provides a high-level abstraction access to the QVD file content. This includes meta information as well as access to the actual data records.

Property Type Description
shape tuple[int, int] The shape of the data frame. First value is number of rows, second value number of columns.
data list[list[any]] The actual data. The first dimension represents the single rows.
columns list[str] The names of the fields that are contained in the QVD file.

@staticmethod from_qvd(path: str) -> QvdDataFrame

The static method QvdDataFrame.from_qvd loads a QVD file from the given path and parses it. The method returns a QvdDataFrame instance.

@staticmethod from_stream(source: BinaryIO) -> QvdDataFrame

The static method QvdDataFrame.from_stream loads a QVD file from the given binary stream. The method returns a QvdDataFrame instance.

@staticmethod from_dict(data: Dict[str, List[any]]) -> QvdDataFrame

The static method QvdDataFrame.from_dict constructs a data frame from a dictionary. The dictionary must contain the columns and the actual data as properties. The columns property is an array of strings that contains the names of the fields in the QVD file. The data property is an array of arrays that contains the actual data records. The order of the values in the inner arrays corresponds to the order of the fields in the QVD file.

@staticmethod from_pandas(data: pandas.DataFrame) -> QvdDataFrame

The static method QvdDataFrame.from_pandas constructs a data frame from a pandas data frame.

head(n: int) -> QvdDataFrame

The method head returns the first n rows of the data frame.

tail(n: int) -> QvdDataFrame

The method tail returns the last n rows of the data frame.

select(*args: str) -> QvdDataFrame

The method select returns a new data frame that contains only the specified columns.

rows(*args: int) -> QvdDataFrame

The method rows returns a new data frame that contains only the specified rows.

at(row: int, column: str) -> any

The method at returns the value at the specified row and column.

to_dict() -> Dict[str, List[any]]

The method to_dict returns the data frame as a dictionary. The dictionary contains the columns and the actual data as properties. The columns property is an array of strings that contains the names of the fields in the QVD file. The data property is an array of arrays that contains the actual data records. The order of the values in the inner arrays corresponds to the order of the fields in the QVD file.

to_qvd(path: str) -> None

The method to_qvd writes the data frame to a QVD file at the specified path.

to_stream(target: BinaryIO) -> None

The method to_stream writes the data frame as a QVD file to a binary stream.

to_pandas() -> pandas.DataFrame

The method to_pandas returns the data frame as a pandas data frame.

License

Copyright (c) 2024 Constantin Müller

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

MIT License or LICENSE for more details.

Forbidden

Hold Liable: Software is provided without warranty and the software author/license owner cannot be held liable for damages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyqvd-1.1.3.tar.gz (12.5 kB view hashes)

Uploaded Source

Built Distribution

PyQvd-1.1.3-py3-none-any.whl (12.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page