Query CSV and Parquet files using SQL

These details have not been verified by PyPI

Project links

repository

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

filequery

Query CSV and Parquet files using SQL. This uses DuckDB behind the scenes so any valid SQL for DuckDB will work here.

installation

$ pip install filequery

CLI usage

Run filequery --help to see what options are available.

usage: __main__.py [-h] [--filename FILENAME] [--filesdir FILESDIR] [--query QUERY] [--query_file QUERY_FILE] [--out_file OUT_FILE] [--out_file_format OUT_FILE_FORMAT] [--config CONFIG]

options:
  -h, --help            show this help message and exit
  --filename FILENAME   path to a CSV, Parquet or JSON file
  --filesdir FILESDIR   path to a directory which can contain a combination of CSV, Parquet and JSON files
  --query QUERY         SQL query to execute against file
  --query_file QUERY_FILE
                        path to file with query to execute
  --out_file OUT_FILE   file to write results to instead of printing to standard output
  --out_file_format OUT_FILE_FORMAT
                        either csv or parquet, defaults to csv
  --config CONFIG       path to JSON config file

For basic usage, provide a path to a CSV or Parquet file and a query to execute against it. The table name will be the file name without the extension.

$ filequery --filename example/test.csv --query 'select * from test'
$ filequery --filename example/json_test.json --query 'select nested.nest_id, nested.nested_val from json_test'
$ filequery --filesdir example/data --query 'select * from test inner join test1 on test.col1 = test1.col1'
$ filequery --filesdir example/data --query_file example/queries/join.sql
$ filequery --filesdir example/data --query_file example/queries/json_csv_join.sql

You can also provide a config file instead of specifying the arguments when running the command.

$ filequery --config <path to config file>

The config file should be a json file. See example config file contents below.

{
    "filename": "../example/test.csv",
    "query": "select col1, col2 from test"
}

{
    "filesdir": "../example/data",
    "query_file": "../example/queries/join.sql",
    "out_file": "result.parquet",
    "out_file_format": "parquet"
}

module usage

You can also use filequery in your own programs. See the example below.

from filequery.filedb import FileDb

query = 'select * from test'

# read test.csv into a table called "test"
fdb = FileDb('example/test.csv')

# return QueryResult object
res = fdb.exec_query(query)

# formats result as csv
print(str(res))

# saves query result to result.csv
res.save_to_file('result.csv')

# saves query result as parquet file
fdb.export_query(query, 'result.parquet', FileType.PARQUET)

development

Packages required for distribution should go in requirements.txt.

To build the wheel:

$ pip install -r requirements-dev.txt
$ make

testing

To test the CLI, cd into the src directory and run filequery as a module.

$ python -m filequery <args>

To run unit tests, stay in the root of the project. The unit tests add src to the path so filequery can be imported properly.

$ python tests/<test file>

Project details

These details have not been verified by PyPI

Project links

repository

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.6

Apr 29, 2024

0.2.5

Apr 10, 2024

0.2.4

Mar 1, 2024

0.2.3

Jan 20, 2024

0.2.2

Dec 15, 2023

0.2.1

Nov 20, 2023

0.2.0

Nov 10, 2023

0.1.9

Sep 22, 2023

0.1.8

May 3, 2023

0.1.7

Mar 23, 2023

This version

0.1.6

Mar 11, 2023

0.1.5

Mar 10, 2023

0.1.4

Feb 27, 2023

0.1.3

Feb 25, 2023

0.1.2

Feb 24, 2023

0.1.1

Feb 22, 2023

0.1.0

Feb 19, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filequery-0.1.6.tar.gz (11.3 kB view hashes)

Uploaded Mar 11, 2023 Source

Built Distribution

filequery-0.1.6-py3-none-any.whl (7.6 kB view hashes)

Uploaded Mar 11, 2023 Python 3

Hashes for filequery-0.1.6.tar.gz

Hashes for filequery-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`d191b9464baefc8bdbc472fff9c6d6b12f7016e61162d7d992c5c65bb04eb5eb`
MD5	`5b90d3b644d217ec7db0f6be42adc666`
BLAKE2b-256	`6084edd48e4c4264eee09888826247cc4a1065f806add627b68d61533a808d12`

Hashes for filequery-0.1.6-py3-none-any.whl

Hashes for filequery-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f770f4244d4635ccb1a5a5d3c1aa7e91f002506289ee928ff53301c43703ac63`
MD5	`bafee708b9d1f367e62d1228b3fce50b`
BLAKE2b-256	`56d4dcd78459964ef52188411829f77aff4453f258f0c736b865c3625f329343`