A versatile Python package that allows you to execute MongoDB-style queries on CSV files and interact with them like SQL tables.

These details have not been verified by PyPI

Project links

Homepage

Project description

csvquery

A versatile Python package that allows you to execute MongoDB-style queries on CSV files and interact with them like SQL tables.

Downloading

pip install csvquery

Package contents

open_csv(str path[, str delimiter = ","])

Produces a Dataset from a CSV file:

from csvquery import open_csv

dataset = open_csv("path/to/file.csv")

get_csv(str url[, str delimiter = ","])

Produces a Dataset from a URL:

from csvquery import get_csv

dataset = get_csv("http://example.com/api/data.csv")

parse_csv(str string[, str delimiter = ","])

Produces a Dataset from a string:

from csvquery import parse_csv

string = "name,age\nJohn,34\nJane,33"
dataset = parse_csv(string)

Operators

Stores all the valid query operator keywords as attributes. Using this class is optional as you can just use the keyword strings instead.

equal = "eq"
not_equal = "neq"
less_than = "lt"
greater_than = "gt"
less_than_or_equal = "lte"
greater_than_or_equal = "gte"
inside = "in"
_not = "not"
_and = "and"
_or = "or"

Comparisons

Stores some common comparison operators.

integers

A lambda function to compare integer values.

data.index("age", Comparisons.integers)

floats

A lambda function to compare floating-point values.

data.index("rate", Comparisons.floats)

strings

A lambda function to compare strings alphabetically.

data.index("name", Comparisons.strings)

default

An alias for the floats comparison.

get_date_comparison(str format_string)

Returns a function that compares dates based on the format string. See https://strftime.org/ for a list of all valid date codes.

data.index("date", Comparisons.get_date_comparison("%Y-%m-%d"))

The Dataset object

The Dataset object is similar to an SQL table. It can be obtained with the open_csv, get_csv, and parse_csv methods.

data

A two-dimensional list of the data.

for row in voter_dataset.data:
    print(row[0])
    ...

fields

A list of the dataset's fields, or column names.

for field in voter_dataset.fields:
    print(field)
    ...

index(str field[, func comparison_operation = Comparisons.default])

Sort the rows of data based on the values in a specified field. Sorting the data is optional, but doing so allows you to do binary searches which have a time complexity of just O(log(n)). The comparison_operation argument must be a function that returns True when the first argument is less than the second argument, and False if otherwise. Import Comparisons for some common comparison operations. By default, the comparison_operation is a floating-point comparison.

from csvquery import open_csv, Comparisons

dataset = open_csv("people.csv")
dataset.index("age", Comparisons.integers) # sorts people by ascending age

You can also make your own comparison operator.

dataset.index("age", lambda a, b: a**2 < b**2)

already_indexed(str field[, func comparison_operation = Comparisons.default])

Specifies that the data is already sorted by a certain field, allowing binary searches without re-sorting.

from csvquery import open_csv, Comparisons

dataset = open_csv("people.csv")
dataset.already_indexed("name", Comparisons.strings)

query(dict filter_object)

Returns all rows that match the filter_object as another Dataset.

from csvquery import open_csv

dataset = open_csv("people.csv")
dataset.index("age")

voter_dataset = dataset.query({
    "age": {          # this filter will run as a binary search since we indexed the data by age
        "gte": 18     # the query will only return people who's age is greater than or equal to 18
    },
    "citizenship" {   # this will run after the binary search to filter the narrowed-down data
        "eq": "USA"   # people will only pass this filter if their "citizenship" field is equal to "USA"
    }
})

The general structure of a filter_object is as follows:

{
    "field_1": {
        "operator_1": "value",
        "operator_2": "value",
        "operator_3": {
            "nested_operator": "value"
        },
        ...
        "operator_n": "value_n"
    },
    "field_2": {
        ...
    },
    ...
    "field_3": {
        ...
    }
}

You can also use the csvquery.Operators class instead of operator strings:

from csvquery import Operators

voters_named_john = voter_dataset.query({
    "name": {
        Operators.equal : "John"
    }
})

If you want to use a comparison operator like gt or lte on a column that was not indexed, you need to provide a comparison operator in the filter_object like so:

from csvquery import open_csv, Operations, Comparisons

dataset = open_csv("people.csv")
dataset.index("citizenship") # sorts people by citizenship

voter_dataset = dataset.query({
    "citizenship": { # binary search
        "eq": "USA"
    },
    "age" {  # not a binary search
        "gte": "18"
        "comparison": Comparisons.integers
    }
})

query_one(dict filter_object)

Returns the first row that matches the filter_object as a Dataset:

john_doe = people_dataset.query_one({"phone":"555-123-4567"})

select(list fields)

Returns the a new Dataset object with only the specified fields.

names_and_ages = people.select(["name", "age"])

select_as(dict fields)

Returns the a new Dataset object with only the specified fields, except the fields are renamed according to the fields dictionary.

names_and_ages = people.select_as({
    "first_and_last_name": "name",
    "years_of_oldness": "age"
})

select_unique(str field)

Returns a new Dataset object with only the specified field, and removes any duplicate values so that each value is unique.

names = people.select_unique("name")

add_field(str field[, func derivation = lambda r:""])

Adds another field with the specified name. By default, the field will be filled with blank values.

people.add_field("status")

You can optionally specify a function that takes the data in that row as a dictionary and outputs the new derived value per row.

people.add_field("full_name", lambda row: row["first_name"] + " " + row["last_name"]])

remove_fields(list fields)

Removes the specified fields from the Dataset.

people.remove_fields(["status", "full_name"])

rename_fields(dict fields)

Renames fields according to the fields dictionary argument.

people.rename_fields({
    "first_and_last_name": "name",
    "years_of_oldness": "age"
})

replace(list fields, func function)

Replaces the values in the specified fields list argument using the function argument, which takes the current value as input and outputs the new value.

people.replace(["first name", "last name"], lambda v: v.lower()) # makes all "first name" and "last name" values lower case

replace_derived(list fields, func derivation)

Replaces the values in the specified fields list argument using the function argument, which takes the row as a dictionary as input and outputs the new value.

def birthday_to_age(row):
    bday = datetime.strptime(row["date_of_birth"], "%Y-%m-%d")
    today = datetime.today()
    return today.year - born.year - ((today.month, today.day) < (born.month, born.day))

people.replace_derived(["age"], birthday_to_age)

join(Dataset other_dataset, list common_fields[, bool remove = True])

Adds fields from other_dataset to this Dataset and matches the rows by referencing the common_fields list argument. The common_fields list must have the field from the current Dataset and the field from the other Dataset in that order. By default, this method will remove the common field after the operation, but you can prevent this by setting the remove argument to False.

locations = open_csv("locations.csv") # has an id field
people = open_csv("people.csv") # has a location_id field that matches people to locations

people.join(locations, ["location_id", "id"])

to_dictionary()

Returns a the data as a dictionary if the Dataset has only one row (as a result of a query_one operation, for example).

john_doe = people.query_one({"phone":"555-123-4567"}) # dataset is one row high
print(john_doe.to_dictionary()["address"])

to_list()

Returns a the data as a list if the Dataset has only one column (as a result of a select operation, for example).

texans = people.query({"state":"TX"}).select("name") # dataset is one column wide
texan_names = texans.to_list()

count([list fields])

If the fields argument is left blank, returns the number of rows in the Dataset.

number_of_people = people.count()

If otherwise, returns the number of rows in which the all of the specified fields are not empty.

number_of_with_jobs = people.count(["job"]) # assuming the "job" field is left blank for unemployed people

sum(str field)

Returns a sum of all the values in that field.

total_net_worth = people.sum("net_worth")

average(str field)

Returns a average of all the values in that field.

average_net_worth = people.average("net_worth")

print_table([list fields])

Outputs your data to the console in a nice table.

voter_dataset.print_table()

You can optionally specify which columns to print.

voter_dataset.print_table(["name", "age"])

save_csv(str filepath[, str delimiter = ","[, fields = ]])

Saves the Dataset to a file. If no fields are specified, all fields will be saved.

voter_dataset.save_csv("output.csv", ";", ["name", "age"])

More examples

SQL equivalent

Classic SQL query

SELECT name, age FROM people
WHERE age >= 18 AND citizenship = "USA";

Python MongoDB-style query

voters = people.query({
    "age": {
        "gte": 18
    },
    "citizenship": "USA"
}).select(["name", "age"])

Printing certain columns

dataset = open_csv("people.csv")
dataset.print_table(dataset.fields[2:5])

Rewriting a CSV file with fewer columns and a different delimiter

dataset = open_csv("people.csv")
dataset.save_csv("people.csv", ";", dataset.fields[2:5])

The "eq" operator is optional in the top level of the dictionary

dataset = csvquery.open_csv("people.csv")
dataset.query({
    "name":"John"
})

Selecting one field

people.select("name") # doesn't need to be a list if it's just one

Chaining

(
open_csv("people.csv")
    .index("age")
    .query({"age":{"gte":18}, "citizenship":"USA"})
    .select(["name", "id"])
    .save_csv("voters.csv", ",")
    .print_table()
)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.1.3

Mar 29, 2020

1.1.2

Mar 28, 2020

1.1.1

Mar 28, 2020

1.1.0

Mar 28, 2020

1.0.5

Mar 27, 2020

1.0.4

Mar 27, 2020

1.0.3

Mar 26, 2020

1.0.2

Mar 26, 2020

1.0.1

Mar 26, 2020

1.0.0

Mar 26, 2020

1.0.0b0 pre-release

Mar 26, 2020

0.2.2

Mar 24, 2020

0.2.1

Mar 24, 2020

0.2.0

Mar 24, 2020

0.1.1

Mar 24, 2020

0.1.0

Mar 24, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csvquery-1.1.3.tar.gz (12.6 kB view details)

Uploaded Mar 29, 2020 Source

Built Distribution

csvquery-1.1.3-py3-none-any.whl (4.7 kB view details)

Uploaded Mar 29, 2020 Python 3

File details

Details for the file csvquery-1.1.3.tar.gz.

File metadata

Download URL: csvquery-1.1.3.tar.gz
Upload date: Mar 29, 2020
Size: 12.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.3

File hashes

Hashes for csvquery-1.1.3.tar.gz
Algorithm	Hash digest
SHA256	`1a09946bacf9d55ef777c48711de260513ba64ae7f577e3535cbcce4084adc77`
MD5	`234ca86a5f08f0f85f0152ce5adc90c7`
BLAKE2b-256	`842419b56cebab752de251a90f646d3eae8a354771df2dd6a1b62a3616312fa8`

See more details on using hashes here.

File details

Details for the file csvquery-1.1.3-py3-none-any.whl.

File metadata

Download URL: csvquery-1.1.3-py3-none-any.whl
Upload date: Mar 29, 2020
Size: 4.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.3

File hashes

Hashes for csvquery-1.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4c00cb7e9a817727f79f3af440fd6aa94cb6ee65a33cd3244c9aaf0efaf82d04`
MD5	`4324cb15a365350536c92b6cfc6fbbf7`
BLAKE2b-256	`642ac7ec0e4eca9eca42e7b44e7a33649cea9142bcc0ed49bf9eac1beec62a1a`

See more details on using hashes here.

csvquery 1.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

csvquery

Downloading

Package contents

open_csv(str path[, str delimiter = ","])

get_csv(str url[, str delimiter = ","])

parse_csv(str string[, str delimiter = ","])

Operators

Comparisons

integers

floats

strings

default

get_date_comparison(str format_string)

The Dataset object

data

fields

index(str field[, func comparison_operation = Comparisons.default])

already_indexed(str field[, func comparison_operation = Comparisons.default])

query(dict filter_object)

query_one(dict filter_object)

select(list fields)

select_as(dict fields)

select_unique(str field)

add_field(str field[, func derivation = lambda r:""])

remove_fields(list fields)

rename_fields(dict fields)

replace(list fields, func function)

replace_derived(list fields, func derivation)

join(Dataset other_dataset, list common_fields[, bool remove = True])

to_dictionary()

to_list()

count([list fields])

sum(str field)

average(str field)

print_table([list fields])

save_csv(str filepath[, str delimiter = ","[, fields = ]])

More examples

SQL equivalent

Classic SQL query

Python MongoDB-style query

Printing certain columns

Rewriting a CSV file with fewer columns and a different delimiter

The "eq" operator is optional in the top level of the dictionary

Selecting one field

Chaining

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes