Skip to main content

A command line tool for working with JSON documents on local disc, in an S3 bucket or on Google Sheets/Cloud Storage

Project description

py_dataset DOI

py_dataset is a Python wrapper for the dataset command line tool, Go package, and C shared library for working with JSON objects as collections. Collections can be stored on disc or in Cloud Storage. JSON objects are stored in collections as plain UTF-8 text. This means the objects can be accessed with common Unix text processing tools as well as most programming languages.

This package wraps all dataset operations such as initialization of collections, creation, reading, updating and deleting JSON objects in the collection. Some of its enhanced features include the ability to generate data frames as well as the ability to import and export JSON objects to and from CSV files and Google Sheets.

Install

Available via pip pip install py_dataset or by downloading this repo and typing python setup.py install. This repo includes dataset shared C libraries compiled for Windows, Mac, and Linux and the appripriate library will be used automatically.

Features

dataset supports

Limitations of dataset

dataset has many limitations, some are listed below

  • it is not a multi-process, multi-user data store (it's files on "disc" without locking)
  • it is not a replacement for a repository management system
  • it is not a general purpose database system
  • it does not supply version control on collections or objects

Tutorial

This module provides the functionality of the dataset command line tool as a Python 3.6 module. Once installed try out the following commands to see if everything is in order (or to get familier with dataset).

The "#" comments don't have to be typed in, they are there to explain the commands as your type them. Start the tour by launching Python3 in interactive mode.

    python3

Then run the following Python commands.

    from py_dataset import dataset
    # Almost all the commands require the collection_name as first paramter, we're storing that name in c_name for convience.
    c_name = "a_tour_of_dataset.ds"

    # Let's create our a dataset collection. We use the method called 'init' it returns True or False
    dataset.init(c_name)

    # Let's check our collection to see if it is OK
    dataset.status(c_name)

    # Let's count the records in our collection (should be zero)
    cnt = dataset.count(c_name)
    print(cnt)

    # Let's read all the keys in the collection (should be an empty list)
    keys = dataset.keys(c_name)
    print(keys)

    # Now let's add a record to our collection. To create a record we need to know
    # this collection name (e.g. c_name), the key (most be string) and have a record (i.e. a dict literal or variable)
    key = "one"
    record = {"one": 1}
    ok = dataset.create(c_name, key, record)
    # If ok is False we can check the last error message with the 'error_message' method
    if ok == False:
        print(dataset.error_message())

    # Let's count and list the keys in our collection, we should see a count of '1' and a key of 'one'
    dataset.count(c_name)
    keys = dataset.keys(c_name)
    print(keys)

    # We can read the record we stored using the 'read' method.
    new_record = dataset.read(c_name, key)
    print(new_record)

    # Let's modify new_record and update the record in our collection
    new_record["two"] = 2
    ok = dataset.update(c_name, key, new_record)
    if ok == False:
        print(dataset.error_message())

    # Let's print out the record we stored using read method
    print(dataset.read(c_name, key)

    # Finally we can remove (delete) a record from our collection
    ok = dataset.delete(c_name, key)
    if ok == False:
        print(dataset.error_message())

    # We should not have a count of Zero records
    cnt = dataset.count(c_name)
    print(cnt)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_dataset-0.0.60.tar.gz (40.0 MB view details)

Uploaded Source

Built Distribution

py_dataset-0.0.60-py3-none-any.whl (40.3 MB view details)

Uploaded Python 3

File details

Details for the file py_dataset-0.0.60.tar.gz.

File metadata

  • Download URL: py_dataset-0.0.60.tar.gz
  • Upload date:
  • Size: 40.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for py_dataset-0.0.60.tar.gz
Algorithm Hash digest
SHA256 07397793b4b32914704f8d12a7e720197d43659720b36c4bedc44791b25b7767
MD5 3ce2c658d3b6a354cfefa4b1e6a40548
BLAKE2b-256 8e39e88cc3355eb9da90246f70ca7896eec7e551395bd9111b0135555bdb1069

See more details on using hashes here.

File details

Details for the file py_dataset-0.0.60-py3-none-any.whl.

File metadata

  • Download URL: py_dataset-0.0.60-py3-none-any.whl
  • Upload date:
  • Size: 40.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for py_dataset-0.0.60-py3-none-any.whl
Algorithm Hash digest
SHA256 56563e0e48e1a9c7b076efa46f97bc4e3bfe7e513794c5345849001799fea22e
MD5 1c3a6b98b2b2038138270b9d01e45d77
BLAKE2b-256 0dc779d2358a2b65c36edc7b8878bb3908062bef403f8b35ef02306df43a6e3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page