Skip to main content

Use SQLite database to store datasets.

Project description

SQLite Dataset

Use SQLite database to store datasets. Based on SQLAlchemy core.

Structure

The core of sqlite-dataset is the Class SQLiteDataset, which wraps a SQLAlchemy connection to a SQLite database.

Usage

Declare a dataset

To declare a dataset, extend the base SQLiteDataset class and specify fields.

from sqlite_dataset import SQLiteDataset, Field, String, Float

class MyIrisDataset(SQLiteDataset):
    
    sepal_length_cm = Field(String, tablename='iris')
    sepal_width_cm = Field(Float, tablename='iris')
    petal_length_cm = Field(Float, tablename='iris')
    petal_width_cm = Field(Float, tablename='iris')
    class_field = Field(String, tablename='iris', name='class')

ds = MyIrisDataset('my_iris_dataset.db')

This will create a sqlite database file on the specified path. If a dataset already exists, it will then be loaded.

The Field object

Field can be seen as a factory that can create a SQLAlchemy's Column object.

The Field object's constructor takes same arguments as sqlalchemy.schema.Column with the difference that Field does not take positional name argument and has an extra keyword argument tablename.

Declare a Column using sqlalchemy.Column:

from sqlalchemy import Column, String

Column('sepal_length_cm', String)
Column(name='sepal_length_cm', type_=String)

Declare a sqlite_dataset.Field:

from sqlite_dataset import Field, String

sepal_length_cm = Field(String)

The variable name will be automatically used as the column name.

Column name can also be specified using name argument, which is useful if the column name is a Python preserved keyword:

from sqlite_dataset import SQLiteDataset, Field, String

class MyDataset(SQLiteDataset):
    class_field = Field(String, name='class')
    type_field = Field(String, name='type')

Table name can be specified using tablename keyword argument:

from sqlite_dataset import SQLiteDataset, Field, String

class MyDataset(SQLiteDataset):
    class_field = Field(String, name='class', tablename='table1')
    type_field = Field(String, name='type', tablename='table2')

This will create two tables: table1, table2.

If tablename is not specified, the column will be created in default table data.

Field type and keyword arguments

The field type is the sqlalchemy column type. All sqlalchemy members were imported into sqlite_dataset.

from sqlalchemy import String, Integer

is exactly the same as:

from sqlite_dataset import String, Integer

Inheritance

Dataset can be inherited.

from sqlite_dataset import SQLiteDataset, Field, String, Float

class BaseDataset(SQLiteDataset):
    class_field = Field(String, tablename='iris', name='class')
    example_field = Field(String, tablename='example_table')

class ChildDataset(BaseDataset): 
    sepal_length_cm = Field(String, tablename='iris')
    sepal_width_cm = Field(Float, tablename='iris')
    petal_length_cm = Field(Float, tablename='iris')
    petal_width_cm = Field(Float, tablename='iris')

Connect to an existing dataset

To connect to a dataset, call the connect() method. Call close() to close it.

ds = SQLiteDataset('test.db')
ds.connect()
# do something
ds.close()

Or the dataset can be used as a context manager

with SQLiteDataset('test.db') as ds:
    # do something
    pass

Schema for existing dataset

SQLiteDataset object uses SQLAlchemy connection under the hood, so a schema is required to make any database queries or operations.

If no schema provided by either of the above, a SQLAlchemy reflection is performed to load and parse schema from the existing database.

It is recommended to explicitly define the schema as reflection may have performance issue in some cases if the schema is very large and complex.

Add and read data

data = [
    {
        'sepal_length_cm': '5.1',
        'sepal_width_cm': '3.5',
        'petal_length_cm': '1.4',
        'petal_width_cm': '0.2',
        'class': 'setosa'
    },
    {
        'sepal_length_cm': '4.9',
        'sepal_width_cm': '3.0',
        'petal_length_cm': '1.4',
        'petal_width_cm': '0.2',
        'class': 'setosa'
    }
]

with MyIrisDataset('test.db') as ds:
    ds.insert_data('iris', data)
with MyIrisDataset('test.db') as ds:
    res = ds.read_data('iris')

Use with pandas

A pandas DataFrame can be inserted into a dataset by utilizing the to_sql() function, and read from the dataset using read_sql function.

Be aware that in this case, SQLiteDataset() should be used without specifying the schema.

import seaborn as sns
import pandas as pd

df = sns.load_dataset('iris')
with SQLiteDataset('iris11.db') as ds:
    df.to_sql('iris', ds.connection)
    ds.connection.commit()
with SQLiteDataset('iris11.db') as ds:
    res = pd.read_sql(
        ds.get_table('iris').select(),
        ds.connection
    )

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlite_dataset-0.5.1.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

sqlite_dataset-0.5.1-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file sqlite_dataset-0.5.1.tar.gz.

File metadata

  • Download URL: sqlite_dataset-0.5.1.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.4 Windows/10

File hashes

Hashes for sqlite_dataset-0.5.1.tar.gz
Algorithm Hash digest
SHA256 ad50d47fe13c15f5c996fbecfa9e43c44bcf922acee1af26d7d0cbdeef8f3235
MD5 1c55af6da9ee88b8ebfe550a0635d5c1
BLAKE2b-256 727cde6b77e77a2063456deff7a2a7ceecb4384a4ffcafdc7474b60b4eaa52a8

See more details on using hashes here.

File details

Details for the file sqlite_dataset-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: sqlite_dataset-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.4 Windows/10

File hashes

Hashes for sqlite_dataset-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8e9764e3cbd9a22ebe01524ffc23a58a78e1d6d209fb617285b99a95fcb65cd7
MD5 b7d42673ed49bf64a7b0405c4a26dcf8
BLAKE2b-256 2a816dfb17c18fb5fbde453e01ddf6c813a5ac976ccf688dbcf012b65616f28f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page