Skip to main content

A (key-value) data-object-layer to get (pandas) tables from a variety of sources with ease

Project description

tabled

A (key-value) data-object-layer to get (pandas) tables from a variety of sources with ease

To install: pip install tabled

SQLite Database Support

Tabled provides seamless integration with SQLite databases through DfFiles:

from tabled import DfFiles

# Automatic SQLite detection - just pass the database file path
df_files = DfFiles('my_database.db')

# Access tables as DataFrames
customers = df_files['customers.parquet']  # Full filename
orders = df_files['orders']                # Clean table name (both work)

# List available tables
print(list(df_files.keys()))  # ['customers.parquet', 'orders.parquet', ...]

# Or use the explicit method
df_files = DfFiles.from_sqlite_file('my_database.db')

Under the hood, SQLite tables are exported to temporary Parquet files for efficient access, with automatic cleanup when the program exits.

SQLite Export Tools

For more control over SQLite data extraction, use the sqlite_tools module:

from tabled.sqlite_tools import export_sqlite_to_dataframes, export_sqlite_to_parquet

# Export to DataFrames
tables = export_sqlite_to_dataframes('database.db')
customers_df = tables['customers']

# Export to Parquet files
export_sqlite_to_parquet('database.db', 'output_directory/')

Table Analysis and Diagnosis

The dataframe_info function provides flexible analysis of pandas DataFrames:

from tabled.diagnose import dataframe_info, register_info_func
import pandas as pd

# Analyze a DataFrame
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['x', 'y', 'z']})
info = dataframe_info(df)
print(info['shape'])  # (3, 2)
print(info['columns'])  # ['a', 'b']

# Extend with custom analysis functions
def memory_usage(df):
    return df.memory_usage(deep=True).sum()

register_info_func('memory', memory_usage)
info = dataframe_info(df)
print(info['memory'])  # Memory usage in bytes

The analysis is completely customizable - you can register new analysis functions or provide custom info function dictionaries to focus on specific aspects of your data.

DfFiles

This section demonstrates how to use DfFiles to store and retrieve pandas DataFrames using various file formats.

Setup

First, let's import required packages and define our test data:

import os
import shutil
import tempfile

import pandas as pd
from tabled import DfFiles

# Test data dictionary
misc_small_dicts = {
    "fantasy_tavern_menu": {
        "item": ["Dragon Ale", "Elf Bread", "Goblin Stew"],
        "price": [7.5, 3.0, 5.5],
        "is_alcoholic": [True, False, False],
        "servings_left": [12, 25, 8],
    },
    "alien_abduction_log": {
        "abductee_name": ["Bob", "Alice", "Zork"],
        "location": ["Kansas City", "Roswell", "Jupiter"],
        "duration_minutes": [15, 120, 30],
        "was_returned": [True, False, True],
    }
}

Creating Test Directory

We'll create a temporary directory for our files:

def create_test_directory():
    # Create a directory for the test files
    rootdir = os.path.join(tempfile.gettempdir(), 'tabled_df_files_test')
    if os.path.exists(rootdir):
        shutil.rmtree(rootdir)
    os.makedirs(rootdir)
    print(f"Created directory at: {rootdir}")
    return rootdir

rootdir = create_test_directory()
print(f"Created directory at: {rootdir}")
Created directory at: /var/folders/mc/c070wfh51kxd9lft8dl74q1r0000gn/T/tabled_df_files_test
Created directory at: /var/folders/mc/c070wfh51kxd9lft8dl74q1r0000gn/T/tabled_df_files_test

Initialize DfFiles

Create a new DfFiles instance pointing to our directory:

df_files = DfFiles(rootdir)

Let's verify it starts empty:

list(df_files)
[]

Creating and Saving DataFrames

Let's create DataFrames from our test data:

fantasy_tavern_menu_df = pd.DataFrame(misc_small_dicts['fantasy_tavern_menu'])
alien_abduction_log_df = pd.DataFrame(misc_small_dicts['alien_abduction_log'])

print("Fantasy Tavern Menu:")
display(fantasy_tavern_menu_df)
print("\nAlien Abduction Log:")
display(alien_abduction_log_df)
Fantasy Tavern Menu:
item price is_alcoholic servings_left
0 Dragon Ale 7.5 True 12
1 Elf Bread 3.0 False 25
2 Goblin Stew 5.5 False 8
Alien Abduction Log:
abductee_name location duration_minutes was_returned
0 Bob Kansas City 15 True
1 Alice Roswell 120 False
2 Zork Jupiter 30 True

Now let's save these DataFrames using different formats:

df_files['fantasy_tavern_menu.csv'] = fantasy_tavern_menu_df
df_files['alien_abduction_log.json'] = alien_abduction_log_df

Reading Data Back

Let's verify we can read the data back correctly:

saved_df = df_files['fantasy_tavern_menu.csv']
saved_df
item price is_alcoholic servings_left
0 Dragon Ale 7.5 True 12
1 Elf Bread 3.0 False 25
2 Goblin Stew 5.5 False 8

MutableMapping Interface

DfFiles implements the MutableMapping interface, making it behave like a dictionary.

Let's see how many files we have:

len(df_files)
2

List all available files:

list(df_files)
['fantasy_tavern_menu.csv', 'alien_abduction_log.json']

Check if a file exists:

'fantasy_tavern_menu.csv' in df_files
True

Supported File Extensions

Let's see what file formats DfFiles supports out of the box.

(Note that some of these will require installing extra packages, which you'll realize if you get an ImportError)

print("Encoder supported extensions:")
list_of_encoder_supported_extensions = list(df_files.extension_encoder_mapping)
print(*list_of_encoder_supported_extensions, sep=', ')
Encoder supported extensions:
csv, txt, tsv, json, html, p, pickle, pkl, npy, parquet, zip, feather, h5, hdf5, stata, dta, sql, sqlite, gbq, xls, xlsx, xml, orc
print("Decoder supported extensions:")
list_of_decoder_supported_extensions = list(df_files.extension_decoder_mapping)
print(*list_of_decoder_supported_extensions, sep=', ')
Decoder supported extensions:
csv, txt, tsv, parquet, json, html, p, pickle, pkl, xml, sql, sqlite, feather, stata, dta, sas, h5, hdf5, xls, xlsx, orc, sav

Testing Different Extensions

Let's try saving and loading our test DataFrame in different formats:

extensions_supported_by_encoder_and_decoder = (
    set(list_of_encoder_supported_extensions) & set(list_of_decoder_supported_extensions)
)
sorted(extensions_supported_by_encoder_and_decoder)
['csv',
 'dta',
 'feather',
 'h5',
 'hdf5',
 'html',
 'json',
 'orc',
 'p',
 'parquet',
 'pickle',
 'pkl',
 'sql',
 'sqlite',
 'stata',
 'tsv',
 'txt',
 'xls',
 'xlsx',
 'xml']

def test_extension(ext):
    filename = f'test_file.{ext}'
    try:
        df_files[filename] = fantasy_tavern_menu_df
        df_loaded = df_files[filename]
        # test the decoded df is the same as the one that was saved (round-trip test)
        # Note that we drop the index, since the index is not saved in the file by default for all codecs
        pd.testing.assert_frame_equal(
            fantasy_tavern_menu_df.reset_index(drop=True),
            df_loaded.reset_index(drop=True),
        )
        return True
    except Exception as e:
        return False


test_extensions = [
    'csv',
    'feather',
    'json',
    'orc',
    'parquet',
    'pkl',
    'tsv',  
    # 'dta',  # TODO: fix
    # 'h5',  # TODO: fix
    # 'html',  # TODO: fix
    # 'sql',  # TODO: fix
    # 'xml',  # TODO: fix
]

for ext in test_extensions:
    print("Testing extension:", ext)
    success = test_extension(ext)
    if success:
        print(f"\tExtension {ext}: ✓")
    else:
        print('\033[91m' + f"\tFix extension {ext}: ✗" + '\033[0m')
        
    # marker = '✓' if success else '\033[91m✗\033[0m'
    # print(f"\tExtension {ext}: {marker}")
Testing extension: csv
	Extension csv: ✓
Testing extension: feather
	Extension feather: ✓
Testing extension: json
	Extension json: ✓
Testing extension: orc
	Extension orc: ✓
Testing extension: parquet
	Extension parquet: ✓
Testing extension: pkl
	Extension pkl: ✓
Testing extension: tsv
	Extension tsv: ✓
Testing extension: dta
	Fix extension dta: ✗
Testing extension: h5
	Fix extension h5: ✗
Testing extension: html
	Fix extension html: ✗
Testing extension: sql
	Fix extension sql: ✗
Testing extension: xml
	Fix extension xml: ✗

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabled-0.1.29.tar.gz (106.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tabled-0.1.29-py3-none-any.whl (67.7 kB view details)

Uploaded Python 3

File details

Details for the file tabled-0.1.29.tar.gz.

File metadata

  • Download URL: tabled-0.1.29.tar.gz
  • Upload date:
  • Size: 106.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tabled-0.1.29.tar.gz
Algorithm Hash digest
SHA256 9e81caf689d1d48891f91ea21b76962bf5ced33396d42ef20ce41b58b21e5a66
MD5 6e379e4304181524c27c865d7fa0cc4a
BLAKE2b-256 005dad75c1f0fd5c46c301db580bbe30fdd087a42ef16aa5726d604beee17d77

See more details on using hashes here.

File details

Details for the file tabled-0.1.29-py3-none-any.whl.

File metadata

  • Download URL: tabled-0.1.29-py3-none-any.whl
  • Upload date:
  • Size: 67.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tabled-0.1.29-py3-none-any.whl
Algorithm Hash digest
SHA256 a03df46cde3ba4e9779893bdf4b8e89e48317805944e65a2c5f884b474157879
MD5 0b06ea73bb49f8fc5d2ec6da996bf04d
BLAKE2b-256 71c989985b3854a616a1876af16019f190a8963f6782f18eb56521df85607c9d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page