Skip to main content

A Python driver to use SQL on Excel workbooks with CRUD, using SQLite as a middleman

Project description

excelsql

excelsql is a lightweight Python package that allows you to load Excel workbooks (.xls and .xlsx formats) into an SQLite database, perform SQL operations, and save changes back into the original format. It's designed to combine the ease of Excel with the power of SQL, streamlining data manipulation and analysis.

Table of Contents

  1. Features
  2. Installation
  3. Usage
  4. Features in Detail
  5. Contributing
  6. License

Features

  • Load Excel files into an SQLite database for SQL querying.
  • Perform SQL operations on the data (SELECT, INSERT, JOIN, etc.).
  • Return SQL query results as Pandas DataFrames.
  • Clean and normalize data in Excel sheets.
  • Validate and modify the schema (add/remove columns).
  • Support for batch processing and merging multiple Excel workbooks.
  • Generate reports and visualizations.
  • Caching for repeated queries.
  • Export data to multiple formats (CSV, JSON, etc.).
  • Jupyter Notebook integration for displaying data.

Installation

To install excelsql, run the following command:

pip install excelsql

Usage

Loading an Excel File

import excelsql

# Path to the Excel file
xls_file = './data/file_example_XLS_5000.xls'
db_file = './data/excel_db.sqlite'

# Initialize the driver
driver = excelsql(xls_file, db_path=db_file)

# Show worksheets
worksheets = driver.show_worksheets()

# Perform SQL query and return results as a DataFrame
df = driver.execute_query_to_dataframe('SELECT * FROM Sheet1')

# Save changes back to an Excel file
driver.save_to_file('./data/modified_file.xlsx')

# Close the connection
driver.close()

Features in Detail

SQL Query Interface with Pandas Integration

Run SQL queries and return the results as a Pandas DataFrame.

df = driver.execute_query_to_dataframe('SELECT * FROM Sheet1 WHERE Age > 30')

Schema Validation and Modification

  • validate_schema: Validate the schema for a worksheet.
  • add_column: Add new columns to a worksheet.
driver.add_column('Sheet1', 'NewColumn', datatype='TEXT')
driver.validate_schema('Sheet1', validations={'Age': 'numeric', 'Date': 'date'})

Batch Processing and Multi-file Support

Load and merge multiple Excel workbooks.

driver.load_multiple_workbooks(['file1.xlsx', 'file2.xlsx'])
driver.merge_workbooks('Sheet1', 'Sheet2', on='Id', how='inner')

Data Cleaning Functions

  • clean_data: Remove or fill missing values.
  • normalize: Normalize the values of numeric columns.
driver.clean_data('Sheet1', strategy='dropna')
driver.normalize('Sheet1', columns=['Age'])

Joins and Merging Data Across Sheets

Join multiple worksheets using SQL-style operations.

driver.join_sheets('Sheet1', 'Sheet2', on='Id', how='inner')

Advanced Reporting & Visualization Support

  • generate_report: Generate descriptive statistics and save as an Excel file.
  • export_visualization: Create visualizations (bar, line, area charts) from data.
driver.generate_report('Sheet1', output='./data/summary.xlsx')
driver.export_visualization('Sheet1', x_col='Age', y_col='Salary', plot_type='bar', output_path='./data/visuals')

Below is an example of a bar chart visualization generated from the Sheet1 dataset:

Bar Chart Visualization

Caching Queries for Speed Optimization

Enable caching of frequent SQL queries to speed up repeated requests.

driver.enable_query_cache()

Export to Multiple Formats

  • export_to_csv: Export worksheet data to CSV.
  • export_to_json: Export worksheet data to JSON.
driver.export_to_csv('Sheet1', './data/output.csv')
driver.export_to_json('Sheet1', './data/output.json')

You can find example exported data formats below:

Jupyter Notebook Integration

Display data directly in Jupyter notebooks for quick data inspection.

driver.display_in_notebook('Sheet1')

Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request on GitHub.


License

This project is licensed under the BSD 3-Clause License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

excelsql-0.1.35.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

excelsql-0.1.35-py3-none-any.whl (3.3 kB view details)

Uploaded Python 3

File details

Details for the file excelsql-0.1.35.tar.gz.

File metadata

  • Download URL: excelsql-0.1.35.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for excelsql-0.1.35.tar.gz
Algorithm Hash digest
SHA256 42b35c8e8e99d3fcc0fa212a3d17130ab612d3fcf1110f9ed95401de27832845
MD5 0cc2c526148e5d190e73a445891dfa47
BLAKE2b-256 0c80c508a0a4737e321f8ddeb379e5b3eb6a47eab0837572145a53a2e7e574dc

See more details on using hashes here.

File details

Details for the file excelsql-0.1.35-py3-none-any.whl.

File metadata

  • Download URL: excelsql-0.1.35-py3-none-any.whl
  • Upload date:
  • Size: 3.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for excelsql-0.1.35-py3-none-any.whl
Algorithm Hash digest
SHA256 c05d1206bbd8ff5fa7753101f5946beb0ed4d9b03ddfe02d8af194a1a950c55b
MD5 5c477b642b6e753494244bf049add43c
BLAKE2b-256 6e2cc87cfadda1ce6953a0e69749800bb8302bf7fd14fb7d88603fe4773dfe6e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page