A Python driver to use SQL on Excel workbooks with CRUD, using SQLite as a middleman
Project description
excelsql
excelsql
is a lightweight Python package that allows you to load Excel workbooks (.xls
and .xlsx
formats) into an SQLite database, perform SQL operations, and save changes back into the original format. It's designed to combine the ease of Excel with the power of SQL, streamlining data manipulation and analysis.
Table of Contents
- Features
- Installation
- Usage
- Features in Detail
- SQL Query Interface with Pandas Integration
- Schema Validation and Modification
- Batch Processing and Multi-file Support
- Data Cleaning Functions
- Joins and Merging Data Across Sheets
- Advanced Reporting & Visualization Support
- Caching Queries for Speed Optimization
- Export to Multiple Formats
- Jupyter Notebook Integration
- Contributing
- License
Features
- Load Excel files into an SQLite database for SQL querying.
- Perform SQL operations on the data (SELECT, INSERT, JOIN, etc.).
- Return SQL query results as Pandas DataFrames.
- Clean and normalize data in Excel sheets.
- Validate and modify the schema (add/remove columns).
- Support for batch processing and merging multiple Excel workbooks.
- Generate reports and visualizations.
- Caching for repeated queries.
- Export data to multiple formats (CSV, JSON, etc.).
- Jupyter Notebook integration for displaying data.
Installation
To install excelsql
, run the following command:
pip install excelsql
Usage
Loading an Excel File
import excelsql
# Path to the Excel file
xls_file = './data/file_example_XLS_5000.xls'
db_file = './data/excel_db.sqlite'
# Initialize the driver
driver = excelsql(xls_file, db_path=db_file)
# Show worksheets
worksheets = driver.show_worksheets()
# Perform SQL query and return results as a DataFrame
df = driver.execute_query_to_dataframe('SELECT * FROM Sheet1')
# Save changes back to an Excel file
driver.save_to_file('./data/modified_file.xlsx')
# Close the connection
driver.close()
Features in Detail
SQL Query Interface with Pandas Integration
Run SQL queries and return the results as a Pandas DataFrame.
df = driver.execute_query_to_dataframe('SELECT * FROM Sheet1 WHERE Age > 30')
Schema Validation and Modification
validate_schema
: Validate the schema for a worksheet.add_column
: Add new columns to a worksheet.
driver.add_column('Sheet1', 'NewColumn', datatype='TEXT')
driver.validate_schema('Sheet1', validations={'Age': 'numeric', 'Date': 'date'})
Batch Processing and Multi-file Support
Load and merge multiple Excel workbooks.
driver.load_multiple_workbooks(['file1.xlsx', 'file2.xlsx'])
driver.merge_workbooks('Sheet1', 'Sheet2', on='Id', how='inner')
Data Cleaning Functions
clean_data
: Remove or fill missing values.normalize
: Normalize the values of numeric columns.
driver.clean_data('Sheet1', strategy='dropna')
driver.normalize('Sheet1', columns=['Age'])
Joins and Merging Data Across Sheets
Join multiple worksheets using SQL-style operations.
driver.join_sheets('Sheet1', 'Sheet2', on='Id', how='inner')
Advanced Reporting & Visualization Support
generate_report
: Generate descriptive statistics and save as an Excel file.export_visualization
: Create visualizations (bar, line, area charts) from data.
driver.generate_report('Sheet1', output='./data/summary.xlsx')
driver.export_visualization('Sheet1', x_col='Age', y_col='Salary', plot_type='bar', output_path='./data/visuals')
Below is an example of a bar chart visualization generated from the Sheet1
dataset:
Caching Queries for Speed Optimization
Enable caching of frequent SQL queries to speed up repeated requests.
driver.enable_query_cache()
Export to Multiple Formats
export_to_csv
: Export worksheet data to CSV.export_to_json
: Export worksheet data to JSON.
driver.export_to_csv('Sheet1', './data/output.csv')
driver.export_to_json('Sheet1', './data/output.json')
You can find example exported data formats below:
Jupyter Notebook Integration
Display data directly in Jupyter notebooks for quick data inspection.
driver.display_in_notebook('Sheet1')
Contributing
Contributions are welcome! Feel free to open an issue or submit a pull request on GitHub.
License
This project is licensed under the BSD 3-Clause License. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file excelsql-0.1.35.tar.gz
.
File metadata
- Download URL: excelsql-0.1.35.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42b35c8e8e99d3fcc0fa212a3d17130ab612d3fcf1110f9ed95401de27832845 |
|
MD5 | 0cc2c526148e5d190e73a445891dfa47 |
|
BLAKE2b-256 | 0c80c508a0a4737e321f8ddeb379e5b3eb6a47eab0837572145a53a2e7e574dc |
File details
Details for the file excelsql-0.1.35-py3-none-any.whl
.
File metadata
- Download URL: excelsql-0.1.35-py3-none-any.whl
- Upload date:
- Size: 3.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c05d1206bbd8ff5fa7753101f5946beb0ed4d9b03ddfe02d8af194a1a950c55b |
|
MD5 | 5c477b642b6e753494244bf049add43c |
|
BLAKE2b-256 | 6e2cc87cfadda1ce6953a0e69749800bb8302bf7fd14fb7d88603fe4773dfe6e |