Python SDK for QuixLake - Easy data querying and management
Project description
QuixLake Python SDK
A Python client library for interacting with QuixLake API. Provides easy-to-use methods for querying, inserting, and managing data without having to handle HTTP requests and response parsing manually.
Features
- 🔍 Simple SQL Queries: Execute SQL queries and get pandas DataFrames
- 📊 Data Management: Insert, compact, and repartition tables
- 🗂️ Partition Support: Work with Hive-partitioned data
- 🔧 Easy Setup: Simple installation and configuration
- 📓 Jupyter Ready: Perfect for data analysis notebooks
Installation
From wheel (recommended)
pip install dist/quixlake_sdk-*.whl
Development mode
pip install -e .
With Jupyter support
pip install 'dist/quixlake_sdk-*.whl[jupyter]'
Quick Start
from quixlake import QuixLakeClient
import pandas as pd
# Initialize client
client = QuixLakeClient(base_url="http://localhost")
# Query data
df = client.query("SELECT * FROM my_table LIMIT 10")
print(df.head())
# Get available tables
tables = client.get_tables()
print("Available tables:", tables)
# Insert data
new_data = pd.DataFrame({
'id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie'],
'machine': ['3D_PRINTER_0', '3D_PRINTER_1', '3D_PRINTER_0']
})
result = client.insert(
table_name="users",
data=new_data,
hive_columns=["machine"]
)
print("Inserted:", result)
API Reference
QuixLakeClient
__init__(base_url="http://localhost", timeout=30)
Initialize the client with QuixLake API base URL.
query(sql, explain_analyze=False)
Execute SQL query and return pandas DataFrame.
sql: SQL SELECT statementexplain_analyze: Enable query execution plan analysis
get_tables()
Get list of available tables.
get_partitions(table_name)
Get partition tree structure for a table.
get_partition_info(table_name)
Get partition structure information for a table.
insert(table_name, data, hive_columns=None, timestamp_column=None, timestamp_format="day")
Insert pandas DataFrame into table with optional partitioning.
table_name: Target table namedata: pandas DataFrame to inserthive_columns: List of columns for Hive partitioningtimestamp_column: Column for timestamp-based partitioningtimestamp_format: 'day', 'hour', or 'month'
compact(table_name, target_file_size_mb=128)
Compact table files to optimize performance.
repartition(table_name, hive_columns=None, timestamp_column=None, timestamp_format="day", target_file_size_mb=128)
Repartition table with new partition scheme.
delete(table_name, where_clause=None, delete_table=False, partitions=None)
Delete data from table with various deletion modes.
Examples
Basic Querying
client = QuixLakeClient()
# Simple query
df = client.query("SELECT COUNT(*) as total FROM events_hive")
# Query with explain analyze
df = client.query("SELECT * FROM events_hive WHERE machine = '3D_PRINTER_0'", explain_analyze=True)
Data Management
# Get table information
tables = client.get_tables()
partition_info = client.get_partition_info("events_hive")
# Insert partitioned data
df = pd.read_csv("new_events.csv")
client.insert(
table_name="events_hive",
data=df,
hive_columns=["machine", "experiment_name"],
timestamp_column="ts_ms",
timestamp_format="day"
)
# Compact table
result = client.compact("events_hive", target_file_size_mb=256)
# Repartition table
result = client.repartition(
table_name="events_hive",
hive_columns=["machine", "year", "month"],
timestamp_column="ts_ms",
timestamp_format="month"
)
Using Context Manager
with QuixLakeClient() as client:
df = client.query("SELECT * FROM my_table")
print(df.describe())
Building the SDK
Run the build script to create a wheel package:
./build.sh
This will:
- Clean previous builds
- Install build dependencies
- Create wheel package in
dist/directory - Show installation instructions
Development
Install in development mode
pip install -e .[dev]
Run tests
pytest
Code formatting
black src/
Error Handling
The SDK provides clear error messages for common issues:
- Partition Mismatch: When trying to insert data with incompatible partition structure
- Query Errors: SQL syntax errors or execution failures
- Network Issues: Connection timeouts or API unavailability
try:
df = client.query("SELECT * FROM non_existent_table")
except ValueError as e:
print(f"Query error: {e}")
Requirements
- Python 3.8+
- pandas >= 1.3.0
- requests >= 2.25.0
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quixlake_sdk-0.2.3-py3-none-any.whl.
File metadata
- Download URL: quixlake_sdk-0.2.3-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86b4c6c066267b597f57c714f8959151314c0457bc97bf934c54d214e4e7e7db
|
|
| MD5 |
719f8ff8d7611a2563c309015afff186
|
|
| BLAKE2b-256 |
275b1a8ee3795e11da4fa27e7b09f006f66fa1534cf1a905e8617c85ec56c7a0
|