A pandas DataFrame accessor for accessing Enterprise Data Repository (EDR) tables with Spark.
Project description
EDR Accessor - Pandas Extension to access the Enterprise Data Repository (EDR) with Spark
The EDR Accessor is a custom pandas DataFrame accessor that simplifies the interaction with Spark, making it easy to list databases, tables, import tables, and write to Delta Lake tables.
Features
- List all Spark databases and tables
- Import Spark tables into a pandas DataFrame
- Retrieve table row counts
- Write pandas DataFrame to Delta Lake tables
Installation
To install EDR Accessor, simply use pip:
pip install edr-accessor
Usage
After installation, you can use the extension by accessing the .edr
attribute on your pandas DataFrame.
import pandas as pd
import edr_accessor
# Create an empty DataFrame
df = pd.DataFrame()
# List all databases
databases = df.edr.list_databases()
# List all tables in a specific database
tables = df.edr.list_tables('my_database')
# Import a table from Spark
df.edr.import_table('my_table', database='my_database')
# Get row counts for tables in a database
row_counts = df.edr.table_rowcounts(database='my_database')
# Write DataFrame to a Delta Lake table
df.edr.to_delta_table('my_delta_table', 'my_container', 'my_storage_account')
Requirements
- Pandas
- PySpark
Contributing
Contributions welcome! Feel free to submit a pull request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
edr-accessor-0.1.3.tar.gz
(5.4 kB
view hashes)
Built Distribution
Close
Hashes for edr_accessor-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68f6a8e76c30767e538c03a40156ecc9be4de981a6b685a9affc769275462132 |
|
MD5 | f10dcc82f4bd9e4760a95b00f388d02c |
|
BLAKE2b-256 | dbc44aea857732ecbd7fb3de7b93b2c8fbd4798924a1a4cdba5847692907e26c |