Skip to main content

A pandas DataFrame accessor for accessing Enterprise Data Repository (EDR) tables with Spark.

Project description

EDR Accessor - Pandas Extension to access the Enterprise Data Repository (EDR) with Spark

The EDR Accessor is a custom pandas DataFrame accessor that simplifies the interaction with Spark, making it easy to list databases, tables, import tables, and write to Delta Lake tables.

Features

  • List all Spark databases and tables
  • Import Spark tables into a pandas DataFrame
  • Retrieve table row counts
  • Write pandas DataFrame to Delta Lake tables

Installation

To install EDR Accessor, simply use pip:

pip install edr-accessor

Usage

After installation, you can use the extension by accessing the .edr attribute on your pandas DataFrame.

import pandas as pd
import edr_accessor

# Create an empty DataFrame
df = pd.DataFrame()

# List all databases
databases = df.edr.list_databases()

# List all tables in a specific database
tables = df.edr.list_tables('my_database')

# Import a table from Spark
df.edr.import_table('my_table', database='my_database')

# Get row counts for tables in a database
row_counts = df.edr.table_rowcounts(database='my_database')

# Write DataFrame to a Delta Lake table
df.edr.to_delta_table('my_delta_table', 'my_container', 'my_storage_account')

Requirements

  • Pandas
  • PySpark

Contributing

Contributions welcome! Feel free to submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edr-accessor-0.1.7.tar.gz (5.5 kB view hashes)

Uploaded Source

Built Distribution

edr_accessor-0.1.7-py3-none-any.whl (8.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page