Custom Spark data sources for reading and writing data in Apache Spark, using the Python Data Source API
Project description
pyspark-data-sources
This repository showcases custom Spark data sources built using the new Python Data Source API for the upcoming Apache Spark 4.0 release. For an in-depth understanding of the API, please refer to the API source code.
Setup
pip install pyspark-data-sources
Usage
Note: Currently the following code only works with Apache Spark
master
branch.
from pyspark_datasources.github import GithubDataSource
# Register the data source
spark.dataSource.register(GithubDataSource)
df = spark.read.format("github").load("apache/spark")
df.show()
Contributing
We welcome and appreciate any contributions to enhance and expand the custom data sources. If you're interested in contributing:
- Add New Data Sources: Want to add a new data source using the Python Data Source API? Submit a pull request or open an issue.
- Suggest Enhancements: If you have ideas to improve a data source or the API, we'd love to hear them!
- Report Bugs: Found something that doesn't work as expected? Let us know by opening an issue.
Need help or have questions? Don't hesitate to open a new issue, and we'll do our best to assist you.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for pyspark_data_sources-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 89cbd4cd0f7064246020466803ab984e6a0409eeb2439fd7efa56a41f3ec0046 |
|
MD5 | 63be05beecfdc03a88e526b2cef3d2f2 |
|
BLAKE2b-256 | 1aebb14fc417735dded13ce170944271238012d019c4e6a9a74e49ec3502cb02 |
Close
Hashes for pyspark_data_sources-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bbc574769d20c5ee2dcaa7d08d2d9daf83fea7050780bf91a4bb5c8f6be0a247 |
|
MD5 | ca80567fbd451d5ce19a3cde9d50b232 |
|
BLAKE2b-256 | 5c3d4544c1eacb18a9d0dc61878a2df1c584999a619cfcd104288de3f2ce8fa5 |