An Spark entry point for python
Project description
Changelog
v0.3.0
Enable –force-download option.
Added –find-links option in order to use a directory as package repository.
Added –no-index option in order to avoid to use external package repositories.
Added –queue option in order to set yarn queue.
Ensure driver’s python executable is same python as sparpy.
Added new entry point sparpy-download just to download packages to specific directory.
Added new entry point isparpy in order to start an interactive session.
v0.2.1
Force pyspark python executable to same as sparpy.
Fix unrecognized options.
Fix default configuration file names.
v0.2.0
Added configuration file option.
Added –debug option.
How to build a Sparpy plugin
On package setup.py an entry point should be configured for Sparpy:
setup(
name='yourpackage',
...
entry_points={
...
'sparpy.cli_plugins': [
'my_command_1=yourpackage.module:command_1',
'my_command_2=yourpackage.module:command_2',
]
}
)
Install
It must be installed on a Spark edge node.
$ pip install sparpy
How to use
Using default Spark submit parameters:
$ sparpy --plugin "mypackage>=0.1" my_plugin_command --myparam 1
Configuration files
sparpy and sparpu-submit accept the parameter –config that allow to set a configuration file. If it is not set it will try to use configuration file $HOME/.sparpyrc. It if does not exist it will try to use /etc/sparpy.conf.
Format:
[spark]
master=yarn
deploy-mode=client
queue=my_queue
spark-executable=/path/to/my-spark-submit
conf=
spark.conf.1=value1
spark.conf.2=value2
packages=
maven:package_1:0.1.1
maven:package_2:0.6.1
repositories=
https://my-maven-repository-1.com/mvn
https://my-maven-repository-2.com/mvn
reqs_paths=
/path/to/dir/with/python/packages_1
/path/to/dir/with/python/packages_2
[plugins]
extra-index-urls=
https://my-pypi-repository-1.com/simple
https://my-pypi-repository-2.com/simple
cache-dir=/path/to/cache/dir
plugins=
my-package1
my-package2==0.1.2
requirements-files=
/path/to/requirement-1.txt
/path/to/requirement-2.txt
find-links=
/path/to/directory/with/packages_1
/path/to/directory/with/packages_2
download-dir-prefix=my_prefix_
no-index=false
no-self=false
force-download=true
[interactive]
pyspark-executable=/path/to/pyspark
python-interactive-driver=/path/to/interactive/driver
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.