No project description provided
Project description
_ _ _
| | | | (_)
| |__| | ___ _ __ _ __ ___ _ ___ _ __ ___
| __ |/ _ \ '__| '_ ` _ \| |/ _ \| '_ \ / _ \
| | | | __/ | | | | | | | | (_) | | | | __/
|_| |_|\___|_| |_| |_| |_|_|\___/|_| |_|\___|
_____ _ _ _ _
| __ \ | | | | (_) | |
| | | | __ _| |_ __ _| |__ _ __ _ ___| | _____
| | | |/ _` | __/ _` | '_ \| '__| |/ __| |/ / __|
| |__| | (_| | || (_| | |_) | | | | (__| <\__ \
|_____/ \__,_|\__\__,_|_.__/|_| |_|\___|_|\_\___/
Source | Downloads | Page | Installation Command |
---|---|---|---|
PyPi | Link | pip install -U hermione-databricks |
What is Databricks?
Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure/AWS cloud services platforms. Designed with the founders of Apache Spark, Databricks is integrated with Azure/AWS to provide one-click setup, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. Databricks comprises the complete open-source Apache Spark cluster technologies and capabilities. Spark in Azure Databricks includes the following components:
Spark SQL and DataFrames: Spark SQL is the Spark module for working with structured data. A DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python.
Streaming: Real-time data processing and analysis for analytical and interactive applications. Integrates with HDFS, Flume, and Kafka.
MLlib: Machine Learning library consisting of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as underlying optimization primitives.
GraphX: Graphs and graph computation for a broad scope of use cases from cognitive analytics to data exploration.
Spark Core API: Includes support for R, SQL, Python, Scala, and Java.
Reference:
- https://github.com/databricks
- https://azure.microsoft.com/en-us/services/databricks/
- https://databricks.com/aws
What is Hermione?
Hermione is the newest open source library that will help Data Scientists on setting up more organized codes, in a quicker and simpler way. Besides, there are some classes in Hermione which assist with daily tasks such as: column normalization and denormalization, data view, text vectoring, etc. Using Hermione, all you need is to execute a method and the rest is up to her, just like magic.
To bring in a little of A3Data experience, we work in Data Science teams inside several client companies and it’s undeniable the excellence of notebooks as a data exploration tool. Nevertheless, when it comes to data science products and their context, when the models needs to be consumed, monitored and have periodic maintenance, putting it into production inside a Jupyter Notebook is not the best choice (we are not even mentioning memory and CPU performance yet). And that’s why Hermione comes in! We have been inspired by this brilliant, empowered and awesome witch of The Harry Potter saga to name this framework!
This is also our way of reinforcing our position that women should be taking more leading roles in the technology field. #CodeLikeAGirl
Reference:
What is Hermione-Databricks?
Considering these two fantastic tools, we have bring the Hermione magic to the #databricks environment, considering more scalability through the #pyspark and #Scala.
With #hermione-databricks you will be able to create the entire structure for your ML project using the databricks workspace to structure the notebooks, pipelines and the DBFS(Databricks File System) to handle with large volumns of data and the project artifacts.
When you start a new project with hermione-databricks, automatcly the bellow local/remote project structures will be created:
Local | Remote |
Local project structure
|
Remote project structure
|
It's important to note that they are not an exact mirror, the reason is the natural difference of local and remote environments,especially considering the DBFS.
After create the project, you can sync the local remote files with the bellow functions:
hermione_databricks sync-local
Sync local project(folders/notebooks/model.pkl).hermione_databricks sync-remote
Sync remote project(folders/notebooks/model.pkl).
Requirements
- Python Version >= 3.6
Installation
To install simply run
pip install --upgrade hermione-databricks
Then set up authentication using an authentication token: <https://docs.databricks.com/api/latest/authentication.html#token-management>
_. Credentials are stored at ~/.databrickscfg
.
hermione_databricks setup
(enter hostname/auth-token at prompt)
To test that your authentication information is working, try a quick test like databricks workspace ls
.
How do I use hermione-databricks?
After installed hermione-databricks:
- Configure the Databricks autentication :
hermione_databricks setup
Here you need to specify the databricks host and the access token, The integration will be made using the official databricks-cli library.
- Starting a new databricks project
hermione_databricks new
Here the hermione-databricks will ask by the:
- Project Name: your project name;
- Project Description: Quicly project description;
- Databricks Host Workspace path: Databricks workspace path, location where your workspace objects will be saved
- Databricks Host DBFS path: Databricks DBFS path, location where your DBFS objects will be saved(include the dbfs:/ prefix).
After This, you can see the project files localy:
Databricks Wokspace (Databricks CLI):
Databricks Wokspace (Databricks Web Interface):
Contributing
Make a pull request with your implementation.
For suggestions, contact us: igor.pereira.br@gmail.com
Licence
Hermione-Databricks is open source and has Apache 2.0 License:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file hermione_databricks-1.0.7-py3-none-any.whl
.
File metadata
- Download URL: hermione_databricks-1.0.7-py3-none-any.whl
- Upload date:
- Size: 646.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 09cf6219f64d593d708d2005113911918ea10d644203dbde2f4c150dff3f12c1 |
|
MD5 | 3fa4127dd8d7fd7236a26ea7b28d0b2e |
|
BLAKE2b-256 | d1460b8b0c456902148bbb61a9b7c840254bf100cdfd565828e07d718e807e3f |