Data Extension for Jupyter notebook
Project description
The Juneau Project
The past decade has brought a sea change in the availability of data. Instead of a world in which we have small number of carefully curated data sources in a centralized database -- instead we have a plethora of datasets, data versions, and data representations that span users, groups, and organizations. Devices and data acquisition tools make it easy to acquire new data, cloud hosting makes it easy to centralize and share files, and cloud data analytics and machine learning tools have driven a desire to integrate and extract value from that data.
We have been missing management tools to centralize and capture such data resources. Data scientists often end up doing redundant work because they have no effective way of finding appropriate resources to reuse and retarget to new applications.
The Juneau Project develops holistic data management tools to find, standardize, and benefit from the existing resources in the data lake. This extension to Jupyter Notebook is a point of access for our dataset management tools.
For more on the project, please see the project home, as well as our research papers:
- Finding Related Tables in Data Lakes for Interactive Data Science. Yi Zhang and Zachary G. Ives. SIGMOD 2020.
- Dataset Relationship Management. Yi Zhang, Soonbo Han, Nan Zheng. CIDR 2019.
Setup
Prerequisites: relational and graph databases
Simple Version
Install Docker, including docker-compose, for your preferred operating system.
- Download this file for Docker-Compose
- Run
docker-compose up
from the directory. - Copy
juneau/config-default.py
tojuneau/config.py
These will use the default user IDs and passwords that exist in config.yaml
.
Custom Version
First, be sure you have installed:
- PostgreSQL, version 10 or later
- Neo4J 3.3, version or later
Then set up a default user ID and password for each:
- Run
sudo -u postgres psql
and then enter\password
. Set a password for the account (by default this is assumed to behabitat1
). - Open your browser to
localhost:7474
and change the password on theneo4j
password, by default tohabitat1
. - Copy
juneau/config-default.py
tojuneau/config.py
Now either edit the YAML file in juneau/config/config.yaml
to match your password and account info or
change the environment variables in your terminal.
Sample data lake corpus
Next, download juneau_start.zip and unzip it.
For the Docker container, you can import as follows:
- Run
./neo4j-update.sh
Otherwise, you can use:
neo4j-admin load --database=data.db --from=juneauG.dump --force
psql -h localhost -U postgres < juneauD.pgsql
And finally you need to edit the neo4j.conf
file to set the database
to data.db
.
Install Jupyter Notebook extensions
See the Developer's Guide for details.
sudo -H python setup.py install
sudo -H jupyter serverextension enable --py juneau
jupyter nbextension install dataset_inspector / --user
jupyter nbextension enable dataset_inspector/main --user
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file juneau-0.0.4.tar.gz
.
File metadata
- Download URL: juneau-0.0.4.tar.gz
- Upload date:
- Size: 56.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b4e85e8741497c56a98568dddc40ef9b8a973de18bdab332ccb251c3856969c |
|
MD5 | 301d708fbb7ff8128ccb5dd59e7aea2b |
|
BLAKE2b-256 | ef47c89bffabb18589c199ec5c881fc547b7a93fb470096e2057e967cbb3c41e |
File details
Details for the file juneau-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: juneau-0.0.4-py3-none-any.whl
- Upload date:
- Size: 86.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0373e56ec0d700a5defdc756bc1bc4bcd60fa59ec5c6478f38936b76f59d9d1 |
|
MD5 | fac0086d9cda2a22a02a97191e1850e9 |
|
BLAKE2b-256 | 8f9564a4aece0bca29eb82ab835ec3d9d68a08b64cf54d58256699f7dcd3524f |