Skip to main content

Data Extension for Jupyter notebook

Project description

The Juneau Project

juneau-project

The past decade has brought a sea change in the availability of data. Instead of a world in which we have small number of carefully curated data sources in a centralized database -- instead we have a plethora of datasets, data versions, and data representations that span users, groups, and organizations. Devices and data acquisition tools make it easy to acquire new data, cloud hosting makes it easy to centralize and share files, and cloud data analytics and machine learning tools have driven a desire to integrate and extract value from that data.

We have been missing management tools to centralize and capture such data resources. Data scientists often end up doing redundant work because they have no effective way of finding appropriate resources to reuse and retarget to new applications.

The Juneau Project develops holistic data management tools to find, standardize, and benefit from the existing resources in the data lake. This extension to Jupyter Notebook is a point of access for our dataset management tools.

For more on the project, please see the project home, as well as our research papers:

Setup

Prerequisites: relational and graph databases

Simple Version

Install Docker, including docker-compose, for your preferred operating system.

  • Download this file for Docker-Compose
  • Run docker-compose up from the directory.
  • Copy juneau/config-default.py to juneau/config.py

These will use the default user IDs and passwords that exist in config.yaml.

Custom Version

First, be sure you have installed:

  • PostgreSQL, version 10 or later
  • Neo4J 3.3, version or later

Then set up a default user ID and password for each:

  • Run sudo -u postgres psql and then enter \password. Set a password for the account (by default this is assumed to be habitat1).
  • Open your browser to localhost:7474 and change the password on the neo4j password, by default to habitat1.
  • Copy juneau/config-default.py to juneau/config.py

Now either edit the YAML file in juneau/config/config.yaml to match your password and account info or change the environment variables in your terminal.

Sample data lake corpus

Next, download juneau_start.zip and unzip it.

For the Docker container, you can import as follows:

  • Run ./neo4j-update.sh

Otherwise, you can use:

  • neo4j-admin load --database=data.db --from=juneauG.dump --force
  • psql -h localhost -U postgres < juneauD.pgsql

And finally you need to edit the neo4j.conf file to set the database to data.db.

Install Jupyter Notebook extensions

See the Developer's Guide for details.

  • sudo -H python setup.py install
  • sudo -H jupyter serverextension enable --py juneau
  • jupyter nbextension install dataset_inspector / --user
  • jupyter nbextension enable dataset_inspector/main --user

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

juneau-0.0.4.tar.gz (56.7 kB view details)

Uploaded Source

Built Distribution

juneau-0.0.4-py3-none-any.whl (86.4 kB view details)

Uploaded Python 3

File details

Details for the file juneau-0.0.4.tar.gz.

File metadata

  • Download URL: juneau-0.0.4.tar.gz
  • Upload date:
  • Size: 56.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.2

File hashes

Hashes for juneau-0.0.4.tar.gz
Algorithm Hash digest
SHA256 7b4e85e8741497c56a98568dddc40ef9b8a973de18bdab332ccb251c3856969c
MD5 301d708fbb7ff8128ccb5dd59e7aea2b
BLAKE2b-256 ef47c89bffabb18589c199ec5c881fc547b7a93fb470096e2057e967cbb3c41e

See more details on using hashes here.

File details

Details for the file juneau-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: juneau-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 86.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.2

File hashes

Hashes for juneau-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c0373e56ec0d700a5defdc756bc1bc4bcd60fa59ec5c6478f38936b76f59d9d1
MD5 fac0086d9cda2a22a02a97191e1850e9
BLAKE2b-256 8f9564a4aece0bca29eb82ab835ec3d9d68a08b64cf54d58256699f7dcd3524f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page