Skip to main content

Organize unstructured data

Project description

Lilac

Prerequisites

Before you can run the server, install the following:

Install dependencies

./scripts/setup.sh

Run Lilac

Development

To run the web server in dev mode with fast edit-refresh:

./run_server_dev.sh

Format typescript files:

npm run format --workspace web/lib
npm run format --workspace web/blueprint
Huggingface

Huggingface spaces are used for PRs and for demos.

Details can be found at Managing Spaces with Github Actions

Staging demo
  1. Login with the HuggingFace to access git.

    poetry run huggingface-cli login

    Follow the instructions to use your git SSH keys to talk to HuggingFace.

  2. Create a huggingface space from your browser: huggingface.co/spaces

  3. Turn on persistent storage in the Settings UI.

  4. Set .env.local environment variables so you can upload data to the space:

      # The repo to use for the huggingface demo.
      HF_STAGING_DEMO_REPO='lilacai/your-space'
      # To authenticate with HuggingFace for uploading to the space.
      HF_USERNAME='your-username'
    
  5. Deploy to your HuggingFace Space:

    poetry run deploy-hf \
      --dataset=$DATASET_NAMESPACE/$DATASET_NAME
    
    # --concept is optional. By default all lilac/* concepts are uploaded. This flag enables uploading other concepts from local.
    # --hf_username and --hf_space are optional and can override the ENV for local uploading.
    

Deployment

To build the docker image:

./scripts/build_docker.sh

To run the docker image locally:

docker run -p 5432:5432 lilac_blueprint

Authentication

Authentication is done via Google login. A Google Client token should be created from the Google API Console. Details can be found here.

By default, the Lilac google client is used. The secret can be found in Google Cloud console, and should be defined under GOOGLE_CLIENT_SECRET in .env.local.

For the session middleware, a random string should be created and defined as LILAC_OAUTH_SECRET_KEY in .env.local.

You can generate a random secret key with:

import string
import random
key = ''.join(random.choices(string.ascii_uppercase + string.digits, k=64))
print(f"LILAC_OAUTH_SECRET_KEY='{key}'")

Publishing on pip

To authenticate, add the PYPI_TOKEN to your .env.local file. You can get the token from pypi.org. Then run the following script:

./scripts/publish_pip.sh

Configuration

To use various API's, API keys need to be provided. Create a file named .env.local in the root, and add variables that are listed in .env with your own values.

Testing

Run all the checks before mailing:

./scripts/checks.sh

Test python:

./scripts/test_py.sh

Test JavaScript:

./scripts/test_ts.sh

Ingesting datasets from CLI

Datasets can be ingested entirely from the UI, however if you prefer to use the CLI you can ingest data with the following command:

poetry run python -m lilacai.data_loader \
  --dataset_name=$DATASET \
  --output_dir=./data/ \
  --config_path=./datasets/the_movies_dataset.json

NOTE: You have to have a JSON file that represents your sour configuration, in this case "the_movies_dataset.json".

Tips

Recommended dev tools

Installing poetry

You may need the following to install poetry:

Troubleshooting

pyenv install not working on M1

If your pyenv does not work on M1 machines after installing xcode, you may need to reinstall xcode command line tools. Stack Overflow Link

No module named _lzma

Follow instructions from pyenv:

  • Uninstall python via pyenv uninstall
  • Run brew install openssl readline sqlite3 xz zlib tcl-tk
  • Reinstall python via pyenv install
$ sudo rm -rf /Library/Developer/CommandLineTools
$ xcode-select --install

Installing TensorFlow on M1

M1/M2 chips need a special TF installation. These steps are taken from the official Apple docs:

  1. Click here to download Conda env
  2. Run:
chmod +x ~/Downloads/Miniforge3-MacOSX-arm64.sh
sh ~/Downloads/Miniforge3-MacOSX-arm64.sh
source ~/miniforge3/bin/activate
  1. Install the TensorFlow 2.9.0 dependencies: conda install -c apple tensorflow-deps=2.9.0

Too many open files on MacOS

When downloading and pre-processing TFDS datasets, you might get too many open files error. To fix, increase the max open files limit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lilacai-0.0.5.tar.gz (136.5 kB view details)

Uploaded Source

Built Distribution

lilacai-0.0.5-py3-none-any.whl (176.6 kB view details)

Uploaded Python 3

File details

Details for the file lilacai-0.0.5.tar.gz.

File metadata

  • Download URL: lilacai-0.0.5.tar.gz
  • Upload date:
  • Size: 136.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.13 Darwin/22.3.0

File hashes

Hashes for lilacai-0.0.5.tar.gz
Algorithm Hash digest
SHA256 731e6288f3343b3f9433a8a86fe880ca6e4652fdec4836cd5497df47ed209584
MD5 ca341dd9475709bbdf613a211d530b8b
BLAKE2b-256 1f3bf6c647cc17dd31771a826cf46270e1206526d75b66a68e4e025b4a3e0485

See more details on using hashes here.

File details

Details for the file lilacai-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: lilacai-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 176.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.13 Darwin/22.3.0

File hashes

Hashes for lilacai-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3c0bb4cb905bd51c019370aff7821df66329498cdc040b8f62cda4ad872eea10
MD5 5e579a1e6989d94e8c2c04f4ff225f1f
BLAKE2b-256 d840be56f1f7e1815d200ddb8d6e0270afc13477fcf6bb006ec889f9649a90ba

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page