Skip to main content

Organize unstructured data

Project description

Lilac 🌸

Analyze, structure and clean unstructured data with AI.

Downloads License Twitter Dev Container GitHub Codespace

Prerequisites

Before you can run the server, install the following:

Install dependencies

./scripts/setup.sh

Run Lilac

Development

To run the web server in dev mode with fast edit-refresh:

./run_server_dev.sh

Format typescript files:

npm run format --workspace web/lib
npm run format --workspace web/blueprint
Huggingface

Huggingface spaces are used for PRs and for demos.

Details can be found at Managing Spaces with Github Actions

Staging demo
  1. Login with the HuggingFace to access git.

    poetry run huggingface-cli login

    Follow the instructions to use your git SSH keys to talk to HuggingFace.

  2. Create a huggingface space from your browser: huggingface.co/spaces

  3. Turn on persistent storage in the Settings UI.

  4. Set .env.local environment variables so you can upload data to the space:

      # The repo to use for the huggingface demo.
      HF_STAGING_DEMO_REPO='lilacai/your-space'
      # To authenticate with HuggingFace for uploading to the space.
      HF_USERNAME='your-username'
    
  5. Deploy to your HuggingFace Space:

    poetry run python -m scripts.deploy_hf \
      --dataset=$DATASET_NAMESPACE/$DATASET_NAME
    
    # --concept is optional. By default all lilac/* concepts are uploaded. This flag enables uploading other concepts from local.
    # --hf_username and --hf_space are optional and can override the ENV for local uploading.
    

Deployment

To build the docker image:

./scripts/build_docker.sh

To run the docker image locally:

docker run -p 5432:5432 lilac_blueprint

Authentication

Authentication is done via Google login. A Google Client token should be created from the Google API Console. Details can be found here.

By default, the Lilac google client is used. The secret can be found in Google Cloud console, and should be defined under GOOGLE_CLIENT_SECRET in .env.local.

For the session middleware, a random string should be created and defined as LILAC_OAUTH_SECRET_KEY in .env.local.

You can generate a random secret key with:

import string
import random
key = ''.join(random.choices(string.ascii_uppercase + string.digits, k=64))
print(f"LILAC_OAUTH_SECRET_KEY='{key}'")

Publishing on pip

To authenticate, add the PYPI_TOKEN to your .env.local file. You can get the token from pypi.org. To publish, run:

./scripts/publish_pip.sh

Configuration

To use various API's, API keys need to be provided. Create a file named .env.local in the root, and add variables that are listed in .env with your own values.

Testing

Run all the checks before mailing:

./scripts/checks.sh

Test python:

./scripts/test_py.sh

Test JavaScript:

./scripts/test_ts.sh

Ingesting datasets from CLI

Datasets can be ingested entirely from the UI, however if you prefer to use the CLI you can ingest data with the following command:

poetry run lilac load \
  --output_dir=demo_data \
  --config_path=demo.yml

NOTE: You must have a config JSON or YAML file that represents your dataset configuration. The config should be an instance of the pydantic class lilac.Config (for multiple datasets) or lilac.DatasetConfig (for a single dataset).

Tips

Recommended dev tools

Installing poetry

You may need the following to install poetry:

Troubleshooting

pyenv install not working on M1

If your pyenv does not work on M1 machines after installing xcode, you may need to reinstall xcode command line tools. Stack Overflow Link

No module named _lzma

Follow instructions from pyenv:

  • Uninstall python via pyenv uninstall
  • Run brew install openssl readline sqlite3 xz zlib tcl-tk
  • Reinstall python via pyenv install
$ sudo rm -rf /Library/Developer/CommandLineTools
$ xcode-select --install

Installing TensorFlow on M1

M1/M2 chips need a special TF installation. These steps are taken from the official Apple docs:

  1. Click here to download Conda env
  2. Run:
chmod +x ~/Downloads/Miniforge3-MacOSX-arm64.sh
sh ~/Downloads/Miniforge3-MacOSX-arm64.sh
source ~/miniforge3/bin/activate
  1. Install the TensorFlow 2.9.0 dependencies: conda install -c apple tensorflow-deps=2.9.0

Too many open files on MacOS

When downloading and pre-processing TFDS datasets, you might get too many open files error. To fix, increase the max open files limit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lilacai-0.0.7.tar.gz (5.5 MB view details)

Uploaded Source

Built Distribution

lilacai-0.0.7-py3-none-any.whl (6.4 MB view details)

Uploaded Python 3

File details

Details for the file lilacai-0.0.7.tar.gz.

File metadata

  • Download URL: lilacai-0.0.7.tar.gz
  • Upload date:
  • Size: 5.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.13 Darwin/22.5.0

File hashes

Hashes for lilacai-0.0.7.tar.gz
Algorithm Hash digest
SHA256 e7c03976aa2f7533ed1c3cec858342bf69dfb1c2cd88f64b4fb83efe88a6aa2e
MD5 e44475f3987a95135fd4bac8c686b0c2
BLAKE2b-256 d949198e011e5661904f0b9b71e596fd624fd6e537800a0935435b61413f9f71

See more details on using hashes here.

File details

Details for the file lilacai-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: lilacai-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 6.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.13 Darwin/22.5.0

File hashes

Hashes for lilacai-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 f623e18e3c042949887e27633d101e71646ce7d0099ed4cb5869a65ed708da49
MD5 96c249d6b8b43cc85ba6100436ddcb8b
BLAKE2b-256 fbf9ae3a0ebf44da448f8da059c141d35b931d893f7306b7fcb755e3b1efc9d3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page