Skip to main content

Organize unstructured data

Project description

Lilac

Prerequisites

Before you can run the server, install the following:

Install dependencies

./scripts/setup.sh

Run Lilac

Development

To run the web server in dev mode with fast edit-refresh:

./run_server_dev.sh

Format typescript files:

npm run format --workspace web/lib
npm run format --workspace web/blueprint
Huggingface

Huggingface spaces are used for PRs and for demos.

Details can be found at Managing Spaces with Github Actions

Staging demo
  1. Login with the HuggingFace to access git.

    poetry run huggingface-cli login

    Follow the instructions to use your git SSH keys to talk to HuggingFace.

  2. Create a huggingface space from your browser: huggingface.co/spaces

  3. Turn on persistent storage in the Settings UI.

  4. Set .env.local environment variables so you can upload data to the space:

      # The repo to use for the huggingface demo.
      HF_STAGING_DEMO_REPO='lilacai/your-space'
      # To authenticate with HuggingFace for uploading to the space.
      HF_USERNAME='your-username'
    
  5. Deploy to your HuggingFace Space:

    poetry run deploy-hf \
      --dataset=$DATASET_NAMESPACE/$DATASET_NAME
    
    # --concept is optional. By default all lilac/* concepts are uploaded. This flag enables uploading other concepts from local.
    # --hf_username and --hf_space are optional and can override the ENV for local uploading.
    

Deployment

To build the docker image:

./scripts/build_docker.sh

To run the docker image locally:

docker run -p 5432:5432 lilac_blueprint

Configuration

To use various API's, API keys need to be provided. Create a file named .env.local in the root, and add variables that are listed in .env with your own values.

Testing

Run all the checks before mailing:

./scripts/checks.sh

Test python:

./scripts/test_py.sh

Test JavaScript:

./scripts/test_ts.sh

Ingesting datasets from CLI

Datasets can be ingested entirely from the UI, however if you prefer to use the CLI you can ingest data with the following command:

poetry run python -m src.data_loader \
  --dataset_name=$DATASET \
  --output_dir=./data/ \
  --config_path=./datasets/the_movies_dataset.json

NOTE: You have to have a JSON file that represents your sour configuration, in this case "the_movies_dataset.json".

Tips

Recommended dev tools

Installing poetry

You may need the following to install poetry:

Troubleshooting

pyenv install not working on M1

If your pyenv does not work on M1 machines after installing xcode, you may need to reinstall xcode command line tools. Stack Overflow Link

No module named _lzma

Follow instructions from pyenv:

  • Uninstall python via pyenv uninstall
  • Run brew install openssl readline sqlite3 xz zlib tcl-tk
  • Reinstall python via pyenv install
$ sudo rm -rf /Library/Developer/CommandLineTools
$ xcode-select --install

Installing TensorFlow on M1

M1/M2 chips need a special TF installation. These steps are taken from the official Apple docs:

  1. Click here to download Conda env
  2. Run:
chmod +x ~/Downloads/Miniforge3-MacOSX-arm64.sh
sh ~/Downloads/Miniforge3-MacOSX-arm64.sh
source ~/miniforge3/bin/activate
  1. Install the TensorFlow 2.9.0 dependencies: conda install -c apple tensorflow-deps=2.9.0

Too many open files on MacOS

When downloading and pre-processing TFDS datasets, you might get too many open files error. To fix, increase the max open files limit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lilacai-0.0.2.tar.gz (134.7 kB view details)

Uploaded Source

Built Distribution

lilacai-0.0.2-py3-none-any.whl (173.8 kB view details)

Uploaded Python 3

File details

Details for the file lilacai-0.0.2.tar.gz.

File metadata

  • Download URL: lilacai-0.0.2.tar.gz
  • Upload date:
  • Size: 134.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.13 Darwin/22.3.0

File hashes

Hashes for lilacai-0.0.2.tar.gz
Algorithm Hash digest
SHA256 39054bf39bbae2c8ab0d2a33753d95c782e2a3bedf3f0a290c0a87945fcf198b
MD5 b8708188d4b09db5e5fd5ac621c8dd70
BLAKE2b-256 618e405dec49fca6d77133ea1ee0942359da702d69ca36cd67580670e4823951

See more details on using hashes here.

File details

Details for the file lilacai-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: lilacai-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 173.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.13 Darwin/22.3.0

File hashes

Hashes for lilacai-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ea6bf869063a60e79e4b324889dd8a7332a888a86193632fe8700245276a4db5
MD5 9b18687faea15279bd2c41f7fdc8a8e3
BLAKE2b-256 ae0aeedf794f4616ca9aece0cc0aadd403bd667796c1634efda46c0b039b903f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page