Skip to main content

Index your dataset

Project description

Lilac

Prerequisites

Before you can run the server, install the following:

Install dependencies

./scripts/setup.sh

Run Lilac

Development

To run the web server in dev mode with fast edit-refresh:

./run_server_dev.sh

Format typescript files:

npm run format --workspace web/lib
npm run format --workspace web/blueprint
Huggingface

Huggingface spaces are used for PRs and for demos.

Details can be found at Managing Spaces with Github Actions

Staging demo
  1. Login with the HuggingFace to access git.

    poetry run huggingface-cli login

    Follow the instructions to use your git SSH keys to talk to HuggingFace.

  2. Create a huggingface space from your browser: huggingface.co/spaces

  3. Turn on persistent storage in the Settings UI.

  4. Set .env.local environment variables so you can upload data to the space:

      # The repo to use for the huggingface demo.
      HF_STAGING_DEMO_REPO='lilacai/your-space'
      # To authenticate with HuggingFace for uploading to the space.
      HF_USERNAME='your-username'
    
  5. Deploy to your HuggingFace Space:

    poetry run deploy-hf \
      --dataset=$DATASET_NAMESPACE/$DATASET_NAME
    
    # --concept is optional. By default all lilac/* concepts are uploaded. This flag enables uploading other concepts from local.
    # --hf_username and --hf_space are optional and can override the ENV for local uploading.
    

Deployment

To build the docker image:

./scripts/build_docker.sh

To run the docker image locally:

docker run -p 5432:5432 lilac_blueprint

Configuration

To use various API's, API keys need to be provided. Create a file named .env.local in the root, and add variables that are listed in .env with your own values.

Testing

Run all the checks before mailing:

./scripts/checks.sh

Test python:

./scripts/test_py.sh

Test JavaScript:

./scripts/test_ts.sh

Ingesting datasets from CLI

Datasets can be ingested entirely from the UI, however if you prefer to use the CLI you can ingest data with the following command:

poetry run python -m src.data_loader \
  --dataset_name=$DATASET \
  --output_dir=./data/ \
  --config_path=./datasets/the_movies_dataset.json

NOTE: You have to have a JSON file that represents your sour configuration, in this case "the_movies_dataset.json".

Tips

Recommended dev tools

Installing poetry

You may need the following to install poetry:

Troubleshooting

pyenv install not working on M1

If your pyenv does not work on M1 machines after installing xcode, you may need to reinstall xcode command line tools. Stack Overflow Link

No module named _lzma

Follow instructions from pyenv:

  • Uninstall python via pyenv uninstall
  • Run brew install openssl readline sqlite3 xz zlib tcl-tk
  • Reinstall python via pyenv install
$ sudo rm -rf /Library/Developer/CommandLineTools
$ xcode-select --install

Installing TensorFlow on M1

M1/M2 chips need a special TF installation. These steps are taken from the official Apple docs:

  1. Click here to download Conda env
  2. Run:
chmod +x ~/Downloads/Miniforge3-MacOSX-arm64.sh
sh ~/Downloads/Miniforge3-MacOSX-arm64.sh
source ~/miniforge3/bin/activate
  1. Install the TensorFlow 2.9.0 dependencies: conda install -c apple tensorflow-deps=2.9.0

Too many open files on MacOS

When downloading and pre-processing TFDS datasets, you might get too many open files error. To fix, increase the max open files limit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lilacai-0.0.1.tar.gz (134.8 kB view details)

Uploaded Source

Built Distribution

lilacai-0.0.1-py3-none-any.whl (173.8 kB view details)

Uploaded Python 3

File details

Details for the file lilacai-0.0.1.tar.gz.

File metadata

  • Download URL: lilacai-0.0.1.tar.gz
  • Upload date:
  • Size: 134.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.13 Darwin/22.3.0

File hashes

Hashes for lilacai-0.0.1.tar.gz
Algorithm Hash digest
SHA256 fa69d9ac1a3b710bd2415b880600696cd4dde86bb9e42b6951a64bc0e0668df0
MD5 6f1650fff4d86d47a9ed3cccacde3e96
BLAKE2b-256 c26693004e682791ee6f6d67704bb33476ff54c78029852eeabb59cc4ab5a0f1

See more details on using hashes here.

File details

Details for the file lilacai-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: lilacai-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 173.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.13 Darwin/22.3.0

File hashes

Hashes for lilacai-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f887f8630f92342876858076fbc010cd9203bea7d3ba1505b3b6e29dfdd5de4e
MD5 68004ee44a506d7a51f610be57607fe7
BLAKE2b-256 478087d42ece97c50ce0e78c292ac4ae9515a03906e1d27191119e54a5c6e7b5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page