Organize unstructured data
Project description
Lilac
Prerequisites
Before you can run the server, install the following:
Install dependencies
./scripts/setup.sh
Run Lilac
Development
To run the web server in dev mode with fast edit-refresh:
./run_server_dev.sh
Format typescript files:
npm run format --workspace web/lib
npm run format --workspace web/blueprint
Huggingface
Huggingface spaces are used for PRs and for demos.
Details can be found at Managing Spaces with Github Actions
Staging demo
-
Login with the HuggingFace to access git.
poetry run huggingface-cli login
Follow the instructions to use your git SSH keys to talk to HuggingFace.
-
Create a huggingface space from your browser: huggingface.co/spaces
-
Turn on persistent storage in the Settings UI.
-
Set .env.local environment variables so you can upload data to the space:
# The repo to use for the huggingface demo. HF_STAGING_DEMO_REPO='lilacai/your-space' # To authenticate with HuggingFace for uploading to the space. HF_USERNAME='your-username'
-
Deploy to your HuggingFace Space:
poetry run deploy-hf \ --dataset=$DATASET_NAMESPACE/$DATASET_NAME # --concept is optional. By default all lilac/* concepts are uploaded. This flag enables uploading other concepts from local. # --hf_username and --hf_space are optional and can override the ENV for local uploading.
Deployment
To build the docker image:
./scripts/build_docker.sh
To run the docker image locally:
docker run -p 5432:5432 lilac_blueprint
Authentication
Authentication is done via Google login. A Google Client token should be created from the Google API Console. Details can be found here.
By default, the Lilac google client is used. The secret can be found in Google
Cloud console, and should be defined under GOOGLE_CLIENT_SECRET
in .env.local.
For the session middleware, a random string should be created and defined as LILAC_OAUTH_SECRET_KEY
in .env.local.
You can generate a random secret key with:
import string
import random
key = ''.join(random.choices(string.ascii_uppercase + string.digits, k=64))
print(f"LILAC_OAUTH_SECRET_KEY='{key}'")
Publishing on pip
To authenticate, add the PYPI_TOKEN
to your .env.local
file. You can get the token from
pypi.org. Then run the following script:
./scripts/publish_pip.sh
Configuration
To use various API's, API keys need to be provided. Create a file named .env.local
in the root, and add variables that are listed in .env
with your own values.
Testing
Run all the checks before mailing:
./scripts/checks.sh
Test python:
./scripts/test_py.sh
Test JavaScript:
./scripts/test_ts.sh
Ingesting datasets from CLI
Datasets can be ingested entirely from the UI, however if you prefer to use the CLI you can ingest data with the following command:
poetry run python -m lilacai.data_loader \
--dataset_name=$DATASET \
--output_dir=./data/ \
--config_path=./datasets/the_movies_dataset.json
NOTE: You have to have a JSON file that represents your sour configuration, in this case "the_movies_dataset.json".
Tips
Recommended dev tools
Installing poetry
You may need the following to install poetry:
- Install XCode and sign license
- XCode command line tools (MacOS)
- homebrew (MacOS)
- pyenv (Python version management)
- Set your current python version
- Python Poetry
Troubleshooting
pyenv install not working on M1
If your pyenv does not work on M1 machines after installing xcode, you may need to reinstall xcode command line tools. Stack Overflow Link
No module named _lzma
Follow instructions from pyenv:
- Uninstall python via
pyenv uninstall
- Run
brew install openssl readline sqlite3 xz zlib tcl-tk
- Reinstall python via
pyenv install
$ sudo rm -rf /Library/Developer/CommandLineTools
$ xcode-select --install
Installing TensorFlow on M1
M1/M2 chips need a special TF installation. These steps are taken from the official Apple docs:
- Click here to download Conda env
- Run:
chmod +x ~/Downloads/Miniforge3-MacOSX-arm64.sh
sh ~/Downloads/Miniforge3-MacOSX-arm64.sh
source ~/miniforge3/bin/activate
- Install the TensorFlow
2.9.0
dependencies:conda install -c apple tensorflow-deps=2.9.0
Too many open files on MacOS
When downloading and pre-processing TFDS datasets, you might get too many open files
error. To fix, increase the max open files limit.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file lilacai-0.0.5.tar.gz
.
File metadata
- Download URL: lilacai-0.0.5.tar.gz
- Upload date:
- Size: 136.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.9.13 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 731e6288f3343b3f9433a8a86fe880ca6e4652fdec4836cd5497df47ed209584 |
|
MD5 | ca341dd9475709bbdf613a211d530b8b |
|
BLAKE2b-256 | 1f3bf6c647cc17dd31771a826cf46270e1206526d75b66a68e4e025b4a3e0485 |
File details
Details for the file lilacai-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: lilacai-0.0.5-py3-none-any.whl
- Upload date:
- Size: 176.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.9.13 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c0bb4cb905bd51c019370aff7821df66329498cdc040b8f62cda4ad872eea10 |
|
MD5 | 5e579a1e6989d94e8c2c04f4ff225f1f |
|
BLAKE2b-256 | d840be56f1f7e1815d200ddb8d6e0270afc13477fcf6bb006ec889f9649a90ba |