Skip to main content

Local semantic search. Stupidly simple.

Project description

AI Filesystem

Open In Colab

Local semantic search over folders. Why didn't this exist?

pip install aifs
pip install "unstructured[all-docs]" # If you want to parse all doc types. Includes large packages!
from aifs import search

search("How does AI Filesystem work?", path="/path/to/folder")
search("It's not unlike how Spotlight works.") # Path defaults to CWD

How it works


aifs

Running aifs.search will chunk and embed all nested supported files (.txt, .py, .sh, .docx, .pptx, .jpg, .png, .eml, .html, and .pdf) in path. It will then store these embeddings into an _.aifs file in path.

By storing the index, you only have to chunk/embed once. This makes semantic search very fast after the first time you search a path.

If a file has changed or been added, aifs.search will update or add those chunks. We still need to handle file deletions (we welcome PRs).

In detail:

  1. If a folder hasn't been indexed, we first use unstructured to parse and chunk every file in the path.
  2. Then we use chroma to embed the chunks locally and save them to a _.aifs file in path.
  3. Finally, chroma is used again to semantically search the embeddings.

If an _.aifs file is found in a directory, it uses that instead of indexing it again. If some files have been updated, it will re-index those.

Goals

  • We should always have SOTA parsing and chunking. The logic for this should be swapped out as new methods arise.
    • Chunking should be semantic — as in, python and markdown files should have different chunking algorithms based on the expected content of those filetypes. Who has this solution?
    • For parsing, I think Unstructured is the best of the best. Is this true?
  • We should always have SOTA embedding. If a better local embedding model is found, we should automatically download and use it.
    • I think Chroma will always do this (is this true?) so we depend on Chroma.
  • This project should stay minimally scoped — we want aifs to be the best local semantic search in the universe.

Why?

We built this to let open-interpreter quickly semantically search files/folders.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aifs-0.0.16.tar.gz (53.9 kB view details)

Uploaded Source

Built Distribution

aifs-0.0.16-py3-none-any.whl (54.4 kB view details)

Uploaded Python 3

File details

Details for the file aifs-0.0.16.tar.gz.

File metadata

  • Download URL: aifs-0.0.16.tar.gz
  • Upload date:
  • Size: 53.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.1.0

File hashes

Hashes for aifs-0.0.16.tar.gz
Algorithm Hash digest
SHA256 63a5df5e2fb08025802950b821c55175e44dfeac542c5475fa4f58495137017f
MD5 e796a3c155d8e5a320e211f944a6de7b
BLAKE2b-256 5985efa738a3ff1f3a569280a70f12908e91930bc462cddc6c5b5942815560a7

See more details on using hashes here.

File details

Details for the file aifs-0.0.16-py3-none-any.whl.

File metadata

  • Download URL: aifs-0.0.16-py3-none-any.whl
  • Upload date:
  • Size: 54.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.1.0

File hashes

Hashes for aifs-0.0.16-py3-none-any.whl
Algorithm Hash digest
SHA256 f9ab6632b2dc2d7158f4d069b8eaf13c027d08d335de31071d647382d6d3641d
MD5 1f36fabe21c8a880d7efb606b03e618b
BLAKE2b-256 988d28d294f4bc434d78a52d8f757cfcd4b86d497e0c21b2f1d8ec4a77009073

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page