Content-Aware File System.
Project description
Content-Aware File System.
Installation
pip install intellifs
Note: intellifs only indexes plain text files, HTML, XML and PDF soures by default.
- Add support for all document types
pip install "unstructured[all-docs]"
Refer unstructured installation documentation for more control over document types.
CLI Usage
Display help section
ifs
Usage: ifs COMMAND
Content-Aware File System.
╭─ Commands ─────────────────────────────────────────────╮
│ embedder Default embedder. │
│ index Index a file or directory. │
│ search Perform semantic search in a directory. │
│ version Display application version. │
╰────────────────────────────────────────────────────────╯
╭─ Parameters ───────────────────────────────────────────╮
│ help,-h Display this message and exit. │
╰────────────────────────────────────────────────────────╯
index
command
ifs index help
Usage: ifs index [ARGS]
Index a file or directory.
╭─ Arguments ──────────────────────────────────────────╮
│ * PATH Path to file or directory. [required] │
╰──────────────────────────────────────────────────────╯
-
Indexing a file.
ifs index ./Cyber.pdf
-
Indexing a directory.
ifs index ./test_docs
search
command
ifs search help
Usage: ifs search [ARGS] [OPTIONS]
Perform semantic search in a directory.
╭─ Arguments ───────────────────────────────────────────────────────────╮
│ DIR Start search directory path. [default: /home/synacktra] │
╰───────────────────────────────────────────────────────────────────────╯
╭─ Parameters ──────────────────────────────────────────────────────────╮
│ * --query -q Search query string. [required] │
│ --max-results -k Maximum result count. [default: 5] │
│ --threshold -t Minimum filtering threshold value. │
│ --return -r Component to return. [choices: path,context] │
╰───────────────────────────────────────────────────────────────────────╯
-
Search in current directory
ifs search --query "How does intellifs work?"
-
Search in specific directory
ifs search path/to/directory --query "How does intellifs work?"
-
Get specific amount of results
ifs search -q "How does intellifs work?" -k 8
-
Control threshold value for better results
ifs search -q "How does intellifs work?" -t 0.5
-
Get specific component of results
[default: path mapped contexts JsON]
ifs search -q "How does intellifs work?" -r path
embedder
command
ifs embedder help
Usage: ifs embedder [OPTIONS]
Default embedder.
╭─ Parameters ────────────────────────────────────╮
│ --select -s Select from available embedders. │
╰─────────────────────────────────────────────────╯
- Display default embedder
ifs embedder
{ "model": "BAAI/bge-small-en-v1.5", "dim": 384, "description": "Fast and Default English model", "size_in_GB": 0.13 }
- Select from available embedders
Uses https://github.com/synacktraa/minifzf for selection.
ifs embedder --select
shell
command
Starts an interactive shell.
https://github.com/synacktraa/synacktraa/assets/91981716/b746ccf8-e27b-4abd-99cf-528677fb0ef8
Library Usage
Initialize FileSystem
from intellifs import FileSystem
ifs = FileSystem()
By default it uses default embedder. You can specify a different
Embedder
instance too.
from intellifs.embedder import Embedder
ifs = FileSystem(
embedder=Embedder(model="<model-name>", dim=<model-dimension>)
)
Use
Embedder.available_models
to list supported models.
index
method
-
Indexing a file
from intellifs.indexables import File ifs.index(File(__file__))
-
Indexing a directory
from intellifs.indexables import Directory ifs.index(Directory('path/to/directory'))
is_indexed
method
Verify If a
File
orDirectory
has been indexed.
file = File(__file__)
ifs.is_indexed(file)
ifs.is_indexed(file.directory)
search
method
-
Search in current directory
ifs.search(query="How does intellifs work?")
-
Search in specific directory
ifs.search( directory=Directory('path/to/directory'), query="How does intellifs work?" )
-
Get specific amount of results
ifs.search(query="How does intellifs work?", max_results=8)
-
Control threshold value for better results
ifs.search( query="How does intellifs work?", score_threshold=0.5 )
How It Works?
The FileSystem
is a sophisticated file system management tool designed for organizing and searching through files and directories based on their content. It utilizes embeddings to represent file contents, allowing for semantic search capabilities. Here's a breakdown of its core components and functionalities:
Core Components
Metadata
and Index
- Metadata: A structured representation that includes file contexts (chunks of text extracted from files), the directory path, filepath, and the last modified timestamp.
- Index: Consists of embeddings (vector representations of file contents) and associated metadata.
FileSystem
Class
The FileSystem
class is the heart of the system, integrating various components to facilitate file indexing, searching, and management.
Initialization
Upon initialization, the FileSystem
prepares the environment for indexing and searching files and directories with the following steps:
-
Embedder Setup: An embedder is initialized to generate vector embeddings from file content. If a custom embedder is not provided, the system defaults to a pre-configured option suitable for general-purpose text embedding.
-
Local Storage Initialization: The system sets up a local storage mechanism to cache the embeddings and metadata. This involves:
- Determining the storage path based on the embedder's name, ensuring a unique cache directory for different embedders.
- Creating a mapping file (
map.json
) within the cache directory to maintain a record of collection names associated with base paths.
-
Base Path Handling: The FileSystem intelligently handles base paths to accommodate the file system structure of different operating systems.
- Windows Systems: On Windows, base paths are recognized as drive letters (e.g.,
C:
,D:
). This allows the system to manage files and directories across different drives distinctly. - POSIX Systems: For POSIX-compliant systems (like Linux and macOS), base paths are identified as root directories (e.g.,
/var
,/home
). This approach facilitates indexing and searching files in a structured manner consistent with UNIX-like directory hierarchies.
- Windows Systems: On Windows, base paths are recognized as drive letters (e.g.,
-
Collection Management: Utilizes a local persistent vector database, managed through the
qdrant_client
, to store and retrieve embeddings and metadata.
Indexing Files and Directories
- File Indexing: Generates an index for a single file by extracting text, partitioning it into manageable chunks, and converting these chunks into embeddings. Metadata is also generated to include the file's contextual information and modification timestamp.
- Directory Indexing: Recursively indexes all files within a directory. It checks for modifications to ensure the index is current, adds new files, and removes entries for deleted files.
Searching
Allows for semantic search within specified directories or globally across all indexed files. Searches are performed using query embeddings to find the most relevant files based on their content embeddings.
Workflow
- Generate Index: When a file or directory is indexed, the system extracts text, generates embeddings, and stores this information along with metadata in a dedicated collection.
- Search: Input a text query to search across indexed files and directories. The system converts the query into an embedding and retrieves the most relevant files based on cosine similarity.
- Management: The system supports adding, updating, and deleting files or directories in the index to keep the database current with the filesystem.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file intellifs-0.0.1.tar.gz
.
File metadata
- Download URL: intellifs-0.0.1.tar.gz
- Upload date:
- Size: 14.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.11 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60e9936ac748db68ba04462625dfba01e220b7eafa9d63c833722f7501da7c11 |
|
MD5 | 368281b74d66437286d6e247bbf6350b |
|
BLAKE2b-256 | 8893eefe8ce832df098697e7177f4dbc2f22dc04a5d2d85804469d8b43fd080d |
File details
Details for the file intellifs-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: intellifs-0.0.1-py3-none-any.whl
- Upload date:
- Size: 16.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.11 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f9707c01fe224c50b9baa4fa4f09d6694e59205474b1fb613fa236364fa95a1 |
|
MD5 | 43c8f9b9a61bb090823749bb577e0677 |
|
BLAKE2b-256 | 4585857d86d9f3986f0d26a63e3d438e698a778cac76018e39c1ed9d6124bae9 |