No project description provided
Project description
insight-cli
Overview
insight is a search engine that enables developers to semantically search Python codebases using natural language queries. The insight-cli provides a CLI for developers to use insight.
Installation
Before installing the insight-cli, ensure that Python version 3.10.0+ is installed. To install the insight-cli globally, run the following command:
$ pip install insight-cli
To ensure the insight-cli was successfully installed, check the insight-cli version by running the following command:
$ insight --version
CLI Commands
The following cli commands require an internet connection.
To initialize the current directory as an insight repository, run the following command:
$ insight --initialize
The following commands must be ran in a directory that has been initialized as an insight repository.
To display the Python files (.py files) and lines in an insight repository (excluding the files and directories specified in the .insightignore file) that semantically match a given natural language query, run the following command:
$ insight --query "<query>"
To uninitialize an insight repository, run the following command:
$ insight --uninitialize
.insightignore
The .insightignore file contains regex patterns that specify directory and file paths to ignore in an insight repository.
- Each line will be considered as a single regex pattern.
- Empty lines are not matchable, they serve only as separators for readability.
- Lines starting with a hashtag '#' serve as comments. A single backslash '\' is placed in front of the first hashtag for patterns that begin with a literal hashtag.
- Scope comments designate patterns to apply exclusively within a specified scope until encountering another scope comment. '## _directory_' and '## _file_' are scope comments which designate patterns to apply within directory and file path scopes respectively.
# Ignore all directory and file paths ending in "test.py"
.*test\.py$
# Ignore all directory and file paths starting with "test_"
^test_
# Ignore all directory and file paths that are exactly "#.py"
\#\.py$
## _directory_
# Patterns now only apply to directory paths
# Ignore all directory paths starting with "main"
^main
## _file_
# Patterns now only apply to file paths
# Ignore all file paths starting with "cache"
^cache
Example Usage
Install the insight-cli.
$ pip install insight-cli
Verify the installation was successful by checking the version.
$ insight --version
Change the current working directory to the desired codebase. This example will use the following GitHub repository: https://github.com/grantchen2003/instapix-word2vec.
$ git clone https://github.com/grantchen2003/instapix-word2vec
$ cd instapix-word2vec
Create a .insightignore file in the current directory and specify that insight should ignore the proto directory.
$ echo -e "## _directory_\n^proto$" > .insightignore
Initialize the current directory as an insight repository. This will create a .insight directory inside the current directory.
$ insight --initialize
Initialized insight repository in \path\to\current\directory\instapix-word2vec
Search in the current insight repository (excluding the proto directory) for the top 3 code snippets that match the query of "function that loads the word2vec model".
$ insight --query "function that loads the word2vec model" 3
3 matches found in the following files:
src\word2vec_service.py
Line 5: from sklearn.metrics.pairwise import cosine_similarity
src\word2vec_service.py
Line 13 - 21: def load_model():
print("loading model")
start_time = time.process_time()
model = KeyedVectors.load_word2vec_format(
os.environ["MODEL_PATH"], binary=True, limit=None
)
end_time = time.process_time()
print(f"loaded model in {round(end_time - start_time, 1)} seconds")
return model
src\word2vec_service.py
Line 24 - 49: class Word2Vec(word2vec_pb2_grpc.Word2VecServiceServicer):
def __init__(self):
self.model = load_model()
self.vocabulary = set(self.model.key_to_index.keys())
def EmbedWords(self, request, context):
print("EmbedWords request received")
words = request.words
embeddings = [
word2vec_pb2.Embedding(embedding=self.model.get_vector(word).tolist())
for word in words
if word in self.vocabulary
]
return word2vec_pb2.EmbedWordsResponse(embeddings=embeddings)
def Similarity(self, request, context):
print("Similarity request received")
embeddings1 = np.array([element.embedding for element in request.embeddings1])
embeddings2 = np.array([element.embedding for element in request.embeddings2])
if len(embeddings1) == 0 or len(embeddings2) == 0:
return word2vec_pb2.SimilarityResponse(similarity=0)
avg_embedding1 = np.mean(embeddings1, axis=0)
avg_embedding2 = np.mean(embeddings2, axis=0)
return word2vec_pb2.SimilarityResponse(
similarity=cosine_similarity([avg_embedding1], [avg_embedding2])[0][0]
)
Contributing
Interested in contributing? Please read the Contribution Guidelines to get started.
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for insight_cli-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6373088468390db173db9b7e4962ef380481f69e2d30224b6637a363017b8439 |
|
MD5 | 353e2e54b0c7e0888efd47d4123a2ac9 |
|
BLAKE2b-256 | 6f954a4f86469dcf450c31a921bbf0a0e26c60a21baa95f82063fde16e7ec944 |