Skip to main content

File search tools (by name and content) for AI agent pipelines

Project description

🔍 AzureAICommunity - Agent - File Search

File search tools (by name and content) for AI agent applications built on the Agent Framework.

PyPI Version Python Versions PyPI Downloads License PyPI Status

Give your agent the ability to search files by name or content — with full glob support, encoding fallback, and configurable depth.

Getting Started · Configuration · Usage · Contributing


Overview

azureaicommunity-agent-file-search provides two @tool-decorated functions that can be wired directly into any agent-framework agent. The agent can search for files by glob pattern or scan file contents for a string — with sensible defaults and a fully configurable SearchConfig for fine-grained control.


✨ Features

Feature
🔎 Search by name — glob-pattern matching (*.py, main*, report.pdf) across a directory tree
📄 Search by content — full-text scan of file contents with case-sensitive or case-insensitive matching
⚙️ Configurable — control max results, depth, hidden files, extension filters, file size limits, and more
🔌 Agent-ready — tools decorated with @tool, drop straight into any agent-framework agent
🌐 Encoding-aware — tries multiple encodings (UTF-8, Latin-1) before skipping a file
🚫 Binary-safe — skips binary files automatically via null-byte detection
🔁 Symlink-safe — optional symlink following with loop detection
📦 Provider-agnostic — works with Ollama, Azure OpenAI, or any agent-framework compatible client

📦 Installation

pip install azureaicommunity-agent-file-search

🚀 Quick Start

import asyncio
from agent_framework.ollama import OllamaChatClient
from file_search_module import file_search_by_name, file_search_by_content, configure

# Optional: set global defaults
configure(max_results=50, max_depth=10, skip_hidden=True)

agent = OllamaChatClient(model="llama3.2").as_agent(
    name="FileSearchAgent",
    instructions="You are a helpful file-search assistant. Always search under C:\\MyProject.",
    tools=[file_search_by_name, file_search_by_content],
)

async def main():
    session = agent.create_session()
    response = await agent.run("Find all Python files in the project.", session=session)
    print(response.text)

asyncio.run(main())

🧑‍💻 Usage

Search by file name

from file_search_module import file_search_by_name

# All Python files
results = file_search_by_name("*.py", path="C:\\MyProject")

# Files starting with "main"
results = file_search_by_name("main*", path="C:\\MyProject")

# Case-sensitive match
results = file_search_by_name("README.md", path="C:\\MyProject", case_sensitive=True)

# Only .py and .txt files
results = file_search_by_name("*", path="C:\\MyProject", file_types=[".py", ".txt"])

Search by file content

from file_search_module import file_search_by_content

# Case-sensitive (default)
results = file_search_by_content("middleware", path="C:\\MyProject")

# Case-insensitive
results = file_search_by_content("TODO", path="C:\\MyProject", case_sensitive=False)

# Restrict to Python files only
results = file_search_by_content("async def", path="C:\\MyProject", file_types=[".py"])

Per-call config override

from file_search_module import file_search_by_name, SearchConfig

custom = SearchConfig(max_results=5, include_extensions=[".py"], skip_hidden=True)
results = file_search_by_name("*.py", path="C:\\MyProject", config=custom)

⚙️ Configuration

configure() — set global defaults

from file_search_module import configure

configure(
    max_results=100,
    max_depth=5,
    skip_hidden=True,
    exclude_extensions=[".pyc", ".pyo", ".exe", ".dll"],
    encodings=["utf-8", "latin-1"],
)

SearchConfig fields

Parameter Type Default Description
max_results int 200 Maximum number of paths returned per search call
max_depth int 20 Maximum directory recursion depth
max_file_size_bytes int 10485760 (10 MB) Files larger than this are skipped during content search
binary_check_bytes int 8192 Bytes read to detect binary files (null-byte probe)
follow_symlinks bool False Whether to follow symbolic links
skip_hidden bool False Skip files and folders starting with .
include_extensions list[str] | None None Whitelist of extensions — None means all
exclude_extensions list[str] | None None Blacklist of extensions
encodings list[str] ["utf-8", "latin-1"] Encoding fallback chain for content search

⚙️ How It Works

file_search_by_name

1. Validate query and path inputs
2. Auto-normalize bare extensions: 'py' → '*.py', '.py' → '*.py'
3. Walk directory tree (respecting max_depth, skip_hidden, follow_symlinks)
4. For each file: check extension filters, apply glob match
5. Return relative paths, capped at max_results

file_search_by_content

1. Validate query and path inputs
2. Walk directory tree (same depth/hidden/symlink rules)
3. For each file: skip if > max_file_size_bytes, skip if binary
4. Try each encoding in the fallback chain until the file is readable
5. Search for the query string (case-sensitive or insensitive)
6. Return relative paths of matching files, capped at max_results

🤝 Contributing

Contributions are welcome! Please open an issue to discuss what you'd like to change before submitting a pull request.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-feature)
  3. Commit your changes (git commit -m 'Add my feature')
  4. Push to the branch (git push origin feature/my-feature)
  5. Open a Pull Request

📄 License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azureaicommunity_agent_file_search-1.0.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file azureaicommunity_agent_file_search-1.0.0.tar.gz.

File metadata

File hashes

Hashes for azureaicommunity_agent_file_search-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a306ab149621d3fbf6aefacf9427865c100f04880a1cb4a2f715782647ac1c43
MD5 c949e7d3ae030d20087745350294d728
BLAKE2b-256 ebf9c06bd99cd25d33d7b327843cc00a94a3071e9a0722daae8c87cc87f0e485

See more details on using hashes here.

File details

Details for the file azureaicommunity_agent_file_search-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for azureaicommunity_agent_file_search-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 21e24a9bd832c72233bd22d46334c3a4dc9da89153185612c532c5685882e720
MD5 158b46bd0633ebae389530774d7802b1
BLAKE2b-256 9ce6d622771a8200e9ff1281ce7ae7d2ee90cb74e49689ad347bd059f9d6a26b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page