Crawl, collect & document your codebase in Markdown — now with integrated LLM analysis.

These details have not been verified by PyPI

Project links

Project description

Crawlect

Now with LLM AI integrated analysis.
Crawl, Collect & Document Your Codebase in Markdown.

Crawlect

Crawlect is a Python module designed to crawl a given directory, collect relevant files and contents, document the entire structure in a clean, readable Markdown file, and analyze the whole project with LLM AI API feedback.

Whether you're analyzing someone else's code or sharing your own, Crawlect makes it effortless to generate a comprehensive project snapshot — complete with syntax-highlighted code blocks, a tree-like structure overview, and fine-tuned filtering rules.

Crawlect is a study project initiated by Yves Guillo & Alexandre Jenzer, supervised by Matthieu Ammiguet during He-Arc's CAS-IDD's Python module (2025).

Why Crawlect?

When starting with a new project — whether you're reviewing, refactoring, or collaborating — understanding its structure and key files is essential. Crawlect does the heavy lifting by:

Analyzes your codebase with integrated LLM API calls.
Crawl your project directory (recursively if needed),
Parsing .gitignore/.dockerignore/.crawlectignore rules to mimic your dev setup,
Masking sensitive data (like .env values),
Automatically generating a well-organized, shareable .md file,
Embedding file contents in Markdown-formatted code blocks.

Use cases

Quickly understand an unfamiliar codebase
Auto-document your projects
Share code context with collaborators
Safely include .env files without leaking sensitive values
Enhancing your workflow with LLM code analysis.

Think of Crawlect as your markdown-minion; obedient, efficient, and allergic to messy folders.

Getting Started

Crawlect is written in Python and requires minimal setup. Install the package or clone it to set up the virtual environment, and you’re ready to document codebases like a Markdown ninja.
Requires Python 3.10+

Install Crawlect via `pip` (The Handy Way™)

Tired of fiddling with virtual environments just to run a CLI tool? We got you.

You can install Crawlect globally in just a few seconds:

pip install crawlect

Then summon your loyal markdown minion from anywhere on your system:

crawlect -p . -o digest.md -open

No clutter, no drama. It just works.

Why choose the `pip` route?

You can call crawlect from any folder — no need to cd into the repo.
Great for repeated use or integrating into your tooling.
Keeps your system clean (no extra scripts, venvs, or manual installs).

(Optional) Local Dev Mode

If you're planning to tinker with Crawlect, we’ve still got you covered.

1. Clone the repo

git clone https://github.com/yvesguillo/crawlect.git
cd crawlect

2. Run the setup script

Make it executable first if needed (Linux/macOS):

chmod +x setup.sh

Then run:

./setup.sh

This script creates a venv, activates it, and installs dependencies.

OR manually install requirements.txt:

pip install -r ./requirements.txt

3. Run Crawlect

python -m crawlect -p . -o ../digest.md -open

This scan the current folder and generate a Markdown file named digest.md in the parent directory.

4. Teardown (optional)

When you're done (if you used setup.sh script), you can clean everything up with:

./teardown.sh

This deactivates and removes the venv and optional artifacts.

CLI Options

Here are the most useful options Crawlect understands:

Options	Description
`-p`, `--path`	Path to crawl (default is current folder `.`).
`-o`, `--output`	Static output file path (e.g. `./digest.md`).
`-op`, `--output_prefix`	Prefix for dynamic output unique file name (e.g. `./digest`).
`-os`, `--output_suffix`	Suffix for dynamic output unique file name (e.g. `.md`).
`-r`, `--recur`	Enable recursive crawling (default: enabled). Use `--no-recur` to disable.
`-d`, `--depth`	Scan depth limit (default is infinite).
`--crawlig`	Use .crawlectignore exclusion rules if exist (default: enabled). Use `--no-crawlig` to disable.
`--gitig`	Use .gitignore exclusion rules if exist (default: enabled). Use `--no-gitig` to disable.
`--dockig`	Use .dockerignore exclusion rules if exist (default: enabled). Use `--no-dockig` to disable.
`--xenv`	Sanitize .env variables to mitigate sensitive info leak risk (default: enabled). Use `--no-xenv` to disable.
`--tree`	Visualize directory tree in the output file (default: enabled). Use `--no-tree` to disable.
`-llmapi`, `--llm-api`	LLM provider to use (e.g., `openai` or `ollama`).
`-llmhost`, `--llm-host`	Host URL for the LLM API (only required for Ollama).
`-llmkey`, `--llm-api-key`	API key for the LLM (only required for OpenAI).
`-llmmod`, `--llm-model`	Model name to use (e.g., `gpt-4.1-nano` or `llama3`).
`-llmreq`, `--llm-request`	LLM tasks to perform: `review`, `docstring`, `readme`.
`-llmcust`, `--llm-custom-requests`	Custom LLM analysis prompts list (each prompt wrapped in `"`).
`-open`, `--open`	Open the output files once generated (default: disabled).
`-verbose`, `--verbose`	Toggle verbosity (default: enabled). Use `--no-verbose` to disable.

Examples

Scan awesomeproject folder with 2 depth level scan and write its digest.md in parent folder, including a project folder tree, while ignoring .gitignore and .dockerignore rules but interpreting .crawlectignore filtering, without sanitizing .env files, then open the generated file with the default system reader.

crawlect -p ./awesomeproject \
  -o ../digest.md \
  -d 2 \
  --no-gitig no \
  --no-dokig no \
  --no-xenv no \
  -open

Scan current folder and write its digest.md in parent folder, then request OpenAi's gpt-4.1-nano model to review and create docstrings from the codebase.

crawlect -p . \
  -o ../digest.md \
  --llm-api openai \
  --llm-api-key yoursupersecretkey \
  --llm-model gpt-4 \
  --llm-request review docstring \
  --llm-custom-requests "Is this project awesome? " "Can you write a poem about this codebase?"

How LLM Feature Works

Ever wish your code could write its own README, explain its quirks, or fill in those long-forgotten docstrings?
With Crawlect’s LLM-powered analysis, it can.

What happens under the hood?

When you add the --llm-* parameters to your command, Crawlect does:

Generate the full project digest as Markdown.
Read that digest and send it to your favorite LLM (OpenAI or Ollama, your pick).
Inject it into a custom prompt depending on your request:
- review: ask the model to review and critique the code.
- docstring: generate docstrings for classes and functions.
- readme: draft a clean, professional README.md based on your project.

The responses are then written to a second file (<output path>.analysis.md).

Supported LLMs

Ollama – Running your own local Ollama service?
If not, no worry, give a try to this one: LLM-Serve
Then, simply use --llm-api ollama, provide your --llm-host (e.g. http://localhost:11434), choose your --llm-model (e.g., llama3) and you are good to go!
OpenAI – Use with --llm-api openai, supply your --llm-api-key, and pick your --llm-model (e.g., gpt-4.1-nano).

Example

Scan current folder and write its digest.md in parent folder then request Ollama to run a request to Llama3 model and create README documentation from the codebase.

crawlect -p . \
  -o ../digest.md \
  --llm-api ollama \
  --llm-host http://localhost:11434 \
  --llm-model llama3 \
  --llm-request readme

Crawlect writes your digest, then generates an <output path>.analysis.md file packed with insights.
And Yes! This README.md have been generated like that. Well… with a bit of editing, yet much faster. Spend less time on boilerplate — more on content and style.

Bonus

Crawlect injects the entire codebase (in Markdown format) once, then asks the LLM to perform each task with that shared context. This means:

No repeated uploads.
Coherent answers across tasks.
Responses that actually make sense together.

How Filtering Works

Crawlect supports standard .gitignore filtering. You can use:

.crawlectignore (optional and custom rules — your secret weapon, auto-detected and parsed like Git would)
.gitignore and .dockerignore (auto-detected and parsed like Git would)

These filters follow the standard .gitignore syntax.

Bonus: Crawlect also exclude the ignore file itself from the digest, so your .crawlectignore or any ignore file won’t show up in the output unless you choose not to use these.

Example Output

Digest

Here's a sneak peek at what Crawlect produces as digest:

# my-awesome-project
2025.05.10 14:22

Generated with Crawlect.

## File structure

- **src/**
    - [main.py](#main&period;py)
    - [utils.py](#utils&period;py)

## Files:

### main.py  
[`src/main.py`](src/main.py)

```python
from .utils import un_plus_un

def main():
    print(f"Hello! Did you know that one plus one is strictly similar to \n{un_plus_un()}?")
```

### utils.py  
[`src/utils.py`](src/utils.py)

```python
def un_plus_un():
    return "deux"
```

Analysis

LLM code analysis looks like that:

////////////
// REVIEW //
////////////

<Markdown-formatted review by the LLM>


///////////////
// DOCSTRING //
///////////////

<Auto-generated docstrings with file/class/function structure>


////////////
// README //
////////////

<Markdown README suggestion ready to paste>


/////////////////////
// CUSTOM_PROMPT_1 //
/////////////////////

**Prompt**: `Is this project awesome?`


<Markdown-formatted response>


/////////////////////
// CUSTOM_PROMPT_2 //
/////////////////////

**Prompt**: `Can you write a poem about this codebase?`


<Markdown-formatted response>

Roadmap & Crazy Ideas

Crawlect-GUI (In the pipe! Alpha version during the end of June 2025…)
HTML output

Contributing

Got ideas? Spot a bug? Wanna make this thing even cooler?
Feel free to fork, star, or open an issue — we’d love to hear from you!

References and thanks

Markdown code syntax table - From jincheng9 on GitHub
Argpars boolean argument treatment - From Codemia
gitignore_parser by Michael Herrmann

If you find Crawlect useful, give it a ☆ to support the project!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.7

Feb 2, 2026

This version

1.0.6

Jun 13, 2025

1.0.5

May 29, 2025

1.0.4

May 18, 2025

1.0.3

May 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlect-1.0.6.tar.gz (67.2 kB view details)

Uploaded Jun 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crawlect-1.0.6-py3-none-any.whl (25.4 kB view details)

Uploaded Jun 13, 2025 Python 3

File details

Details for the file crawlect-1.0.6.tar.gz.

File metadata

Download URL: crawlect-1.0.6.tar.gz
Upload date: Jun 13, 2025
Size: 67.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for crawlect-1.0.6.tar.gz
Algorithm	Hash digest
SHA256	`d95f29e77aa5464493d43883c4f8df722edb8c51169101bd18e28900e83a4d0b`
MD5	`397c858344c4164bcf6a140eea5933d1`
BLAKE2b-256	`a0774ea898ecedb59292555bb63391b3fbe70bd56019e39e5b10e6fa57f064c8`

See more details on using hashes here.

File details

Details for the file crawlect-1.0.6-py3-none-any.whl.

File metadata

Download URL: crawlect-1.0.6-py3-none-any.whl
Upload date: Jun 13, 2025
Size: 25.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for crawlect-1.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c2e43fbcd4a9b4292af592bf0778bd662eac2a77a5c5d73d8284e19456eaa29f`
MD5	`e222f5f8f05202da770ec9bfc4a59342`
BLAKE2b-256	`0d602459f5eb65dd9ea75c876aeebc35722d82e8b8f7839581859104be936d44`

See more details on using hashes here.

Crawlect 1.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Crawlect

Why Crawlect?

Use cases

Getting Started

Install Crawlect via pip (The Handy Way™)

Why choose the pip route?

(Optional) Local Dev Mode

1. Clone the repo

2. Run the setup script

3. Run Crawlect

4. Teardown (optional)

CLI Options

Examples

How LLM Feature Works

What happens under the hood?

Supported LLMs

Example

Bonus

How Filtering Works

Example Output

Digest

Analysis

Roadmap & Crazy Ideas

Contributing

References and thanks

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Install Crawlect via `pip` (The Handy Way™)

Why choose the `pip` route?