Pathik - Advanced web path discovery tool
Project description
Pathik
A powerful web crawling tool with Go implementation and Python bindings. Supports local storage and optional Cloudflare R2 storage.
INSTALLATION
Prerequisites
- Go 1.16+
- Python 3.6+
Install Python Package
pip install pathik
Clone Repository
git clone https://github.com/yourusername/pathik.git
cd pathik
Install in Development Mode
pip install -e .
BUILDING GO BINARY
Navigate to Pathik Directory
cd pathik
Build Binary Using Script
python build_binary.py
Expected Output:
Building Go binary in /path/to/pathik
Build successful!
Binary located at: /path/to/pathik/pathik_bin
Testing binary...
Binary output: [Help text from binary]
USAGE
Python Usage
Basic Crawling
import pathik
import os
output_dir = os.path.abspath("output_data")
os.makedirs(output_dir, exist_ok=True)
urls = ["https://example.com"]
results = pathik.crawl(urls, output_dir)
for url, files in results.items():
print(f"URL: {url}")
print(f"HTML: {files['html']}")
print(f"Markdown: {files['markdown']}")
R2 Upload (Optional)
results = pathik.crawl_to_r2(
["https://example.com"],
uuid_str="my-id"
)
for url, info in results.items():
print(f"R2 HTML Key: {info['r2_html_key']}")
print(f"Local File: {info['local_html_file']}")
Direct Go Usage
Local Crawling
./pathik_bin -crawl -outdir ./output https://example.com
R2 Upload
./pathik_bin -r2 -uuid my-id -dir ./output https://example.com
TROUBLESHOOTING
Missing Binary
cd pathik
python build_binary.py
Path Issues
# Use absolute paths
output_dir = os.path.abspath("./output")
Import Errors
pip uninstall -y pathik
cd pathik && pip install -e .
PROJECT STRUCTURE
main.go- CLI interfacecrawler/- Web crawling logicstorage/- File storage handlerspathik/- Python bindings__init__.py- Package setupcrawler.py- Go integrationsimple.py- Python fallback
CONFIGURATION
Configure R2 credentials in storage.go or through environment variables.
LICENSE
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pathik-0.1.0.tar.gz
(6.1 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pathik-0.1.0.tar.gz.
File metadata
- Download URL: pathik-0.1.0.tar.gz
- Upload date:
- Size: 6.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a83ef0b5393c3c5645262e74340304217fe4e8bbdecf1f76a7078d5de7e25dac
|
|
| MD5 |
c1ee5788f893a8075ed8b778c8607e8d
|
|
| BLAKE2b-256 |
c3b190fb7ac36b794d15af7815c10c2d4d8b4a2d0d6bc915a6d6be122a1445c9
|
File details
Details for the file pathik-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pathik-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ce50671c75b4fa98734f4b97e50ba490561d18e15c8846a5a06881f35f47d1d
|
|
| MD5 |
cb3133b7d1e3e38ab6a1685879f693b9
|
|
| BLAKE2b-256 |
ec09d0fbaa19cd3454f55a17491d3268289fc4f542c762ded9f3db7c5c081776
|