Skip to main content

Pathik - Advanced web path discovery tool

Project description

Pathik

A powerful web crawling tool with Go implementation and Python bindings. Supports local storage and optional Cloudflare R2 storage.

INSTALLATION

Prerequisites

  • Go 1.16+
  • Python 3.6+

Install Python Package

pip install pathik

Clone Repository

git clone https://github.com/yourusername/pathik.git
cd pathik

Install in Development Mode

pip install -e .

BUILDING GO BINARY

Navigate to Pathik Directory

cd pathik

Build Binary Using Script

python build_binary.py

Expected Output:

Building Go binary in /path/to/pathik
Build successful!
Binary located at: /path/to/pathik/pathik_bin
Testing binary...
Binary output: [Help text from binary]

USAGE

Python Usage

Basic Crawling

import pathik
import os

output_dir = os.path.abspath("output_data")
os.makedirs(output_dir, exist_ok=True)

urls = ["https://example.com"]
results = pathik.crawl(urls, output_dir)

for url, files in results.items():
    print(f"URL: {url}")
    print(f"HTML: {files['html']}")
    print(f"Markdown: {files['markdown']}")

R2 Upload (Optional)

results = pathik.crawl_to_r2(
    ["https://example.com"],
    uuid_str="my-id"
)

for url, info in results.items():
    print(f"R2 HTML Key: {info['r2_html_key']}")
    print(f"Local File: {info['local_html_file']}")

Direct Go Usage

Local Crawling

./pathik_bin -crawl -outdir ./output https://example.com

R2 Upload

./pathik_bin -r2 -uuid my-id -dir ./output https://example.com

TROUBLESHOOTING

Missing Binary

cd pathik
python build_binary.py

Path Issues

# Use absolute paths
output_dir = os.path.abspath("./output")

Import Errors

pip uninstall -y pathik
cd pathik && pip install -e .

PROJECT STRUCTURE

  • main.go - CLI interface
  • crawler/ - Web crawling logic
  • storage/ - File storage handlers
  • pathik/ - Python bindings
  • __init__.py - Package setup
  • crawler.py - Go integration
  • simple.py - Python fallback

CONFIGURATION

Configure R2 credentials in storage.go or through environment variables.

LICENSE

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pathik-0.1.0.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pathik-0.1.0-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file pathik-0.1.0.tar.gz.

File metadata

  • Download URL: pathik-0.1.0.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for pathik-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a83ef0b5393c3c5645262e74340304217fe4e8bbdecf1f76a7078d5de7e25dac
MD5 c1ee5788f893a8075ed8b778c8607e8d
BLAKE2b-256 c3b190fb7ac36b794d15af7815c10c2d4d8b4a2d0d6bc915a6d6be122a1445c9

See more details on using hashes here.

File details

Details for the file pathik-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pathik-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for pathik-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0ce50671c75b4fa98734f4b97e50ba490561d18e15c8846a5a06881f35f47d1d
MD5 cb3133b7d1e3e38ab6a1685879f693b9
BLAKE2b-256 ec09d0fbaa19cd3454f55a17491d3268289fc4f542c762ded9f3db7c5c081776

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page