Skip to main content

A CLI tool to recursively download specific file types from websites (e.g., pdf, txt.

Project description

📚 Link Miner

Linkminer is a command-line tool that recursively crawls a website and downloads files of specified types (e.g., PDFs, videos, documents). Originally built to fetch KCSE past papers, it's now a general-purpose file scraper.


🚀 Features

  • 🎯 Download specific file types (.pdf, .mp4, .docx, etc.)
  • 🔁 Recursive crawling with configurable depth
  • ⚙ Supports config files and CLI arguments
  • 💾 Skips files that already exist
  • 🎨 ASCII banner + colorized output for a better UX

📦 Installation

🔧 From source:

git clone https://github.com/skye-cyber/kcse-fetcher.git
cd kcse-fetcher
pip install .

OR

pip install linkminer

🧪 Usage

🔹 Basic (CLI only):

python -m kcse_fetcher https://example.com --types pdf mp4 --depth 2

🔹 Using a config file:

{
  "url": "https://example.com",
  "types": ["pdf", "mp4"],
  "depth": 3,
  "output": "downloads"
}
  • Then Run:
python -m kcse_fetcher -c config.json

⚙ Options

Option Description url Starting URL to crawl --types File extensions to download --depth Max recursion depth (None = no limit) --output Output directory --config Path to JSON config file


🛠 Example Output

[i] Crawling: https://example.com
[i] File types: pdf, mp4
[i] Output dir: downloads
[i] Depth limit: 2

[] Downloaded: kcse_2021_english.pdf
[] Downloaded: kcse_2021_kiswahili.pdf
[] Skipped (exists): kcse_2021_chemistry.pdf

📘 License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.

See the LICENSE file for more details. See the LICENSE file for details.


💡 Author

Skye - Wambua

  • Made with 💻 and ☕ in Kenya

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linkminer-1.0.0.tar.gz (46.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

linkminer-1.0.0-py3-none-any.whl (34.6 kB view details)

Uploaded Python 3

File details

Details for the file linkminer-1.0.0.tar.gz.

File metadata

  • Download URL: linkminer-1.0.0.tar.gz
  • Upload date:
  • Size: 46.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for linkminer-1.0.0.tar.gz
Algorithm Hash digest
SHA256 142b186433d7fd0ffccb5cf99b4ca46e14503800ff7fd72a0278467a8bd2c261
MD5 a4a5b34a22b4b1333b0cc5f931264587
BLAKE2b-256 2d03c76087ff404a4c571c68136e1d337772c9187335fffc6e552c3665d7e542

See more details on using hashes here.

File details

Details for the file linkminer-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: linkminer-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 34.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for linkminer-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ebbcc6dfdc295097b965341f4db3126b629cf102465681e3ed21a274b1d68b13
MD5 7efd9ba343963feb1fe4e18b280f7f32
BLAKE2b-256 7dc61ea8e74479ab554e330c3197f9de5af605c59e6e794a8732601d672e7920

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page