Skip to main content

A webmining CLI tool & library for python.

Project description

Build Status

Minet

minet is a webmining CLI tool & library for python. It adopts a lo-fi approach to various webmining problems by letting you perform a variety of actions from the comfort of your command line. No database needed: raw data files will get you going.

In addition, minet also exposes its high-level programmatic interface as a library so you can tweak its behavior at will.

Features

  • Multithreaded, memory-efficient fetching from the web.
  • Multithreaded, scalable crawling using a comfy DSL.
  • Multiprocessed raw text content extraction from HTML pages.
  • Multiprocessed scraping from HTML pages using a comfy DSL.
  • URL-related heuristics utilities such as extraction, normalization and matching.
  • Data collection from various APIs such as CrowdTangle.

Installation

minet can be installed using pip:

pip install minet

Cookbook

To learn how to use minet and understand how it may fit your use cases, you should definitely check out our Cookbook.

Usage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for minet, version 0.31.0
Filename, size File type Python version Upload date Hashes
Filename, size minet-0.31.0-py3-none-any.whl (88.6 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size minet-0.31.0.tar.gz (59.0 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page