A webmining CLI tool & library for python.
minet is a webmining CLI tool & library for python. It adopts a lo-fi approach to various webmining problems by letting you perform a variety of actions from the comfort of your command line. No database needed: raw data files will get you going.
In addition, minet also exposes its high-level programmatic interface as a library so you can tweak its behavior at will.
- Multithreaded, memory-efficient fetching from the web.
- Multithreaded, scalable crawling using a comfy DSL.
- Multiprocessed raw text content extraction from HTML pages.
- Multiprocessed scraping from HTML pages using a comfy DSL.
- URL-related heuristics utilities such as extraction, normalization and matching.
- Data collection from various APIs such as CrowdTangle.
minet can be installed using pip:
pip install minet
To learn how to use
minet and understand how it may fit your use cases, you should definitely check out our Cookbook.
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size minet-0.31.0-py3-none-any.whl (88.6 kB)||File type Wheel||Python version py3||Upload date||Hashes View|
|Filename, size minet-0.31.0.tar.gz (59.0 kB)||File type Source||Python version None||Upload date||Hashes View|