Simple tools for downloading, cleaning, extracting and parsing content
Project description
snagit
Yet another scrapping tool.
snagit allows you to scrape multiple pages or documents by either running script files, or in the interactive REPL. For instance:
$ snagit
Type "help" for more information. Ctrl+c to exit
> load http://httpbin.org/links/3/{} range='0-2'
> print
<html><head><title>Links</title></head><body>0 <a href='/links/3/1'>1</a> <a href='/links/3/2'>2</a> </body></html>
<html><head><title>Links</title></head><body><a href='/links/3/0'>0</a> 1 <a href='/links/3/2'>2</a> </body></html>
<html><head><title>Links</title></head><body><a href='/links/3/0'>0</a> <a href='/links/3/1'>1</a> 2 </body></html>
> select a
> print
<a href="/links/3/1">1</a>
<a href="/links/3/2">2</a>
<a href="/links/3/0">0</a>
<a href="/links/3/2">2</a>
<a href="/links/3/0">0</a>
<a href="/links/3/1">1</a>
> unwrap_attr a href
> print
/links/3/1
/links/3/2
/links/3/0
/links/3/2
/links/3/0
/links/3/1
> list
LOAD 'http://httpbin.org/links/3/{}' range='0-2'
PRINT
SELECT 'a'
PRINT
UNWRAP_ATTR 'a' 'href'
PRINT
Features
Process data as either a text block, lines of text, or HTML (using BeautifulSoup)
Built-in scripting language
REPL for command line interaction
Requirements
Python 3.5+
bs4 (BeautifulSoup 4.x)
requests
strutil
cachely
For testing:
pytest
pytest-cov
Development and Testing
Assumptions: you have pip and virtualenv installed.
$ virtualenv snagit $ source bin/activate $ git clone https://github.com/dakrauth/snagit.git $ cd snagit $ inv develop $ inv test $ inv cov
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file snagit-0.3.0.tar.gz.
File metadata
- Download URL: snagit-0.3.0.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7b03cbd8eccfe492de2e400e7bd23a4ba85d46f4836ef139902040f3c34c920
|
|
| MD5 |
4be781b4cd3b554e40c1f5c06e423473
|
|
| BLAKE2b-256 |
02f709f9d3f301932ff8dc13124623fa489f9b3dd8a7374da0d1514d98ea4769
|
File details
Details for the file snagit-0.3.0-py3-none-any.whl.
File metadata
- Download URL: snagit-0.3.0-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
348fde452098dc7deb024778e081128d15d176d4fff6e4cc48e7ae2d038a4b2f
|
|
| MD5 |
d73cd1848693930067b5fa4db31656dd
|
|
| BLAKE2b-256 |
55d88db31f93564a87dcaff628574fe1334ce41f208356ba3c84548a3e363f09
|