CLI interpreter for xpath and css selectors
Project description
About
parselcli
is a command line interface wrapper for parsel package for evaluating css and xpath selection real time against web urls or local html files.
Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
Usage
$ parsel --help
Usage: parsel [OPTIONS] [URL]
Interactive shell for css and xpath selectors
Options:
-h TEXT request headers, e.g. -h "user-agent=cat
bot"
-xpath start in xpath mode instead of css
-p, --processors TEXT comma separated processors: {}
-f, --file FILENAME input from html file instead of url
-c TEXT compile css and return it
-x TEXT compile xpath and return it
--cache cache requests
--config TEXT config file [default:
/home/dex/.config/parsel.toml]
--embed start in embedded python shell
--shell [ptpython|ipython|bpython|python]
preferred embedded shell; default auto
resolve in order
--help Show this message and exit.
parselcli
reads XML or HTML file from url or disk and starts interpreter for xpath or css selectors.
By default it starts in css interpreter mode but can be switched to xpath by -xpath
command and switched back with -css
.
Interpreter also has auto complete and suggestions for selectors [in progress]
The interpreter also supports commands and embedding of python
, ptpython
, ipython
and bpython
shells.
Command can be called with -
prefix. List of available commands can be found by calling -help
command (see Example section).
Processors and Commands
parsecli
supports flags and commands in shell:
$ parsel "https://github.com/granitosaurus/parsel-cli"
> -help
available commands (use -command):
help: show help
debug: show debug info
embed: start interactive python shell
open: open current url in browser tab
view: open current html in browser tab
fetch: download from new url
css: switch to css selectors
xpath: switch to xpath selectors
available flags (use +flag to enable and -flag to disable)
strip: strip every element of trailing and leading spaces
first: take first element when there's only one
collapse: collapse lists when only 1 element
absolute: convert relative urls to absolute
join: join results into one
len: return length of results
Processors can be activated with +
prefix and deactivated with -
. These processors can be supplied inline:
> h1::text +strip
['parsel-cli']
or activated for whole session
> +strip
enabled flag: strip
Commands are just called as is with sometimes taking a positional argument:
> -fetch "http://some-other-url.com"
downloading "http://some-other-url.com"
> -view
opening document in browser
Example
$ parsel "https://github.com/granitosaurus/parsel-cli"
> h1::text
['\n ', '\n ', '\n\n', 'parsel-cli']
> +join +strip
enabled flag: join
enabled flag: strip
> h1::text
parsel-cli
> h1::text +len
4
> -xpath
switched to xpath
> //h1/text()
parsel-cli
> -css
switched to css
> -embed
>>> locals()
{'sel': <Selector xpath=None data='<html lang="en">\n <head>\n <meta char'>, 'response': <Response [200]>, 'request': <PreparedRequest [GET]>, '_': {...}, '_1': {...}}
>>> response
<Response [200]>
>>>
> -debug
200-https://github.com/granitosaurus/parsel-cli
enabled processors:
Join
Strip
> -help
available commands (use -command):
help: show help
debug: show debug info
embed: start interactive python shell
open: open current url in browser tab
view: open current html in browser tab
fetch: download from new url
css: switch to css selectors
xpath: switch to xpath selectors
available flags (use +flag to enable and -flag to disable)
strip: strip every element of trailing and leading spaces
first: take first element when there's only one
collapse: collapse lists when only 1 element
absolute: convert relative urls to absolute
join: join results into one
len: return length of results
Install
pip install parselcli
or install from github:
pip install --user git+https://github.com/Granitosaurus/parsel-cli@v0.3.0
Config
parselcli
can be configured via toml
configuration file found in $XDG_HOME/parsel.toml
(usually ~/.config/parsel.toml
):
# default processors (the +flags)
processors = [ "collapse", "strip",]
# where ptpython history is located
history_file_css = "/home/user/.cache/parsel/history_css"
history_file_xpath = "/home/user/.cache/parsel/history_xpath"
[requests]
# when using --cache flag for using cached responses
cache_expire = 86400
# where sqlite cache file is stored for cache
cache_dir = "/home/user/.cache/parsel/requests.cache"
[requests.headers]
# here headers can be defined for requests to avoid bot detection etc.
User-Agent = "parselcli web inspector"
# e.g. chrome on windows use
# User-Agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for parselcli-0.32.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b82fce21dc094e1d7549d6b2d6f8e6e0e62899a033c8806450920bc64c3dd08 |
|
MD5 | 5b79ba7dd801c4a1afd923a34b6a0fc5 |
|
BLAKE2b-256 | bcfd17ab1279f508739e8c7086bee3beffcc694d83a1a41bf77da6586cb388a8 |