pquery

grep for HTML; CLI for pyquery

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
License
- Public Domain
Natural Language
- English
Programming Language
- Python :: 2.7
Topic
- Internet

Project description

pquery
======

grep for HTML; CLI for pyquery

## Demo

```
$curl -s https://github.com/hupili/pquery | pquery '.content a' -p text
.gitignore
LICENSE
MANIFEST.in
README.md
pquery
setup.py
```

`pquery` is intended to integrate into your UNIX pipeline.

## Install

`pip install pquery`

## Syntax

```
Usage:
pquery <selector>
pquery <selector> -p <projector>
pquery <selector> -f <format_string>
pquery -h | --help

Options:
-p: project the dict onto field `<projector>`.
-f: equivalent of `<format_string>.format(item)`,
where item is the dict form of one selected HTML element.
-h | -v: shows this doc.

Dict keys:
'tag': The HTML tag
'html': Inner HTML of the element
'text': Inner text of the element
...: [optional] Other attributes: e.g. 'href'
```

## Why

`grep` is powerful for **lines**.
HTML is structured and not line processor friendly.
CSS selector is a natural grep for HTML.
This script simply wraps [pyquery](http://pyquery.readthedocs.org/en/latest/) to provide a CLI.

## Example 1

A [course webpage](https://class.coursera.org/crypto-008/wiki/LectureSlidesPublicCourse)
lists slides in `pdf` and `pptx`.
Want to download all the PDFs.
This saves you some click.

```
wget --load-cookies=cookies.txt -O- 'https://class.coursera.org/crypto-008/wiki/LectureSlidesPublicCourse' | pquery a -p href | grep pdf | xargs -P 5 -I{} wget {}
```

It's tedious to directly grep the PDF links out from HTML.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
License
- Public Domain
Natural Language
- English
Programming Language
- Python :: 2.7
Topic
- Internet

Release history Release notifications | RSS feed

This version

1.4

Sep 26, 2015

1.3

Sep 23, 2015

1.2

Sep 23, 2015

1.1

Sep 17, 2015

1.0

Jan 17, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pquery-1.4.tar.gz (3.1 kB view details)

Uploaded Sep 26, 2015 Source

File details

Details for the file pquery-1.4.tar.gz.

File metadata

Download URL: pquery-1.4.tar.gz
Upload date: Sep 26, 2015
Size: 3.1 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for pquery-1.4.tar.gz
Algorithm	Hash digest
SHA256	`ea8be57d7064c2c7331c4ccc35d6defd45b6028afc7d122c5a93c0ee3890ae7e`
MD5	`f48d132103b51ebc3fc036fc1259d239`
BLAKE2b-256	`e4f17566c10c8f4aefe62dd0d6ec1618c92fff833948eee7154e722e91aa0c0c`

See more details on using hashes here.

pquery 1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes