Helpers to fetch & parse text on pages with requests, lxml, & beautifulsoup4
Project description
Install
Install system requirements for lxml
% sudo apt-get install -y libxml2 libxslt1.1 libxml2-dev libxslt1-dev zlib1g-dev or % brew install libxml2
Install with pip
% pip3 install parse-helper Optionally install ipython with ``pip3 install ipython`` to enable ``ph-soup-explore`` command
Usage
The ph-ddg, ph-download-files, ph-download-file-as, and ph-soup-explore scripts are provided
$ venv/bin/ph-ddg --help Usage: ph-ddg [OPTIONS] [QUERY] Pass a search query to duckduckgo api Options: --help Show this message and exit. $ venv/bin/ph-download-files --help Usage: ph-download-files [OPTIONS] [ARGS]... Download all links to local files - args: urls or filenames containing urls Options: --help Show this message and exit. $ venv/bin/ph-download-file-as --help Usage: ph-download-file-as [OPTIONS] URL [LOCALFILE] Download link to local file - url: a string - localfile: a string Options: --help Show this message and exit. $ venv/bin/ph-soup-explore --help Usage: ph-soup-explore [OPTIONS] [URL_OR_FILE] Create a soup object from a url or file and explore with ipython Options: --help Show this message and exit.
In [1]: import parse_helper as ph
In [2]: ph.USER_AGENT
Out[2]: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/58.0.3029.110 Chrome/58.0.3029.110 Safari/537.36'
In [3]: ph.duckduckgo_api('adventure time')
2019-08-27 06:21:05,303: Fetching JSON from https://api.duckduckgo.com?q=adventure+time&format=json
Out[3]:
[{'text': 'Adventure Time An American animated television series created by Pendleton Ward for Cartoon Network.',
'thumbnail': 'https://duckduckgo.com/i/fb8f17fd.png',
'link': 'https://duckduckgo.com/Adventure_Time'},
{'text': '"Adventure Time" (pilot) An animated short created by Pendleton Ward, as well as the pilot to the Cartoon Network series...',
'thumbnail': 'https://duckduckgo.com/i/aa9b49e0.png',
'link': 'https://duckduckgo.com/Adventure_Time_(pilot)'},
{'text': "Adventure Time (1959 TV series) A local children's television show on WTAE-TV 4 in Pittsburgh, Pennsylvania, from 1959 to 1975.",
'thumbnail': '',
'link': 'https://duckduckgo.com/Adventure_Time_(1959_TV_series)'},
{'text': "Adventure Time (1967 TV series) A Canadian children's adventure television series which aired on CBC Television in 1967 and 1968.",
'thumbnail': '',
'link': 'https://duckduckgo.com/Adventure_Time_(1967_TV_series)'},
{'text': 'Adventure Time (album) The second album for the rock/pop trio The Elvis Brothers.',
'thumbnail': '',
'link': 'https://duckduckgo.com/Adventure_Time_(album)'}]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
File details
Details for the file parse_helper-0.1.22-py3-none-any.whl
.
File metadata
- Download URL: parse_helper-0.1.22-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f93c71865cc1e3ab8014abd18e7c0f10f3b19107c684d1d50734434acccc26fc |
|
MD5 | a9c99a6314ec02a52cfb15de613e0e48 |
|
BLAKE2b-256 | 503a9e30fd67ddd8edbde91856ad6ed2d5f2451de8e35593f6e44235910f9eee |