Combine XPath, CSS Selectors and JSONPath for Web data extracting.
Project description
Combine XPath, CSS Selectors and JSONPath for Web data extracting.
Quickstarts
Installation
Install the stable version from PYPI.
pip install data-extractor
Or install the latest version from Github.
pip install git+https://github.com/linw1995/data_extractor.git@master
Usage
from data_extractor import Field, Item, JSONExtractor
class Count(Item):
followings = Field(JSONExtractor("countFollowings"))
fans = Field(JSONExtractor("countFans"))
class User(Item):
name_ = Field(JSONExtractor("name"), name="name")
age = Field(JSONExtractor("age"), default=17)
count = Count()
assert User(JSONExtractor("data.users[*]"), is_many=True).extract(
{
"data": {
"users": [
{
"name": "john",
"age": 19,
"countFollowings": 14,
"countFans": 212,
},
{
"name": "jack",
"description": "",
"countFollowings": 54,
"countFans": 312,
},
]
}
}
) == [
{"name": "john", "age": 19, "count": {"followings": 14, "fans": 212}},
{"name": "jack", "age": 17, "count": {"followings": 54, "fans": 312}},
]
Changelog
v0.5.4
9552c79 Fix:Simplified item’s extract_first method fail to raise ExtractError
08167ab Fix:Simplified item’s extract_first method should support param default
6e4c269 New:More unittest for testing the simplified items
a35b85a Chg:Update poetry.lock
e5ff37b Docs,Chg:Update travis-ci status source in the README.rst
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file data_extractor-0.5.4.tar.gz
.
File metadata
- Download URL: data_extractor-0.5.4.tar.gz
- Upload date:
- Size: 11.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.17 CPython/3.7.1 Linux/4.15.0-1028-gcp
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8da1a0e25877c9a90ee8d7bdc18923756587c9078afff27a3361e82c3cd37627 |
|
MD5 | b6f6fb8a844e503e4be9a5ade72b84b9 |
|
BLAKE2b-256 | 53daa2853c338776a010af5794f9ddc4823153a119a1c34955d547930e418d23 |
Provenance
File details
Details for the file data_extractor-0.5.4-py3-none-any.whl
.
File metadata
- Download URL: data_extractor-0.5.4-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.17 CPython/3.7.1 Linux/4.15.0-1028-gcp
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b046dafff32b252bf67dd39387484eca308b71c2cc77050314ec49dd636136e7 |
|
MD5 | 2b75d6047ceaa1c57d0906d50eba30c6 |
|
BLAKE2b-256 | 7e5c39064f290978b14eb8eeae4354f068e9f0e78c765feb39236cb749f3c350 |