Combine XPath, CSS Selectors and JSONPath for Web data extracting.
Project description
Combine XPath, CSS Selectors and JSONPath for Web data extracting.
Quickstarts
Installation
Install the stable version from PYPI.
pip install "data-extractor[jsonpath-extractor]" # for extracting JSON data
pip install "data-extractor[lxml]" # for extracting HTML data
Or install the latest version from Github.
pip install "data-extractor[jsonpath-extractor] @ git+https://github.com/linw1995/data_extractor.git@master"
Extract JSON data
Currently supports to extract JSON data with below optional dependencies
install one dependency of them to extract JSON data.
Extract HTML(XML) data
Currently supports to extract HTML(XML) data with below optional dependencies
cssselect for using CSS-Selectors
Usage
from data_extractor import Field, Item, JSONExtractor
class Count(Item):
followings = Field(JSONExtractor("countFollowings"))
fans = Field(JSONExtractor("countFans"))
class User(Item):
name_ = Field(JSONExtractor("name"), name="name")
age = Field(JSONExtractor("age"), default=17)
count = Count()
assert User(JSONExtractor("data.users[*]"), is_many=True).extract(
{
"data": {
"users": [
{
"name": "john",
"age": 19,
"countFollowings": 14,
"countFans": 212,
},
{
"name": "jack",
"description": "",
"countFollowings": 54,
"countFans": 312,
},
]
}
}
) == [
{"name": "john", "age": 19, "count": {"followings": 14, "fans": 212}},
{"name": "jack", "age": 17, "count": {"followings": 54, "fans": 312}},
]
Changelog
v1.0.1
Build
Supports Python 3.13
Contributing
Environment Setup
Clone the source codes from Github.
git clone https://github.com/linw1995/data_extractor.git
cd data_extractor
Setup the development environment. Please make sure you install the pdm, pre-commit and nox CLIs in your environment.
make init
make PYTHON=3.7 init # for specific python version
Linting
Use pre-commit for installing linters to ensure a good code style.
make pre-commit
Run linters. Some linters run via CLI nox, so make sure you install it.
make check-all
Testing
Run quick tests.
make
Run quick tests with verbose.
make vtest
Run tests with coverage. Testing in multiple Python environments is powered by CLI nox.
make cov
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file data-extractor-1.0.1.tar.gz
.
File metadata
- Download URL: data-extractor-1.0.1.tar.gz
- Upload date:
- Size: 28.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ff9424d4859ecd1a4f3b0f9f5b614117a0efa2747493365169b54c6af51aa90 |
|
MD5 | e579f24780210425917e241421abdf42 |
|
BLAKE2b-256 | 19ab4a9fff19fe0fcb15eb83083fa51288bd1f21aa1acbc12fba8dd93c8c6597 |
File details
Details for the file data_extractor-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: data_extractor-1.0.1-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2126dc68207b650ae884cac6caf8dec0388c6334ad379069b166d6336e27b1e7 |
|
MD5 | a1e87a5b66c2376a1429bd55cd603df8 |
|
BLAKE2b-256 | 1f9e4c7f72bd7e7a0879eb751c8599dda74a8542d736e97c8f673eaff8226b68 |