sense_text_extractor
Project description
sense-text-extractor
sense-text-extractor是正文抽取客户端库
安装方式(当前版本0.0.1)
pip install sense-text-extractor
使用指南
基于sense-core的settings.ini的label配置调用:
from sense_text_extractor import SenseTextExtractor
extractor = SenseTextExtractor(label='text_extractor')
text = extractor.extract_text("http://sports.sina.com.cn/g/pl/2019-01-11/doc-ihqhqcis5048507.shtml", "穆里尼奥在等待复出")
print(text)
使用host和port的调用:
extractor = SenseTextExtractor('52.83.143.61', '6681')
text = extractor.extract_text("http://sports.sina.com.cn/g/pl/2019-01-11/doc-ihqhqcis5048507.shtml", "穆里尼奥在等待复出")
print(text)
使用说明
extract_text方法可能抛出异常,需要自己捕捉。返回结果是string,如果是''字符串,表示可能没有抽取出正文。 如果用于爬虫,extract_text需要传入第三个参数,也就是下载的html源码,否则extractor的sever端因为获取超时而抛出异常,也容易被反爬虫限制。
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sense-text-extractor-0.0.5.tar.gz
.
File metadata
- Download URL: sense-text-extractor-0.0.5.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 571606f23643ec966c1018762eb4d78449b49fa02c420acbdc87bdeba602d17e |
|
MD5 | 5332ec6a79bea6d04eb0b4f00108c691 |
|
BLAKE2b-256 | 623a124e936e99c571a4ce17c3bddd15118804d16478d128b1884b39eec1d0eb |
File details
Details for the file sense_text_extractor-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: sense_text_extractor-0.0.5-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f41eeb1319668c2451d3a96e7f3f687990d0612e6cc49d30f42ee4bdc0b0cbb |
|
MD5 | 3010e1d564e6e327102c65e1e43bdcba |
|
BLAKE2b-256 | b6454eab1b89e332c85bffe4ce378f82e0eb5e4b21d398304a28e79c5ba0667a |