Skip to main content

sense_text_extractor

Project description

sense-text-extractor

sense-text-extractor是正文抽取客户端库

安装方式(当前版本0.0.1)

pip install sense-text-extractor

使用指南

基于sense-core的settings.ini的label配置调用:

from sense_text_extractor import SenseTextExtractor
extractor = SenseTextExtractor(label='text_extractor')
text = extractor.extract_text("http://sports.sina.com.cn/g/pl/2019-01-11/doc-ihqhqcis5048507.shtml", "穆里尼奥在等待复出")
print(text)

使用host和port的调用:

extractor = SenseTextExtractor('52.83.143.61', '6681')
text = extractor.extract_text("http://sports.sina.com.cn/g/pl/2019-01-11/doc-ihqhqcis5048507.shtml", "穆里尼奥在等待复出")
print(text)

使用说明

extract_text方法可能抛出异常,需要自己捕捉。返回结果是string,如果是''字符串,表示可能没有抽取出正文。 如果用于爬虫,extract_text需要传入第三个参数,也就是下载的html源码,否则extractor的sever端因为获取超时而抛出异常,也容易被反爬虫限制。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sense-text-extractor-0.0.5.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

sense_text_extractor-0.0.5-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file sense-text-extractor-0.0.5.tar.gz.

File metadata

  • Download URL: sense-text-extractor-0.0.5.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.7

File hashes

Hashes for sense-text-extractor-0.0.5.tar.gz
Algorithm Hash digest
SHA256 571606f23643ec966c1018762eb4d78449b49fa02c420acbdc87bdeba602d17e
MD5 5332ec6a79bea6d04eb0b4f00108c691
BLAKE2b-256 623a124e936e99c571a4ce17c3bddd15118804d16478d128b1884b39eec1d0eb

See more details on using hashes here.

File details

Details for the file sense_text_extractor-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: sense_text_extractor-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.7

File hashes

Hashes for sense_text_extractor-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3f41eeb1319668c2451d3a96e7f3f687990d0612e6cc49d30f42ee4bdc0b0cbb
MD5 3010e1d564e6e327102c65e1e43bdcba
BLAKE2b-256 b6454eab1b89e332c85bffe4ce378f82e0eb5e4b21d398304a28e79c5ba0667a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page