Skip to main content

A library for creating statistical NER systems that work on HTML data

Project description

https://travis-ci.org/scrapinghub/webstruct.svg?branch=master https://codecov.io/gh/scrapinghub/webstruct/branch/master/graph/badge.svg

Webstruct is a library for creating statistical NER systems that work on HTML data, i.e. a library for building tools that extract named entities (addresses, organization names, open hours, etc) from webpages.

Unlike most NER systems, webstruct works on HTML data, not only on text data. This allows to define features that use HTML structure, and also to embed annotation results back into HTML.

Read the docs for more info.

License is MIT.

Contributing

To run tests, make sure tox is installed, then run tox from the source root.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webstruct-0.4.1.tar.gz (40.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webstruct-0.4.1-py2.py3-none-any.whl (53.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file webstruct-0.4.1.tar.gz.

File metadata

  • Download URL: webstruct-0.4.1.tar.gz
  • Upload date:
  • Size: 40.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for webstruct-0.4.1.tar.gz
Algorithm Hash digest
SHA256 af61c40f9d379530dc5b53832aea7dfde4711e15ead08c3bd6c2b1ad371d8863
MD5 d26c7ce9eaa134aff3bfe87f40a2f73d
BLAKE2b-256 bdc31e602693b6f6a1d8f2e753ebb718b548570b59f7b970f06170ef578c250d

See more details on using hashes here.

File details

Details for the file webstruct-0.4.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for webstruct-0.4.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 1fee1794794e82298b782050aeb90ef1482b47a1187fbdb07019cc0ac7cc6ce3
MD5 29f99a62b2ada4e8fda6248810bf9cac
BLAKE2b-256 f32d6523d8717fec4eca493b55b149123a7af4ec1b511da4cb2f63d133b44445

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page