Skip to main content

HTML for NLP

Project description

HTML for NLP

Installation

pip install git+https://github.com/druskacik/html_for_nlp

Usage

import requests
from html_for_nlp import HTMLDocument

r = requests.get('https://google.com')

doc = HTMLDocument(r.content)

print(doc.full_text)

Output:

<[document]>
Google
Vyhľadávanie
<a>
Obrázky
</a>
<a>
Mapy
</a>
<a>
Play
</a>
<a>
YouTube
</a>
<a>
Správy
</a>
<a>
Gmail
</a>
<a>
Disk
</a>
<a>
Ďalšie
»
</a>
<a>
História hľadania
</a>
|
<a>
Nastavenia
</a>
|
<a>
Prihlásiť sa
</a>
<a>
Rozšírené vyhľadávanie
</a>
<span>
<a>
Reklama
</a>
<a>
Riešenia pre firmy
</a>
<a>
Všetko o Google
</a>
<a>
Google.sk
</a>
<p>
© 2023
</p>
</span>
</[document]>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html_for_nlp-0.0.2.tar.gz (3.4 kB view details)

Uploaded Source

Built Distribution

html_for_nlp-0.0.2-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file html_for_nlp-0.0.2.tar.gz.

File metadata

  • Download URL: html_for_nlp-0.0.2.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for html_for_nlp-0.0.2.tar.gz
Algorithm Hash digest
SHA256 21bd072f010c5334a44b35737fd047fc9a893337d715d53f54a1382f9954d879
MD5 3ed69bb5e065fd155216691ff607fba9
BLAKE2b-256 c222936b373df4a77c61fecf88436da11fdab3c90bcfc614e47384d7ef250925

See more details on using hashes here.

File details

Details for the file html_for_nlp-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: html_for_nlp-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for html_for_nlp-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b2961cc618096bc02076950b0454f0a303d57c8753a0280a69c6d5ebb3935c8e
MD5 bde1ddea382c7aa60044b25adbc30fd7
BLAKE2b-256 3713b95c952ec1a42bf4192ab9573f0df69307e854002c77e40f58043d33a5f2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page