HTML for NLP
Project description
HTML for NLP
Installation
pip install git+https://github.com/druskacik/html_for_nlp
Usage
import requests
from html_for_nlp import HTMLDocument
r = requests.get('https://google.com')
doc = HTMLDocument(r.content)
print(doc.full_text)
Output:
<[document]>
Google
Vyhľadávanie
<a>
Obrázky
</a>
<a>
Mapy
</a>
<a>
Play
</a>
<a>
YouTube
</a>
<a>
Správy
</a>
<a>
Gmail
</a>
<a>
Disk
</a>
<a>
Ďalšie
»
</a>
<a>
História hľadania
</a>
|
<a>
Nastavenia
</a>
|
<a>
Prihlásiť sa
</a>
<a>
Rozšírené vyhľadávanie
</a>
<span>
<a>
Reklama
</a>
<a>
Riešenia pre firmy
</a>
<a>
Všetko o Google
</a>
<a>
Google.sk
</a>
<p>
© 2023
</p>
</span>
</[document]>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
html_for_nlp-0.0.2.tar.gz
(3.4 kB
view hashes)
Built Distribution
Close
Hashes for html_for_nlp-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2961cc618096bc02076950b0454f0a303d57c8753a0280a69c6d5ebb3935c8e |
|
MD5 | bde1ddea382c7aa60044b25adbc30fd7 |
|
BLAKE2b-256 | 3713b95c952ec1a42bf4192ab9573f0df69307e854002c77e40f58043d33a5f2 |