Skip to main content

Screen-scraping library

Project description

Beautiful Soup is a library that makes it easy to scrape information
from web pages. It sits atop an HTML or XML parser, providing Pythonic
idioms for iterating, searching, and modifying the parse tree.

# Quick start

```
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup("<p>Some<b>bad<i>HTML")
>>> print soup.prettify()
<html>
<body>
<p>
Some
<b>
bad
<i>
HTML
</i>
</b>
</p>
</body>
</html>
>>> soup.find(text="bad")
u'bad'

>>> soup.i
<i>HTML</i>

>>> soup = BeautifulSoup("<tag1>Some<tag2/>bad<tag3>XML", "xml")
>>> print soup.prettify()
<?xml version="1.0" encoding="utf-8">
<tag1>
Some
<tag2 />
bad
<tag3>
XML
</tag3>
</tag1>
```

To go beyond the basics, [comprehensive documentation is available](http://www.crummy.com/software/BeautifulSoup/bs4/doc/).

# Links

* [Homepage](http://www.crummy.com/software/BeautifulSoup/bs4/)
* [Documentation](http://www.crummy.com/software/BeautifulSoup/bs4/doc/)
* [Discussion group](http://groups.google.com/group/beautifulsoup/)
* [Development](https://code.launchpad.net/beautifulsoup/)
* [Bug tracker](https://bugs.launchpad.net/beautifulsoup/)
* [Complete changelog](https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/NEWS.txt)

# Building the documentation

The bs4/doc/ directory contains full documentation in Sphinx
format. Run `make html` in that directory to create HTML
documentation.

# Running the unit tests

Beautiful Soup supports unit test discovery from the project root directory:

```
$ nosetests
```

```
$ python -m unittest discover -s bs4 # Python 2.7 and up
```

If you checked out the source tree, you should see a script in the
home directory called test-all-versions. This script will run the unit
tests under Python 2.7, then create a temporary Python 3 conversion of
the source and run the unit tests again under Python 3.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

beautifulsoup4-4.6.2.tar.gz (166.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

beautifulsoup4-4.6.2-py3-none-any.whl (90.4 kB view details)

Uploaded Python 3

beautifulsoup4-4.6.2-py2-none-any.whl (92.3 kB view details)

Uploaded Python 2

File details

Details for the file beautifulsoup4-4.6.2.tar.gz.

File metadata

  • Download URL: beautifulsoup4-4.6.2.tar.gz
  • Upload date:
  • Size: 166.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.7.0

File hashes

Hashes for beautifulsoup4-4.6.2.tar.gz
Algorithm Hash digest
SHA256 44804593772d7fe90b4d3c6ff044c56a418657a666cfe0987662a980ad64d8e8
MD5 421024306455b786cc6c3f45bcf26294
BLAKE2b-256 5d8725edb9c99e9546c223eb0aa33a71e25e7a80881e263b4f4b3fe4cfabc427

See more details on using hashes here.

File details

Details for the file beautifulsoup4-4.6.2-py3-none-any.whl.

File metadata

  • Download URL: beautifulsoup4-4.6.2-py3-none-any.whl
  • Upload date:
  • Size: 90.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.7.0

File hashes

Hashes for beautifulsoup4-4.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 935a4bd38c6ff119488697dbc71996966557d8a53f94b7beeefc947a7b98d537
MD5 2419c896aefded6ddd46f89f17982749
BLAKE2b-256 06b5201d206eb0f92e2ef1fadd8e08d702061ae446686833bb781004001f803d

See more details on using hashes here.

File details

Details for the file beautifulsoup4-4.6.2-py2-none-any.whl.

File metadata

  • Download URL: beautifulsoup4-4.6.2-py2-none-any.whl
  • Upload date:
  • Size: 92.3 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.7.0

File hashes

Hashes for beautifulsoup4-4.6.2-py2-none-any.whl
Algorithm Hash digest
SHA256 412664d8dee1b5c43da48b3e07778137441747a61dc7bc723f72d733a7f1a497
MD5 cf5fd99289ad40ff5878645a58137c1a
BLAKE2b-256 1014027b59ba92cf6de0a93c65c25463590e6c80bf350f834fce1574fdbaa9c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page