Skip to main content

Screen-scraping library

Project description

Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.

Quick start

  >>> from bs4 import BeautifulSoup
  >>> soup = BeautifulSoup("<p>Some<b>bad<i>HTML")
  >>> print soup.prettify()
  <html>
   <body>
    <p>
     Some
     <b>
      bad
      <i>
       HTML
      </i>
     </b>
    </p>
   </body>
  </html>
  >>> soup.find(text="bad")
  u'bad'

  >>> soup.i
  <i>HTML</i>

  >>> soup = BeautifulSoup("<tag1>Some<tag2/>bad<tag3>XML", "xml")
  >>> print soup.prettify()
  <?xml version="1.0" encoding="utf-8">
  <tag1>
   Some
   <tag2 />
   bad
   <tag3>
    XML
   </tag3>
  </tag1>

To go beyond the basics, comprehensive documentation is available.

Links

Building the documentation

The bs4/doc/ directory contains full documentation in Sphinx format. Run make html in that directory to create HTML documentation.

Running the unit tests

Beautiful Soup supports unit test discovery from the project root directory:

 $ nosetests
 $ python -m unittest discover -s bs4 # Python 2.7 and up

If you checked out the source tree, you should see a script in the home directory called test-all-versions. This script will run the unit tests under Python 2.7, then create a temporary Python 3 conversion of the source and run the unit tests again under Python 3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

beautifulsoup4-4.7.1.tar.gz (167.1 kB view details)

Uploaded Source

Built Distributions

beautifulsoup4-4.7.1-py3-none-any.whl (94.3 kB view details)

Uploaded Python 3

beautifulsoup4-4.7.1-py2-none-any.whl (94.4 kB view details)

Uploaded Python 2

File details

Details for the file beautifulsoup4-4.7.1.tar.gz.

File metadata

  • Download URL: beautifulsoup4-4.7.1.tar.gz
  • Upload date:
  • Size: 167.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/2.7.12

File hashes

Hashes for beautifulsoup4-4.7.1.tar.gz
Algorithm Hash digest
SHA256 945065979fb8529dd2f37dbb58f00b661bdbcbebf954f93b32fdf5263ef35348
MD5 c71f53fcb2580c376ab7b010a9178983
BLAKE2b-256 80f2f6aca7f1b209bb9a7ef069d68813b091c8c3620642b568dac4eb0e507748

See more details on using hashes here.

File details

Details for the file beautifulsoup4-4.7.1-py3-none-any.whl.

File metadata

  • Download URL: beautifulsoup4-4.7.1-py3-none-any.whl
  • Upload date:
  • Size: 94.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/2.7.12

File hashes

Hashes for beautifulsoup4-4.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 034740f6cb549b4e932ae1ab975581e6103ac8f942200a0e9759065984391858
MD5 d1d6d1be4f8080ce7e3406449ba541ce
BLAKE2b-256 1d5d3260694a59df0ec52f8b4883f5d23b130bc237602a1411fa670eae12351e

See more details on using hashes here.

File details

Details for the file beautifulsoup4-4.7.1-py2-none-any.whl.

File metadata

  • Download URL: beautifulsoup4-4.7.1-py2-none-any.whl
  • Upload date:
  • Size: 94.4 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/2.7.12

File hashes

Hashes for beautifulsoup4-4.7.1-py2-none-any.whl
Algorithm Hash digest
SHA256 ba6d5c59906a85ac23dadfe5c88deaf3e179ef565f4898671253e50a78680718
MD5 5475bce6027c3d8450c8e28ef87e4ef5
BLAKE2b-256 8b0e048a2f88bc4be5e3697df9dc1f7b9d5c9c75be62676feeeb91d2e896c5ea

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page