pywebcopy

Python library to clone/archive pages or sites from the Internet.

These details have not been verified by PyPI

Project links

Homepage

Project description

    ____       _       __     __    ______                     _____   
   / __ \__  _| |     / /__  / /_  / ____/___  ____  __  __   /__  /   
  / /_/ / / / / | /| / / _ \/ __ \/ /   / __ \/ __ \/ / / /     / /    
 / ____/ /_/ /| |/ |/ /  __/ /_/ / /___/ /_/ / /_/ / /_/ /     / /     
/_/    \__, / |__/|__/\___/_.___/\____/\____/ .___/\__, /     /_/      
      /____/                               /_/    /____/

Created By : Raja Tomar License : Apache License 2.0 Email: rajatomar788@gmail.com

PyWebCopy is a free tool for copying full or partial websites locally onto your hard-disk for offline viewing.

PyWebCopy will scan the specified website and download its content onto your hard-disk. Links to resources such as style-sheets, images, and other pages in the website will automatically be remapped to match the local path. Using its extensive configuration you can define which parts of a website will be copied and how.

What can PyWebCopy do?

PyWebCopy will examine the HTML mark-up of a website and attempt to discover all linked resources such as other pages, images, videos, file downloads - anything and everything. It will download all of theses resources, and continue to search for more. In this manner, WebCopy can "crawl" an entire website and download everything it sees in an effort to create a reasonable facsimile of the source website.

What can PyWebCopy not do?

PyWebCopy does not include a virtual DOM or any form of JavaScript parsing. If a website makes heavy use of JavaScript to operate, it is unlikely PyWebCopy will be able to make a true copy if it is unable to discover all of the website due to JavaScript being used to dynamically generate links.

PyWebCopy does not download the raw source code of a web site, it can only download what the HTTP server returns. While it will do its best to create an offline copy of a website, advanced data driven websites may not work as expected once they have been copied.

Installation

pywebcopy is available on PyPi and is easily installable using pip

$ pip install pywebcopy

You are ready to go. Read the tutorials below to get started.

First steps

You should always check if the latest pywebcopy is installed successfully.

>>> import pywebcopy
>>> pywebcopy.__version___
7.x.x

Your version may be different, now you can continue the tutorial.

Basic Usages

To save any single page, just type in python console

from pywebcopy import save_webpage
save_webpage(
      url="https://httpbin.org/",
      project_folder="E://savedpages//",
      project_name="my_site",
      bypass_robots=True,
      debug=True,
      open_in_browser=True,
      delay=None,
      threaded=False,
)

To save full website (This could overload the target server, So, be careful)

from pywebcopy import save_website

save_website(
      url="https://httpbin.org/",
      project_folder="E://savedpages//",
      project_name="my_site",
      bypass_robots=True,
      debug=True,
      open_in_browser=True,
      delay=None,
      threaded=False,
)

Running Tests

Running tests is simple and doesn't require any external library. Just run this command from root directory of pywebcopy package.

$ python -m pywebcopy -t

Command Line Interface

pywebcopy have a very easy to use command-line interface which can help you do task without having to worrying about the inner long way.

Getting list of commands
```
$ python -m pywebcopy --help
```

Using CLI

Usage: pywebcopy [-p|--page|-s|--site|-t|--tests] [--url=URL [,--location=LOCATION [,--name=NAME [,--pop [,--bypass_robots [,--quite [,--delay=DELAY]]]]]]]

Python library to clone/archive pages or sites from the Internet.

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  --url=URL             url of the entry point to be retrieved.
  --location=LOCATION   Location where files are to be stored.
  -n NAME, --name=NAME  Project name of this run.
  -d DELAY, --delay=DELAY
                        Delay between consecutive requests to the server.
  --bypass_robots       Bypass the robots.txt restrictions.
  --threaded            Use threads for faster downloading.
  -q, --quite           Suppress the logging from this library.
  --pop                 open the html page in default browser window after
                        finishing the task.

  CLI Actions List:
    Primary actions available through cli.

    -p, --page          Quickly saves a single page.
    -s, --site          Saves the complete site.
    -t, --tests         Runs tests for this library.

Running tests
```
  $ python -m pywebcopy run_tests
```

Authentication and Cookies

Most of the time authentication is needed to access a certain page. Its real easy to authenticate with pywebcopy because it uses an requests.Session object for base http activity which can be accessed through WebPage.session attribute. And as you know there are ton of tutorials on setting up authentication with requests.Session.

Here is an example to fill forms

from pywebcopy.configs import get_config

config = get_config('http://httpbin.org/')
wp = config.create_page()
wp.get(config['project_url'])
form = wp.get_forms()[0]
form.inputs['email'].value = 'bar' # etc
form.inputs['password'].value = 'baz' # etc
wp.submit_form(form)
wp.get_links()

You can read more in the github repositories docs folder.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

7.1

May 13, 2025

7.0.2

Apr 27, 2022

7.0.1 yanked

Oct 31, 2021

Reason this release was yanked:

AttributeError: WebPage.html_mime_types 'tuple' object attribute '__doc__' is read-only

7.0.0 yanked

Oct 31, 2021

Reason this release was yanked:

bugged

6.3.0

Apr 5, 2020

6.2.0

Mar 12, 2020

6.1.1

Dec 8, 2019

6.1.0

Dec 6, 2019

6.0.0

Jun 4, 2019

5.0.1

Jan 6, 2019

4.0.1

Oct 31, 2018

4.0.0

Sep 26, 2018

4.0.0rc0 pre-release

Sep 26, 2018

2.0.3

Aug 19, 2018

2.0.1

Aug 18, 2018

2.0.0b0 pre-release

Aug 11, 2018

1.10

Aug 4, 2018

1.9

Jul 23, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pywebcopy-7.1.tar.gz (43.3 kB view details)

Uploaded May 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pywebcopy-7.1-py2.py3-none-any.whl (46.8 kB view details)

Uploaded May 13, 2025 Python 2Python 3

File details

Details for the file pywebcopy-7.1.tar.gz.

File metadata

Download URL: pywebcopy-7.1.tar.gz
Upload date: May 13, 2025
Size: 43.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for pywebcopy-7.1.tar.gz
Algorithm	Hash digest
SHA256	`4e768971a13e9e7c3b500b64018e9e4bf9446f5e56a0254bb8ecbb8c365acb11`
MD5	`2e44e4e68b2755be4090f90760ac0914`
BLAKE2b-256	`b51ac09e16dba1fb4fe6c54b9ed36592e4f31c51edd4faa9d8b61a9d315d48c4`

See more details on using hashes here.

File details

Details for the file pywebcopy-7.1-py2.py3-none-any.whl.

File metadata

Download URL: pywebcopy-7.1-py2.py3-none-any.whl
Upload date: May 13, 2025
Size: 46.8 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for pywebcopy-7.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`034ff4cc3ef6a2ab42b9c4ef8b431d64091ed08f02a7cc1e959aac8ac3fdad49`
MD5	`cf74ac1b97f35398395246a3442fe01e`
BLAKE2b-256	`da9698911dc09ada0fef798df41a6e34ba81361ce724520d8bb513aa6c5c92ec`

See more details on using hashes here.

pywebcopy 7.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

What can PyWebCopy do?

What can PyWebCopy not do?

Installation

First steps

Basic Usages

Running Tests

Command Line Interface

Getting list of commands

Using CLI

Running tests

Authentication and Cookies

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes