Skip to main content

Archive all public posts from Pastebin.com

Project description

Pastebin Archiver

What is this?

This app retrieves new posts made on Pastebin.com and stores them offline in a database. You can see the latest public posts it will retrieve here.

Why?

Some of the pastes posted to Pastebin contain interesting or sensitive data, and sometimes pastes are deleted by their poster or Pastebin staff. Running an instance of this archiver lets you retrieve deleted pastes and build a large dataset to run queries against.

Pastebin API info

Important: This archiver uses the Pastebin Scraping API which requires a whitelisted IP address and a Lifetime Pro account to use. More info here.

Installation

Install from PyPI (recommended)

  1. Ensure you have Python 3.7+ installed.
  2. Run pip install pastebin_archiver
  3. Done! Jump down to the Usage section to get started.

Install from source

  1. Ensure you have Python 3.7+ and poetry installed
    $ python --version
    Python 3.7.4
    $ poetry --version
    Poetry 0.12.17
    
  2. Clone the git repository
    git clone https://gitlab.com/jonpavelich/pastebin-archiver.git 
    
  3. Install the dependencies
    $ cd pastebin-archiver
    $ poetry install
    
  4. Run it!
    $ poetry run pastebin-archiver
    

Run unit tests

  1. Install from source (see the section above)
  2. Run poetry run python -m unittest

Usage

Command line usage

If you installed the package using pip, then you can simply run pastebin-archiver:

$ pastebin-archiver         # Run with default settings
$ pastebin-archiver --help  # Print available command line options

Python usage

If you'd prefer to use the package in your own code, you can do so like this:

# Import the package
from pastebin_archiver import PastebinArchiver

# (Optional) configure logging
logging.basicConfig(level=logging.DEBUG) 

# Run the application
app = PastebinArchiver()
app.main()

Important: app.main() does not return, it runs forever looking for new pastes to fetch.

Configuration

The log target and log level can be controlled with options (--logfile and --loglevel) or environment variables (LOG_FILE and LOG_LEVEL).

Database

By default, the fetched data will be saved to a SQLite database file in your working directory called pastebin.db. You can change this behaviour by passing in a database connection string using the --database option or the DATABASE environment variable. For example:

$ pastebin-archiver --database 'postgresql://user:pass@localhost/mydatabase'

Important: You'll need extra packages to connect to databases other than SQLite. For PostgreSQL, you'll need to run pip install psycopg2-binary (or if you installed from source, you can run poetry install -E pgsql)

For detailed info on connection strings and a list of database packages you can use, see the SQLAlchemy documentation.

Contributing

If you find any bugs or have any suggestions to improve the project, please open an issue on GitLab.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pastebin_archiver-0.2.1.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

pastebin_archiver-0.2.1-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file pastebin_archiver-0.2.1.tar.gz.

File metadata

  • Download URL: pastebin_archiver-0.2.1.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.7 Linux/5.6.13-200.fc31.x86_64

File hashes

Hashes for pastebin_archiver-0.2.1.tar.gz
Algorithm Hash digest
SHA256 272edf529762a1b54842e283c93dfbbff68f37f30beb5bfd42a8981040eb7739
MD5 dd4e55eb9742773114945a25868d6117
BLAKE2b-256 83abff24beb55da2e8b1b2ae1a98d143fba95f9e149a0fe488ffa703de3927a4

See more details on using hashes here.

File details

Details for the file pastebin_archiver-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: pastebin_archiver-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.7 Linux/5.6.13-200.fc31.x86_64

File hashes

Hashes for pastebin_archiver-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4c0bcc54ab95a8543f07df5b1521d8ed60dc2c3d8c025871dc875b58fb6bfead
MD5 db94d511faf17df29c1295ee5afafc7f
BLAKE2b-256 4ed8ffa19d3cfe408edca8de2b1041d5cd2057563b5bb667c0cdf571772a29b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page