Archive all public posts from Pastebin.com
What is this?
This app retrieves new posts made on Pastebin.com and stores them offline in a database. You can see the latest public posts it will retrieve here.
Some of the pastes posted to Pastebin contain interesting or sensitive data, and sometimes pastes are deleted by their poster or Pastebin staff. Running an instance of this archiver lets you retrieve deleted pastes and build a large dataset to run queries against.
Pastebin API info
Install from PyPI (recommended)
- Ensure you have Python 3.7+ installed.
pip install pastebin_archiver
- Done! Jump down to the Usage section to get started.
Install from source
- Ensure you have Python 3.7+ and poetry installed
$ python --version Python 3.7.4 $ poetry --version Poetry 0.12.17
- Clone the git repository
git clone https://gitlab.com/jonpavelich/pastebin-archiver.git
- Install the dependencies
$ cd pastebin-archiver $ poetry install
- Run it!
$ poetry run pastebin-archiver
Run unit tests
- Install from source (see the section above)
poetry run python -m unittest
Command line usage
If you installed the package using pip, then you can simply run
$ pastebin-archiver # Run with default settings $ pastebin-archiver --help # Print available command line options
If you'd prefer to use the package in your own code, you can do so like this:
# Import the package from pastebin_archiver import PastebinArchiver # (Optional) configure logging logging.basicConfig(level=logging.DEBUG) # Run the application app = PastebinArchiver() app.main()
app.main() does not return, it runs forever looking for new pastes to fetch.
The log target and log level can be controlled with options (
--loglevel) or environment variables (
By default, the fetched data will be saved to a SQLite database file in your working directory called
pastebin.db. You can change this behaviour by passing in a database connection string using the
--database option or the
DATABASE environment variable. For example:
$ pastebin-archiver --database 'postgresql://user:pass@localhost/mydatabase'
Important: You'll need extra packages to connect to databases other than SQLite.
For PostgreSQL, you'll need to run
pip install psycopg2-binary (or if you installed from source, you can run
poetry install -E pgsql)
For detailed info on connection strings and a list of database packages you can use, see the SQLAlchemy documentation.
If you find any bugs or have any suggestions to improve the project, please open an issue on GitLab.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for pastebin_archiver-0.2.1-py3-none-any.whl