Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

A program to download pdf documents from public websites, and catalogue them in BibTeX format

Project Description

PIEBERRY (IT’S FOR YOUR LIBRARY)

This is a program which I wrote to automate a painful aspect of my work-life - downloading, storing, cataloguing and referencing documents from (mainly public sector & government) websites.

These websites publish reams of documents in pdf format, usually in a random range of cryptic CMS-generated filename schemas, with incomplete or non-existent file metadata.

Typically I download these, rename them with an intelligible title, a six-digit archival date prefix, store then in an appropriate folder, and enter them into my database of reference materials for use with LaTeX/BibTex.

Actually, scratch that. What I REALLY do is download them, leave them on my Desktop folder, look at them once, forget them, fill up my disk quota, delete them, realise I’ve lost them and download them all over again.

Hence, Pieberry, which will do all of the good and none of the bad described above.

It’s mainly for my use, but I hope that someone else will find it useful. I’m open to requests for features.

It’s written in Python, with the PortablePython 2.5.4 distribution (which contains wxpython) in mind, but also requires Beautiful Soup, pyPdf and Pybtex, which are all available through easy_install from setuptools.

Release History

Release History

This version
History Node

1.7-7

History Node

1.7-3

History Node

1.6-1

History Node

1.5-1

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
pieberry-library-assistant-1.7-7.tar.gz (122.0 kB) Copy SHA256 Checksum SHA256 Source Aug 21, 2010

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting