A python library/command-line tool to automatically rename the pdf files of scientific publications by looking up the publication metadata on the web.
Project description
pdf-renamer
pdf-renamer is a Python command-line tool to automatically rename the pdf files of a scientific paper, or in general any publication which can be associated to a DOI or other identifiers (e.g. arXiv). It can be used to rename single files or to scan entire folders and sub-folders. The format of the filename can be specified by the user by choosing among several tags. Besides command-line operation, it can also be used as a library from your Python project.
Warning
pdf-renamer
uses pdf2doi
to find the DOI of a paper. Versions of pdf2doi
prior to the 1.6 are affected by a very annoying bug. By default, after finding the DOI of a pdf paper, pdf2doi
will store the DOI into the metadata of the pdf file. Due to a bug, the size of the pdf file doubles everytime that a metadata was added. This bug has been fixed in all versions of pdf2doi
> 1.6.
If you have pdf files that have been affected by this bug, you can use pdf2doi
to fix it. After updating pdf2doi
to a version > 1.6, run pdf2doi path/to/folder/containing/pdf/files -id ''
. This will restore the pdf files to their original size.
Latest stable version
The latest stable version of pdf-renamer
is the 1.1. See here for the full change log.
Table of Contents
- Description
- Installation
- Usage
- Installing the shortcuts in the right-click context menu of Windows
- Contributing
- License
Installation
Use the package manager pip to install pdf-renamer
.
pip install pdf-renamer==1.1
This will install pdf-renamer
as Python package, but also as a stand-alone executable script.
The executable will be installed in a directory whose path depends on your Python installation and operating system.
Make sure that this directory is added to the PATH
variable of your operating system (for standard Python installations under Windows this should be already the case).
You can check how to add the folder to the PATH
variable for Windows,
Mac and Linux.
Under Windows, it is also possible to add shortcuts to the right-click context menu.
Description
pdf-renamer
uses the libraries pdf2doi and pdf2bib to extract
bibliographic data of a paper starting from a .pdf file. The retrieved data can then be used to automatically rename pdf files with a custom format (e.g. 'Year - Journal - Authors - Title').
Usage
pdf-renamer
can be invoked directly from the command line, without having to open a python console.
The simplest command-line invokation is
$ pdfrenamer 'path/to/target'
where target
is either a valid pdf file or a directory containing pdf files. pdf-renamer
will automatically rename the file(s) in path/to/target
(assuming that they are valid publications for which a DOI/arXiv ID can be found), by using the standard settings.
A list of the standard settings, and additional commands, can be obtained by typing pdfrenamer --h
Type
$ pdfrenamer --h
usage: pdfrenamer [-h] [-s] [-ro] [-f FORMAT] [-sf] [-max_length_authors MAX_LENGTH_AUTHORS]
[-max_length_filename MAX_LENGTH_FILENAME] [-max_words_title MAX_WORDS_TITLE] [-case CASE]
[-add_abbreviation_file PATH_ABBREVIATION_FILE] [-fr] [-sd] [-install--right--click]
[-uninstall--right--click]
[path ...]
Automatically renames pdf files of scientific publications by retrieving their identifiers (e.g. DOI or arxiv ID) and looking up their bibtex infos.
positional arguments:
path Relative path of the pdf file or of a folder.
options:
-h, --help show this help message and exit
-s, --decrease_verbose
Decrease verbosity. By default (i.e. when not using -s), all steps performed by pdf-renamer, pdf2dbib and pdf2doi are documented.
-ro, --readonly By default, pdf-renamer and pdf2doi store some information the metadata of the pdf file in order to speed up subsequent processing. By using this additional option, no metadata is ever added.
-f FORMAT Format of the new filename. Default = "{YYYY} - {Jabbr} - {A3etal} - {T}".
Valid tags:
{YYYY} = Year of publication
{MM} = Month of publication (in digits)
{DD} = Day of publication (in digits)
{J} = Full name of Journal
{Jabbr} = Abbreviated name of Journal, if any available (otherwise full name is used)
{Aall} = Last name of all authors (separated by comma)
{Aetal} = Last name of the first author, add 'et al.' if more authors are present
{A3etal} = Last name of the first three authors (separated by comma), add 'et al.' if more authors are present
{aAall} = First initial and last name of all authors (separated by comma)
{aAetal} = First initial and last name of the first author, add 'et al.' if more authors are present
{aA3etal} = First initial and last name of the first three authors (separated by comma), add 'et al.' if more authors are present
{T} = Title
-sf, --sub_folders Rename also pdf files contained in subfolders of target folder. Default = "False".
-max_length_authors MAX_LENGTH_AUTHORS
Sets the maximum length of any string related to authors (default=80).
-max_length_filename MAX_LENGTH_FILENAME
Sets the maximum length of any generated filename. Any filename longer than this will be truncated (default=250).
-max_words_title MAX_WORDS_TITLE
Sets the maximum number of words from the paper title to use for the filename (default=20).
-case CASE Possible values are 'camel', 'snake', 'kebab', 'none' (default=none).
If different from 'none', converts each tag string into either 'camel' (e.g., LoremIpsumDolorSitAmet), 'snake' (e.g., Lorem_ipsum_dolor_sit_amet), or 'kebab' case (e.g., Lorem-ipsum-dolor-sit-amet).
Note: this will not affect any punctuation symbol or space contained in the filename format by the user.
-add_abbreviation_file PATH_ABBREVIATION_FILE
The content of the text file specified by PATH_ABBREVIATION_FILE will be added to the user list of journal abbreviations.
Each row of the text file must have the format 'FULL NAME = ABBREVIATION'.
-fr, --force_rename By default, whenever pdf-renamer renames a pdf file by using a certain filename format, it also stores the format string into a tag of the pdf file.
In this way if pdf-renamer comes across that same file later, and the current filename format is the same as the one stored in the pdf file tag,
the file is ignored. By using this command, this behavior is overruled: pdf-renamer will always rename each file it comes across.
-sd, --set_default By adding this command, any value specified (in this same command) for the filename format (-f),
max length of author string (-max_length_authors), max length of filename string (-max_length_filename),
max number of title words (-max_words_title), and case (-case) will be also stored as default value(s) for the future.
-install--right--click
Add a shortcut to pdf-renamer in the right-click context menu of Windows. You can rename a single pdf file (or all pdf files in a folder) by just right clicking on it!
NOTE: this feature is only available on Windows.
-uninstall--right--click
Uninstall the right-click context menu functionalities. NOTE: this feature is only available on Windows.
Several tags (as listed above) can be used to change the standard filename format, for example,
$ pdfrenamer 'path/to/target' -f "{YYYY} - {Aetal} - {J} - {T}"
will produce filenames which start with the year of publication, followed by first initial and full last name of first author + et al. (if more authors are present), followed by the full
journal name and the paper title. Note that the tags are case sensitive.
Other useful settings are -max_length_authors
, -max_length_filename
and -max_words_title
, which set, respectively, the maximum number of characters allowed for the author string and for the overall filename, and the maximum nunber of words that will be used from the title.
The optional command -case CASE
can be used to convert all substrings (such as title, journal, etc.) to either the camel, snake or kebab case. The values of all these settings can be specified simultaneously, e.g.
$ pdfrenamer 'path/to/target' -f "{YYYY} - {Aetal} - {J} - {T}" -max_length_authors 40 -max_length_filename 200 -max_words_title 20 -case snake
The values set for most parameters, however, are not permanently changed, unless the optional command -sd
(set default) is added,
$ pdfrenamer 'path/to/target' -f "{YYYY} - {Aetal} - {J} - {T}" -max_length_authors 40 -max_length_filename 200 -max_words_title 20 -case snake -sd
In this case the new values are saved in a settings.ini file inside the pdf-renamer
folder (as can be checked by typing pdfrenamer --h
again).
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pdf-renamer-1.1.tar.gz
.
File metadata
- Download URL: pdf-renamer-1.1.tar.gz
- Upload date:
- Size: 228.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 58d362f6d2250ce4242a152008a205d9ab2a8741973d5c3b35a367099a6a4b2a |
|
MD5 | 66edb4a3d6864edc57483bd9a9f2c041 |
|
BLAKE2b-256 | b0c1e0366350710eee8d8f4dcc7d11e011f04f5422a4efe3cd93c7bb26a92e99 |
File details
Details for the file pdf_renamer-1.1-py3-none-any.whl
.
File metadata
- Download URL: pdf_renamer-1.1-py3-none-any.whl
- Upload date:
- Size: 230.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fbe3508614528a664df794e99fe0140622585f546ed49ace24469924cb6a1cde |
|
MD5 | c8de9dd310721b2dbb9b2ce3a904a1b5 |
|
BLAKE2b-256 | ba7774017f14583036861b08cae2663d98aa232aa9c86c08a4553ab9a0539bbb |