guessfilename

Derive a file name according to old file name cues and/or PDF file content

These details have not been verified by PyPI

Project links

Project description

* guessfilename.py

This Python script tries to come up with a new file name for each
file from command line argument.

It does this with several methods: first, the current file name is
analyzed and any [[https://en.wikipedia.org/wiki/Iso_date][ISO date/timestamp]] and [[https://github.com/novoid/filetags/][filetags]] are re-used.
Secondly, if the parsing of the file name did not lead to any new file
name, the content of the file is analyzed. Following file types are
supported by now:
- PDF files

The script accepts an arbitrary number of files (see your shell for
possible length limitations).

- *Target group*: users who are able to use command line tools and who
  are using tags in file names.
- Hosted on github: [[https://github.com/novoid/guess-filename.py]] and PyPi: [[https://pypi.org/project/guessfilename/]]

** Why

I do scan almost all paper mail. Many of those documents are sent to
me regularily. Such documents are bills or insurance informations, for
example.

Being too lazy to name those files manually with high chances of
getting many variants for the same document type, I came up with a
method to derive file names from either the old file name (cues I
enter without knowing the exact target file name) or the file content.

Analyzing the content enables this script to recognize bills via
customer numbers or phone numbers, amounts to pay, and so on.

** Examples

Here are some examples that demonstrate the purpose of this script.
The generated file names are following [[https://www.karl-voit.at/managing-digital-photographs/][my file name convention]].

For better user experience, I like to *define an abbreviation* in [[https://karl-voit.at/apps-I-am-using/][my
shell]] which also makes the examples easier to read:

: alias gf=guessfilename.py

A very simple example is a *simple bill*:

: gf "2016-03-05 phone 12,34 €.pdf"
:  → "2016-03-05 COMPANY landline 12,34€ -- scan bill.pdf"

Some mobile apps generate weird formatted file names. Here is some *recording*:

: gf "rec_20171129-0902 A nice recording .wav"
:  → "2017-11-29T09.02 A nice recording.wav"

*Android screenshot* files tend to look like that:

: gf "Screenshot_2017-11-29_10-32-12.png"
:  → "2017-11-29T10.32.12 -- screenshots.png"

*Android photographs* are handled similarly:

: gf "IMG_20190118_133928.jpg"
:  → "2019-01-18T13.39.28.jpg"

*Files saved from [[https://signal.org/][Signal]]* do have strange default names as well:

: gf "signal-2018-03-08-102332.jpg"
:  → "2018-03-08T10.23.32.jpg"

Many companies like to generate *really silly file names*. This is from my bank:

: gf "C110014365208EUR20150930001.pdf"
:  → "2015-09-30 Bank statement 2015-001 10014365208.pdf"

This script is able to *parse content of PDF* file in order to get
meta-data to generate the new file name. This can be applied to you
*salary*, for example:

: gf "2020-03-04 salary.pdf"
:  → "2020-02-29 MYCOMPANY salary for February 1234,56€ -- finance.pdf"

As you can see, "guessfilename" makes your digital life easier when
you do have recurring file rename tasks.

** Usage

#+BEGIN_SRC sh :results output :wrap src
guessfilename --help
#+END_SRC

#+BEGIN_src
Usage:
    guessfilename [<options>] <list of files>

This little Python script tries to rename files according to pre-defined rules.

It does this with several methods: first, the current file name is analyzed and
any ISO date/timestamp and filetags are re-used. Secondly, if the parsing of the
file name did not lead to any new file name, the content of the file is analyzed.

You have to adapt the rules in the Python script to meet your requirements.
The default rule-set follows the filename convention described on
http://karl-voit.at/managing-digital-photographs/


:copyright: (c) by Karl Voit
:license: GPL v3 or any later version
:URL: https://github.com/novoid/guess-filename.py
:bugreports: via github or <tools@Karl-Voit.at>


Options:
  -h, --help     show this help message and exit
  -d, --dryrun   enable dryrun mode: just simulate what would happen, do not
                 modify files
  -v, --verbose  enable verbose mode
  -q, --quiet    enable quiet mode
  --version      display version and exit
#+END_src

** Pixel Images and Videos
:PROPERTIES:
:CREATED:  [2020-11-15 Sun 17:07]
:END:

I added handling for [[https://karl-voit.at/2020/11/15/pixel4a-migration/][my Pixel 4a]] camera results: JPEG images and MP4 videos.

Due to [[https://www.reddit.com/r/Pixel4a/comments/jubshe/fixing_the_messy_timestamps_of_pixel_4a_camera/][a somewhat messy meta data situation]] I had to use the
=File:FileModifyDate= [[https://en.wikipedia.org/wiki/Exif][Exif]] meta-data in order to get time-stamps from
the local time zone. If you happen to apply guessfilename after
modifying the file due to copying or editing, you will get wrong
time-stamps. Therefore, use [[https://syncthing.net/][Syncthing]] or similar synchronzation tools
that preserve file modification time to get the files from the mobile
to your computer. Apply guessfilename before modifying the files any
further.

Furthermore, you will need to install [[https://exiftool.org/][ExifTool]] as an external
dependency. I was not able to find a Python-only Exif library that
provided me read access to advanced Exif values the Pixel is using.

** MediathekView
:PROPERTIES:
:CREATED:  [2018-05-10 Thu 17:03]
:END:

When downloading TV shows using [[https://github.com/mediathekview/MediathekView][MediathekView]], you should use the following download pattern:

- MediathekView v11:
  : %DT%d %s - %t - %T -ORIGINAL- %N.mp4

- MediathekView v13:
  - Einstellungen > Aufzeichnen und Abspielen > Set bearbeiten
    - [Set-Name] > Hilfsprogramme:
      - ffmpeg > Zieldateiname > =%DT%d %s - %t - %T -ORIGINALhd- %N.mp4=
      - ffmpeg > Schalter > =-user_agent "Mozilla" -i %f -c copy -bsf:a aac_adtstoasc **=

When applying =guessfilename= on the resulting files, you will get something like this:

#+BEGIN_EXAMPLE
   20180509T235000 ORF - ZIB 24 - Auswirkungen nach US-Aus für Atomdeal -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Auswirkungen-na__13976363__o__1735069995__s14297628_8__BCK1HD_23514710P_23540405P_Q4A.mp4  ...
       →  2018-05-09T23.51.47 ORF - ZIB 24 - Auswirkungen nach US-Aus für Atomdeal -- lowquality.mp4

   20180509T235000 ORF - ZIB 24 - Hirntoter Bub plötzlich aufgewacht -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Hirntoter-Bub-p__13976363__o__5119815115__s14297631_1__BCK1HD_00045915P_00072303P_Q4A.mp4  ...
       →  2018-05-09T00.04.59 ORF - ZIB 24 - Hirntoter Bub plötzlich aufgewacht -- lowquality.mp4

   20180509T235000 ORF - ZIB 24 - Meldungen -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Meldungen__13976363__o__1117657593__s14297632_2__BCK1HD_00072303P_00085816P_Q4A.mp4  ...
       →  2018-05-09T00.07.23 ORF - ZIB 24 - Meldungen -- lowquality.mp4

   20180509T235000 ORF - ZIB 24 - Neuerung bei Filmfestspielen in Cannes -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Neuerung-bei-Fi__13976363__o__1941003027__s14297634_4__BCK1HD_00085816P_00111715P_Q4A.mp4  ...
       →  2018-05-09T00.08.58 ORF - ZIB 24 - Neuerung bei Filmfestspielen in Cannes -- lowquality.mp4

   20180509T235000 ORF - ZIB 24 - Trumps CIA-Kandidatin umstritten -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Trumps-Kandidat__13976363__o__1488806017__s14297630_0__BCK1HD_00020922P_00045915P_Q4A.mp4  ...
       →  2018-05-09T00.02.09 ORF - ZIB 24 - Trumps CIA-Kandidatin umstritten -- lowquality.mp4

   20180509T235000 ORF - ZIB 24 - Wetter -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Wetter__13976363__o__2966973785__s14297635_5__BCK1HD_00111715P_00120000P_Q4A.mp4  ...
       →  2018-05-09T00.11.17 ORF - ZIB 24 - Wetter -- lowquality.mp4
#+END_EXAMPLE

As you can see, the temporal order of the chunks is extracted so that
the files are in their correct order.

Please note that this does not work with a show whose chunks do cross
midnight since the date is always taken from the start of the show and
the time from the actual time being shown.

** .info.json Meta-Data Files
:PROPERTIES:
:CREATED:  [2019-10-19 Sat 15:21]
:END:

If you do download a media file and its associated separate
=.info.json= file (both base-names without file extension need to
match), this tool is able to parse the meta-data to derive a new file
name.

Currently, there are two meta-data formats supported: ORG TVthek and
YouTube, both via http://rg3.github.io/youtube-dl/

: youtube-dl --write-info-json <URL>

This results, for example, with files like these:

: Durchbruch bei Brexit-Verhandlungen-14577219.info.json
: Durchbruch bei Brexit-Verhandlungen-14577219.mp4
: Isolierte Familie - 58-jähriger Österreicher in U-Haft-14577221.info.json
: Isolierte Familie - 58-jähriger Österreicher in U-Haft-14577221.mp4
: The Star7 PDA Prototype-Ahg8OBYixL0.info.json
: The Star7 PDA Prototype-Ahg8OBYixL0.mp4

Please notice the associated =mp4= files as well as the =info.json=
files.

Applying guess-filename on these files look like this:

#+BEGIN_EXAMPLE
vk@sherri ~tmp % guessfilename *mp4

   Durchbruch bei Brexit-Verhandlungen-14577219.mp4  ...
       →  2019-10-17T16.59.07 ORF - ZIB 17 00 - Durchbruch bei Brexit-Verhandlungen -- highquality.mp4

   Isolierte Familie - 58-jähriger Österreicher in U-Haft-14577221.mp4  ...
       →  2019-10-17T17.01.44 ORF - ZIB 17 00 - Isolierte Familie: 58-jähriger Österreicher in U-Haft -- highquality.mp4

   The Star7 PDA Prototype-Ahg8OBYixL0.mp4  ...
       →  2007-09-13 youtube - The Star7 PDA Prototype - Ahg8OBYixL0.mp4
#+END_EXAMPLE

The =info.json= files are not removed or renamed.

** Extending with your own regular expressions

The structure of the script is like the following:

- general header, command-line argument parser, ...
- =handle_logging()=
- =error_exit()=
- =FileSizePlausibilityException()=
- =class GuessFilename()=
  - *a long list of regular expression definitions*
  - =derive_new_filename_from_old_filename()=
    - here, you can *add code to interpret the regular expressions*
  - =derive_new_filename_from_content()=
    - if you want to parse PDF content, add your code here
  - =derive_new_filename_from_json_metadata()=
    - this handles the JSON meta-data files generated by [[https://ytdl-org.github.io/youtube-dl/index.html][youtube-dl]] (see above)
  - =handle_file()=
    - the function that loops over all files is probing for new file names until a function is returning with a new name:
      1. =derive_new_filename_from_old_filename()=
      2. =derive_new_filename_from_content()=
      3. =derive_new_filename_from_json_metadata()=
      4. if no name returned until here: prints out a warning that no new name could be derived
  - The rest of the class consist of a bunch of tool functions, e.g., for parsing and querying:
  - =adding_tags()=
  - =split_filename_entities()=
  - =contains_one_of()=
  - =contains_all_of()=
  - =fuzzy_contains_one_of()=
  - =fuzzy_contains_all_of()=
  - =has_euro_charge()=
  - =get_euro_charge()=
  - =get_euro_charge_from_context_or_basename()=
  - =get_euro_charge_from_context()=
  - =rename_file()=
  - =get_datetime_string_from_named_groups()=
  - =get_date_string_from_named_groups()=
  - =get_datetime_description_extension_filename()=
  - =get_date_description_extension_filename()=
  - =NumToMonth()=
  - =translate_ORF_quality_string_to_tag()=
  - =get_file_size()=
  - =warn_if_ORF_file_seems_to_small_according_to_duration_and_quality_indicator()=
- =move_to_success_dir()=
- =move_to_error_dir()=
- =main()=

For the most basic pattern matching, you just have to add regular
expressions to the =GuessFilename()= class and add the regex matching
code to =derive_new_filename_from_old_filename()=.

Do not forget to add simple tests to =guessfilename_test.py= as well!

* Related tools and workflows
# --- BEGIN SHARED: filetags_tools --- see https://github.com/novoid/screencasts/

This tool is part of a tool-set which I use to manage my digital files
such as photographs. My work-flows are described in [[http://karl-voit.at/managing-digital-photographs/][this blog posting]]
you might like to read and in the video which is linked above.

In short:

- For *tagging*, please refer to [[https://github.com/novoid/filetags][filetags]] and its documentation. It's
  the most important part of the whole concept on how I manage files.

- See [[https://github.com/novoid/date2name][date2name]] for easily adding ISO *time-stamps or date-stamps* to files.

- For *easily naming and tagging* files within file browsers that
  allow integration of external tools, see [[https://github.com/novoid/appendfilename][appendfilename]] (once more)
  and [[https://github.com/novoid/filetags][filetags]].

- Moving to the archive folders is done using [[https://github.com/novoid/move2archive][move2archive]].

- Naming files is tedious. Therefore, I wrote [[https://github.com/novoid/guess-filename.py/][guessfilename]]:
  Python-script, guesses according to file name, optional PDF content,
  optional video json metadata.

- Having tagged photographs gives you many advantages. For example, I
  automatically [[https://github.com/novoid/set_desktop_background_according_to_season][choose my *desktop background image* according to the
  current season]].

- Files containing an ISO time/date-stamp gets indexed by the
  filename-module of [[https://github.com/novoid/Memacs][Memacs]].

-----------

- Alternative implementations of the =filetags= concept:
  - [[https://github.com/beutelma/filetags.el][GitHub - DerBeutlin/filetags.el: Emacs package to manage filetags in the filename]]
  - With [[https://github.com/protesilaos/denote][denote]], Protesilaos Stavrou implemented a conceptually
    related approach to manage notes within an Emacs buffer. With
    [[https://en.wikipedia.org/wiki/Dired][Emacs/dired]], this method equally may be applied on files, too.

- Related to =date2name=:
  - https://github.com/DerBeutlin/date2name.el Alternative implementation for [[https://en.wikipedia.org/wiki/Dired][Emacs/dired]]
  - https://github.com/muehlburger/d2n Alternative implementation in [[https://go.dev/][Go]]

- Related to =m2a=:
  - https://github.com/velvet-jones/imgfiler/

- Related to =guessfilename=:
  - [[http://www.jonasjberg.com/][Jonas Sjöberg]] took my idea and developed the much more advanced (and
    thus a bit more complicated) [[https://github.com/jonasjberg/autonameow][autonameow]]. It uses rule-based renaming,
    analyzes content of plain text, epub, pdf and rtf files, extracts
    meta-data from many different file formats via [[https://www.sno.phy.queensu.ca/%257Ephil/exiftool/][exiftool]] and so forth.
  - [[https://www.reddit.com/r/datacurator/comments/f6ku5p/building_an_auto_file_sorter_need_requirements/][This reddit thread]] brought me to [[https://github.com/unreadablewxy/fs-curator][fs-curator]] whose [[https://github.com/unreadablewxy/fs-curator/wiki][documentation]] looks
    promising. I did not test it and it's still in an early stage.
    However, it could be a future user-friendly part of a workflow that
    watches folders for file changes and applies processes like
    guessfilename.
  - I you don't need the full power of a programming language,
    [[https://github.com/tfeldmann/organize][organize]] might do the trick for you. Instead of coding Python, you
    define your rules within a text file. For many people, this may
    seem more user friendly.

----------

- A research platform for testing file-tagging on all platforms: [[https://karl-voit.at/tagstore/][tagstore]]
  - This happens to be an important part of [[https://karl-voit.at/tagstore/downloads/Voit2012b.pdf][my PhD thesis]] in PIM.
  - Not maintained since 2013 any more but surely still a cool
    starting point in case you want to get a flexible tool when doing
    research with tagging interfaces.

- Good resources for tagging software in general
  - [[https://turbofuture.com/computers/Whats-the-Best-Software-for-Tagging-Files-A-Review][What's the Best Software for Tagging Files? | TurboFuture]]
  - "Marktübersicht von Tagging-Werkzeugen und Vergleich mit tagstore" (German, 2013): linked on [[https://karl-voit.at/tagstore/en/papers.shtml][this page]] of the [[https://karl-voit.at/tagstore/][tagstore project]]

- If you do like filetags but you prefer the syntax of [[https://www.tagspaces.org/][TagSpaces]] for
  adding tags to file names, you should check out [[https://github.com/jgru/filetags][this filetags fork]].
  Maintenance is limited though. Please notice that my other tools
  working with tags do not support TagSpaces-style either.

- https://forge.chapril.org/tykayn/rangement.git
  - An NPM implementation of a subset of GuessFileName (using image exif header), append2name, move2archive
  - You probably need to read a bit of French
# --- END SHARED: filetags_tools --- see https://github.com/novoid/screencasts/

* How to Thank Me and Contribute to the Poroject
# --- BEGIN SHARED: how_to_thank_me --- see https://github.com/novoid/screencasts/

I'm glad if you like my tool. I've got way more projects on:

- [[https://github.com/novoid/][GitHub]] (oldest projects),
- [[https://gitlab.com/publicvoit/][GitLab.com]] (older projects), and
- [[https://codeberg.org/publicvoit/][Codeberg]] (newest projects).

If you want to support me:

- [[https://karl-voit.at/2018/06/07/cardware/][Send old-fashioned *postcard* per snailmail]] - I love personal feedback!
  - see [[http://tinyurl.com/j6w8hyo][my address]]
- Send feature wishes or improvements as an issue 
- Create issues for bugs
- Contribute merge requests for bug fixes
- Check out my other cool projects on the platforms above

If you want to contribute to this cool project, please fork and
contribute!

I am using [[http://www.python.org/dev/peps/pep-0008/][Python PEP8]] and occasionally some ideas from [[http://en.wikipedia.org/wiki/Test-driven_development][Test Driven
Development (TDD)]]. I fancy Python3 with [[https://typing.python.org/en/latest/spec/annotations.html][type annotations]], although I'm
not using them everywhere at the moment. Starting with 2025, I began
to use help from Claude.ai which is a huge improvement, given my lack
of programming practice and knowledge.

After all, each of my tools was developed because I needed its
functionality and could not get it elsewhere - at least to my
knowledge or taste.

# --- END SHARED: how_to_thank_me --- see https://github.com/novoid/screencasts/


* Local Variables                                                  :noexport:
# Local Variables:
# mode: auto-fill
# mode: flyspell
# eval: (ispell-change-dictionary "en_US")
# End:

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2026.3.6.2

Mar 6, 2026

2026.3.1.1

Mar 1, 2026

2023.2.5.2

Mar 5, 2023

2023.2.5.1

Mar 5, 2023

2023.1.13.1

Jan 13, 2023

2021.8.28.1

Aug 28, 2021

2020.11.21.1

Nov 21, 2020

2020.11.15.1

Nov 15, 2020

2020.3.1.1

Mar 1, 2020

2019.11.23.1

Nov 23, 2019

2019.10.19.2

Oct 19, 2019

2019.10.19.1

Oct 19, 2019

2019.10.10.1

Oct 10, 2019

2018.07.06.1

Jul 6, 2018

2018.06.16.1

Jun 16, 2018

2018.05.21.1

May 21, 2018

2018.05.12.2

May 12, 2018

2018.05.12.1

May 12, 2018

2018.05.10.1

May 10, 2018

2018.02.03.01

Feb 3, 2018

2017.12.24

Dec 24, 2017

2017.12.08

Dec 8, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

guessfilename-2026.3.6.2.tar.gz (54.8 kB view details)

Uploaded Mar 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

guessfilename-2026.3.6.2-py3-none-any.whl (48.5 kB view details)

Uploaded Mar 6, 2026 Python 3

File details

Details for the file guessfilename-2026.3.6.2.tar.gz.

File metadata

Download URL: guessfilename-2026.3.6.2.tar.gz
Upload date: Mar 6, 2026
Size: 54.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for guessfilename-2026.3.6.2.tar.gz
Algorithm	Hash digest
SHA256	`34975a9c1d468350446d77fa4281600f49739bbdf3eb64f3e0181b12a1569fe9`
MD5	`40a12fbdf6086a5fa17bc15db7a9986d`
BLAKE2b-256	`b512bc41deb3118e2f46943b6383dd7948dcc0086a8d10037771cc1135eeadf5`

See more details on using hashes here.

File details

Details for the file guessfilename-2026.3.6.2-py3-none-any.whl.

File metadata

Download URL: guessfilename-2026.3.6.2-py3-none-any.whl
Upload date: Mar 6, 2026
Size: 48.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for guessfilename-2026.3.6.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3e375fbb8b303faeaac3897727d4fb9bc7a333ef65eb340c46193e28deb69ba9`
MD5	`f402650719cfd050108c58808153c38a`
BLAKE2b-256	`50a7c7be3f81d174597aa3e458030b57ba9d4a24538ebc86625818eed031ddcf`

See more details on using hashes here.

guessfilename 2026.3.6.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes