A tool for archivist's to automate the generation of references for digital files
Project description
Auto Reference Generator
A small python programme to generate hieracrchical archival reference for files and directories and export the results to a spreadsheet.
Badges: PyPI | CI | Coverage | License
Quick Start
Install:
pip install -U auto_reference_generator
Or download portable version.
Run:
auto_ref /path/to/root -p PREFIX -o /path/to/output
Generates a meta/ folder with output root_AutoRef.xlsx
To use the portable version navigate to the Downloaded then run with:
.\auto_ref.exe .\path\to\root
Version & Package info
Python Version: Python Version 3.10+ is recommended. Earlier versions may work but are not tested.
Additional Packages:
- pandas (required)
- openpyxl (required - spreadsheet exports)
- pyodf (ods export)
- lxml (xml export)
- tqdm To install:
pip install pandas openpyxl pyodf lxml tqdm
Portable version includes
Why use this tool?
This tool is designed for archivist's cataloguing large amounts of Digital Records at a time.
It's platform independent tested functioning on Windows and Linux (untested on MacOS).
Features:
- Hierachical reference generation with customisatible (Prefixes/Suffixes/Delimiters/Starting ref).
- 'Level' identification and limiting.
- Keyword filtering - replacing Numericals with specified keywords (intials, first letter, JSON map)
- Logged removal of empty directories.
- An Accession / Running Number mode.
- Fixity Generation
- Exports: xslx (Default), csv, ods (requires pyodf), json or xml (requires lxml).
- Integration with Opex Manifest Generator *Shameless Self promotion*.
Basic Usage / Examples
- Basic:
auto_ref /path/to/root - Prefix:
auto_ref /path/to/root -p PREFIX - Suffix:
auto_ref /path/to/root -s SUFFIX - Delimiter:
auto_ref /path/to/root -dlm "-" - Accession:
auto_ref /path/to/root -acc file - Fixity:
auto_ref /path/to/root -fx MD5 - Format:
auto_ref /path/to/root -fmt csv - Remove Empty
auto_ref /path/to/root --remove-empty - Output:
auto_ref /path/to/root -o /path/to/output - Include Hidden:
auto_ref /path/to/root --hidden
These options can be combined in an.
Output
Expected ouput is like so:
This includes a preset of metadata: Including: FullName, RelativeName, BaseName, Size, Modified, Ref_Section Level, Parent, Archive_Reference,
The reference will by default be generated to the Archive_Reference column:
Structure of References
Usage with Prefix ARC.
Folder Reference
>Root ARC
--->Folder 1 ARC/1
------>Sub Folder 1 ARC/1/1
--------->File 1 ARC/1/1/1
--------->File 2 ARC/1/1/2
------>Sub Folder 2 ARC/1/2
--------->File 3 ARC/1/2/1
--------->File 4 ARC/1/2/2
--->Folder 2 ARC/2
------>Sub Folder 3 ARC/2/1
--------->File 5 ARC/2/2
--->File 6 ARC/3
Files and Folders can coexist at the same level. Without a prefix the root reference defaults to 0:
>Root 0
--->Folder 1
------>Sub Folder 1/1
--------->File 1/1/1
--------->File2 1/1/2
------>File3 1/2
Prefixes can be set to any point in an Hierachy.
Advanced Options
Accession mode
An alternative method of code generation is based on an 'accession number'/running number pattern. Each file or folder will be given a running number regardless of depth.
Example output running Accession in "File" Mode:
> Root ACC-Dir
---> Folder 1 ACC-Dir
------> File 1 ACC-1
------> File 2 ACC-2
---> File 3 ACC-3
---> Folder 2 ACC-Dir
------> Sub-Folder ACC-Dir
---------> File 4 ACC-4
To run:
auto_ref "/path/to/root" -acc file -accp "ACC"
The available modes are for file, dir, all. Output will be to an additiona Accession_Reference column
Clear Empty Directories
Running auto_ref /path/to/root --remove-empty automatically remove any empty directories. A plain text log of the removed directories will be saved to meta/.
Fixity
To run a fixity and save to the output: auto_ref /path/to/root -fx ALGORITHM This will default to using the SHA-1 algorithm. MD5, SHA-1, SHA-256 and SHA-512 are supported.
Level Limit
Set a level limit to stop generating referencing at: auto_ref /path/to/root -l 5 will stop generating references 5 levels below root.
Skip
If you want to generate a spreadsheet without a reference code you can run: auto_ref /path/to/root --skip
Options File
You can customise the program by creating an 'options file' using the --options-file option. This allows for customisation of the column headers and the programs defaults.
Options given as:
[options]
INDEX_FIELD = FullName #Sets which field to use for index.
PATH_FIELD = FullName
RELATIVE_FIELD = RelativeName
PARENT_FIELD = Parent
PARENT_REF = Parent_Ref
REFERENCE_FIELD = Archive_Reference
REF_SECTION = Ref_Section
ACCESSION_FIELD = Accession_Reference
LEVEL_FIELD = Level
...
Keywords
To use Keywords: auto_ref /path/to/root -key Keyword1 Keyword2 Keyword3 ... -keym intialise The keywords will replace the reference number to all matches of the keyword. The way the replacement is made is determined by the -keym / --keyword-mode.
Keyword Modes:
intialiseUses the intials of the words.Department of JusticebecomesDOJ. Singluar words use first x amount of letters.firstlettersuses the first x amount of letters.Department of JusticesbecomesDEP.from_jsoncan create a Python Dictionary in a JSON file and set KEYWORD to that file. Will replace all keys with given value. Run like so:auto_ref /path/to/root -key /path/to/json.json.- JSON Dict written like so:
{'keyword':'value','keyword2','value2',...}
Full Options:
(Note to self add in argparse to md...)
The below covers the full range of options. This may not be up to date.
For up to date options use the -h option to show dialog:
Options:
-h, --help Show Help dialog
-p, --prefix Replace Root 0 with specified prefix [string]
Is added to all references
-s --suffix Add a suffix to references [string]
--suffix-options Set whether to apply to files, {apply_to_files,apply_to_folders,
folders,or to all apply_to_all}
default is to apply_to_files.
-l --level-limit Set whether to limit generation to [int]
a specific level.
Note generated references may have
extra delimiter.
-dlm --delimiter Set to change the default delimiter [string]
-acc, --accession Run in "Accession Mode", this will {Dir,File,
generate a running number of either All}
Files, directories, or Both
-accp, --acc-prefix Set the Prefix to append onto the running [boolean]
number generated in "Accession Mode"
-fx --fixity Generate fixity codes for files {MD5, SHA-1,
SHA-256, SHA-512}
-hid --hidden Include Hidden directories and files in [boolean]
generation.
--remove-empty Will remove all Empty Directories from [boolean]
within a given folder, not including them
in the Reference Generation.
A simply Text list of removed folders is
then generated to the output directory.
-str, --start-ref Set the number to start the Reference [int]
generation from.
-o, --output Set the directory to export the spreadsheet to. [string]
--disable-meta-dir Set whether to generate a "meta" directory, [boolean]
to export CSV / Excel file to.
Default behavior will be to create a directory,
using this option will disable it.
-skp --skip Skip running the Auto Reference process, [boolean]
will generate a spreadsheet but not
an Archival Reference
-fmt, --format Set export format. Will require {xlsx,csv,ods,dict,xml,json}
appropriate modules in Python.
ods - PyODF
xlsx - OpenPyXL
xml - lxml
Defaults to xlsx.
-key --keywords Set keywords to replace numericals with [string|path]
alphanumerical characters
Can be single word or
list: 'written,like,this'
Or path to a JSON file containing a dict
Keywords only currently act upon folders
and not files.
-keym -keywords-mode Set way to replace: {initialise,first_letters,from_json}
initialise: My New Folder > MNF
first_letters: My New Folder > MYF
from_json: Imports
Requires to have a Json file with a
dictionary of words to replace:
{'Word':'Replacement',
'SecondWord':'2ndReplacement}
--keywords-retain- Set whether to continue reference numbering [bool]
order If not used keywords don't 'count' towards
additional references.
IE if the keyword you are replacing is
be reference number: 2, this is moved
to what would originally be number 3.
Retaining the order means, this scheme is
maintained: IE 3 is still 3, and 2 is skipped
--keywords-case- Set to enable case-sensitivity for keyword [bool]
sensitivity matches. Default is cases are not sensitive.
--keywords-abbreviation Set the number of characters to abbreviate to [int]
-number Only for first_letters mode.
--sort-by Set the sorting method: folders_first, sorts [folders_first|alphabetically]
folders first. Alphabetically, you can guess.
Ignores folders.
Troubleshooting
- On Windows ensure that your root folder does not end in a
\. - The
meta/folder will always be ignored.
Future Developments
Level Limitations to allow for "group references"- Added!Generating reference's which use alphabetic characters- Added!- Physical Level
Contributing
I welcome further contributions and feedback.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file auto_reference_generator-1.3.5.tar.gz.
File metadata
- Download URL: auto_reference_generator-1.3.5.tar.gz
- Upload date:
- Size: 167.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1062c152d45e7c0769c593f7907837abc3c75dd7396d27701f2f9b7b4da9aa4d
|
|
| MD5 |
2029f8cc499c694eda21f3b5103e04d2
|
|
| BLAKE2b-256 |
5657be77e27730417bbb46b20fb048d5717db82ed70468c2c590b561de9ff1ed
|
Provenance
The following attestation bundles were made for auto_reference_generator-1.3.5.tar.gz:
Publisher:
pypi-publish.yml on CPJPRINCE/auto_reference_generator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
auto_reference_generator-1.3.5.tar.gz -
Subject digest:
1062c152d45e7c0769c593f7907837abc3c75dd7396d27701f2f9b7b4da9aa4d - Sigstore transparency entry: 845380010
- Sigstore integration time:
-
Permalink:
CPJPRINCE/auto_reference_generator@7337ccc8098fbb706be8665b011b3f7081eb4b3b -
Branch / Tag:
refs/tags/v1.3.5 - Owner: https://github.com/CPJPRINCE
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@7337ccc8098fbb706be8665b011b3f7081eb4b3b -
Trigger Event:
push
-
Statement type:
File details
Details for the file auto_reference_generator-1.3.5-py3-none-any.whl.
File metadata
- Download URL: auto_reference_generator-1.3.5-py3-none-any.whl
- Upload date:
- Size: 24.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b1324eb62cd7db89a267f58d580cdf51ea1e695793a85cc6e45671bf830adf7
|
|
| MD5 |
65fb0e11810394dba4a6405d058d0848
|
|
| BLAKE2b-256 |
e1a3775f99eb48f553674a7a6998cb676d6ab7f6d5c11817eefd39aa10801050
|
Provenance
The following attestation bundles were made for auto_reference_generator-1.3.5-py3-none-any.whl:
Publisher:
pypi-publish.yml on CPJPRINCE/auto_reference_generator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
auto_reference_generator-1.3.5-py3-none-any.whl -
Subject digest:
6b1324eb62cd7db89a267f58d580cdf51ea1e695793a85cc6e45671bf830adf7 - Sigstore transparency entry: 845380013
- Sigstore integration time:
-
Permalink:
CPJPRINCE/auto_reference_generator@7337ccc8098fbb706be8665b011b3f7081eb4b3b -
Branch / Tag:
refs/tags/v1.3.5 - Owner: https://github.com/CPJPRINCE
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@7337ccc8098fbb706be8665b011b3f7081eb4b3b -
Trigger Event:
push
-
Statement type: