A tool for archivist's to automate the generation of references for digital files
Project description
Auto Reference Generator Tool
The Auto Reference Generator tool is small python programme to help Digital archivists reference and catalogue Digital Items. It recursively acts through a given directory to create generating reference codes for each directory and file, then exporting the results to an Excel or CSV spreadsheet.
It's platform independent tested functioning on Windows, MacOS and Linux.
Why use this tool?
If you're an archivist dealing with Digital Records, this provides a means of undertaking a referencing of a large amount of digital records at a time, saving a significant amount of time in assigning reference codes to individual records.
The generated spreadsheet also serves as the basis for spreadsheet inputs for the Opex Manifest Generator tool *Shameless Self promotion*.
A Quick Note
If you need to conduct an arrangement of the files; this must be done beforehand for the references to be accurate; though a temporary spreadsheet can be generated to provide assistance in this.
Additional features:
Some additional features include.
- Append prefixes to the Archival Reference.
- Identifying the depth / level of each folder.
- Gathering standard set of Metadata.
- Changeable starting reference.
- Logged removal of empty directories.
- An alternative "Accession Reference" mode.
- Compatibility with Win32 / Window's 256 Character limit.
Structure of References
Folder Reference
-->Root 0
---->Folder 1 1
------>Sub Folder 1 1/1
-------->File 1 1/1/1
-------->File 2 1/1/2
------>Sub Folder 2 1/2
-------->File 3 1/2/1
-------->File 4 1/2/2
---->Folder 2 2
------>Sub Folder 3 2/1
------>File 5 2/2
---->File 6 3
The root reference defaults to 0, however this the Prefix option can be utilized to change 0 to the desired prefix / archival reference, changing the structure to:
-->Root Folder ARC
---->Folder ARC/1
------>Sub Folder ARC/1/1
------>File ARC/1/2
etc
Prerequisites
The following modules are utilized and installed with the package:
- pandas
- openpyxl
Optional Modules also include:
- lxml (for XML Export)
- odfpy (for ODS Export)
Python Version 3.8+ is also recommended. It may work on earlier versions, but this has not been tested.
Installation
To install, simply run:
pip install -U auto_Reference_generator
Usage
To run the basic program, run from the terminal:
auto_ref {path/to/your/folder}
Replacing the path with your folder. If a space is in the path enclose in quotations. On Windows this may look like:
auto_ref "C:\Users\Christopher\Downloads\"
Additional options can be appended before or after the root directory.
To run the program with the Prefix option, add the -p option and type in your prefix:
auto_ref "C:\Users\Christopher\Downloads\" -p "ARCH"
This will generate a spreadsheet in a folder called 'meta' within the 'root' directory.
The spreadsheet will be named after the 'root' folder and appended with "_Autoref".
Within the spreadsheet you will have information on the paths of the files as well as some additional metadata: size, extensions and dates.
At the end of the spreadsheet an Archive_Reference column with the generated reference.
(If ran without Prefix Option this will simply be the numerals)
Accession mode
There is an alternative method of generating a reference number; having create a code based on the directory hierarchy you can simply create one that follows an 'accession number' pattern. IE each file or folder regardless of depth will be given a running number; depending on the 'mode' the running number will only apply to Directories, Files or Both!
Example running Accession in "File" Mode
----> Root ACC-Dir
------> Folder 1 ACC-Dir
--------> File 1 ACC-1
--------> File 2 ACC-2
------> File 3 ACC-3
------> Folder 2 ACC-Dir
--------> Sub-Folder ACC-Dir
----------> File 4 ACC-4
The available modes are File, Dir, All
To run in accession mode, use the -acc and -accp options (A prefix must be set):
auto_ref "C:\Users\Christopher\Downloads\" -acc File -accp "ACC"
When you generate an Accession Reference an Archive_Reference code will always also be generated.
Set start reference
To set a start reference simply add -str followed by (Note this must be numeral)
Clear Empty Directories
Adding --empty or -rm to the will automatically remove any empty directories within the files. It will also generate a simple text log in the meta folder of the empty directories that were removed.
Fixity
You can also generate Fixities by simply adding the -fx option. This will default to using the SHA-1 algorithm, only MD5, SHA-1, SHA-256 and SHA-512 are supported.
To run a SHA-512 generation:
auto_ref "C:\Users\Christopher\Downloads\" -fx SHA-512
Filtering
By default hidden folders and folders named 'meta' will be ignored. You can include hidden folders by using the option --hidden
Skip
If you just want to generate a spreadsheet without a reference code you can add -skp | --skip, and it will simply generate a spreadsheet without the Archive_Reference
Options:
For up to date options use the -h option to show dialog:
Options:
-h, --help Show Help dialog
-p, --prefix Replace Root 0 with specified prefix [string]
Is added to all references
-s --suffix Add a suffix to references [string]
--suffix-options Set whether to apply to files, {apply_to_files,apply_to_folders,
folders,or to all apply_to_all}
default is to apply_to_files.
-l --level-limit Set whether to limit generation to [int]
a specific level.
Note generated references may have
extra delimiter.
-dlm --delimiter Set to change the default delimiter [string]
-acc, --accession Run in "Accession Mode", this will {Dir,File,
generate a running number of either All}
Files, directories, or Both
-accp, --acc-prefix Set the Prefix to append onto the running [boolean]
number generated in "Accession Mode"
-fx --fixity Generate fixity codes for files {MD5, SHA-1,
SHA-256, SHA-512}
-hid --hidden Include Hidden directories and files in [boolean]
generation.
--rm-empty Will remove all Empty Directories from [boolean]
within a given folder, not including them
in the Reference Generation.
A simply Text list of removed folders is
then generated to the output directory.
-str, --start-ref Set the number to start the Reference [int]
generation from.
-o, --output Set the directory to export the spreadsheet to. [string]
--disable-meta-dir Set whether to generate a "meta" directory, [boolean]
to export CSV / Excel file to.
Default behavior will be to create a directory,
using this option will disable it.
-skp --skip Skip running the Auto Reference process, [boolean]
will generate a spreadsheet but not
an Archival Reference
-fmt, --format Set export format. Will require {xlsx,csv,ods,dict,xml,json}
appropriate modules in Python.
ods - PyODF
xlsx - OpenPyXL
xml - lxml
Defaults to xlsx.
-key --keywords Set keywords to replace numericals with [string|path]
alphanumerical characters
Can be single word or
list: 'written,like,this'
Or path to a JSON file containing a dict
Keywords only currently act upon folders
and not files.
-keym -keywords-mode Set way to replace: {initialise,first_letters,from_json}
initialise: My New Folder > MNF
first_letters: My New Folder > MYF
from_json: Imports
Requires to have a Json file with a
dictionary of words to replace:
{'Word':'Replacement',
'SecondWord':'2ndReplacement}
--keywords-retain- Set whether to continue reference numbering [bool]
order If not used keywords don't 'count' towards
additional references.
IE if the keyword you are replacing is
be reference number: 2, this is moved
to what would originally be number 3.
Retaining the order means, this scheme is
maintained: IE 3 is still 3, and 2 is skipped
--keywords-case- Set to enable case-sensitivity for keyword [bool]
sensitivity matches. Default is cases are not sensitive.
--keywords-abbreviation Set the number of characters to abbreviate to [int]
-number Only for first_letters mode.
--sort-by Set the sorting method: folders_first, sorts [folders_first|alphabetically]
folders first. Alphabetically, you can guess.
Ignores folders.
Future Developments
Level Limitations to allow for "group references"- Added!Generating reference's which use alphabetic characters- Added!
Contributing
I welcome further contributions and feedback.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file auto_reference_generator-1.3.4.tar.gz.
File metadata
- Download URL: auto_reference_generator-1.3.4.tar.gz
- Upload date:
- Size: 130.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
30c24fd27cb040130bc545df2ddf56151801ad6b5b117d567021b35ee3c11a30
|
|
| MD5 |
bcc59acc71c090305af47b05230f574e
|
|
| BLAKE2b-256 |
e52c49d30070779a2f9c21f1fca0cc71eb24fdd598107e52f1ddd18db99417c0
|
Provenance
The following attestation bundles were made for auto_reference_generator-1.3.4.tar.gz:
Publisher:
pypi-publish.yml on CPJPRINCE/auto_reference_generator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
auto_reference_generator-1.3.4.tar.gz -
Subject digest:
30c24fd27cb040130bc545df2ddf56151801ad6b5b117d567021b35ee3c11a30 - Sigstore transparency entry: 756121466
- Sigstore integration time:
-
Permalink:
CPJPRINCE/auto_reference_generator@1e2d0479c873ad1e18f6abef9ea21bab3737c6ac -
Branch / Tag:
refs/tags/Release-1.3.4---the-Great-Renaming - Owner: https://github.com/CPJPRINCE
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@1e2d0479c873ad1e18f6abef9ea21bab3737c6ac -
Trigger Event:
push
-
Statement type:
File details
Details for the file auto_reference_generator-1.3.4-py3-none-any.whl.
File metadata
- Download URL: auto_reference_generator-1.3.4-py3-none-any.whl
- Upload date:
- Size: 22.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb7580b37a2aacedac10afee1d32d0d888a14fe6c93370ef58a708d75117b8e7
|
|
| MD5 |
4cab21d1a1f436d851abc85f1dedf3f4
|
|
| BLAKE2b-256 |
558b9f3da19a1b96203efeb012fcacd3c64f0eb4725a47c37083b1eceb4908ce
|
Provenance
The following attestation bundles were made for auto_reference_generator-1.3.4-py3-none-any.whl:
Publisher:
pypi-publish.yml on CPJPRINCE/auto_reference_generator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
auto_reference_generator-1.3.4-py3-none-any.whl -
Subject digest:
eb7580b37a2aacedac10afee1d32d0d888a14fe6c93370ef58a708d75117b8e7 - Sigstore transparency entry: 756121490
- Sigstore integration time:
-
Permalink:
CPJPRINCE/auto_reference_generator@1e2d0479c873ad1e18f6abef9ea21bab3737c6ac -
Branch / Tag:
refs/tags/Release-1.3.4---the-Great-Renaming - Owner: https://github.com/CPJPRINCE
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@1e2d0479c873ad1e18f6abef9ea21bab3737c6ac -
Trigger Event:
push
-
Statement type: