Skip to main content

A tool for archivist's to automate the generation of references for digital files

Project description

Auto Reference Generator Tool

The Auto Reference Generator tool is small python programme to help Digital archivists reference and catalogue Digital Items. It recursively acts through a given directory to create generating reference codes for each directory and file, then exporting the results to an Excel or CSV spreadsheet.

It's platform independent tested functioning on Windows, MacOS and Linux.

Why use this tool?

If you're an archivist dealing with Digital Records, this provides a means of undertaking a referencing of a large amount of digital records at a time, saving a significant amount of time in assigning reference codes to individual records.

The generated spreadsheet also serves as the basis for spreadsheet inputs for the Opex Manifest Generator tool *Shameless Self promotion*.

A Quick Note

If you need to conduct an arrangement of the files; this must be done beforehand for the references to be accurate; though a temporary spreadsheet can be generated to provide assistance in this.

Additional features:

Some additional features include.

  • Append prefixes to the Archival Reference.
  • Identifying the depth / level of each folder.
  • Gathering standard set of Metadata.
  • Changeable starting reference.
  • Logged removal of empty directories.
  • An alternative "Accession Reference" mode.
  • Compatibility with Win32 / Window's 256 Character limit.

Structure of References

Folder                  Reference
-->Root                 0
---->Folder 1           1
------>Sub Folder 1     1/1
-------->File 1         1/1/1
-------->File 2         1/1/2
------>Sub Folder 2     1/2
-------->File 3         1/2/1
-------->File 4         1/2/2
---->Folder 2           2
------>Sub Folder 3     2/1
------>File 5           2/2
---->File 6             3

The root reference defaults to 0, however this the Prefix option can be utilized to change 0 to the desired prefix / archival reference, changing the structure to:

-->Root Folder          ARC
---->Folder             ARC/1
------>Sub Folder       ARC/1/1
------>File             ARC/1/2
etc

Prerequisites

The following modules are utilized and installed with the package:

  • pandas
  • openpyxl

Optional Modules also include:

  • lxml (for XML Export)
  • odfpy (for ODS Export)

Python Version 3.8+ is also recommended. It may work on earlier versions, but this has not been tested.

Installation

To install, simply run:

pip install -U auto_Reference_generator

Usage

To run the basic program, run from the terminal:

auto_ref {path/to/your/folder}

Replacing the path with your folder. If a space is in the path enclose in quotations. On Windows this may look like:

auto_ref "C:\Users\Christopher\Downloads\"

Additional options can be appended before or after the root directory.

To run the program with the Prefix option, add the -p option and type in your prefix:

auto_ref "C:\Users\Christopher\Downloads\" -p "ARCH"

This will generate a spreadsheet in a folder called 'meta' within the 'root' directory.

MetaFolder

The spreadsheet will be named after the 'root' folder and appended with "_Autoref".

FolderSpread

Within the spreadsheet you will have information on the paths of the files as well as some additional metadata: size, extensions and dates.

SpreadPreview

At the end of the spreadsheet an Archive_Reference column with the generated reference.

ReferencePreview

(If ran without Prefix Option this will simply be the numerals)

Accession mode

There is an alternative method of generating a reference number; having create a code based on the directory hierarchy you can simply create one that follows an 'accession number' pattern. IE each file or folder regardless of depth will be given a running number; depending on the 'mode' the running number will only apply to Directories, Files or Both!

Example running Accession in "File" Mode

----> Root              ACC-Dir
------> Folder 1        ACC-Dir
--------> File 1        ACC-1
--------> File 2        ACC-2
------> File 3          ACC-3
------> Folder 2        ACC-Dir
--------> Sub-Folder    ACC-Dir
----------> File 4      ACC-4

The available modes are File, Dir, All

To run in accession mode, use the -acc and -accp options (A prefix must be set):

auto_ref "C:\Users\Christopher\Downloads\" -acc File -accp "ACC"

AccessionPReview

When you generate an Accession Reference an Archive_Reference code will always also be generated.

Set start reference

To set a start reference simply add -str followed by (Note this must be numeral)

Clear Empty Directories

Adding --empty or -rm to the will automatically remove any empty directories within the files. It will also generate a simple text log in the meta folder of the empty directories that were removed.

Fixity

You can also generate Fixities by simply adding the -fx option. This will default to using the SHA-1 algorithm, only MD5, SHA-1, SHA-256 and SHA-512 are supported.

HashPreview

To run a SHA-512 generation:

auto_ref "C:\Users\Christopher\Downloads\" -fx SHA-512

Filtering

By default hidden folders and folders named 'meta' will be ignored. You can include hidden folders by using the option --hidden

Skip

If you just want to generate a spreadsheet without a reference code you can add -skp | --skip, and it will simply generate a spreadsheet without the Archive_Reference

Options:

For up to date options use the -h option to show dialog:

Options:
        -h,     --help          Show Help dialog                              

        -p,     --prefix        Replace Root 0 with specified prefix            [string]
                                Is added to all references

        -s      --suffix        Add a suffix to references                      [string]

        --suffix-options        Set whether to apply to files,                  {apply_to_files,apply_to_folders,
                                folders,or to all                               apply_to_all}
                                default is to apply_to_files.
        
        -l      --level-limit   Set whether to limit generation to              [int]
                                a specific level.
                                Note generated references may have
                                extra delimiter.

        -dlm    --delimiter     Set to change the default delimiter             [string]

        -acc,   --accession     Run in "Accession Mode", this will              {Dir,File,
                                generate a running number of either             All}
                                Files, directories, or Both                                                           
                                
        -accp,  --acc-prefix    Set the Prefix to append onto the running       [boolean]
                                number generated in "Accession Mode"
        
        -fx     --fixity        Generate fixity codes for files                 {MD5, SHA-1, 
                                                                                SHA-256, SHA-512}
        
        -hid    --hidden        Include Hidden directories and files in         [boolean]
                                generation.

        --rm-empty              Will remove all Empty Directories from          [boolean]
                                within a given folder, not including them
                                in the Reference Generation.
                                A simply Text list of removed folders is 
                                then generated to the output directory.
        
        -str,     --start-ref   Set the number to start the Reference           [int] 
                                generation from.
        
        -o,     --output        Set the directory to export the spreadsheet to. [string]      
        
        --disable-meta-dir      Set whether to generate a "meta" directory,     [boolean]
                                to export CSV / Excel file to.
                                Default behavior will be to create a directory,
                                using this option will disable it.      
        
        -skp    --skip          Skip running the Auto Reference process,   [boolean]
                                will generate a spreadsheet but not
                                an Archival Reference
        
        -fmt,   --format        Set export format. Will require                 {xlsx,csv,ods,dict,xml,json}
                                appropriate modules in Python.
                                ods - PyODF
                                xlsx - OpenPyXL
                                xml - lxml     
                                Defaults to xlsx.
        
        -key    --keywords      Set keywords to replace numericals with         [string|path]
                                alphanumerical characters
                                Can be single word or
                                list: 'written,like,this'
                                Or path to a JSON file containing a dict

                                Keywords only currently act upon folders
                                and not files.
        
        -keym   -keywords-mode  Set way to replace:                             {initialise,first_letters,from_json}
                                initialise: My New Folder > MNF
                                first_letters: My New Folder > MYF
                                from_json: Imports
                                Requires to have a Json file with a
                                dictionary of words to replace:
                                {'Word':'Replacement',
                                'SecondWord':'2ndReplacement}
        
        --keywords-retain-      Set whether to continue reference numbering     [bool]
        order                   If not used keywords don't 'count' towards
                                additional references. 
                                IE if the keyword you are replacing is 
                                be reference number: 2, this is moved
                                to what would originally be number 3.
                                Retaining the order means, this scheme is 
                                maintained: IE 3 is still 3, and 2 is skipped

        --keywords-case-        Set to enable case-sensitivity for keyword    [bool]
        sensitivity           matches. Default is cases are not sensitive.

        --keywords-abbreviation Set the number of characters to abbreviate to   [int]
        -number                 Only for first_letters mode.

        --sort-by               Set the sorting method: folders_first, sorts    [folders_first|alphabetically]
                                folders first. Alphabetically, you can guess.
                                Ignores folders.
                
                                
        

Future Developments

  • Level Limitations to allow for "group references" - Added!
  • Generating reference's which use alphabetic characters - Added!

Contributing

I welcome further contributions and feedback.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto_reference_generator-1.3.4.tar.gz (130.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auto_reference_generator-1.3.4-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file auto_reference_generator-1.3.4.tar.gz.

File metadata

  • Download URL: auto_reference_generator-1.3.4.tar.gz
  • Upload date:
  • Size: 130.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for auto_reference_generator-1.3.4.tar.gz
Algorithm Hash digest
SHA256 30c24fd27cb040130bc545df2ddf56151801ad6b5b117d567021b35ee3c11a30
MD5 bcc59acc71c090305af47b05230f574e
BLAKE2b-256 e52c49d30070779a2f9c21f1fca0cc71eb24fdd598107e52f1ddd18db99417c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for auto_reference_generator-1.3.4.tar.gz:

Publisher: pypi-publish.yml on CPJPRINCE/auto_reference_generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file auto_reference_generator-1.3.4-py3-none-any.whl.

File metadata

File hashes

Hashes for auto_reference_generator-1.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 eb7580b37a2aacedac10afee1d32d0d888a14fe6c93370ef58a708d75117b8e7
MD5 4cab21d1a1f436d851abc85f1dedf3f4
BLAKE2b-256 558b9f3da19a1b96203efeb012fcacd3c64f0eb4725a47c37083b1eceb4908ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for auto_reference_generator-1.3.4-py3-none-any.whl:

Publisher: pypi-publish.yml on CPJPRINCE/auto_reference_generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page