Skip to main content

Systematic tracking and referencing of digital artefacts for postgraduate students and early career researchers

Project description

Scientists and engineers create a multitude of digital artefacts during their daily work:
  • experimental results,

  • simulation results,

  • literate programming notebooks analysing experiments and simulations

  • statistical models,

  • machine learning models,

  • figures,

  • tables, etc

In order to trace and track these multiple interconnected research artefacts, hierarchical naming schemes are a powerful tool to document the connection between research artefacts, find previous research outputs, and enable reproducible research [1].

The following naming scheme has evolved over several years to track research artefacts of all kinds:

The general scheme is: DSyymd[hMM]e[_x]__title

  • DS: Project specific initials. Here, DS stands for Data Science.

  • yy: [0-9][0-9] are the last two digits of the years in the 21st century. I won’t live beyond that. So, I do not care for following centuries.

  • m: [o-z] these letters map to the respective months.

  • d: [1-9,A-V] represent the 31 days of a month. Digits and upper-case characters have approximately the same height, such that this element gives a visual structure to the name, which divides the date from the daily counter.

  • h: [a-x] these optional letters refer to the hours of the day

  • MM: [0-5][0-9] are the minutes with 00 encoding the full hour

  • e: [a-z]+ daily counter as lower-case letter enumerating the respective database or dataset. The 28th dataset would start with be enumerated as aa.

  • x: Optional attribute being the last significant characters of the dataset, from which DSyymde is derived.

  • title: Readable name of the respective data set with whitespaces being replaced by underscores.

month m

day d

day d

day d

hour h

hour h

o

January

1

1

B

11

L

21

0

a

12

m

p

February

2

2

C

12

M

22

1

b

13

n

q

March

3

3

D

13

N

23

2

c

14

o

r

April

4

4

E

14

O

24

3

d

15

p

s

May

5

5

F

15

P

25

4

e

16

q

t

June

6

6

G

16

Q

26

5

f

17

r

u

July

7

7

H

17

R

27

6

g

18

s

v

August

8

8

I

18

S

28

7

h

19

t

w

September

9

9

J

19

T

29

8

i

20

u

x

October

A

10

K

20

U

30

9

j

21

v

y

November

V

31

10

k

22

w

z

December

11

l

23

x

  • The first dataset created on Friday 01.01.2021 would be named DS21o1a.

  • The second dataset created on the same day would be named DS21o1b.

  • An analysis (e.g. Jupyter notebook) of the first data set started after the second data set had been created would be named DS21o1c_a. Exported figures of this analysis should be named DS21o1c_a__[plottype].[filetype].

  • An analysis of data set DS21o1b started on 2nd January should be named DS21o2a_1b.

  • An meta analysis of DS21o1c_a and DS21o2a_1b started on 11th February should be named DS21pBa_o1c_2a.

Installation

The module contexere can be installed from PyPi:

pip install contexere

Usage

The project provides the command line tool name:

usage: name [-h] [--version] [-n] [-p PROJECT] [-s] [-t] [-v] [-vv] [path]

Suggest name for research artefact

positional arguments:
  path                  Path to folder with research artefacts (default:
                        current working dir)

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -n, --next            Suggest next artefact name
  -p PROJECT, --project PROJECT
                        Specify project abbreviation
  -s, --summary         Sumarize files following the naming convention
  -t, --time            add time abbreviation
  -u, --utc             Generate timestamp with respect to UTC (default is local timezone)
  -v, --verbose         set loglevel to INFO
  -vv, --very-verbose   set loglevel to DEBUG

Calling the tool without any arguments returns the date abbreviation of today:

name
24xV

Adding the option --time also abbreviates the actual time:

name --time
24xVj36

In an empty folder, a file name is suggested by specifying a project abbreviation and the --next option:

mkdir test_folder
cd test_folder
name --project DS --next
DS24xVa

After the creation of the first file, only the --next option needs to be provided to get another file name suggestion:

touch DS24xVa__test.dat
name --next
DS24xVb

Let’s assume that DS24xVb is the name of a Jupyter notebook, which analyses results created by DS24xVa.py. You can reflect this relation by actually naming the notebook DS24xVb_xVa__anlysis.ipynb to reflect its relation to DS24xVa.py. However, simplifying the name by removing redundant elements of the reference DS24xVb_a__anlysis.ipynb is even more efficient.

References

[1] Martin Kühne and Andreas W. Liehr. Improving the traditional information management in natural sciences. Data Science Journal, 8(1):18–26, 2009. doi: 10.2481/dsj.8.18.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contexere-1.0.0.tar.gz (51.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contexere-1.0.0-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file contexere-1.0.0.tar.gz.

File metadata

  • Download URL: contexere-1.0.0.tar.gz
  • Upload date:
  • Size: 51.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for contexere-1.0.0.tar.gz
Algorithm Hash digest
SHA256 5c712d3d90e562da70c43986636ccba106a77df2028f2aaf352c7017a25048eb
MD5 8b506980f27175ddcdaa2be56c5e7fcc
BLAKE2b-256 4edf4f2e7c9198214a1250e597fd560b617ab948888d2d055da1bc05087d3012

See more details on using hashes here.

Provenance

The following attestation bundles were made for contexere-1.0.0.tar.gz:

Publisher: release.yml on kempa-liehr/contexere

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file contexere-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: contexere-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for contexere-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 21da72100d6a8a688906d745cc218409b9a10688794b3aa10cfbe8f82aab078e
MD5 9819cc0746751d3a4e07130c0e9ed92e
BLAKE2b-256 988723be71a813605cbe055865b8a92becf8b5ffc9ad77de5679ad75fdb0bb30

See more details on using hashes here.

Provenance

The following attestation bundles were made for contexere-1.0.0-py3-none-any.whl:

Publisher: release.yml on kempa-liehr/contexere

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page