copy/track files in git, and a CLI tool/library to traverse the history
Project description
git_doc_history
Copy and track files in git
, and a library to traverse the history
Really, the doc
isn't super accurate, but 'file/index/history' is such an overloaded term when it comes to git
I use this to track my todo.txt
files, changes to configuration files, any shell histories which don't support timestamps (see all of my config files here)
This copies the files to a different directory, so it doesn't interfere with the application/configuration
By copying those files to a separate directory, I can always roll back to previous file, or see what the file was like a couple days/months ago.
For shell histories/files which are unique lines of text (e.g., my todo.txt
file) this also lets me estimate timestamps for when new lines were added to the history/text files, using the iter_diffs
below, which emits added/removed events for individual events with estimated times
Installation
Requires python3.8+
To install with pip, run:
pip install git_doc_history
Usage
The main script to backup data is [bin/git_doc_history
], which gets installed into your ~/.local/bin/
directory.
If uses a config file (parsed with python-dotenv
) like:
SOURCE_DIR=~/.todo # copy from
BACKUP_DIR=~/data/todo_git_history # copy to
# multiple lines means multiple files
COPY_FILES="todo.txt
done.txt"
You can either provide the full path to that config file, or place the file in ~/.config/git_doc_history
For example, after placing it at ~/.config/git_doc_history/todo -- to copy/commit any changes, run:
$ git_doc_history todo
Generated configuration:
SOURCE_DIR: /home/sean/data/todo
BACKUP_DIR: /home/sean/data/todo_git_history
COPY_FILES: todo.txt
done.txt
'/home/sean/data/todo/todo.txt' -> '/home/sean/data/todo_git_history/todo.txt'
'/home/sean/data/todo/done.txt' -> '/home/sean/data/todo_git_history/done.txt'
'/home/sean/data/todo/.gitignore' -> '/home/sean/data/todo_git_history/.gitignore'
[master f927490] update
1 file changed, 1 insertion(+)
create mode 100644 .gitignore
That uses python3 -m git_doc_history shell todo
to parse the configuration file, like:
eval "$(python3 -m git_doc_history shell todo)"
The python library comes with a small CLI interface to extract a file from some time ago:
$ python3 -m git_doc_history extract-file-at --at 2020-09-20 -c todo todo.txt -
setup command of completion
The BACKUP_DIR
is of course just a regular git directory -- you can reset --hard
to some point in the past to get rid of recent commits, rebase
/squash
to merge commits or do whatever you please
Library Usage
Most things will be done with git_doc_history.DocHistory
This doesn't assume the filetype is readable text (you may be storing images/binary doc files in the git repository), so the default is to return the data as bytes
-- you can .decode("utf-8")
to convert that to readable text
To traverse the entire history:
from git_doc_history import DocHistory
from git_doc_history.config import parse_config, resolve_config
# parse the config from the env file
doc = DocHistory.from_dict(parse_config(resolve_config("todo")))
# iterate through the history for the todo.txt file
for snapshot in doc.iter_commit_snapshots("todo.txt"):
print(str(snapshot.commit_sha))
print(str(snapshot.dt))
print(snapshot.data.decode("utf-8"))
Parsing Diffs
Iterates through the git history in chronological order, keeping track
of when data was added or removed. By default, this parses the file
given by splitting it into lines. If lines are added/removed, this returns an
event which specifies when in the history, and what was added/removed
Alternatively, can pass a parse_func
, which is a function which
accepts the DocHistorySnapshot
, and retuns a list of hashable items
to store as state
For an example of parsing diffs, see examples/todotxt_diff.py
:
Example output looks something like:
added 2022-03-08 12:14:45 (C) 2022-03-08 create shebang script +programming
removed 2022-03-08 13:14:58 (C) 2022-03-08 create shebang script +programming
added 2022-03-08 22:23:39 save formhistory.sqlite in browserexport
removed 2022-03-08 23:23:45 save formhistory.sqlite in browserexport
added 2022-03-09 02:49:58 (C) create a python fzf wrapper because apparently I cant find a good one
added 2022-03-10 16:24:24 (B) 2022-03-10 create plaintext playlist parser module +music
removed 2022-03-11 01:30:49 (B) 2022-03-10 create plaintext playlist parser module +music
added 2022-03-11 10:37:06 (C) 2022-03-11 sync tmux from home directory +programming
added 2022-03-12 03:44:24 install undotree +vim +programming
removed 2022-03-12 04:44:51 (C) 2022-03-11 sync tmux from home directory +programming
removed 2022-03-12 10:51:20 install undotree +vim +programming
In this case, 'removed' would mean I either changed the text on the line, or (more likely) I completed it
Tests
git clone 'https://github.com/seanbreckenridge/git_doc_history'
cd ./git_doc_history
pip install '.[testing]'
pytest
flake8 ./git_doc_history
mypy ./git_doc_history
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file git_doc_history-0.1.1.tar.gz
.
File metadata
- Download URL: git_doc_history-0.1.1.tar.gz
- Upload date:
- Size: 11.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ddb0231c8eccbb89e23bb04f3d1994c35b759b8badbfa2d0f3f708b1cf248b3c |
|
MD5 | 8393efa5727b2260799d12dd62c058cb |
|
BLAKE2b-256 | 41e8dab593709af3f0386248681e05a290842a63a989e460baa1d123c5985cce |
File details
Details for the file git_doc_history-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: git_doc_history-0.1.1-py3-none-any.whl
- Upload date:
- Size: 11.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4cf4541891302e4b69044abdb3f3de7ecf3ec639b193d95dbb4ee4715714f44d |
|
MD5 | f25faff5992ea437aeeb42a2a7b1cd0e |
|
BLAKE2b-256 | 9b1e08ed342ae760e43ebea200fb743617db6def38d4590d68264f80c8b67b5c |