data-annalist

Audit trail generator for data processing scripts.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Natural Language
- English
Programming Language
- Python :: 3.10

Project description

Annalist

Audit trail generator for data processing scripts.

Free software: GNU General Public License v3
Documentation: https://annalist.readthedocs.io.

Usage

Create an Annalist object at the base of the module you’d like to audit. use the @Annalist.annalize decorator on any function you would like to annalize

from annalist.annalist import Annalist

ann = Annalist()

@ann.annalize
def example_function():
    ...

Annalise also works on most class functions, with some exceptions.

class ExampleClass():

    # Initializers can be annalized just fine
    @ann.annalize
    __init__(self, arg1, arg2):
        self.arg1 = arg1
        self._arg2 = arg2
        ...

    # DO NOT put an annalizer on a property definition.
    # The annalizer calls the property itself, creating infinite recursion.
    @property
    def arg2(self):
        return self._arg2

    # Putting an annalizer on a setter is fine though.
    # Just make sure you put it after the setter decorator.
    @arg2.setter
    @ann.annalize
    def arg2(self, value):
        self._arg2 = value

    # DO NOT put it on the __repr__ either.
    # Same as before, this creates infinite recursion.
    def __repr__(self):
        return f"{str(arg1)}: {str(arg2)}"

In the main script, the Annalist object must be called again. This will point to the singleton object initialized in the dependency. The annalist must be configured before usage.

>>> ann = Annalist()
>>> ann.configure(logger_name="Example Logger", analyst_name="Speve")

Now the annalized code can be run like normal, and will be audited.

>>> example_function()
2023/11/2 09:42:13 | INFO | example_function called by Speve as part of Example Logger session

Feature Roadmap

This roadmap outlines the planned features and milestones for the development of our deterministic and reproducible process auditing system.

Milestone 1: Audit Logging Framework

Develop a custom audit logging framework or class.
Capture function names, input parameters, return values, data types, and timestamps.
Implement basic logging mechanisms for integration.

Milestone 2: Standardized Logging Format

Define a standardized logging format for comprehensive auditing.
Ensure consistency and machine-readability of the logging format.

Milestone 3: Serialization and Deserialization

Implement serialization and deserialization mechanisms.
Store and retrieve complex data structures and objects.
Test serialization for data integrity.

Milestone 4: Versioning and Dependency Tracking

Capture and log codebase version (Git commit hash) and dependencies.
Ensure accurate logging of version and dependency information.

Milestone 5: Integration Testing

Create integration tests using the audit logging framework.
Log information during the execution of key processes.
Begin development of process recreation capability.

Milestone 6: Reproduction Tool (Partial)

Develop a tool or script to read and reproduce processes from the audit trail.
Focus on recreating the environment and loading serialized data.

Milestone 7: Documentation (Partial)

Create initial documentation.
Explain how to use the audit logging framework and the audit trail format.
Document basic project functionalities.

Milestone 8: Error Handling

Implement robust error handling for auditing and reproduction code.
Gracefully handle potential issues.
Provide informative and actionable error messages.

Milestone 9: MVP Testing

Conduct testing of the MVP.
Reproduce processes from the audit trail and verify correctness.
Gather feedback from initial users within the organization.

Milestone 10: MVP Deployment

Deploy the MVP within the organization.
Make it available to relevant team members.
Encourage usage and collect user feedback.

Milestone 11: Feedback and Iteration

Gather feedback from MVP users.
Identify shortcomings, usability issues, or missing features.
Prioritize and plan improvements based on user feedback.

Milestone 12: Scaling and Extending

Explore scaling the solution to cover more processes.
Add additional features and capabilities to enhance usability.

Please note that milestones may overlap, and the order can be adjusted based on project-specific needs. We aim to remain flexible and responsive to feedback during development.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.0 (2023-09-13)

First release on PyPI.

0.1.1 (2023-10-27)

Basic logging functionality.
Only supports logging to console.

0.2.0 (2023-11-2)

Implemented Annalist as a Singleton.
Usage now includes configuration step.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Natural Language
- English
Programming Language
- Python :: 3.10

Release history Release notifications | RSS feed

0.4.3

Apr 3, 2024

0.4.2

Apr 2, 2024

0.4.1

Feb 13, 2024

0.4.0

Feb 13, 2024

0.3.6

Nov 28, 2023

0.3.5

Nov 28, 2023

0.3.4

Nov 28, 2023

0.3.3

Nov 24, 2023

0.3.2

Nov 19, 2023

This version

0.2.0

Nov 2, 2023

0.1.1

Oct 27, 2023

0.1.0

Sep 25, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data-annalist-0.2.0.tar.gz (14.1 kB view hashes)

Uploaded Nov 2, 2023 Source

Built Distribution

data_annalist-0.2.0-py2.py3-none-any.whl (6.6 kB view hashes)

Uploaded Nov 2, 2023 Python 2 Python 3

Hashes for data-annalist-0.2.0.tar.gz

Hashes for data-annalist-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`436c5664fbc582915c78ec4100d71e1929f06938e2f622820d811ea72f5218f7`
MD5	`c609fe30963ad1a6c25810b15c8d888f`
BLAKE2b-256	`b05c7484d5c6e3ac5e9540f57b435de3cb9f97693d72be03eb97e3db6352202c`

Hashes for data_annalist-0.2.0-py2.py3-none-any.whl

Hashes for data_annalist-0.2.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`e55f7ff4a11de49b8055e7e830634cd6ccb9fb2ba2eb9069c54c6adfb7c1b70a`
MD5	`7f122afb19e3f7f3ad6173ea698f3486`
BLAKE2b-256	`f5b0f4206e2633cdaf0659767676e9d26da83bc6afe264e3566a05a79f7543d4`