Skip to main content

Ad-hoc Test EXecutor

Project description

ATEX = Ad-hoc Test EXecutor

A collections of Python APIs to provision operating systems, collect and execute FMF-style tests, gather and organize their results and generate reports from those results.

The name comes from a (fairly unique to FMF/TMT ecosystem) approach that allows provisioning a pool of systems and scheduling tests on them as one would on an ad-hoc pool of thread/process workers - once a worker becomes free, it receives a test to run.
This is in contrast to splitting a large list of N tests onto M workers like N/M, which yields significant time penalties due to tests having very varies runtimes.

Above all, this project is meant to be a toolbox, not a silver-plate solution. Use its Python APIs to build a CLI tool for your specific use case.
The CLI tool provided here is just for demonstration / testing, not for serious use - we want to avoid huge modular CLIs for Every Possible Scenario. That's the job of the Python API. Any CLI should be simple by nature.


THIS PROJECT IS HEAVILY WIP, THINGS WILL MOVE AROUND, CHANGE AND OTHERWISE BREAK. DO NOT USE IT (for now).


License

Unless specified otherwise, any content within this repository is distributed under the GNU GPLv3 license, see the COPYING.txt file for more.

Parallelism and cleanup

There are effectively 3 methods of running things in parallel in Python:

  • threading.Thread (and related concurrent.futures classes)
  • multiprocessing.Process (and related concurrent.futures classes)
  • asyncio

and there is no clear winner (in terms of cleanup on SIGTERM or Ctrl-C):

  • Thread has signal handlers only in the main thread and is unable to interrupt any running threads without super ugly workarounds like sleep(1) in every thread, checking some "pls exit" variable
  • Process is too heavyweight and makes sharing native Python objects hard, but it does handle signals in each process individually
  • asyncio handles interrupting perfectly (every try/except/finally completes just fine, KeyboardInterrupt is raised in every async context), but async python is still (3.14) too weird and unsupported
    • asyncio effectively re-implements subprocess with a slightly different API, same with asyncio.Transport and derivatives reimplementing socket
    • 3rd party libraries like requests or urllib3 don't support it, one needs to resort to spawning these in separate threads anyway
    • same with os.* functions and syscalls
    • every thing exposed via API needs to have 2 copies - async and non-async, making it unbearable
    • other stdlib bugs, ie. "large" reads returning BlockingIOError sometimes

The approach chosen by this project was to use threading.Thread, and implement thread safety for classes and their functions that need it.
For example:

class MachineReserver:
    def __init__(self):
        self.lock = threading.RLock()
        self.job = None
        self.proc = None

    def reserve(self, ...):
        try:
            ...
            job = schedule_new_job_on_external_service()
            with self.lock:
                self.job = job
            ...
            while not reserved(self.job):
                time.sleep(60)
            ...
            with self.lock:
                self.proc = subprocess.Popen(["ssh", f"{user}@{host}", ...)
            ...
            return machine
        except Exception:
            self.abort()
            raise

    def abort(self):
        with self.lock:
            if self.job:
                cancel_external_service(self.job)
                self.job = None
            if self.proc:
                self.proc.kill()
                self.proc = None

Here, it is expected for .reserve() to be called in a long-running thread that provisions a new machine on some external service, waits for it to be installed and reserved, connects an ssh session to it and returns it back.

But equally, .abort() can be called from an external thread and clean up any non-pythonic resources (external jobs, processes, temporary files, etc.) at which point we don't care what happens to .reserve(), it will probably fail with some exception, but doesn't do any harm.

Here is where daemon=True threads come in handy - we can simply call .abort() from a KeyboardInterrupt (or SIGTERM) handle in the main thread, and just exit, automatically killing any leftover threads that are uselessly sleeping.
(Realistically, we might want to spawn new threads to run many .abort()s in parallel, but the main thread can wait for those just fine.)

It is not perfect, but it's probably the best Python can do.

Note that races can still occur between a resource being reserved and written to self.* for .abort() to free, so resource de-allocation is not 100% guaranteed, but single-threaded interrupting has the same issue.
Do have fallbacks (ie. max reserve times on the external service).

Also note that .reserve() and .abort() could be also called by a context manager as __enter__ and __exit__, ie. by a non-threaded caller (running everything in the main thread).

Unsorted notes

TODO: codestyle from contest

- this is not tmt, the goal is to make a python toolbox *for* making runcontest
  style tools easily, not to replace those tools with tmt-style CLI syntax

  - the whole point is to make usecase-targeted easy-to-use tools that don't
    intimidate users with 1 KB long command line, and runcontest is a nice example

  - TL;DR - use a modular pythonic approach, not a modular CLI like tmt


- Orchestrator with
  - add_provisioner(<class>, max_workers=1)   # will instantiate <class> at most max_workers at a time
  - algo
    - for all provisioner classes, spawns classes*max_workers as new Threads
    - waits for any .reserve() to return
    - creates a new Thread for minitmt, gives it p.get_ssh() details
      - minitmt will
        - establish a SSHConn
        - install test deps, copy test repo over, prepare socket dir on SUT, etc.
        - run the test in the background as
            f=os.open('some/test/log', os.WRONLY); subprocess.Popen(..., stdout=f, stderr=f, stdin=subprocess.DEVNULL)
        - read/process Unix sock results in the foreground, non-blocking,
          probably calling some Orchestrator-provided function to store results persistently
        - regularly check Popen proc status, re-accept UNIX sock connection, etc., etc.
      - minitmt also has some Thread-independent way to .cancel(), killing the proc, closing SSHConn, etc.

  - while waiting for minitmt Threads to finish, to re-assign existing Provisioner instances
    to new minitmt Threads, .. Orchestrator uses some logic to select, which TestRun
    would be ideal to run next
    - TestRun probably has some "fitness" function that returns some priority number
      when given a Provisioner instance (?) ...
    - something from minitmt would also have access to the Provisioner instance
    - the idea is to allow some logic to set "hey I set up nested VM snapshot on this thing"
      on the Provisioner instance, and if another /hardening/oscap TestRun finds
      a Provisioner instance like that, it would return high priority
    - ...
    - similar to "fitness" like function, we need some "applicability" function
      - if TestRun is mixed to RHEL-9 && x86_64, we need it to return True
        for a Provisioner instance that provides RHEL-9 and x86_64, but False otherwise

- basically Orchestrator has
  - .add_provisioner()
  - .run_test()  # called with an exclusively-borrowed Provisioner instance
    - if Provisioner is_alive()==False after .run_test(), instantiate a new one from the same inst.__class__
    - if test failed and reruns > 0, try run_test() again (or maybe re-queue the test)
  - .output_result()  # called by run_test() to persistently log a test result
  - .applicable()  # return True if a passed TestRun is meant for a passed Platform (Provisioner?)
    - if no TestRun returns True, the Provisioner is .release()d because we don't need it anymore
  - .fitness()    # return -inf / 0 / +inf with how much should a passed TestRun run on a Provisioner
  - MAYBE combine applicable() and fitness() into one function, next_test() ?
    - given the free Provisioner and a list of TestRuns, select which should run next on the Provisioner
      - if none is chosen, .release() the Provisioner without replacement, continue

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atex-0.8.tar.gz (63.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atex-0.8-py3-none-any.whl (57.7 kB view details)

Uploaded Python 3

File details

Details for the file atex-0.8.tar.gz.

File metadata

  • Download URL: atex-0.8.tar.gz
  • Upload date:
  • Size: 63.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for atex-0.8.tar.gz
Algorithm Hash digest
SHA256 36e9a2275a66515efe8857ecf7077a7aade9abcce48a5592d68f1eaf2aa74b81
MD5 f9c30828f167dbbe75ac5f04f036f01b
BLAKE2b-256 c5498b1fafac4e46b5288b31d60db8487179c32d1774bb855eaa2151b37ca743

See more details on using hashes here.

File details

Details for the file atex-0.8-py3-none-any.whl.

File metadata

  • Download URL: atex-0.8-py3-none-any.whl
  • Upload date:
  • Size: 57.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for atex-0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 0a9f71e3bd03593b400afa97e12c4c70b20af35619e9b442255949b8edb4cb36
MD5 5b92d41a69f4b49f09ac46a8a9da34bf
BLAKE2b-256 e8a9ba15a9f57faa2cf0debd179fa9f29bbff102cb385bb330061e6fa4da2118

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page