"Save Entire Directories Into Metaflow's Metadata Store"
Project description
metaflow_magicdir
Save Entire Directories Into Metaflow's Metadata Store
Install
pip install metaflow_magicdir
How to use
You can use @magicdir
to pass local directories between metaflow steps. This will also work remotely.
# examples/example_flow.py
from metaflow import FlowSpec, step
from metaflow_magicdir import magicdir
class MagicDirFlow(FlowSpec):
@magicdir(dir='mydir')
@step
def start(self):
with open('mydir/output1', 'w') as f:
f.write('hello world')
with open('mydir/output2', 'w') as f:
f.write('hello world again')
self.next(self.end)
@magicdir(dir='mydir')
@step
def end(self):
print('first', open('mydir/output1').read())
print('second', open('mydir/output1').read())
if __name__ == "__main__":
MagicDirFlow()
If you run the above flow, you will see the following output:
python examples/example_flow.py run
Metaflow 2.5.4 executing MagicDirFlow for user:hamelValidating your flow...
The graph looks good!
Running pylint...
Pylint is happy!
2022-04-18 13:53:24.077 Workflow starting (run-id 1650315204073458):
2022-04-18 13:53:24.083 [1650315204073458/start/1 (pid 13299)] Task is starting.
2022-04-18 13:53:24.834 [1650315204073458/start/1 (pid 13299)] Task finished successfully.
2022-04-18 13:53:24.840 [1650315204073458/end/2 (pid 13302)] Task is starting.
2022-04-18 13:53:25.527 [1650315204073458/end/2 (pid 13302)] first hello world
2022-04-18 13:53:25.608 [1650315204073458/end/2 (pid 13302)] second hello world
2022-04-18 13:53:25.609 [1650315204073458/end/2 (pid 13302)] Task finished successfully.
2022-04-18 13:53:25.610 Done!
You can retrieve the results from the above Flow with the client api and extract_magicdir
:
Let's first remove the directory if it exists:
!rm -rf mydir/ #remove the directory if it exists
from metaflow import Flow
from metaflow_magicdir import extract_magicdir
run_data = Flow('MagicDirFlow').latest_successful_run.data
extract_magicdir(run_data)
We can now inspect the contents of this directory to see it's contents!
!ls mydir/
output1 output2
magicdir
with foreach
Nothing special is required to use magicdir
with foreach. Consider the following modification to the above flow:
#examples/mapflow.py
from metaflow import FlowSpec, step
from metaflow_magicdir import magicdir
class MagicDirMapFlow(FlowSpec):
"""Show how magic directories work with foreach"""
@step
def start(self):
self.step_num = range(5)
self.next(self.write, foreach='step_num')
@magicdir(dir='my_map_dir')
@step
def write(self):
self.step_idx = self.input # metaflow gives self.input a value from `step_num` from the prior step
with open(f'my_map_dir/{self.step_idx}.txt', 'w') as f:
f.write(f'this is step {self.step_idx}')
self.next(self.read)
@magicdir(dir='my_map_dir')
@step
def read(self):
print('file contents:', open(f'my_map_dir/{self.step_idx}.txt').read())
self.next(self.join)
@step
def join(self, inputs):
print(f"step numbers were: {[i.step_idx for i in inputs]}")
self.next(self.end)
@step
def end(self): pass
if __name__ == "__main__":
MagicDirMapFlow()
if __name__ == "__main__":
MagicDirMapFlow()
python examples/mapflow.py run
Metaflow 2.5.4 executing MagicDirMapFlow for user:hamelValidating your flow...
The graph looks good!
Running pylint...
Pylint is happy!
2022-04-18 13:41:56.687 Workflow starting (run-id 1650314516684584):
2022-04-18 13:41:56.695 [1650314516684584/start/1 (pid 12420)] Task is starting.
2022-04-18 13:41:57.444 [1650314516684584/start/1 (pid 12420)] Foreach yields 5 child steps.
2022-04-18 13:41:57.445 [1650314516684584/start/1 (pid 12420)] Task finished successfully.
2022-04-18 13:41:57.452 [1650314516684584/write/2 (pid 12423)] Task is starting.
2022-04-18 13:41:57.459 [1650314516684584/write/3 (pid 12424)] Task is starting.
2022-04-18 13:41:57.466 [1650314516684584/write/4 (pid 12425)] Task is starting.
2022-04-18 13:41:57.473 [1650314516684584/write/5 (pid 12426)] Task is starting.
2022-04-18 13:41:57.481 [1650314516684584/write/6 (pid 12427)] Task is starting.
2022-04-18 13:41:58.438 [1650314516684584/write/3 (pid 12424)] Task finished successfully.
2022-04-18 13:41:58.450 [1650314516684584/read/7 (pid 12438)] Task is starting.
2022-04-18 13:41:58.452 [1650314516684584/write/2 (pid 12423)] Task finished successfully.
2022-04-18 13:41:58.463 [1650314516684584/read/8 (pid 12439)] Task is starting.
2022-04-18 13:41:58.465 [1650314516684584/write/5 (pid 12426)] Task finished successfully.
2022-04-18 13:41:58.473 [1650314516684584/read/9 (pid 12440)] Task is starting.
2022-04-18 13:41:58.478 [1650314516684584/write/6 (pid 12427)] Task finished successfully.
2022-04-18 13:41:58.487 [1650314516684584/read/10 (pid 12441)] Task is starting.
2022-04-18 13:41:58.489 [1650314516684584/write/4 (pid 12425)] Task finished successfully.
2022-04-18 13:41:58.496 [1650314516684584/read/11 (pid 12442)] Task is starting.
2022-04-18 13:41:59.314 [1650314516684584/read/7 (pid 12438)] file contents: this is step 1
2022-04-18 13:41:59.348 [1650314516684584/read/8 (pid 12439)] file contents: this is step 0
2022-04-18 13:41:59.350 [1650314516684584/read/9 (pid 12440)] file contents: this is step 3
2022-04-18 13:41:59.362 [1650314516684584/read/11 (pid 12442)] file contents: this is step 2
2022-04-18 13:41:59.370 [1650314516684584/read/10 (pid 12441)] file contents: this is step 4
2022-04-18 13:41:59.450 [1650314516684584/read/7 (pid 12438)] Task finished successfully.
2022-04-18 13:41:59.479 [1650314516684584/read/9 (pid 12440)] Task finished successfully.
2022-04-18 13:41:59.482 [1650314516684584/read/8 (pid 12439)] Task finished successfully.
2022-04-18 13:41:59.495 [1650314516684584/read/10 (pid 12441)] Task finished successfully.
2022-04-18 13:41:59.497 [1650314516684584/read/11 (pid 12442)] Task finished successfully.
2022-04-18 13:41:59.505 [1650314516684584/join/12 (pid 12459)] Task is starting.
2022-04-18 13:42:00.183 [1650314516684584/join/12 (pid 12459)] step numbers were: [0, 3, 2, 1, 4]
2022-04-18 13:42:00.261 [1650314516684584/join/12 (pid 12459)] Task finished successfully.
2022-04-18 13:42:00.269 [1650314516684584/end/13 (pid 12462)] Task is starting.
2022-04-18 13:42:01.027 [1650314516684584/end/13 (pid 12462)] Task finished successfully.
2022-04-18 13:42:01.027 Done!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file metaflow_magicdir-0.0.4.tar.gz
.
File metadata
- Download URL: metaflow_magicdir-0.0.4.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86ed47128877968a6a64a2d131d81f7792df325d538d1ae70077af1c2b8ef51b |
|
MD5 | 49f613708cce97aa381917cb3490e995 |
|
BLAKE2b-256 | 1190990182e80ec71f139260b128132d59dd10f8e1eba1a61adb154b9c44b7b5 |
File details
Details for the file metaflow_magicdir-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: metaflow_magicdir-0.0.4-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b3835865d0f2473f69fca7e21d3785023ad06c66e6dc042e31f76f4d13a6d64 |
|
MD5 | b1666355326924a8a80379fde32c7fd1 |
|
BLAKE2b-256 | 533cfcef4aa2158ad9c039d0109f143f053dd5323f93be95f587b7df4e4a43ca |