Python Dispatcher
Project description
origin files: https://www.yuque.com/xingyeyongtantiao/dpdispatcher/rdydgb
developers' discussion (temporarily in Chinese): https://www.yuque.com/docs/share/08ab09f3-f84d-4ed3-b777-9e0c791963b6?#
introduction:
short introduction
dpdispatcher is a python package used to generate HPC(High Performance Computing) scheduler systems (Slurm/PBS/LSF/dpcloudserver) jobs input scripts and submit these scripts to HPC systems and poke until they finish.
dpdispatcher will monitor (poke) until these jobs finish and download the results files (if these jobs is running on remote systems connected by SSH).
the set of abstraction provided by dpdispatcher.
Task
class, which represents a command to be run on batch job system, as well as the essential files need by the command.
Submission
class, which represents a collection of jobs defined by the HPC system.
And there may be common files to be uploaded by them.
dpdispatcher will create and submit these jobs when a submission
instance execute run_submission
method.
This method will poke until the jobs finish and return.
Job
class, a class used by Submission
class, which represents a job on the HPC system.
Submission
will generate job
s' submitting scripts used by HPC systems automatically with the Task
and Resources
Resources
class, which represents the computing resources for each job within a submission
.
How to contribute
dpdispatcher welcomes every people (or organization) to use under the LGPL-3.0 License.
And Contributions are welcome and are greatly appreciated! Every little bit helps, and credit will always be given.
If you want to contribute to dpdispatcher, just open a issue, submiit a pull request , leave a comment on github discussion, or contact deepmodeling team.
Any forms of improvement are welcome.
- use, star or fork dpdispatcher
- improve the documents
- report or fix bugs
- request, discuss or implement features
dpdispatcher is maintained by deepmodeling's developers now and welcome other people.
example
machine = Machine.load_from_json('machine.json')
resources = Resources.load_from_json('resources.json')
## with open('compute.json', 'r') as f:
## compute_dict = json.load(f)
## machine = Machine.load_from_dict(compute_dict['machine'])
## resources = Resources.load_from_dict(compute_dict['resources'])
task0 = Task.load_from_json('task.json')
task1 = Task(command='cat example.txt', task_work_path='dir1/', forward_files=['example.txt'], backward_files=['out.txt'], outlog='out.txt')
task2 = Task(command='cat example.txt', task_work_path='dir2/', forward_files=['example.txt'], backward_files=['out.txt'], outlog='out.txt')
task3 = Task(command='cat example.txt', task_work_path='dir3/', forward_files=['example.txt'], backward_files=['out.txt'], outlog='out.txt')
task4 = Task(command='cat example.txt', task_work_path='dir4/', forward_files=['example.txt'], backward_files=['out.txt'], outlog='out.txt')
task_list = [task0, task1, task2, task3, task4]
submission = Submission(work_base='lammps_md_300K_5GPa/',
machine=machine,
resources=reasources,
task_list=task_list,
forward_common_files=['graph.pb'],
backward_common_files=[]
)
## submission.register_task_list(task_list=task_list)
submission.run_submission(clean=False)
example resources for GPU2080Ti
resources = Resources(number_node=1,
cpu_per_node=8,
gpu_per_node=2,
queue_name="GPU2080TI",
group_size=12,
custom_flags=[
"#SBATCH --mem=32G",
## "#SBATCH --account=deepmodeling"
## "#SBATCH --cluster=gpucluster"
],
strategy={'if_cuda_multi_devices': true},
para_deg=3,
source_list=["~/deepmd.env"],
)
machine.json
{
"machine_type": "Slurm",
"context_type": "SSHContext",
"local_root" : "/home/user123/workplace/22_new_project/",
"remote_root": "~/dpdispatcher_work_dir/",
"remote_profile":{
"hostname": "39.106.xx.xxx",
"username": "user1",
"port": 22,
"timeout": 10
}
}
resources.json
{
"number_node": 1,
"cpu_per_node": 4,
"gpu_per_node": 1,
"queue_name": "GPUV100",
"group_size": 5
}
task.json
{
"command": "lmp -i input.lammps",
"task_work_path": "bct-0/",
"forward_files": [
"conf.lmp",
"input.lammps"
],
"backward_files": [
"log.lammps"
],
"outlog": "log",
"errlog": "err",
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.