WRENCH PEDAGOGIC MODULES Distributed Computing Courseware

Module 2: Workflows

Overview

Now that you have been introduced to workflows, workflow management systems (WMS), cyberinfrastructure, and file transfer times, we proceed by taking a closer look into what actually happens when workflows (the applications) are executed by a WMS (the software to manage application executions) on some cyberinfrastructure (the execution platform). First we describe a scenario that provides a specific context for observing the execution of a workflow. This includes the composition of the workflow, the platform specifications, and finally the WMS implementation. Then we detail the sequence of workflow execution events and formulate a simple equation to model the expected execution time (or makespan).

Scenario

Workflow

Figure 1 illustrates the DAG representation of the workflow for our scenario. task0 requires the file task0::0.in as its input and produces the file task0::0.out as output. task1 requires the output of task0 (file task0::0.out) as its input and produces the file task1::0.out as output. In order to orchestrate the execution of this workflow, a WMS needs access to two types of resources: persistent storage to read/write files to/from and a compute resource to perform the computation required by each task.

Workflow

The platform in Figure 2 depicts the cyberinfrastructure on which we can execute this workflow. In this scenario, the WMS resides on the host my_lab_computer.edu, and has access to both the Storage Service on host storage_db.edu and the Compute Service on host hpc.edu.

Storage Service. A Storage Service (SS) stores files and handles read and write requests. For example, if my_lab_computer.edu would like to read a file from storage_db.edu, it will make a read request to the SS. Then the file will be sent over the network from storage_db.edu to my_lab_computer.edu. Say, for example, this file is 100 MB, and the link between the two hosts has an effective bandwidth of 10 MB/sec and a latency of 10 microseconds. The estimated amount of time it will take to perform the file read operation can be estimated as follows:

Compute Service. A Compute Service (CS) can execute tasks submitted to it by the WMS. Typically, a compute service will have access to faster hardware than your typical machine and so it can execute workflow tasks faster. For example, say we were to perform 100 TFlops of computation on my_lab_computer.edu (which computes at speed 35 GFlop/sec, as seen in the figure). This is expected to take:

If instead we were to perform 100 TFlops of computation via the CS on hpc.edu (which computes at speed 1000 GFlop/sec) , this computation is expected to take:

The CS is able to compute 100 Tflops about 28 times faster than my_lab_computer.edu. In our scenario, the WMS only uses the CS on hpc.edu to execute workflow tasks.

Workflow Management System. The WMS in this scenario greedily submits tasks to the CS once they become ready. A task is ready to be submitted by the WMS when it has been notified that the current task’s parent task(s) has been completed (including saving its output). For example, the WMS can submit task1 to the CS only once the CS notifies the WMS that task0 has completed (i.e., the output of task0 has been produced and saved somewhere). When the WMS submits a task to the CS, it specifies that all file read and write operations be done through the Remote Storage Service located at storage_db.edu. Additionally, the initial input file, task0::0.in, is assumed to already be “staged” at the SS.

The Workflow Execution

Figure 3 bellow illustrates what happens during the execution of our workflow. In order for the CS to complete each task in its entirety, it must read in the input files for that task, perform the computation required by that task, and finally write the output files for that task.

Workflow Execution

Notice that in step 3, the CS writes the output file to the SS, then immediately reads that file back from the SS in step 4. This happens because file operations are assigned to the SS at storage_db.edu, therefore file task0::0.out simply cannot be “cached” by the CS for reuse.

Using figure 3 as a guide and the two formulas stated above, we can estimate the amount of time it would take to execute our workflow. The estimated execution time of task0 is as follows:

Next, the estimated execution time of task1 is as follows:

task0 and task1 are executed sequentially therefore the total estimated makespan of our workflow will be:

Conclusion

In this primer we have demonstrated what a simple workflow execution looks like for a specific scenario. The scenario includes a workflow description, platform specifications, and a WMS implementation. Based on these elements, we were able to break down the execution of the workflow into steps, estimate the duration of each step, and finally estimate the duration of the entire workflow execution. The upcoming activities will follow a similar format: a scenario will be presented and you will evaluate certain aspects of the workflow execution. Continue on to the next section, Activity 1: Running Your First Simulated Workflow Execution.