Module 3: Data Locality

Learning objectives
Overview
Activity
Conclusion

Learning Objectives

Understand a simple distributed workflow execution scenario;
Be able to analyze a (simulated) workflow execution timeline based on textual and graphical simulation output;
Be able to estimate the time it should take to complete a workflow task on a given compute host, accounting for I/O overhead;
Understand I/O overhead effects on workflow executions;
Gain exposure to the concept of data locality and its effect on workflow execution.

Overview

Workflow and Platform Scenario

In this activity, we study the execution of the workflow depicted in Figure 1 on the cyberinfrastructure depicted in Figure 2. A Compute Service (CS) will execute tasks that the Workflow Management System (WMS) submits to it. The CS has at its disposal a single core and will execute only one task at a time. The Storage Service (SS) stores files, much like a database, and handles read and write requests. When the WMS submits a job to the CS, information is included in the job submission that specifies what storage service to use for I/O operations. This is a very simple scenario, and will be used to get our “feet wet” with WRENCH simulations.

WMS Scenario

We execute the workflow with a WMS that executes tasks on the CS as soon as possible. Each task running on the CS reads and writes data from/to the SS (which, from the perspective of the task, is on a remote host). Once the WMS is notified by the CS that a task has completed, it will greedily submit the next ready task going from left-to-right in the depicted workflow whenever multiple tasks are ready. For example, when task0 has completed and its output has been written to the SS, task1, task2, and task3 become ready. At this point, the WMS will submit task1 for execution to the CS. This process is repeated until workflow execution is complete.

Activity

Step #1: Run the Simulation

In a terminal, run the following commands.

Install the simulator: docker pull wrenchproject/wrench-pedagogic-modules:activity-1
Run the simulator: docker container run wrenchproject/wrench-pedagogic-modules:activity-1

Step 2 will display textual simulation output to your terminal window. This output indicates (simulated) actions and events throughout the execution of the workflow.

Step #2: Interpret the Workflow Execution

[0.000000][my_lab_computer.edu:wms_activity1_3] Starting on host my_lab_computer.edu listening on mailbox_name wms_activity1_3
[0.000000][my_lab_computer.edu:wms_activity1_3] About to execute a workflow with 5 tasks
[0.000000][my_lab_computer.edu:wms_activity1_3] Submitting task0 as a job to compute service on hpc.edu
[33814.005184][my_lab_computer.edu:wms_activity1_3] Notified that task0 has completed
[33814.005184][my_lab_computer.edu:wms_activity1_3] Submitting task1 as a job to compute service on hpc.edu
.
.
.

The simulation will produce output similar to the above snippet of text. The first column denotes the simulation time at which some process is performing some action. The second column is split into two sections: hostname, and process name. Last is a message describing what the process is doing. For example, the second line from the output above, [0.000000][my_lab_computer.edu:wms_activity1_3] About to execute a workflow with 5 tasks tells us that at simulation time 0.00000, the WMS named wms_activity1, located on the physical host, my_lab_computer.edu, is “About to execute a workflow with 5 tasks”. Note that the process name is actually wms_activity1_3. The “3” there is added to distinguish different instances of the WMS in case the simulation executes multiple of them (which we don’t do in this activity). Simulation output for this activity has been constrained such that only messages from the WMS are visible. Furthermore, the color scheme of the output has been set up such that general messages are pink, task submissions to the CS are blue, and notifications received from the CS are red. You’ll note that, for instance, we do not see any simulation output corresponding to what the SS is doing. In the following activities, we will expose more simulation output to highlight areas of interest.

Answer these questions

[q1] At what time did the WMS submit task1 as a job to the compute service?
[q2] From the WMS’s perspective, how long did task1 run for? (this duration is called the task’s turnaround-time)
[q3] The compute service runs on a host with a speed of 1000 GFlop/sec, and task4 must perform 10 TFlop. About how long should we expect task4 to compute for?
[q4] Based on the simulation output, from the WMS’s perspective how long does it take for task4 to complete?
[q5] Why does task4 take longer than what you computed in question 3?
[q6] Assuming there is no traffic on the network, about how long would it take to send all of task4’s input data from storage_db.edu to hpc.edu and to send all of task4’s output data from hpc.edu to storage_db.edu, using the direct link between these two hosts and assuming no other network traffic?
[q7] Accounting for this I/O overhead, does task4’s execution time as experienced by the WMS make sense?

Step #3: Visualize the Workflow Execution

Analyzing the textual simulation output can be tedious, especially when the workflow comprises many tasks and/or when there are many simulated software services. Fortunately, the simulator can produce a visualization of the workflow execution as a Gantt chart.

In the terminal run the following commands:

run docker pull wrenchproject/wrench-pedagogic-modules:ics332-activity-visualization
then run docker container run -p 3000:3000 -d wrenchproject/wrench-pedagogic-modules:ics332-activity-visualization
open a browser and go to localhost:3000/
sign in using your <UH Username>@hawaii.edu Google Account
select Activity 1: Running Your First Simulated Workflow Execution

Answer these questions

[q8] What fraction of task0’s execution time is spent doing I/O?
[q9] What fraction of task4’s execution time is spent doing I/O?
[q10] If the link bandwidth between storage_db.edu and hpc.edu were doubled, what fraction of task4’s execution time would be spent doing I/O? Double the platform link bandwidth (set it to 20 MB/sec) using the visualization and run the simulation. Is your expectation confirmed?
[q11] With the link bandwidth doubled, how much faster is the workflow execution now than before?
[q12] What link bandwidth would be necessary for the workflow to run 2x faster than with the original 10 MB/sec bandwidth? Hint: You can do this by solving a simple equation, and then check that your answer is correct using the simulation.

Step #4: Better Data locality with another storage service on the compute host

The CS is reading and writing files from and to a remote storage service, thus contributing to I/O overhead. This overhead can be reduced if the storage service were located on the same host as the CS. Here we introduce the idea of moving computations closer to where the data resides, or data locality.

In the previous steps we ran simulations using the platform scenario described in Figure 2 where the compute service accessed a remote storage service to perform all of the read and write operations as required by the workflow in Figure 1. Now, consider the scenario where a WMS user is only concerned with accessing the initial input file, task0::0.in and the final output file, task4::0.out via the remote storage service. Other files created during the execution of the workflow need not be analyzed or accessed by the WMS user and serve only as intermediate steps required to complete the workflow in its entirety. Furthermore, let us say that another storage service resides on the host hpc.edu and that the CS has access to this storage service. Since the WMS user will only access the remote storage service to handle two files, task0::0.in and task4::0.out, we have enhanced our previous WMS implementation so that it tells the CS to use its local storage service (the storage service located on hpc.edu) for all read and write operations for intermediate files. Figure 3 above illustrates our new cyberinfrastructure and WMS/Workflow scenario.

Using the visualization tool from Step 3, input 10 MB/sec as the link bandwidth. Select the radio button that says: Storage Service on storage_db.edu and hpc.edu. Run the simulation.

Answer these questions

[q13] What fraction of task4 is spent doing I/O?
[q14] Compared to the workflow execution time observed in Step 2, how much faster is it now?
[q15] Using only the remote storage service (i.e., no storage service close to the compute service), what would you need to increase the bandwidth to in order to have a workflow execution that is faster than what was realized using a 10 MB/sec link bandwidth with a storage service on storage_db.edu and hpc.edu?

When you are finished using the visualization tool, run: docker kill $(docker ps -a -q --filter ancestor=wrenchproject/wrench-pedagogic-modules:ics332-activity-visualization)

Conclusion

In this activity, we have simulated the execution of a small workflow on two simple distributed computing environments that exposed the cost of performing I/O operations. Using both textual and visual simulation output, you have familiarized yourselves with network bandwidth and data locality, two of the many factors that can affect task turnaround times, which consequently affect overall workflow execution performance. With these concepts in mind, proceed to the next activity, Activity 2: Parallelism, where we construct a more complex distributed computing environment in order to explore the concept of “parallelism”.

WRENCH PEDAGOGIC MODULES Distributed Computing Courseware

Module 3: Data Locality

Learning Objectives

Overview

Workflow and Platform Scenario

WMS Scenario

Activity

Step #1: Run the Simulation

Step #2: Interpret the Workflow Execution

Step #3: Visualize the Workflow Execution

Step #4: Better Data locality with another storage service on the compute host

Conclusion