TUFLOW FV Parallel Computing: Difference between revisions

From TUFLOW FV Wiki
Jump to navigation Jump to search
No edit summary
 
Line 1: Line 1:
{| class="wikitable" align="right"
| colspan="3" style="text-align:center;" | '''USEFUL LINKS'''
|-
| width="220pt" style="text-align:center;" | '''Wiki Links'''
| width="220pt" style="text-align:center;" | '''Help'''
| width="220pt" style="text-align:center;" | '''Downloads'''
|-
| [http://fvwiki.tuflow.com/index.php?title=Main_Page ''TUFLOW FV Wiki Main Page'']
| ''[[Contact|Products Support/Contact]]''
| [https://www.tuflow.com/downloads/ ''TUFLOW FV Downloads'']
|-
| ''[[Tutorial Model Introduction|TUFLOW FV Tutorials]]''
| ''[[Requesting a Licence]]''
| [https://www.tuflow.com/downloads/tuflow-fv-models/ ''Tutorial Module Data'']
|-
| [https://wiki.tuflow.com/index.php?title=Main_Page ''TUFLOW Classic/HPC Wiki'']
| ''[[TUFLOW FV Glossary|TUFLOW FV Glossary]]''
| [https://www.tuflow.com/downloads/ ''Manuals'']
|}
= Introduction =
= Introduction =
TUFLOW FV is parallelised for multi-processor machines using the OpenMP implementation of shared memory parallelism. This means that a TUFLOW FV model simulation will run faster if there  
TUFLOW FV is parallelised for multi-processor machines using the OpenMP implementation of shared memory parallelism. This means that a TUFLOW FV model simulation will run faster if there  

Latest revision as of 13:13, 28 March 2023

USEFUL LINKS
Wiki Links Help Downloads
TUFLOW FV Wiki Main Page Products Support/Contact TUFLOW FV Downloads
TUFLOW FV Tutorials Requesting a Licence Tutorial Module Data
TUFLOW Classic/HPC Wiki TUFLOW FV Glossary Manuals

Introduction

TUFLOW FV is parallelised for multi-processor machines using the OpenMP implementation of shared memory parallelism. This means that a TUFLOW FV model simulation will run faster if there is more than one processor (or thread) on a single computer. This page summarises the increase in computational speed using multi-processor processing and how hyper-threading (https://en.wikipedia.org/wiki/Hyper-threading) affects the runtime.

Benchmarking Test Setup

A computer with 2x Xeon5680 3.33GHz processors was used for the test. This provids access to 12 physical cores. With hyper-threading enabled this equates to 24 threads (or logical processors). With hyper-threading turned off it equates to 12 threads (or logical processors).

Three models were simulated:

  • A 3D lab experiment model (4,224 2D cells / 38,016 3D cells)
  • Floodplain flood model (Tutorial Module 03, 16,457 2D cells)
  • Coastal storm tide model (37,348 3D cells)

Using a batch file each model was run varying the number of threads using the OMP_NUM_THREADS set to either 1, 2, 4, 8, 12, 18, 24 and with the OMP_NUM_THREADS not included (default). All models were run first with hyperthreading enabled within the BIOS (which is usually the default when using Intel processors) and then repeated with hyperthreading turned off.

Test Results

The run times are summarised in the table below.
TUFLOW FV Parallel Computing 01.PNG

The run times are plotted as a relative speed compared to the run with hyper-threading off and only one thread, i.e. OMP_NUM_THREADS = 1 in the batch file.
TUFLOW FV Parallel Computing 02.PNG

Discussion and Recommendations

Speed up with Hyper-Threading OFF (Physical Cores Only)

  • 1 ~ 12 threads: the relative computational speed increases as more threads are used for the simulations. However the increase is not quite linear (i.e. an 8 x speed up using 8 threads) with the number of threads, because the CPU also needs to allow for the overhead of exchanging and managing data among threads.
  • 12 ~ 24 threads: the relative computational speed decreases significantly when the batch file requests to use more threads than the number of available physical cores.
  • "default": When OMP_NUM_THREADS is not specified, the task manager shows 100% CPU usage, but the computational speed is slightly slower than the "12 threads" run.

Speed up with Hyper-Threading ON

  • 1 ~ 12 threads: the relative computational speed increases as more threads are used for the simulations. However, the relative computational speed of the "12 threads" run is slightly slower than that of "8 threads".
  • 12 ~ 24 threads and "default": the task manager shows 100% CPU usage for all these runs. This indicates all the physical cores are "maxed-out", however, the "18 threads" run had the fastest computational speed for all three models used in the test.

Recommendations

  • Based on the results above, for greatest speed gains it is recommended to turn hyper-threading off and run a model with "OMP_NUM_THREADS = number of physical cores".
  • Turning hyper-threading on could be beneficial if other processes are running in the background during a simulation, e.g. data copying, word processing and etc, since these "light" processes do not max-out an entire CPU core with hyper-threading switched on.


It must be noted that the findings above will likely vary from model to model, particularly if coupling to the external turbulence or water quality models, which, are not parallelised unlike TUFLOW FV.