fluent.com home page

   
 

FLUENT Users Capitalize on Parallel Processing

 

Liz Marshall, Fluent Inc.

The computing potential available to today’s CFD engineers is nothing short of remarkable. Ten years ago, only the most adventurous CFD practitioners used models with more than 100,000 cells. Many simulations of this size could only be solved on the super-computers of the day. Since then, scientists and engineers have scaled up to larger and larger problems, fueled by ever-faster hardware at steadily decreasing cost. The drop in price of processors and memory has coincided with advances in software technology to make parallel computing within the reach of many companies. For large scale problems, parallel processing algorithms have been introduced that allow a calculation to be segmented into two or more partitions that are solved simultaneously on different CPUs. Multiprocessor workstations, and networks of single or multiprocessor machines are now routinely being deployed at companies around the world to make faster work of simulations of all kinds using parallel processing. Fluent software users are among those who have taken advantage of this trend, thanks in part to the robust and scaleable parallel processing capabilities of the software.

Variety of hardware

There are many ways that a parallel calculation can be performed. Multiprocessor machines contain two or more CPUs, and can be based on RISC (running UNIX) or Intel (running Windows or Linux) architecture. On a dual-processor machine, for example, the two processors share the memory in the system. The shared memory enables independent processes to communicate, using a technique called shared memory processing (SMP 1 ). Single, or serial processor machines, which contain only a single CPU, can be connected over a network to form a cluster. When a network of such machines performs a calculation in parallel, the process is called distributed memory processing (DMP). Unlike shared memory processing, where a single machine manages all the memory, with distributed memory processing the memory is managed locally on each machine; here, communication among processes occurs over a network rather than through shared memory. Multi-processor machines can also be networked to other multiple or single processor machines. Calculations run on a cluster of this type can use a process called distributed shared memory processing (DSMP, or often just DSM).

View Larger Image
Contours of cell partition on a car surface for a mesh subdivided into eight partitions

“In addition to its superior accuracy, ease of use and consistency, FLUENT is also absolutely amazing in its parallel processing ability. We assembled a small Linux cluster and obtained a parallel processing license. FLUENT performed flawlessly in our clustered environment the first time we tried it. Setting up and running a job in parallel is seamless to the end user, making FLUENT the ultimate return on investment in simulation tools.”
– Ryan Huizenga
CAD Systems Supervisor
Litens Automotive Group

FLUENT users have employed all of the above approaches for large jobs in need of parallel processing. Rodney Balzar from Briggs & Stratton Corporation uses a twoprocessor HP J6000, with 1024 MB of RAM shared by each processor. His simulations of turbulent flow with heat transfer typically involve more than three million cells. Jim DeSpirito at the US Army Research Laboratory (ARL) has a large computing facility at his disposal. The Major Shared Resource Center at ARL has over 1200 processors on SGI Origin 2000, Origin 3800, and IBM SP super-computers, most of which have hundreds of gigabytes of RAM. DeSpirito’s group is one of many that use the facility, but he rarely has to wait long in the queue to launch jobs. He finds that he gets the best performance if he sets a limit of about 200,000 cells on each CPU. Thus, jobs involving five million cells typically use 28 to 32 processors, while those involving 16 million cells work well with 64-96 processors. At Hamilton Sundstrand, Gary Post uses a cluster of six dual-processor Dec Alphas, running UNIX, each of which has 4 GB of memory. The machines are networked to each other, but are segregated from the rest of the corporate network. His typical runs, which include combustion and radiative heat transfer, involve from 500,000 to one million cells, are usually done using six processors on three machines. He often needs to find six available processors on more than three machines, and is grateful for the flexibility that allows him to choose either one or two from each machine. Giri Manampathy at GE Aircraft Engines usually uses a cluster of dual-processor HP workstations. The machines are linked via a high-speed network, and are segregated from all of the other computers on the company network. For problems using up to six million cells, most of which involve turbulent combustion, he typically makes use of 12 CPUs on this network. When not using the HP cluster, he can also elect to use an 8-processor shared memory PC.

The PC, with Intel-based architecture, has gained popularity among Fluent software users, and indeed, among engineers and scientists running computationally intensive simulations of all types. For FLUENT users, parallel computing is available for both the Windows and Linux operating systems and on CPUs from both Intel and Advanced Micro Devices (AMD). At Babcock Borsig, Ken Hules uses a cluster of one and two-processor machines using Intel and AMD hardware running Windows. With twelve CPUs at his disposal on a high speed network that is segregated from the corporate network, he usually runs FLUENT jobs on four to six processors at a time, using load balancing (through FLUENT’s partitioning tools) to effectively mix the range of CPU speeds in use. His problems are large, in excess of two million cells, but are primarily characterized by complex physics, including coal combustion and water sprays. Paul Chapman at Alstom Power also uses a collection of UNIX and PC workstations, but has added a Linux-based cluster for larger cases. The cluster has six dual-processor machines, with direct high-speed connections between each of the nodes. It is ideal for larger cases which can exceed two million cells, including radiation and chemical reactions associated with simulations of large scale power and process equipment. Considering the total cost of running large CFD simulations, the economics favor running on the fastest possible hardware. For this reason, they have upgraded the hardware twice in the past two years, with the latest swap to AMD processors running Linux.

Performance enhancements

All of the FLUENT users interviewed have found impressive gains in their computing ability since switching to parallel processing. For Manampathy at GEAE, who has been using parallel processing for about a year, performance has scaled linearly as he has added compute nodes during this time. Grid independence is very important to him, so with parallel processing, he can always ensure that each solution satisfies this requirement. Balzar at Briggs & Stratton has seen a four-fold improvement after adding a second node. This exaggerated improvement is most likely due to the fact that his calculations were too large to fit inside the available RAM on his serial machine.

When this occurs, portions of the calculation must continually be swapped out of RAM to the disk so that other portions can be moved into RAM for active computation. Swapping, audible by the sound produced when data is written to a hard drive (often a rattling sound coming from the computer), can easily slow a calculation down by a factor of two. By adding a second processor and more memory, his calculations now easily fit into the available RAM, so his savings have been effectively quadrupled. For Post at Hamilton Sundstrand, who has been parallel processing for about two years, larger simulations with a step change in detail are now possible. For a typical combustion problem he usually needed an overnight run to compute a cold flow solution. He would then have to wait until the following day before he could ignite the flame and compute the final solution. Now, the setup and cold flow can be done in a single day, so that the flame solution can be performed that night. Whereas on a single processor machine, it might have taken two and a half days to solve a combustion problem with 200,000 cells, it now takes one full day to solve one with over 500,000 cells. According to DeSpirito at ARL, whose simulations can exceed 10 or even 15 million cells, “Our problems would not be solvable without parallel processing.”

Clearly, obvious benefits are realized for CFD simulations that rely solely on the solution of transport equations (species mixing and reactions, Eulerian multiphase, transient flow, etc.). The benefits are less apparent when the simulation involves particle tracking and is performed on a cluster. According to Hules at Babcock Borsig, while he achieves linear scale-up most of the time, the scale-up is reduced when he simulates coal combustion. This is because the particle tracking routines currently run at parallel speeds on shared memory machines only. (A distributed memory particle tracking model is planned for FLUENT 6.1.) Despite the current limitations, he is still pleased with the speed-up he achieves when compared to his serial runs of the past.

In addition to the benefits of faster processors and algorithms for running calculations in parallel, high performance graphics cards have added the ability to visualize the results of larger models. Where the PC was previously incapable of rendering the results of large 3D simulations, the falling cost of 3D graphics hardware has allowed users to easily manipulate and animate CFD data, making post-processing an enjoyable experience. Advances in linking parallel calculations to real-time desktop post-processing will allow CFD modeling to extend far beyond its traditional boundaries in the years to come.

In today’s engineering landscape, there are increased demands for higher accuracy from CFD simulations, and these are coupled with demands for more rapid turnaround times. To meet these demands, parallel processing will continue to play an ever-expanding role. Having evolved from algorithms for shared memory workstations to those for distributed memory clusters connecting single and multi-processor machines, parallel processing technology will continue to grow. Computers will continue to stun us as well, with their increased power and reduced costs. With these advances, the day will soon come when problems with tens of millions of cells will become routine.

Running FIDAP and POLYFLOW in Parallel

FLUENT is not the only software from Fluent that takes advantage of parallel processing. Most of the capabilities of FIDAP and POLYFLOW run in parallel on multi-processor machines. Many platforms are supported, and upcoming releases will continue to focus on improving the usability, performance and robustness of parallel processing.


Previous ArticleFluentNEWS Next Article