| |
Liz Marshall, Fluent Inc.
The computing potential available to todays CFD engineers is nothing
short of remarkable. Ten years ago, only the most adventurous CFD practitioners
used models with more than 100,000 cells. Many simulations of this size
could only be solved on the super-computers of the day. Since then, scientists
and engineers have scaled up to larger and larger problems, fueled by
ever-faster hardware at steadily decreasing cost. The drop in price of
processors and memory has coincided with advances in software technology
to make parallel computing within the reach of many companies. For large
scale problems, parallel processing algorithms have been introduced that
allow a calculation to be segmented into two or more partitions that are
solved simultaneously on different CPUs. Multiprocessor workstations,
and networks of single or multiprocessor machines are now routinely being
deployed at companies around the world to make faster work of simulations
of all kinds using parallel processing. Fluent software users are among
those who have taken advantage of this trend, thanks in part to the robust
and scaleable parallel processing capabilities of the software.
Variety of hardware
There are many ways that a parallel calculation can be performed. Multiprocessor
machines contain two or more CPUs, and can be based on RISC (running UNIX)
or Intel (running Windows or Linux) architecture. On a dual-processor
machine, for example, the two processors share the memory in the system.
The shared memory enables independent processes to communicate, using
a technique called shared memory processing (SMP 1 ). Single, or serial
processor machines, which contain only a single CPU, can be connected
over a network to form a cluster. When a network of such machines performs
a calculation in parallel, the process is called distributed memory processing
(DMP). Unlike shared memory processing, where a single machine manages
all the memory, with distributed memory processing the memory is managed
locally on each machine; here, communication among processes occurs over
a network rather than through shared memory. Multi-processor machines
can also be networked to other multiple or single processor machines.
Calculations run on a cluster of this type can use a process called distributed
shared memory processing (DSMP, or often just DSM).
Contours of cell partition on a car surface for a mesh subdivided into
eight partitions
In addition to its superior accuracy, ease of use and consistency,
FLUENT is also absolutely amazing in its parallel processing ability.
We assembled a small Linux cluster and obtained a parallel processing
license. FLUENT performed flawlessly in our clustered environment the
first time we tried it. Setting up and running a job in parallel is
seamless to the end user, making FLUENT the ultimate return on investment
in simulation tools.
Ryan Huizenga
CAD Systems Supervisor
Litens Automotive Group
FLUENT users have employed all of the above approaches for large jobs
in need of parallel processing. Rodney Balzar from Briggs & Stratton
Corporation uses a twoprocessor HP J6000, with 1024 MB of RAM shared by
each processor. His simulations of turbulent flow with heat transfer typically
involve more than three million cells. Jim DeSpirito at the US Army Research
Laboratory (ARL) has a large computing facility at his disposal. The Major
Shared Resource Center at ARL has over 1200 processors on SGI Origin 2000,
Origin 3800, and IBM SP super-computers, most of which have hundreds of
gigabytes of RAM. DeSpiritos group is one of many that use the facility,
but he rarely has to wait long in the queue to launch jobs. He finds that
he gets the best performance if he sets a limit of about 200,000 cells
on each CPU. Thus, jobs involving five million cells typically use 28
to 32 processors, while those involving 16 million cells work well with
64-96 processors. At Hamilton Sundstrand, Gary Post uses a cluster of
six dual-processor Dec Alphas, running UNIX, each of which has 4 GB of
memory. The machines are networked to each other, but are segregated from
the rest of the corporate network. His typical runs, which include combustion
and radiative heat transfer, involve from 500,000 to one million cells,
are usually done using six processors on three machines. He often needs
to find six available processors on more than three machines, and is grateful
for the flexibility that allows him to choose either one or two from each
machine. Giri Manampathy at GE Aircraft Engines usually uses a cluster
of dual-processor HP workstations. The machines are linked via a high-speed
network, and are segregated from all of the other computers on the company
network. For problems using up to six million cells, most of which involve
turbulent combustion, he typically makes use of 12 CPUs on this network.
When not using the HP cluster, he can also elect to use an 8-processor
shared memory PC.
The PC, with Intel-based architecture, has gained popularity among Fluent
software users, and indeed, among engineers and scientists running computationally
intensive simulations of all types. For FLUENT users, parallel computing
is available for both the Windows and Linux operating systems and on CPUs
from both Intel and Advanced Micro Devices (AMD). At Babcock Borsig, Ken
Hules uses a cluster of one and two-processor machines using Intel and
AMD hardware running Windows. With twelve CPUs at his disposal on a high
speed network that is segregated from the corporate network, he usually
runs FLUENT jobs on four to six processors at a time, using load balancing
(through FLUENTs partitioning tools) to effectively mix the range
of CPU speeds in use. His problems are large, in excess of two million
cells, but are primarily characterized by complex physics, including coal
combustion and water sprays. Paul Chapman at Alstom Power also uses a
collection of UNIX and PC workstations, but has added a Linux-based cluster
for larger cases. The cluster has six dual-processor machines, with direct
high-speed connections between each of the nodes. It is ideal for larger
cases which can exceed two million cells, including radiation and chemical
reactions associated with simulations of large scale power and process
equipment. Considering the total cost of running large CFD simulations,
the economics favor running on the fastest possible hardware. For this
reason, they have upgraded the hardware twice in the past two years, with
the latest swap to AMD processors running Linux.
Performance enhancements
All of the FLUENT users interviewed have found impressive gains in their
computing ability since switching to parallel processing. For Manampathy
at GEAE, who has been using parallel processing for about a year, performance
has scaled linearly as he has added compute nodes during this time. Grid
independence is very important to him, so with parallel processing, he
can always ensure that each solution satisfies this requirement. Balzar
at Briggs & Stratton has seen a four-fold improvement after adding
a second node. This exaggerated improvement is most likely due to the
fact that his calculations were too large to fit inside the available
RAM on his serial machine.
When this occurs, portions of the calculation must continually be swapped
out of RAM to the disk so that other portions can be moved into RAM for
active computation. Swapping, audible by the sound produced when data
is written to a hard drive (often a rattling sound coming from the computer),
can easily slow a calculation down by a factor of two. By adding a second
processor and more memory, his calculations now easily fit into the available
RAM, so his savings have been effectively quadrupled. For Post at Hamilton
Sundstrand, who has been parallel processing for about two years, larger
simulations with a step change in detail are now possible. For a typical
combustion problem he usually needed an overnight run to compute a cold
flow solution. He would then have to wait until the following day before
he could ignite the flame and compute the final solution. Now, the setup
and cold flow can be done in a single day, so that the flame solution
can be performed that night. Whereas on a single processor machine, it
might have taken two and a half days to solve a combustion problem with
200,000 cells, it now takes one full day to solve one with over 500,000
cells. According to DeSpirito at ARL, whose simulations can exceed 10
or even 15 million cells, Our problems would not be solvable without
parallel processing.
Clearly, obvious benefits are realized for CFD simulations that rely
solely on the solution of transport equations (species mixing and reactions,
Eulerian multiphase, transient flow, etc.). The benefits are less apparent
when the simulation involves particle tracking and is performed on a cluster.
According to Hules at Babcock Borsig, while he achieves linear scale-up
most of the time, the scale-up is reduced when he simulates coal combustion.
This is because the particle tracking routines currently run at parallel
speeds on shared memory machines only. (A distributed memory particle
tracking model is planned for FLUENT 6.1.) Despite the current limitations,
he is still pleased with the speed-up he achieves when compared to his
serial runs of the past.
In addition to the benefits of faster processors and algorithms for running
calculations in parallel, high performance graphics cards have added the
ability to visualize the results of larger models. Where the PC was previously
incapable of rendering the results of large 3D simulations, the falling
cost of 3D graphics hardware has allowed users to easily manipulate and
animate CFD data, making post-processing an enjoyable experience. Advances
in linking parallel calculations to real-time desktop post-processing
will allow CFD modeling to extend far beyond its traditional boundaries
in the years to come.
In todays engineering landscape, there are increased demands for
higher accuracy from CFD simulations, and these are coupled with demands
for more rapid turnaround times. To meet these demands, parallel processing
will continue to play an ever-expanding role. Having evolved from algorithms
for shared memory workstations to those for distributed memory clusters
connecting single and multi-processor machines, parallel processing technology
will continue to grow. Computers will continue to stun us as well, with
their increased power and reduced costs. With these advances, the day
will soon come when problems with tens of millions of cells will become
routine.
Running FIDAP and POLYFLOW in Parallel
FLUENT is not the only software from Fluent that takes advantage of parallel
processing. Most of the capabilities of FIDAP and POLYFLOW run in parallel
on multi-processor machines. Many platforms are supported, and upcoming
releases will continue to focus on improving the usability, performance
and robustness of parallel processing.
|
|
|