| |
by Stan Posey, Technical Marketing Engineer, CAE Applications,
SGI and Mark Kremenetsky, Ph.D., CAE Applications Specialist, SGI
For several years, Silicon Graphics Inc. (SGI) and Fluent have pursued
a cooperative development strategy to ensure top performance of all Fluent
software on SGI/Cray Research systems. In a recent collaborative effort,
we have achieved breakthrough levels of parallel scalability for FLUENT/UNS.
Parallel Design Issues
Parallel execution of FLUENT/UNS is based on domain decomposition, in
which the flow field is divided into multiple partitions of roughly equal
size in terms of required computational work. Each partition is solved
on an independent processor, with information transferred between partitions
through explicit message passing in order to maintain the coherency of
the global solution.
The term "parallel scaling" refers to how parallel speedup remains linear
or degrades as the number of processors is increased. There are several
factors that can inhibit a high degree of parallel scaling. Solver and
model algorithms, and their implementation, impact the frequency and amount
of information that must be shared across partitions. Similarly, efficient
planning of what data to share (the message content), and when to do so,
is important. Optimum domain decomposition tools are critical, as they
affect the load balancing between processors and determine the size of
partition boundaries and consequent message passing requirements. Finally,
the computer hardware and communications subsystems, along with the message
passing tools used to access them, will impact parallel scaling. Both
Fluent and SGI have made significant investments to address these software
and hardware issues. In addition, it is noteworthy that individual CFD
applications will scale differently, depending on model size and flow
physics.
FLUENT/UNS Performance Enhancements
Initial benchmarking by SGI indicated that FLUENT/UNS gave linear scaling
on up to 16 processors for typical test cases, confirmation that the issues
noted above were handled well in the code design. For parallel scaling
beyond 16 processors, it became apparent that the message passing system
was the limiting issue. The team improved performance by implementing
the latest SGI proprietary message passing interface, called MPI3.0, in
FLUENT/UNS. This greatly improved performance compared to the public domain
message passing library already incorporated into Fluent's code design.
Additional enhancements were required for scaling on the non-uniform
memory architecture (NUMA) of the Origin 2000. These enhancements to MPI3.0
were made in order to enforce "processor-memory affinity" -- or to ensure
that data reside in memory that is local to the process using the data.
FLUENT/UNS was used as a test program during the MPI tuning project at
SGI, in a good example of development collaboration between the two companies.
"One of our goals with application of CFD at Chrysler is to develop
a rapid assessment tool for the early design and development phase.
The scaling capability demonstrated by FLUENT/UNS on the Origin 2000
system is a very relevant step for us to achieve that goal."
Dr. Richard Sun, Supervisor of Core CFD, Chrysler
Industrial Testing
As part of the parallel scaling study, Chrysler Corporation provided
an automotive underhood thermal management model that includes a coarse
treatment of external aerodynamics and contains more than 1M cells. Calculations
were run on an Origin 2000 system with 128 processors and 4 GBytes of
memory. FLUENT/UNS achieved high parallel efficiency and a remarkable
level of scaling -- nearly linear up to 64 processors (see figure 1.).
Figure 1. Parallel Speedup on 1M Cell Underhood Study
Towards Higher Resolution and More Complex Modeling
These recent performance achievements set the stage for CFD to expand
beyond current modeling practices. Solution turnaround times have been
reduced to where CFD can influence the design process. Model sizes can
be increased to include higher resolution, yielding simulations of increasing
complexity, realism, and accuracy. Finally, today's "grand challenges"
to CFD, including applications like transient external aerodynamics or
large eddy simulation (LES) turbulence modeling that have been limited
by solution turnaround times, are more approachable. deflections of the
structure take place.
Figure 2. Color-coded parallel partitions in the one-million-cell
underhood thermal management benchmark study. FLUENT/UNS achieved outstanding
parallel scalability.
|
|
|