How to Make LS-DYNA Run Faster. Guangye Li (
). Jeff
Zais (
). IBM Deep Computing Team. 2. IBM Deep Computing ...
th
4 European LS-DYNA Users Conference
MPP / Linux Cluster / Hardware II
IBM Deep Computing Group
How to Make LS-DYNA Run Faster
Guangye Li (
[email protected]) Jeff Zais (
[email protected]) IBM Deep Computing Team
May, 2003 | IBM Deep Computing Group
© 2002 IBM Corporation
IBM Deep Computing Team – 2003 LS-DYNA Conference
Topics pSeries POWER4 Performance Topics Recent SMP Optimization Effect of Parallel Repeatability Flag Effect of Parallel Force Assembly Version 970 tuning
xSeries IA-32 Xeon Performance Topics Faster Processors Faster Frontside Bus Version 970 Tuning Interconnect Options
Comparisons and Summary
2
LS-DYNA Conference | May 2003
© 2002 IBM Corporation
K – II - 03
MPP / Linux Cluster / Hardware II
th
4 European LS-DYNA Users Conference
IBM Deep Computing Team – 2003 LS-DYNA Conference
IBM pSeries Performance
POWER4 and AIX product line Clusters of individual SMP nodes SP Switch 2 high performance interconnect Individual nodes range up to an SMP of 32 processors Entire product line in transition from POWER4 to POWER4+ processor Interconnect Option: Gigabit Ethernet
3
© 2002 IBM Corporation
LS-DYNA Conference | May 2003
IBM Deep Computing Team – 2003 LS-DYNA Conference
Recent Optimization of version 960 SMP LS-DYNA 10000 8000 Elapsed Time (sec)
6000 4000 2000 0 4-CPU
p690 – Dec 2002 para flag on repeatability flag on refined Neon-535k elements 4
K – II - 04
LS-DYNA Conference | May 2003
8-CPU Revision 1488
16-CPU
32-CPU
Revision 1647
© 2002 IBM Corporation
th
4 European LS-DYNA Users Conference
MPP / Linux Cluster / Hardware II
IBM Deep Computing Team – 2003 LS-DYNA Conference
Improved Performance from use of the PARA Flag 18000 16000 14000 12000 Elapsed 10000 Time 8000 (sec) 6000 4000 2000 0 2-CPU p690 – Dec 2002 repeatability flag on refined Neon-535k elements 5
4-CPU
8-CPU
para=0
16-CPU 32-CPU
para=1
© 2002 IBM Corporation
LS-DYNA Conference | May 2003
IBM Deep Computing Team – 2003 LS-DYNA Conference
Effect of the Repeatability Flag 35000 30000 25000 Elapsed 20000 Time 15000 (sec) 10000 5000 0 1-CPU 2-CPU 4-CPU 8-CPU 16-CPU 32-CPU p690 – Dec 2002 para flag on refined Neon-535k elements 6
LS-DYNA Conference | May 2003
repeatability on
repeatability off
© 2002 IBM Corporation
K – II - 05
th
MPP / Linux Cluster / Hardware II
4 European LS-DYNA Users Conference
IBM Deep Computing Team – 2003 LS-DYNA Conference
Recent MPI LS-DYNA Optimization 8000 7000 6000 5000 Elapsed Time 4000 (sec) 3000 2000 1000 0 4-CPU p655 – Jan 2003 version 970 revision 3535 refined Neon-535k elements 7
8-CPU before tuning
16-CPU
32-CPU
after tuning
© 2002 IBM Corporation
LS-DYNA Conference | May 2003
IBM Deep Computing Team – 2003 LS-DYNA Conference
Comparison of v960 and v970 Performance 10000 8000 Elapsed 6000 Time 4000 (sec) 2000 0 4-CPU p655 – Jan 2003 v960 r1647 MPI LS-DYNA refined Neon-535k elements 8
K – II - 06
LS-DYNA Conference | May 2003
8-CPU
16-CPU
v970 r3535
32-CPU
v970 r3535 tuned
© 2002 IBM Corporation
th
4 European LS-DYNA Users Conference
MPP / Linux Cluster / Hardware II
IBM Deep Computing Team – 2003 LS-DYNA Conference
IBM xSeries Performance
9
Linux clusters One or two processor nodes (Intel IA-32 Xeon) Interconnect Options: Gigabit Ethernet or Myrinet Several decisions regarding LS-DYNA (LAM/MPI, MPICH, …)
© 2002 IBM Corporation
LS-DYNA Conference | May 2003
IBM Deep Computing Team – 2003 LS-DYNA Conference
Interconnect – Effect on Performance 25000 20000 Elapsed 15000 Time 10000 (sec) 5000 0 2-CPU
4-CPU
2.2 GHz IntelliStation Cluster June 2002 MPI LS-DYNA Fast Ethernet refined Neon-535k elements 10
LS-DYNA Conference | May 2003
8-CPU
16-CPU 32-CPU
Gigabit Ethernet
Myrinet
© 2002 IBM Corporation
K – II - 07
MPP / Linux Cluster / Hardware II
th
4 European LS-DYNA Users Conference
IBM Deep Computing Team – 2003 LS-DYNA Conference
Performance Improvement with Version 970 20000 15000 Elapsed Time 10000 (sec) 5000 0 2-CPU
2.8 GHz x335 Cluster Gigabit Ethernet March 2003 LAM/MPI LS-DYNA refined Neon-535k elements 11
4-CPU
8-CPU
version 960
16-CPU 32-CPU
version 970
© 2002 IBM Corporation
LS-DYNA Conference | May 2003
IBM Deep Computing Team – 2003 LS-DYNA Conference
Performance Improvement with Faster Processors 25000 20000 Elapsed 15000 Time 10000 (sec) 5000 0 2-CPU 4-CPU 8-CPU 16-CPU 32-CPU 64-CPU
V960 r1488 LS-DYNA Gigabit Ethernet Jan-March 2003 LAM/MPI refined Neon-535k elements 12
K – II - 08
LS-DYNA Conference | May 2003
2.4 GHz
2.8 GHz
© 2002 IBM Corporation
th
4 European LS-DYNA Users Conference
MPP / Linux Cluster / Hardware II
IBM Deep Computing Team – 2003 LS-DYNA Conference
Speedup from Faster 533 MHz Frontside Bus Model Size (elements) 12000
Speedup: 400MHz to 533 MHz Frontside Bus 1.10
32000
1.08
155000
1.20
430000
1.18
V960 r1488 LS-DYNA March 2003 LAM/MPI 2.8 GHz x335 node – 2 processor runs 13
© 2002 IBM Corporation
LS-DYNA Conference | May 2003
IBM Deep Computing Team – 2003 LS-DYNA Conference
Configuring Each Node with One Processor 16000 14000 12000 10000 Elapsed 8000 Time 6000 (sec) 4000 2000 0 2-CPU 4-CPU 8-CPU 16-CPU 32-CPU 64-CPU
V960 r1488 LS-DYNA Gigabit Ethernet x335 2.8 GHz 2 CPUs per node March 2003 LAM/MPI Front crash model 430k elements 14
LS-DYNA Conference | May 2003
1 CPU per node
© 2002 IBM Corporation
K – II - 09
MPP / Linux Cluster / Hardware II
th
4 European LS-DYNA Users Conference
IBM Deep Computing Team – 2003 LS-DYNA Conference
POWER4 and IA-32 Xeon Performance Compared 25000 20000 Elapsed 15000 Time 10000 (sec) 5000 0 2-CPU
4-CPU
8-CPU
V960 LS-DYNA 2.4 GHz Xeon + Myrinet Jan 2003 Refined Neon 535k Elements 15
16-CPU 32-CPU
1.3 GHz POWER4 p655
LS-DYNA Conference | May 2003
© 2002 IBM Corporation
IBM Deep Computing Team – 2003 LS-DYNA Conference
Interconnect Performance Compared 30 25 20 Parallel 15 Speedup 10 5 0 2-CPU 4-CPU 8-CPU 16-CPU 32-CPU x335+Fast Ethernet x335+Gigabit Ethernet x335+Myrinet p655+SP Switch2
V960 LS-DYNA Jan 2003 Refined Neon 535k Elements 16
K – II - 10
LS-DYNA Conference | May 2003
© 2002 IBM Corporation
th
4 European LS-DYNA Users Conference
MPP / Linux Cluster / Hardware II
IBM Deep Computing Team – 2003 LS-DYNA Conference
Summary IBM Continues to work with LSTC on improving the performance of LS-DYNA IBM pSeries still provides top performance and the advantages of the AIX user environment IBM xSeries platforms offer a very cost effective Linux Cluster solutions for LS-DYNA customers Users today can customize their system in order to pick the features which serve them best Processors Operating system Interconnect
17
LS-DYNA Conference | May 2003
© 2002 IBM Corporation
K – II - 11
MPP / Linux Cluster / Hardware II
K – II - 12
th
4 European LS-DYNA Users Conference