|


1.5 GHz Intel Itanium-2, Quadrics QsNet2/Elan4 interconnect, 8GB main mem

Compaq Alphaserver SC, ES45 Elan3, double-rail (only tested w/single)
750-node, 4-way 1GHz Alpha, 4GB, libelan1.3, OSF 5.1

Itanium-2 Linux Cluster, LANai10.0 PCI-X, GM 2.0.8
64-node, 2-way 1.3 GHz Itanium-2, 4GB
PCI-X bus bandwidth: 482 MB/sec read, 501 MB/sec read

IBM Federation Interconnect, LAPI v.2.3.2.1, 8-way 1.5GHz Power4, 16GB main mem


1.5 GHz Intel Itanium-2, 6MB L3, 256KB L2, 32K L1, 2 TB system memory (8GB/node)

512 MSPs, 2MB cache per MSP, 16 GB per node


Comparing GASNet udp-conduit and MPICH_p4 MPI back-to-back on the same Gigabit Ethernet hardware:
1.3 GHz Dual Itanium-2, 4 GB, Broadcom NetXtreme BCM5701 Gigabit Ethernet, Linux 2.4.20, PCI-X
PCI-X bus bandwidth: 482 MB/sec read, 501 MB/sec read

Comparing GASNet vapi-conduit and OSU MVAPICH MPI back-to-back on the same Infiniband hardware:
Dual 1.4Ghz Opteron, 1GB main memory mem
Linux 2.4.21-bigphys, Mellanox InfiniHost (Cougar) IB-4X HCAs, Mellanox Drivers 3.1, Firmware 3.0.0
MVAPICH numbers are from their own unmodified tester, re-normalized to MB=2^20 bandwidth bytes and full round-trip latency.
GASNet numbers are from the GASNet testsmall and testlarge benchmarks.
GASNet consistently and significantly outperforms MVAPICH because the GASNet one-sided put/get semantics are fundamentally a better match for the capabilities of the underlying RDMA hardware than MPI's two-sided message passing semantics. GASNet's put/gets turn into simple, fully one-sided RDMA operations in the common case, and therefore reap the hardware peak performance, whereas MPI pays in performance for enforcing MPI's ordering and tag matching semantics.
