CASHMERE-2L: Software Coherent Shared Memory on a Clustered Remote-Write Network

  • Robert Stets ,
  • Sandhya Dwarkadas ,
  • Nikolaos Hardavellas ,
  • ,
  • Leonidas Kontothanassis ,
  • Srinivasan Parthasarathy ,
  • Michael Scott

Sixteenth ACM Symposium on Operating System Principles (SOSP) |

Published by Association for Computing Machinery, Inc.

Low-latency remote-write networks, such as DEC’s Memory Channel, provide the possibility of transparent, inexpensive, large-scale shared-memory parallel computing on clusters of shared memory multiprocessors (SMPs). The challenge is to take advantage of hardware shared memory for sharing within an SMP, and to ensure that software overhead is incurred only when actively sharing data across SMPs in the cluster. In this paper, we describe a “two- level” software coherent shared memory system—Cashmere-2L— that meets this challenge. Cashmere-2L uses hardware to share memory within a node, while exploiting the Memory Channel’s remote-write capabilities to implement “moderately lazy” release consistency with multiple concurrent writers, directories, home nodes, and page-size coherence blocks across nodes. Cashmere- 2L employs a novel coherence protocol that allows a high level of asynchrony by eliminating global directory locks and the ne ed for TLB shootdown. Remote interrupts are minimized by exploiting the remote-write capabilities of the Memory Channel network. Cashmere-2L currently runs on an 8-node, 32-processor DEC AlphaServer system. Speedups range from 8 to 31 on 32 processors for our benchmark suite, depending on the application’s characteristics. We quantify the importance of our protocol optimizations by comparing performance to that of several alternative protocols that do not share memory in hardware within an SMP, and require more synchronization. In comparison to a one-level protocol that does not share memory in hardware within an SMP, Cashmere-2L improves performance by up to 46%.