Using Speculative Push to Reduce Communication Latencies in Critical Sections
Loading...
Files
Date
Authors
Rajwar, Ravi
Kagi, Alain
Goodman, James
Advisors
License
DOI
Type
Technical Report
Journal Title
Journal ISSN
Volume Title
Publisher
University of Wisconsin-Madison Department of Computer Sciences
Grantor
Abstract
Communication latencies within critical sections constitute a major bottleneck in some classes of emerging parallel workloads. In this paper we propose a mechanism, Speculative Push, aimed at reducing this communication latency. Speculative push allows the cache controller; responding to a request for a cache line inferred to have a lock variable, to predict the data sets the requestor will access within the critical section. The controller then pushes these addresses from its own cache to the target cache in an exclusive state. It also writes back the data to memory. By overlapping the transfer of the protected data along with the transfer of the lock, the communication latencies within critical sections can be substantially reduced. By pushing data in exclusive state, the mechanism can collapse the read-modify-write sequences within a critical section into a local cache access. The write-back to memory gives the receiving cache the option to ignore the push. We make a case for the use of Inferentially Queued Locks (IQLs), not just for efficient synchronization but also for reducing communication latencies. With IQLs, the processor infers the existence, and limits, of a critical section from the use of synchronization instructions and joins a queue of lock requestors. The speculative push mechanism extracts information about program structure by observing IQLs. Neither of the mechanisms require any programmer or compiler support nor any instruction set changes.
Our results demonstrate that for a set of benchmarks with high communication characteristics, IQLs are able to provide speedups when there is frequent synchronization. In each of the benchmarks we studied, the combination of
IQLs and speculative push removed more than half of the processor's observed latency during critical sections.
Description
Keywords
Related Material and Data
Citation
TR1472