Chip-multiprocessor (CMP) architectures present a challenge for efficient simulation, combining the requirements of a detailed microprocessor simulator with that of a tightly-coupled parallel system. In this paper, a distributed simulator for target CMPs is presented based on the Message Passing Interface (MPI) designed to run on a host cluster of workstations. Microbenchmark-based evaluation is used to narrow the parallelization design space concerning the performance impact of distributed vs. centralized target ...
This paper presents an analytical performance characterization and topology comparison from a latency perspective for the Scalable Coherent Interface (SCI). Experimental methods are used to determine constituent latency components and to verify the results obtained by these analytical models as close approximations of reality. In contrast with simulative models, analytical SCI models are faster to solve, yielding accurate performance estimates very quickly, and thereby broadening the design space that can ...
The distributed nature of routing and flow control in a register-insertion ring topology complicates priority enforcement for real-time systems. Two divergent approaches for priority enforcement for ring-based networks are reviewed: a node-oriented scheme called Preemptive Priority Queue and a ring-wide arbitration approach dubbed TRAIN. This paper introduces a hybrid protocol named Directed Flow Control that combines node- and ring-oriented flow control to yield greater performance. A functional comparison of the ...
High-performance networks require sophisticated management systems to identify sources of bottlenecks and detect faults. At the same time, the impact of network queries on the latency and bandwidth available to the applications must be minimized. Adaptive techniques can be used to control and reduce the rate of sampling of network information, reducing the amount of processed data and lessening the overhead on the network. Two adaptive sampling methods are proposed ...