Execution of course grain parallel programs in PC clusters promises super-computer performance in low cost hardware environments. However the overhead associated with data distribution, synchronization, and peripheral access can easily eliminate any performance gain promised by the individual cluster capacity. Application specific system performance analysis is required both to engineer PC cluster hardware and evaluate the cost effectiveness of parallelizing software components. This paper presents a distributed system performance model ...