Since the iCPP algorithm is to preserve concurrency, how to measure concurrency is discussed and a concurrency metric is developed. In previous works, concurrency was defined as the number of active gates at a given simulation cycle for parallel discrete event logic simulation under the assumption that there is an infinite number of processors. In the same paper, the average parallelism is defined as the average concurrency during the entire simulation. A different definition of concurrency was used by Smith et. al. so to take into account the number of processors. However, both definitions can be used only for a synchronous simulation that performs at lock steps. In an asynchronous parallel simulation as in a Time Warp simulation, it is difficult to measure the concurrency because the ``simulation cycle'' is not clearly defined.
Most asynchronous parallel simulation studies use the performance of real parallel simulations to evaluate partitioning algorithms. Recently, a few works based on the critical path analysis have been performed to estimate the execution time of parallel simulations. However, their approaches require the execution of either a sequential simulation or a parallel simulation. Since the performance of parallel simulation is affected not only by partitioning but also by many other factors such as synchronization algorithms and target architectures, it is desirable to have a concurrency metric for partitioning that is independent of those factors and requires neither parallel simulation nor sequential simulation. Such a metric for concurrency has never been defined previously.
Given a partitioned directed graph, we want to define a concurrency metric that can be evaluated without performing a parallel simulation or a sequential simulation. Since concurrency is related to the number of active gates at a simulation cycle, nodes in a partitioned directed graph need to be classified into linearly ordered levels so that each level can be evaluated at about the same time on all processors. This classification process, called levelization, is based on the distances of individual graph nodes from primary input gates or flip-flops and the processor assignment. Once the levelization procedure has been applied to a directed graph, the concurrency can then be defined.
The inherent parallelism of a circuit depends on its structure, such
as the number of primary input gates, the number of flip-flops, and the
connectivity of circuit gates. In general, a loosely connected circuit
has high concurrency while a tightly connected circuit has relatively low
concurrency. Figure summarizes the concurrency of several circuits
after the application of those partitioning algorithms. Figure displays
those results graphically for the circuit s38417 with three partitioning
algorithms that produce a high degree of concurrency. It shows that the
iCPP algorithm produces better concurrency than MeTis does while the string
algorithm is the best in terms of concurrency.
In Figure A, partitioning results of the s38584.1 circuit are summarized in terms of the ratio of edge cut. As can be seen from the diagram, curves for different partitioning algorithms exhibit the same trend as the number of processors increases. Another observation is that the iCPP and the MeTis algorithms consistently produce lower ratio of edge cuts than those other four algorithms. As a matter of fact, when bi-partitioning (two-processor partitioning) is performed, the ratio of external events for the iCPP algorithm is 1.5%. This compares favorably with several other partitioning algorithms such as Flip-flop-Clustering (10%), Min-Cut (15%), and Levelizing (20%). These results were reported in Sporrer et al. Even though the Corolla-clustering algorithm with its 0.6% ratio of external events for bi-partitioning definitely outperforms the iCPP algorithm in this regard, it is relatively slow and does not take concurrency into consideration. Figure B also shows that the ratio of edge cuts with the iCPP algorithm is only 12.5% even when 64 processors are used. For more detail in concurrency metric and partitioning results, please read a paper.
Figure A: Edge Cut and communication Ratio
Figure B: Edge Cut Ratios with several partitioning algorithms