Database of Case Studies Achieved Using CADP

Validation of the xSTream Data-Flow Architecture

Organisation: STMicroelectronics
VASY project-team, Inria Grenoble - Rhône-Alpes and LIG

Method: LOTOS
Discrete Time Markov Chains

Tools used: CADP (Construction and Analysis of Distributed Processes)
IPC tools

Domain: Hardware Design.

Period: 2007-2010

Size: up to 17 million states and up to 15 billion transitions.

Description: xSTream is a multiprocessor dataflow architecture developed by STMicroelectronics for high performance embedded multimedia streaming applications. In this architecture, computation nodes (e.g., filters) communicate by xSTream queues connected by a NoC (Network on Chip). An xSTream queue generalizes a bounded FIFO queue in two ways: it provides additional primitives (such as peek to consult items in the middle of the queue, which is not possible with the standard push/pop primitives of FIFO queues), and a backlog (extra memory) to allow the increase of the queue size when the queue overflows. The validation of the xSTream architecture was addressed for qualitative and quantitative aspects.

For qualitative aspects, several variants of xSTream queues were modeled in LOTOS (in total 6,800 lines of code), differing in whether each xSTream primitive (push, pop, peek, etc.) is modeled by one or two (request/response) LOTOS events. For several values of n and p, equivalence checking using the BISIMULATOR tool of CADP established that an xSTream queue of size n with a backlog of size p is branching equivalent to an xSTream queue of size n+p.

We then investigated the interaction of several xSTream queues sharing the NoC. Using BISIMULATOR, it was shown that the state space (1.5 million states and 8 million transitions) for a pair (Q1, Q2) of xSTream queues sharing the NoC with another pair (Q3, Q4) of xSTream queues was not branching equivalent, when hiding the actions of (Q3, Q4), to the state space obtained for (Q1, Q2) alone. This illustration of a possible starvation caused by NoC access conflicts revealed that the xSTream credit protocol was not optional (as assumed so far) but mandatory.

After incorporation of the xSTream credit protocol into the LOTOS model, compositional state space generation succeeded (reducing the size of the largest intermediate state space from 17 million states and 108 million transitions down to 1.5 million states and 8 million transitions). Also, branching equivalence could be shown, but only when the credit value parameter is equal to 1; for values greater than 1, the issue has been reported to the xSTream architects, together with a convincing counterexample (a valid xSTream application running into a deadlock when executed on an xSTream simulator generated from the LOTOS model using the EXEC/CÆSAR environment of CADP).

As a next step, we studied the xSTream NoC, composed of routers connected by direct communication links (xSTream queues). First experiments showed that representing xSTream primitives in the routers by two events (request/response), as used for modeling xSTream queues, would lead to state space explosion. Therefore, each xSTream primitive was represented by a single LOTOS event in a router. To reuse the LOTOS models already developed for xSTream queues (in which each xSTream primitive was represented using two LOTOS events), additional LOTOS processes, called network interfaces, were introduced. These network interfaces convert one representation into the other.

Two successive models of an xSTream router have been developed. The first model (150 lines of LOTOS code) uses a single LOTOS process with eight parameters representing the status of each of the eight ports of the router. State space generation for this model was slow due to the number of conditions evaluating to false when computing the successor states. Therefore, we produced a second model (120 lines of LOTOS code), in which each port is represented by a separate process, which reduced the generation time by a factor 50, still yielding the same state space.

For a NoC composed of four routers, each of which is directly connected to all the other routers, the corresponding state space (about 5,600 states and 20,000 transitions) could be generated compositionally, the largest intermediate graph having one million states and 5 million transitions. By using the BISIMULATOR tool, it was shown that this model (i) is not branching bisimilar to, but only contains (modulo the branching preorder) the parallel composition of two xSTream queues (after abstracting away the network interfaces and the routers); (ii) is branching bisimilar to the composition of two xSTream queues connected by two network interfaces (after abstracting away only the routers). The correctness of the additional behavior has been verified by checking temporal logic formulas expressing that using the EVALUATOR tool.

The case of a NoC composed of six routers is more intricate, because the routers are no longer fully connected; this implies that a packet might traverse more than two routers, which in turn increases the number of messages that might be received by a router. Consequently, the size of the state space for a single router increased: a router in a NoC with four routers has less than 400 states and 140,000 transitions, whereas a router in a NoC with six routers has up to 6.7 million states and 49 million transitions, which was too large for further composition with the other routers, network interfaces, and xSTream queues. To abstract away the differences between all "disturbing" messages (i.e., messages that do not directly concern the considered pair of xSTream queues), destination and routing information in all these messages was ignored. The model a router was also simplified by removing those ports that are only used by the abstracted messages. Alltogether, the state space corresponding to this simplified model of a router was reduced down to 240 states and 1,000 transitions. An additional benefit was that the same router could be used for all routers of the NoC (due to an asymmetry in the architecture, in the first model all routers had different state spaces). By using the BISIMULATOR tool, the parallel composition of two xSTream queues connected by network interfaces and a NoC with six abstract routers was found branching bisimilar to the parallel composition of two xSTream queues connected by two network interfaces. However, an equivalence relation between the two models of a router could not be established, because the simplified model of a router is no longer deterministic (abstracting away the routing information from disturbing messages introduces nondeterminism, since some disturbing messages can be dropped while others have to be forwarded to the next router). For a refined abstraction, distinguishing two kinds of disturbing packets, those to be dropped and those to be forwarded, both models were found to be related by a mutual inclusion (i.e., two simulations, but not a bisimulation).

For quantitative aspects, the goal was to predict latency and throughputs in the communication architecture, as well as occupancy within xSTream queues (with and without backlogs). A key challenge is to combine probabilistic/stochastic information (e.g., the rates at which xSTream applications push and pop elements in and out of the queues) with precise timing information (e.g., memory access time). This led to a new approach, called IPC (Interactive Probabilistic Chains), inspired by Interactive Markov Chains, but using probabilities instead of stochastic distributions and a central clock governing all delays. A structured operational semantics for standard process algebra operators (sequential and parallel composition, nondeterministic choice, etc.) has been defined for IPC; it has also been proven that probabilistic branching bisimulation is a congruence for the parallel composition of IPC. Taking advantage of the open architecture of CADP, a prototype tool chain (6,400 lines of C code and 400 lines of Perl script) has been developed and experimented on several examples.

In particular, using the IPC approach to compute the average throughput of the put operation for different sizes of the push and pop queues, it could be shown that the relative size of the push and pop queues forming an xSTream queue might influence the average throughput of the push operation, although all xSTream queues with the same overall size (sum of push and pop queues) are functionally equivalent (i.e., branching bisimilar). The best performance has been observed when the sizes of both queues are as similar as possible. It has also been observed that larger push queues better support bursts of messages.

Considering a system of two xSTream queues (i.e., two pairs of push and pop queues) sharing the NoC, the effect of the flow-control mechanism on the througput of the first xSTream queue has been studied for various consumption rates of the second xSTream queue has been studied. In the case that the messages of the second xSTream queue are rapidly consumed, i.e., removed from the pop queue of the second xSTream queue, no impact on the throughput of the first xSTream queue has been observed. In the case that the messages of the second xSTream queue are consumed slowly, it has been observed that the flow-control mechanism impacts the throughput of the first xSTream queue. Indeed, without the flow-control mechanism, the first xSTream queue is slowed down proportionally to the second xSTream queue, whereas with the flow-control mechanism, the performance of the first xSTream queue even increases and tends towards the values observed for a single xSTream queue.

Conclusions: Formally modeling a hardware architecture points out ambiguities in the design, and provides concrete counterexamples in case of errors. The models developed for qualitative analysis can be extended (and thus reused) for quantitative analysis, in particular performance evaluation and prediction. Finally, the open architecture of the CADP toolbox eases the development of new tools.

Publications: [Coste-Hermanns-Lantreibecq-Serwe-09] Nicolas Coste, Holger Hermanns, Etienne Lantreibecq, and Wendelin Serwe. Towards Performance Prediction of Compositional Models in Industrial GALS Designs. In Proceedings of the 21th International Conference on Computer Aided Verification CAV'2009 (Grenoble, France), June--July, 2009.
Available on-line at: http://cadp.inria.fr/publications/Coste-Hermanns-Lantreibecq-Serwe-09.html

[Coste-10] Nicolas Coste. Towards Performance Prediction of Compositional Models in GALS Designs. PhD Thesis in Computer Science, University of Grenoble (France), June 2010.
Available on-line at: http://cadp.inria.fr/publications/Coste-10.html

Contact:
Wendelin Serwe
INRIA Grenoble Rhône-Alpes / CONVECS
655, avenue de l'Europe
38330 Montbonnot Saint-Martin
FRANCE
Tel: +33 (0)4 76 61 53 52
Fax: +33 (0)4 76 61 52 52
Email: [email protected]



Further remarks: This case-study, amongst others, is described on the CADP Web site: http://cadp.inria.fr/case-studies


Last modified: Fri Feb 16 15:34:26 2024.


Back to the CADP case studies page