

## Task #1: Sequencing 1

a)

*Sequencing overhead* is delay caused by sequencing elements eg. flip-flops. In this case there are two sources of delay: the setup time  $t_{\text{setup}}$  and clk-to-Q propagation time (ie. worst case delay)  $t_{\text{pcq}}$ .

$$\text{seq. overhead} = t_{\text{setup}} + t_{\text{pcq}} = 115\text{ps}$$

b)

The *propagation delay* of the combinational logic must be short enough to allow enough time for the sequencing overhead within a single clock cycle

$$t_{\text{pd}} \leq T_C - \text{seq. overhead} = 500\text{ps} - 115\text{ps} = 385\text{ps}$$

c)

The logic *contamination delay* is the best case delay and hence by definition  $t_{\text{cd}} \leq t_{\text{pd}}$ . However, in order to avoid a *hold violation* on the receiving flip-flop,  $t_{\text{cd}} \geq t_{\text{hold}} - t_{\text{ccq}}$ . The range of  $t_{\text{cd}}$  allowed is therefore

$$-5\text{ps} = t_{\text{hold}} - t_{\text{ccq}} \leq t_{\text{cd}} \leq t_{\text{pd}} = 385\text{ps}$$

Clearly, since  $t_{\text{cd}}$  can not be negative, it is impossible for a hold violation to occur!

d)

With a clock skew  $|t_{\text{skew}}| \leq 50\text{ps}$  between any two sequencing elements in the system, the *sequencing overhead* must take this into account

$$\text{seq. overhead} = t_{\text{setup}} + t_{\text{pcq}} + t_{\text{skew}_{\text{max}}} = 165\text{ps}$$

The logic propagation and contamination delays then become even further constrained. Clock skew must also be assumed to go in both directions and so the possibility of a hold violation must now be reconsidered

$$t_{\text{cd}} \geq t_{\text{hold}} - t_{\text{ccq}} + t_{\text{skew}} = 45\text{ps}$$

The range to which  $t_{\text{cd}}$  is constrained is thus

$$45\text{ps} = t_{\text{hold}} - t_{\text{ccq}} + t_{\text{skew}} \leq t_{\text{cd}} \leq t_{\text{pd}} \leq 335\text{ps}$$

## Task #2: Sequencing 2

a)

The longest (critical) path through the circuit is through all three XOR gates. The shortest clock period that allows for the worst-case delay is therefore

$$T_C \geq t_{\text{setup}} + t_{\text{pcq}} + 3 \cdot t_{\text{pd}} = 430\text{ps}$$

which gives a maximum operating frequency of

$$f_{C \text{ max}} = 2.326 \text{ GHz}$$

**b)**

If the circuit operates at 2GHz then the clock skew that it can tolerate can be expressed as

$$t_{\text{skew}} \leq T_C - (t_{\text{setup}} + t_{\text{pcq}} + 3 \cdot t_{\text{pd}}) = 70\text{ps}$$

which is the time remaining per clock cycle after all the signals have propagated and stabilised.

**c)**

The maximum clock skew that the circuit can tolerate before it runs the risk of causing a *hold violation* is the delay of the *shortest* path through the circuit offset by the *hold time* of the receiving flip-flop. The shortest path passes through one XOR gate in this case, giving an expression for the clock-skew as follows

$$t_{\text{skew}} \leq (t_{\text{cd}} + t_{\text{ccq}}) - t_{\text{hold}} = 85\text{ps}$$

d)

Let the original circuit have inputs  $a, b, c$  and  $d$  and output  $y$ . The logical function of the circuit is then  $y = (((a \oplus b) \oplus c) \oplus d)$ . Due to the *associative* property of the XOR function, the inputs may be grouped in any way. Therefore, with three XOR gates, one could arrange them as  $y = ((a \oplus b) \oplus (c \oplus d))$ , reducing the critical path by one gate.

The maximum clock frequency without clock skew is now

$$f_{C\max} = (t_{\text{setup}} + t_{\text{pcq}} + 2 \cdot t_{\text{pd}})^{-1} = 3.030 \text{ GHz}$$

Since the shortest path has been increased by one gate, increasing the effective contamination delay of the circuit, the circuit should tolerate a larger clock skew before a hold violation occurs

$$t_{\text{skew}} \leq (2 \cdot t_{\text{cd}} + t_{\text{ccq}}) - t_{\text{hold}} = 140 \text{ ps}$$

and indeed the tolerable clock skew has been increased by 55ps!

## Task #3: Metastability

a)

From figures 9 to 14 in [6513714], two extreme values can be found for both (the measured values of)  $T_w$  and  $\tau$  at  $V_{DD}=1.25\text{V}$ , each at the extreme ends of the measured temperature range.

In order to estimate the *mean time between failures* (MTBF) with a *data rate*  $F_D = 200\text{MHz}$ , a *system clock* frequency  $F_C = 5\text{GHz}$ , and a metastability *resolution time*  $S = T_C$  ie. one clock cycle, the following equation can be used

$$\text{MTBF} = \frac{e^{S/\tau}}{T_w F_C F_D}$$

The resulting MTBF and values used for each temperature extreme is shown in table 1.

Table 1: MTBF,  $T_w$ ,  $\tau$  at  $V_{DD}=1.25$  vs. temperature [ $^{\circ}\text{C}$ ]

| Temperature [ $^{\circ}\text{C}$ ] | $T_w$ [ps] | $\tau$ [ps] | MTBF [ $\mu\text{s}$ ] |
|------------------------------------|------------|-------------|------------------------|
| -20                                | 4          | 65          | 5.4                    |
| 100                                | 25         | 45          | 3.4                    |

It is clear that the MTBF will vary  $3.4 \mu\text{s} \leq \text{MTBF} \leq 5.4 \mu\text{s}$  depending on the operating temperature.

b)

It stands to reason that a system with 4000 flip-flops will be 4000 times as likely to fail due to metastability. If we then have 20 sets of flip-flops, the worst-case MTBF can be estimated at

$$\text{MTBF} = \frac{1}{20 \cdot 4000} \cdot \frac{e^{S/\tau}}{T_w F_C F_D} = 52.5 \text{ ps}$$

c)

In order to compute the *resolution time*  $S$  for a certain minimum MTBF, the previous equations can be rewritten as follows

$$S \geq \tau \cdot \ln(\text{MTBF} \cdot T_W F_C F_D)$$

For a failure rate of at most once per 10 years, assuming constant worst-case conditions, as well as  $f_C = 100\text{MHz}$  and  $F_D = 10\text{MHz}$

$$S \geq 1.34ns$$

yields the results in table 2. Both of these values are a small fraction of a single clock cycle

Table 2: Resolution time  $S$ ,  $T_w$  and  $\tau$  at  $V_{DD}=1.25$  vs. temperature [°C]

| Temperature [°C] | $T_w$ [ps] | $\tau$ [ps] | $S$ [ns] |
|------------------|------------|-------------|----------|
| -20              | 4          | 65          | 1.81     |
| 100              | 25         | 45          | 1.33     |

( $1.34ns/T_C = 0.134$ ) and therefore one clock cycle will more than suffice (MTBF grows exponentially with  $S$ !).

d)

Over the temperature range,  $\tau$  rises exponentially with falling supply voltage, while  $T_W$  tends to be more stable with a linear positive relationship with both temperature and supply voltage. When estimating the required resolution time  $S$  for a MTBF of 10 years when  $V_{DD}$  can reach as low as 1V, a clear maximum can be observed in the data. At  $-20^\circ\text{C}$ ,  $\tau \approx 600\text{ps}$  while  $T_W \approx 3\text{ps}$ .

Using the previous equation

$$S \geq 13.93ns$$

and order of magnitude greater!

Since every clock cycle  $T_C = 10\text{ns}$ , the resolution time will require two clock cycles.

**Task #4:****a)**

The spans for the various black and grey PG cells are shown below.

| # | PG Cells |      |
|---|----------|------|
|   | Black    | Grey |
| 1 | 19:18    | 1:0  |
| 2 | 19:16    | 3:0  |
| 3 | 19:12    | 7:0  |
| 4 | 19:4     | 15:0 |
| 5 | 15:14    | 23:0 |
| 6 | 15:12    | 27:0 |
| 7 | 15:8     | -    |
| 8 | 23:8     | -    |
| 9 | 27:12    | -    |

Figure 1: *Spans of various PG cells in PG-tree diagram.*

**b)**

The difference between the black and grey PG-cells are is that the black cells compute both the *group propagate*  $P_{i:j}$  and the *group generate*  $G_{i:j}$ , whereas the grey cells only compute the *group generate*. Furthermore, the grey cells are placed above black cells in any given bit position, since only the *group generate* is needed when calculating the sum.

**c)**

A sum function for a bit  $i$  in the PG-tree is (using Weste & Harris' nomenclature) the *propagate* for  $i$  exclusive OR'ed with the previous bit's *group propagate*

$$S_i = P_i \oplus G_{i-1:0}$$

**d)**

Since the *group generate* of the preceeding bit  $i-1$  is needed to compute the sum, it is clear that very few sums can be formed in this implementation. Only sum bits 1, 2, 4, 8, 12, 16, 20, 24 and 28 can be formed.

e)



Figure 2: PG-tree with grey PG-cells added in order to make every sum bit computable.

f)

The longest (critical) path through the PG-tree is through three grey cells and four black ones. Bits 30, 26, and 22 have this propagation delay of seven PG cells.