# The Probability Behind Safety

## Introduction

Safety system design and safety requirement assignment has specific rules. Behind those rules are actually the probability. To fully understand the rules, it is extremely important to understand its mathematics.

In this blog post, I would like to discuss the safety requirement inheritance and decomposition from the perspective of probability.

## Safety Requirement Inheritance

The ASIL is not calculated for a physical system component - it is calculated for a function. THE ASIL associated with a function is then inherited by the software and hardware elements that realize the function.

Suppose I have a function $f$ which is computed using a sequence of smaller functions $f_1$, $f_2$, $f_3$, $\cdots$, $f_n$, i.e.,

$$f(x) = f_n(f_{n-1}(…(f_2(f_1(x)))))$$

If the ASIL requirement for $f$ is $x$, the ASIL requirement for $f_1$, $f_2$, $f_3$, $\cdots$, $f_n$, each cannot be lower than $x$. This is called ASIL requirement inheritance.

For example, if the ASIL requirement for $f$ is ASIL-D, the ASIL requirement for $f_1$, $f_2$, $f_3$, $\cdots$, $f_n$ should all be ASIL-D.

What if any of the ASIL requirement for $f_1$, $f_2$, $f_3$, $\cdots$, $f_n$ is lower than the ASIL requirement for $f$? Basic probability mathematics will tell us that the ASIL requirement for $f$ will not be satisfied.

For example, suppose the failure rate for ASIL-D and ASIL-B are $10^{-12}$ and $10^{-6}$, respectively. Suppose only one of the $n$ small functions $f_1$, $f_2$, $f_3$, $\cdots$, $f_n$ is ASIL-B and the rest of them are ASIL-D, what’s the failure rate of $f$?

The failure rate of $f$ will be

$$P(f)= 1 - (1 - 10^{-12})^{n-1} (1 - 10^{-6})$$

Without using any mathematical approximations, we could just use a calculator to calculate the failure rate for different values of $n$.

$n$ $P$
$1$ $0.000001$
$2$ $0.000001$
$10$ $0.000001$
$100$ $0.00000100009$
$1000$ $0.00000100099$
$10000$ $0.00000100999$

We could see that even if $n$ is very large, the failure rate of $f$ is always the failure for ASIL-B, which is far away from the failure rate for ASIL-D. In layman’s terms, “A Chain is As Strong As The Weakest Link”.

## Safety Requirement Decomposition

In practice, achieving ASIL-D or extremely small failure rate is difficult for most of the scenarios, unless we decompose the safety requirement. Concretely, subject to constraints, the inherited ASIL may be lowered by decomposition of a requirement into redundant requirements.

Suppose I have a function $f$ which can be computed using any one of the functions $f_1$, $f_2$, $f_3$, $\cdots$, $f_n$, and we further assume that the functions $f_1$, $f_2$, $f_3$, $\cdots$, $f_n$ are independent from each other, if the ASIL requirement for $f$ is $x$, the ASIL requirement for $f_1$, $f_2$, $f_3$, $\cdots$, $f_n$, each can be lower than $x$.

For example, if the ASIL requirement for $f$ is ASIL-D, the ASIL requirement for $f_1$, $f_2$, $f_3$, $\cdots$, $f_n$ can all be ASIL-B or even lower depending on the value of $n$.

Suppose the failure rate for ASIL-D and ASIL-B are $10^{-12}$ and $10^{-6}$, respectively, what’s the failure rate of $f$?

The failure rate of $f$ will be

$$P(f) = 10^{-6n}$$

Apparently,

$n$ $P$
$1$ $10^{-6}$
$2$ $10^{-12}$
$10$ $10^{-60}$
$100$ $10^{-600}$
$1000$ $10^{-6000}$
$10000$ $10^{-60000}$

We could see that the larger $n$ is, the lower failure rate of $f$ will be. Consequently, the more redundancy the lower ASIL rate each of the redundancy can have.

All the mathematics above assumes the independence between the functions. However, if the independence assumption does not hold, the mathematics above becomes invalid.

For example, if $f$ can be deposed using two functions $f_1$ and $f_2$, if both $f_1(x) = g_1(g_2(x))$ and $f_2(x) = g_1(g_3(x))$. If the safety requirement $f$ is ASIL-D, what’s the minimum ASIL requirement for $f_1$, $f_2$, $g_1$, $g_2$, and $g_3$?

Without looking at how $f_1$ and $f_2$ are implemented, we could assume that $f_1$ and $f_2$ are independent of each other and therefore the safety requirement for them can be ASIL-B. Then $f_1$ is implemented using a sequence of $g_1$ and $g_2$, $f_2$ is implemented using a sequence of $g_1$ and $g_3$.

It is obvious to see that the failure rate of $f_1$ is

\begin{align} P(f_1) &= 1 - (1 - P(g_1)) (1 - P(g_2)) \\ &= P(g_1) + P(g_2) - P(g_1)P(g_2) \\ \end{align}

and the failure rate of $f_2$ is

\begin{align} P(f_2) &= 1 - (1 - P(g_1)) (1 - P(g_3)) \\ &= P(g_1) + P(g_3) - P(g_1)P(g_3) \\ \end{align}

What’s more interesting is the failure rate of $f$

\begin{align} P(f) &= \underbrace{P(g_1) \times 1}_{g_1 \text{ failed}} + \underbrace{(1 - P(g_1)) P(g_2)P(g_3)}_{g_1 \text{ did not fail}} \\ &= P(g_1) + P(g_2)P(g_3) - P(g_1)P(g_2)P(g_3) \\ \end{align}

Mathematically, if $f_1$ and $f_2$, we shall have the failure rate of $f$ equal to product of the failure rate of $f_1$ and $f_2$, i.e.,

\begin{align} P(f) = P(f_1) P(f_2) \end{align}

However, we could see that usually the above equation does not work. This means $f_1$ and $f_2$ are not independent of each other, because we used $g_1$ in both $f_1$ and $f_2$.

But given the existing design, how could we assign safety requirement for $g_1$, $g_2$, and $g_3$ so can $P(f)$ can achieve ASIL-D (approximately)? Obviously, if $g_1$ is ASIL-D, and $g_2$ and $g_3$ are ASIL-B, $P(f) = 2 \times 10^{-12}$, $f$ will be ASIL-D.

Therefore, generally speaking, for any shared unit used in the branches of the safety decomposition, it will have to inherit the original safety requirement. In other words, if the architectural elements are not sufficiently independent, then the redundant requirements and the architectural elements inherit the initial ASIL.

In fact, because usually $P(g_1) \ll 1$,

\begin{align} P(f) &= P(g_1) + P(g_2)P(g_3) - P(g_1)P(g_2)P(g_3) \\ &= P(g_1) + P(g_2)P(g_3) \\ \end{align}

This already indicates that the shared unit ($g_1$) has to inherit the safety requirement of the parent ($f$) and the independent units ($g_2$ and $g_3$) can have lower safety requirements than the parent.

## Homogeneous Redundancy VS Heterogeneous Redundancy

Homogeneous redundancy means the two redundant elements used in the decomposition are the same (not shared but a copy of each other). A quick example of homogeneous redundancy is that a two-engine airplane is usually equipped with two engines of the same type manufactured by the same company. If one of the two engines is lost, the airplane can still be functional.

Heterogeneous redundancy, on the contrary, means two redundant elements used in the decomposition are different. A quick example of heterogenous redundancy is that the laptop can be powered by both AC and battery. If either AC or battery is lost, the laptop can still be functional.

Heterogenous redundancy usually means the two different redundant elements, $f_1$ and $f_2$, in the decomposition are independent and they can have lower safety requirements than the parent $f$, i.e., $P(f) = P(f_1) P(f_2)$.

Homogeneous redundancy does not necessarily mean the two identical redundant elements, $f_1$ and $f_2$, in the decomposition are independent. The designer would have to sufficiently prove that the two identical redundant elements are independent by showing $P(f) = P(f_1) P(f_2)$. For example, if there is a significant design flaw in the two identical redundant elements which will cause the identical elements to fail under the same condition in many use cases, $P(f) \neq P(f_1) P(f_2)$. But this does not mean homogeneous redundancy cannot be used. Mathematically,

$$P(f_1) = P(X = x) P(f_1 | X = x) + (1 - P(X = x)) P(f_1 | X \neq x)$$

$$P(f_2) = P(X = x) P(f_2 | X = x) + (1 - P(X = x)) P(f_2 | X \neq x)$$

$$P(f) = P(X = x) P(f | X = x) + (1 - P(X = x)) P(f | X \neq x)$$

where $X$ is the random variable of failure reason and $x$ is the failure reason, such as design flaw or bugs in the computer program, that will definitely cause all the redundant copies to fail, i.e.,

$$P(f_1 | X = x) = P(f_2 | X = x) = 1$$

$$P(f | X = x) = P(f_1 | X = x) P(f_2 | X = x) = 1$$

$$P(f | X \neq x) = P(f_1 | X \neq x) P(f_2 | X \neq x)$$

Thus, we have

$$P(f_1) = P(X = x) + (1 - P(X = x)) P(f_1 | X \neq x)$$

$$P(f_2) = P(X = x) + (1 - P(X = x)) P(f_2 | X \neq x)$$

$$P(f) = P(X = x) + (1 - P(X = x)) P(f_1 | X \neq x) P(f_2 | X \neq x)$$

So $P(f) = P(f_1) P(f_2)$ is only true when $P(X = x) = 0$. In reality, $P(X = x)$ might not be easily measured directly and we might think it as $P(X = x) = 0$. But what if $P(X = x)$ is some large value that is larger than the safety requirement for $f$, the function safety will not be reached.

Therefore, if homogeneous redundancy is used, there will be no guard to design flaws. The safety system designer should only claim $f$ reaches certain safety requirement by directly measuring it, instead of computing it from $P(f) = P(f_1) P(f_2)$ by assuming $f_1$ and $f_2$ are independent of each other.

Lei Mao

08-01-2022

08-01-2022