In this post, we will discuss the logic and mathematics behind the Maximum Aposteriori Probability detection principles used in communication systems.
Digital communication systems neutralize many problems that beset analog systems. One of these problems is faithfully communicating the information across a channel. With analog signals, we need to make a decision as to what voltage level was transmitted. This voltage level comes from a very large range of voltage amplitudes possible. By supplanting it with a digital system, we massively reduce the problem to deducing whether the binary 'ON' (1) was sent or 'OFF' (0). However, because of the inherent noise in the receivers (thermal noise and all), and the imperfectness of the communication channels, there are chances that a '0' would be corrupted to a '1' or vice-versa. So, it becomes important to design receivers so as to minimize such chances of error. This decision is often done at the detector stage in the receiver.
It should be obvious that the detector plays a large part in making the receiver 'good'. In technical parlance, what we want to do is minimize the probability of making an error. Statistically, we want to minimize the Average Probability of Error which is given by:
\(P_e = \mathbb{E}(P\{\hat{X} \neq a_i | X = a_i\}) = \displaystyle\sum_{i=1}^{|\chi|}P\{\hat{X} \neq a_i | X = a_i\} \cdot P\{X = a_i\}\)
where \(|\chi|\)is the cardinality of the constellation (number of symbols in the constellation).
So, we want to minimize it? Yes. How? Intuitively, by not making a mistake in detection. We could juggle some equations and write the above expression as:
\(P_e = \displaystyle\sum_{i=1}^{|\chi|}(1 - P\{\hat{X} = a_i | X = a_i\} \cdot P\{X = a_i\}) = 1 - \displaystyle\sum_{i=1}^{|\chi|}P\{\hat{X} = a_i | X = a_i\} \cdot P\{X = a_i\}\)
Okay! So, what is our objective? minimize \(P_e\). Note that the expression is a subtraction and that it is a probability. Probabilities are non-negative (\(0\leq P_e \leq 1\)). So, minimizing \(P_e\)is equivalent to maximizing the subtrahend. Or,
\(maximize\mbox{ }\displaystyle\sum_{i=1}^{|\chi|}P\{\hat{X} = a_i | X = a_i\} \cdot P\{X = a_i\}\)
Since this term above is a summation, maximizing each one of its members would maximize the whole thing. Or,
\(\displaystyle\sum_{i=1}^{|\chi|}maximize\mbox{ }P\{\hat{X} = a_i | X = a_i\} \cdot P\{X = a_i\}\)
This means, that the detector should make a decision such that each conditional probability of correct decision is maximized.
Imagine that each arriving signal is put on individual pieces of paper which represent a complex plane like the one shown below. Now, the detector needs to decide which of the symbol from the codebook might have been sent by looking at what is received. This is an inference process. We're looking at an outcome and guessing what caused it. To do so, the detector has a heuristic - it divides the entire sheet of paper into as many disjoint regions as there are codes in the codebook or symbols in the constellation. These regions encompass the entire sheet. Then, to each region, it has assigned a symbol value - if the incoming signal lies in the region \(R_1\), I'll say that \(a_1\) was transmitted (\(\hat{X}=a_1\)). Likewise, it associates a symbol with each region. Now, how good the mapping of those regions is, determines how well the detector will perform. In an optimal detector, the regions would be such that the above expression is maximized. But how do we know what the regions should be?
Then, the above equation can be written as
\(\displaystyle\sum_{i=1}^{|\chi|}P\{y \in R_i | X = a_i\} \cdot P\{X = a_i\} = \displaystyle\sum_{i=1}^{|\chi|} \int_{R_i} f_Y(y | X = a_i) \cdot P\{X = a_i\}\)
with \(f_Y(y|X = a_i)\)is the conditional probability density function of the received signal. Depending on what kind of p.d.f. Y has will determine the decision regions in the optimal detector. Thus, the choice of how the regions look like, or what they should be depends on the p.d.f. of the incoming signal. When we consider an AWGN channel, we model the noise as Gaussian, and consequently, Y also takes a Gaussian form. If we had modeled noise as - let's say uniform distribution - Y would have a different form and consequently, the regions and the mapping would change.
Note that \(P\{Y = y | X = a_i\} \cdot P\{X = a_i\} = P\{X = a_i | Y = y \} \cdot P\{Y = y\}\)(by Bayes rule). This is the Aposteriori probability that \(a_i\)was transmitted, given that 'y' was received.
There is another receiver, the Maximum Likelihood receiver, which selects \(a_i\) that maximizes the likelihood of obtaining \(Y = y\). For an ML Detector, we have:
\(maximize P\{Y=y|X=a_i\}\)
Note that if the apriori probabilities of all symbols were equal, i.e. if transmitting any of the symbols from the codebook is equally probable, then the MAP detector reduces to the ML Detector. This looks like a special case but is a very powerful tool which is widely exploited.