I Introduction
Part of this work was presented at the 2018 International Symposium on Information Theory (ISIT) [OronBasharISIT].
Finitestate channels (FSCs) are commonly used to model scenarios in which the channel or the system have memory. Instances of this model can be found in wireless communication [FSCTransWirelessComm, FSCTransWirelessComm1], molecular communication [MolecFSCTransComm, MolecularSurvey], chemical interactions and magnetic recordings [FSCMagnetic]. Despite their importance in theory and practice, their capacity expression is still given by a noncomputable expression [Gallager68, Loeliger_memory, GBAA, PfisterISI]. In this paper, we investigate computational methods for finding the capacity of unifilar FSCs with feedback (Fig. 1).
A useful approach for computing the feedback capacity is via dynamic programming (DP) methods [PermuterCuffVanRoyWeissman07_Chemical, Yang05, TatikondaMitter_IT09]. When the DP problem can be solved analytically, simple capacity expressions and optimal coding schemes can be determined [Chen05, PermuterCuffVanRoyWeissman08, Ising_channel, Sabag_BEC, trapdoor_generalized, Ising_artyom_IT, PeledSabagBEC, Sabag_BIBO_IT]. However, in most cases analytical solutions are infeasible. Thus, no insights on communication aspects, such as coding schemes, or analytic expressions can be achieved, except to the resultant numerical lower bounds. In this paper, we propose an alternative method to compute lower and upper bounds on the capacity. The main advantages of the new evaluation method is that the numerical results can be converted into analytic expressions, and that each resultant lower bound implies a simple coding scheme.
The upper and lower bounds are based on a new technique that simplifies the feedback capacity expression using auxiliary graphs [Sabag_UB_IT]. The auxiliary graph, termed the graph, is used to map output sequences onto one of the auxiliary graph nodes (Fig. 2). This sequential mapping can be exploited to derive singleletter lower and upper bounds on the capacity expression of the unifilar FSC [Sabag_UB_IT]. Specifically, for any choice of a graph, the upper bound is given by
(1) 
where the joint distribution is
, and denotes a stationary distribution. For the lower bound, it was shown that any choice of a graph yields(2) 
for all input distributions, , that are BCJRinvariant, a property that will be defined later.
The upper bound plays an important role in capacity characterization as it is tight for all cases where the capacity is known, including the trapdoor, Ising and inputconstrained channels. Furthermore, for all these cases, the upper bound is tight with auxiliary graphs that have small cardinality [Chen05, PermuterCuffVanRoyWeissman08, Ising_channel, Sabag_BEC, trapdoor_generalized, Ising_artyom_IT, PeledSabagBEC, Sabag_BIBO_IT]. Therefore, if one can show a cardinality bound on the graph’s size, it will suffice to conclude a singleletter capacity expression. The current paper was motivated by the question of whether there exists a cardinality bound or not. Unfortunately, we have no decisive answer to this question, but we developed very useful numerical tools that led to new capacity results, analytic bounds and simple coding schemes.
First, we show that the upper bound in (1) can be formulated as a standard convex optimization problem. The convexity is not trivial since it depends on a stationary distribution that is controlled by the input distribution. As will be shown, the formulation gives efficient algorithms that converge to the global maximum that is required for the computation of (1). Second, given a conjectured solution, the upper bound can be proven analytically using the KKT conditions. The upper bound optimization problem is useful for evaluating the performance of the graphbased encoders that result from the lower bound optimization problem.
For the lower bound, we provide an optimization problem that maximizes the lower bound in (2) over all BCJRinvariant input distributions. In this case, the optimization problem is not convex. Nonetheless, any feasible point (a BCJRinvariant input) induces a lower bound on the feedback capacity. The main advantage is that we can extract a graph and an input distribution, termed here as a graphbased encoder (in DP literature, a finitestate controller) and their achievable rates. Graphbased encoders also benefit from a simple coding scheme. We will present a posterior matching (PM) scheme that achieves the lower bound for any graphbased encoder. The scheme is inspired by the PM principle for memoryless channels [shayevitz_posterior_mathcing] that was extended to systems with memory [Sabag_BIBO_IT]. Thus, any graphbased encoder implies a simple coding scheme that achieves even if the lower bound does not attain the capacity.
The optimization problems are formulated with respect to a fixed graph and evaluated with a generic enumeration method for directed graphs that we developed. An alternative method to constructing Markov graphs is also presented. These two construction methods are used to evaluate the bounds on wellknown channels: the Ising, trapdoor and BFCs. The numerical results give promising results in all studies channels.
For all channels, graphbased encoders and their simple achievable rates are presented. The performance of the graphbased encoders, when compared to the numerical upper bounds, yield neartight bounds. We also derive analytic upper bounds that lead to new capacity results. For example, for the BFC, we prove that the capacity is achieved with a graphbased encoder that has only a single node only. For the wellstudied trapdoor channel, we derive a new capacity result by providing a simple graphbased encoder with only three nodes, and a corresponding upper bound.
The remainder of the paper is organized as follows: Section II presents notation, the setting and background on the graph bounds. Section III contains the optimization problems and the coding scheme. Section IV contains examples, including their numerical evaluation and their analytic expressions. Lastly, Section LABEL:sec:conclusion contains some concluding remarks. Technical proofs are given in the appendices to preserve the flow of the presentation.
Ii Notation and Preliminaries
This section presents notation, the setting and the relevant background on the graph [Sabag_UB_IT].
Iia Notation
Random variables, realizations and sets are denoted by uppercase (e.g., ), lowercase (e.g., ) and calligraphic letters (e.g., ), respectively. We use the notation to denote the tuple and
to denote a realization of such a vector of random variables. For a real number
, we define . The binary entropy function is denoted by. The cumulative distribution function of
is denoted by , and its inverse is denoted by. The probability vector of
is denoted by , the conditional probability of given is denoted by , and the joint distribution of and is denoted by . The probability is denoted by , and when the random variable is clear from the context, we write it in shorthand as . For a vector , represents an elementwise inequality for each coordinate in .IiB FSC with feedback
A FSC is defined by a conditional probability , where is the channel input, is the channel output, is the channel state during transmission, and is the new channel state. The encoder chooses , the channel input, based on the message and the output tuple . At each time , the channel has the property . If the channel state, , is a deterministic function , then the FSC is called unifilar. A unifilar FSC is strongly connected if for all , there exist and such that . It is also assumed that the initial state, , is available to both the encoder and the decoder.
The capacity of the unifilar FSC is given by the following:
Theorem 1.
[Theorem , [PermuterCuffVanRoyWeissman08]] The feedback capacity of a strongly connected unifilar FSC, where is available to both to the encoder and the decoder, can be expressed by
(3) 
The capacity expression in Theorem 1 cannot be computed directly. It can be shown that the capacity can be formulated and evaluated as an infinitehorizon average reward MDP [TatikondaMitter_IT09, PermuterCuffVanRoyWeissman08]. However, analytic solutions for the capacity are challenging due to the continuous alphabets of states and actions.
IiC The graph bounds
The graph bounds are an alternative for computing the capacity when the MDP cannot be solved. Their main idea is to simplify (3) by embedding an auxiliary graph into the capacity expression. We now formalize the graph bounds that will be used in the optimization problems.
For an output alphabet , the graph is a directed, connected and labeled graph. Each of its nodes should have outgoing edges with distinct labels (see an example in Fig. 2).
The graph definition implies that, given an initial node, , and an output sequence, , a unique node is determined by walking along the labelled edges according to . The induced mapping can be represented by , or with a timeinvariant function , where a new graph node is computed from the previous node and the channel output.
Next, the graph is embedded into the original FSC. A new directed graph, the graph, combines the graph and the channel state evolution, and is constructed as follows:

Each node in the graph is split into nodes that are represented by pairs .

An edge with a label exists if and only if there exists a pair such that , , and .
For a fixed graph and distribution , the transition probabilities on the graph are:
The notation stands for the set of input distributions that induce a unique stationary distribution on , that is, their corresponding graph is irreducible and aperiodic.
Having defined the coupled graph, the upper bound on the capacity can be presented:
Theorem 2.
[Sabag_UB_IT, Theorem ] The feedback capacity of a strongly connected unifilar FSC, where the initial state is available both to the encoder and the decoder, is bounded by
(4) 
for all graphs for which the coupled graph has a single and aperiodic closed communicating class. The joint distribution is , where is the stationary distribution of the coupled graph.
To present the lower bound, it is convenient to present the joint distribution as:
(5) 
The pairs and correspond to before and after a single transmission, respectively.
We define a property that is called BCJRinvariant input. An input distribution is said to be an aperiodic input if its
graph is aperiodic. An aperiodic input distribution is BCJRinvariant if it implies the Markov chain:
where the joint distribution is (5). A simple verification of the Markov chain is:
(6) 
which needs to hold for all and .
A graphbased encoder is constituted of a graph and a BCJRinvariant input distribution. The following theorem provides a lower bound on feedback capacity.
Theorem 3.
[Sabag_UB_IT, Theorem ] The feedback capacity of unifilar FSCs is bounded by
(7) 
for all aperiodic inputs that are BCJRinvariant.
Iii The optimization problems and coding scheme
The bounds on the feedback capacity (Theorems 2 and 3) can be represented as follows:
(8) 
This section contains the formulation of two optimization problems, each corresponding to a bound in (8), and the coding scheme. Note that the optimization problems only differ in their maximization domains. We will first provide a formulation of the upper bound as a convex optimization problem. Then, we introduce additional constraints that restrict the maximization domain to be on input distributions that are BCJRinvariant. The upper bound formulation and the extra constraints constitute the optimization problem of the lower bound in (8).
Iiia The upper bound
The optimization variables are chosen as , that is, a joint distribution on . The random variables and (correspondingly, and ) should be interpreted as the channel state (correspondingly, the state) before and after one transmission. Thus, the optimization variables need to satisfy their original relation:
(9) 
With some abuse of notation, refers to joint distribution that satisfies (9).
In the following, three sets of constraints for the optimization problem are defined:
IiiA1 Stationary distribution
The random variables are introduced to manipulate the joint distribution such that it has a stationary distribution on the coupled graph. This is done by verifying that the marginal distributions satisfy for all .
Formally, for each , the constraint function is given by
(10) 
where is an index that corresponds to graph edges.
IiiA2 Channel law
The following set of constraint functions ensures that the distribution satisfies the Markov chain, , and that the channel law is preserved. That is, for all . The corresponding constraint functions are given by
(11) 
for . Note that is a constant that is given by the channel law.
IiiA3 Pmf
The last constraint function verifies that the optimization variables form a valid pmf,
(12) 
with .
In the following we define the optimization problem for the upper bound in Theorem 2: [colframe=black,colback=white, sharp corners,colbacktitle=white,coltitle=black,boxrule=0.45pt] The optimization problem for the upper bound:
subject to  
(13) 
The following theorem shows that (IIIA3) is a convex optimization problem.
Theorem 4 (Convex optimization for UB).
For a given graph, the optimization problem in (IIIA3) is a convex optimization problem. That is, are convex functions of for .
The proof of Theorem 4 appears in Appendix LABEL:app:UB_convex. Theorem 4 is a computational result; the upper bound formulation as a convex problem makes it possible to use algorithms that converge to the global maximum, and are efficient in terms of running time. For the implementation of Theorem 4, we used CVX [cvx] with the Sedumi solver. Such simulations provide tolerances of for the objective and the constraints. This result complements the upper bound derivation in [Sabag_UB_IT], since it is now a computable singleletter expression. In Section IV, we will illustrate the utility of the KKT conditions when simplifying the mutual information into analytic expressions.
Remark 1.
The natural choice for the optimization variables is the conditional distribution . This choice turned out to be challenging when attempting to show the objective convexity. The difficulty stems from the fact that the objective depends on the stationary distribution , which is an implicit function of . Specifically, the distribution is given by the solution of , where is the transition matrix of the Markov chain and
is an identity matrix. Even for simple scenarios such as the entropy rate of a constrained Markovchain
[Marcus98], it is not clear whether the objective is convex, although it was observed numerically to behave as a convex function.IiiB The lower bound
In this section, we present the optimization problem of the lower bound. From a communication perspective, a graph restricts the structure of cooperation between the encoder and the decoder. The idea behind the forthcoming optimization problem is to find the BCJRinvariant input distribution with the highest achievable rate when the structure of the cooperation (i.e., the graph) is fixed^{1}^{1}1From an MDP perspective, the optimization problem looks for the best policy that is constrained to visit a finite number of states, subject to the graph structure. Each node corresponds to an MDP state, and any path to this node should result in the same MDP state.. The optimization problem for the lower bound is the upper bound in (IIIA3), but with additional constraints. The constraints are imposed for the BCJRinvariant property:
(14) 
for all . Since is a deterministic function of , the constraint in (14) can be viewed as the Markov chain .
Formally, for each , the constraint function of the BCJR property in (14) is:
(15) 
where is an index that enumerates all triplets and takes values in , where . One can already note that the constraints in (IIIB) are not linear and, thus, the resulting optimization problem is not convex.
In the following we define the optimization problem for the lower bound: [colframe=black,colback=white, sharp corners,colbacktitle=white,coltitle=black,boxrule=0.45pt] The optimization problem for the lower bound:
subject to  
(16) 
For the implementation of (IIIB), we used a sequential quadratic programming (SQP) algorithm that is suitable for nonconvex optimization problems. This method is implemented in MATLAB via a function called fmincon. This function starts with an initial point for the solution, and then converges, possibly to the global maximum. Since the optimization problem is not convex, the termination point depends on the initial point. Therefore, we generate some random initial points and choose the solution that achieves the highest lower bound. Practically, we observed that for most graphs, a few initial guesses are sufficient to converge to the global maximum.
The BCJRinvariant property was presented in [Sabag_UB_IT] as a simple condition for the Markov chain that in turn simplifies the capacity to the lower bound in (7). It turns out that this property is also necessary when analyzing PM schemes [Sabag_BIBO_IT] as will be shown next.
IiiC Construction of graphbased coding schemes
Each graphbased encoder, i.e., a feasible point to the optimization of the lower bound in IIIB benefits from the construction of an explicit matching coding scheme. We present the coding scheme construction with an informal statement of its achievable rate and discuss the missing (technical) details.
Throughout the scheme, a graph and an input distribution are fixed ahead of communication. The scheme is given by a simple procedure that is repeated
times. In each procedure, both the encoder and the decoder will keep track of the posterior probability (PP):
(17) 
The PP corresponds to the decoder’s belief regarding the message at time , which is also available to the encoder from the feedback. To encode, we will use matching (described below) of the PP to an input distribution , where are determined as follows:

The graph node is determined from the channel outputs .

The state is determined for each message separately. Recall that the channel state can be determined from . Therefore, for each , one can compute which corresponds to the channel state when assuming .
We now present the transmission procedure with denoting the correct message.
[colframe=black,colback=white, sharp corners,colbacktitle=white,coltitle=black,boxrule=0.45pt,]
Procedure in the coding scheme:
1. The encoder transmits
where
3. The channel output is revealed
4. The PP (of each message) is updated recursively as
(18) 
5. The graph node is updated:
6. The state (of each message) is updated:
Decoding (after times):
The following theorem concludes the achievable rate of the scheme.
Theorem 5 (Informal).
For any BCJRinvariant input, , the scheme achieves .
This theorem provides a coding scheme with low complexity that achieves . In the context of the current paper, its main contribution is that any feasible point to the optimization problem of the lower bound is accompanied by a coding scheme. Clearly, if the lower bound is tight, the scheme is capacityachieving. It is interesting to note that the recursive computation of the message PP is preserved for channels with memory, but with a different update rule (18).
The analysis of the coding scheme is omitted in this paper for the sake of brevity and due to the many technical details that are required to show Theorem 5 precisely^{2}^{2}2Specifically, there a need for dithering of the messages before the actual transmission of channel inputs. Also, we need to use a message splitting operation in order to maintain accurate behavior of the stationary distribution.. In [Sabag_BIBO_IT], we presented a rigorous proof for the binaryinput binaryoutput (BIBO) channel with input constraints where the state is . By replacing with a general state, , the analysis is identical Theorem 5 is proved.
Graph size  2  3  4  5  6 
No. Graphs ()  5  50  4866  21126  655424 
No. Graphs ()  27  2297  463548     
IiiD Choice of graphs
So far, we presented two optimization problems for a fixed graph. Here, we proceed with the development of a practical algorithm that computes the lower and upper bounds. The main challenge here is how to choose graphs that will result in tight bounds. Below, we present several approaches to choose graphs: all of them are applicable in both optimization problems.
IiiD1 Graphs pool (GP)
A valid graph is a directed graph that is aperiodic, i.e., connected and has period . A bruteforce method to find graphs is to create a pool of all valid graphs. This is a combinatorial problem whose output increases sharply as the graph size increases. However, we developed an enumeration method for all graphs, so that we only need to save a list of indices and a simple function that returns the graph. A useful observation is that different labelling of the nodes on the graph will result in the same graph structure, which gives an improvement of a factor . In Table I, the number of valid graphs is listed for .
IiiD2 Markov graphs
A valid choice of a graph is a graph for which each node represents the last output symbols. For instance, see Fig. 3, where each node represents the last output from the alphabet . For any choice of and an output alphabet , the resultant graph has nodes and edges leaving each node.
Clearly, as increases, the performance of the bounds can be improved or unchanged. For several channels, it is known that the bounds will not approach the capacity for a finite , since the optimal output distribution is a variableorder Markov process, which is a generalization of the Markov chain on the outputs that is suggested here.
IiiD3 Discussion on continuous graphs
From a general perspective, we aim to solve the optimization problem
(19) 
This means that the upper bound is minimized over all applicable graphs. This is a difficult problem since the minimization domain is discrete and, thus, should be searched through fully. A common technique in optimization is to relax a discrete domain into a continuous domain.
In our case, the relaxation is for the graphs’ domain in the minimization. Recall that, when is fixed, a graph function has the form . To relax such a domain, note that such functions are exactly the boundaries of a conditional distribution . We will now show that the upper bound is also valid on the interior of .
Lemma 1 (Upper bound with probabilistic graph).
For any , the upper bound in Theorem 2 holds.
The only difference from Theorem 2 is the transition law of the Markov chain on :
(20) 
Proof.
From the functional representation lemma [ElGamal], is a function of , where is independent of . We now define an auxiliary unifilar FSC with output , with independent of the input and the channel state. Clearly, the capacity of the new channel is the same as that of the original one, but now the graph is labelled with , as needed. ∎
The objective of the corresponding optimization problem is now:
(21) 
However, it is not difficult to show that (21) can be formalized as a concave optimization problem when is fixed. Thus, the optimal graph lies on the boundaries of , that is, the optimal graph is deterministic. This fact makes this relaxation attempt counterproductive.
Iv Examples and Analytic results
In this section, we provide explicit graphbased encoders and prove their tightness when possible. In all other cases, we compare the achievable rates with numerical upper bounds. For all the examples in this section, the variables take values from a binary alphabet, i.e., .
Comments
There are no comments yet.