1. Introduction

This is the first in a series on Deep Reinforcement Learning (DRL) inspired by recent kitchen counter side discussions with friends. A apt way to introduce the value and power of DRL is to contrast it with self-supervised pre-training in LLMs / DNNs. If we consider games such as chess, while it is possible to pre-train a model once on the rules of the game, by far and away the best way for it to become adept is by doing so empirically – in DRL this is effected by providing an environment where the model may play the game and learn optimal strategies in a phenomenological way. The environment is engineered to deliver a reward signal to the model during intermediate and/or end states of the game. To somewhat simplify the narrative – the magnitude of the reward signal may be then used to construct a loss function that is used to train the model through standard methods from non-convex optimization. The power of Deep RL has been evidenced not just in the area of games (AlpaZero, AlphaGo) and robotics, but also to enhance the reasoning capabilities of frontier LLMs for a number of tasks with automatically verifiable rewards such as math problems and coding. The power of this method is evidenced by a recent article in Nature on outcomes related DeepseekR1 with regards to DRL on Chain of Thought (Cot) fueled reasoning tasks – these outcomes are summed up by the following remark from the article – “The self-evolution of DeepSeek-R1-Zero underscores the power and beauty of RL: rather than explicitly teaching the model how to solve a problem, we simply provide it with the right incentives and it autonomously develops advanced problem-solving strategies. This serves as a reminder of the potential of RL to unlock higher levels of capabilities in LLMs, paving the way for more autonomous and adaptive models in the future” (DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning, Nature Oct. 2025).

To the refugee number theory acolyte, the confluence of the Banach Fixed Point Theorem and the theory of MDP strategies is pedestrian, such statements along with applications of the fundamental theorem of Linear Programming comprise the dual underpinnings of the (de-facto) formal basis for tabular value based methods derived from Bellman Optimality. Such value based frameworks are subsequently used as approximation targets for value and policy based networks in modern reinforcement learning. (cf. later references to DeepSeekR1 and Chain of Thought (CoT) driven reinforcement learning for generalized reasoning in LLMs). This is emphasized as it provides the underlying formal abstractions from the underlying two theorems that are never really placed in the front and center in most modern narratives on reinforcement learning.

Reinforcement Learning (RL) begins with a deceptively simple question: given that I am in some state at some point in time, what is the best strategy for maximizing present and future cumulative discounted rewards that can be attained over all possible present and future actions from the given initial state? The answer is crystallized in Richard Bellman’s principle of optimality seems states that the search over an infinite continuum of stochastic policies collapses, and there always exists a deterministic policy that is optimal – this result arises by application of the fundamental theorem of linear programming. This article unpacks how and why such a “Stochastic Collapse” (self-termed) occurs. It then describes the algorithmic structure that lets us compute the answer by working backward from the terminal time (Bellman backup and dynamic programming), and why such exact methods shatter against the scale of real-world problems—from autonomous vehicles to the reasoning chains inside large language models like DeepSeek-R1. For refugees more at home in number theory or cryptography than in control theory, the basic intuition behind behind the stochastic collapse is this: optimizing a value function over all stochastic policies is, on its face, an optimization over an uncountably infinite set of probability distributions—a simplex at every state and every time step. The key insight is that the structure of the Bellman equation forces the optimum to a vertex of each simplex, and vertices of probability simplices are precisely the deterministic (pure) actions. In a related vein, the parallel to Nash’s theorem on equilibria in non-cooperative games is not accidental; both rely on the geometry of simplices and fixed-point arguments, though the mechanisms diverge in important ways we will discuss.

2. Setup: Finite-Horizon MDPs, Policies, and Value Functions

2.1 The Finite-Horizon MDP

Definition (Finite-Horizon MDP). A finite-horizon Markov Decision Process is a tuple \langle \mathcal{S}, \mathcal{A}, P, R, H, \gamma \rangle where:

  • \mathcal{S} is a finite set of states,
  • \mathcal{A} is a finite set of actions,
  • P_t(s' \mid s, a) is the transition probability from state s to s' under action a at time step t \in \{0, 1, \ldots, H-1\},
  • R_t(s,a) is the immediate reward for taking action a in state s at time step t,
  • H \in \mathbb{N} is the horizon—the number of decision epochs, and
  • \gamma \in (0, 1] is a discount factor.

The agent interacts with the MDP for exactly H steps: at each time t = 0, 1, \ldots, H-1, it observes state s_t \in \mathcal{S}, selects action a_t \in \mathcal{A}, receives reward R_t(s_t, a_t), and transitions to s_{t+1} \sim P_t(\cdot \mid s_t, a_t). The interaction terminates after step H-1Remark (Discount factor). In the finite-horizon setting, the discount factor \gamma may be set to 1 without convergence issues, since the sum of rewards is bounded by at most H terms. When \gamma = 1, the agent treats future rewards as equally valuable as immediate ones. We retain \gamma in the formulation for generality and to clarify the connection to the infinite-horizon discounted setting discussed in Section 5. Remark (Time-varying dynamics). We allow the transition kernel P_t and reward function R_t to depend on t. This is the most general finite-horizon formulation. When P_t = P and R_t = R for all t (time-homogeneous dynamics), the model simplifies, but the optimal policy may still be non-stationary because the time remaining influences the optimal action.

2.2 Non-Stationary Stochastic Policies

Definition (Non-stationary stochastic policy). A non-stationary stochastic policy is a sequence \pi = (\pi_0, \pi_1, \ldots, \pi_{H-1}) where each \pi_t : \mathcal{S} \to \Delta(\mathcal{A}) maps states to probability distributions over actions at time t. Here \Delta(\mathcal{A}) denotes the (|\mathcal{A}|-1)-dimensional probability simplex: \displaystyle \Delta(\mathcal{A}) = \left\{ \mathbf{p} \in \mathbb{R}^{|\mathcal{A}|} \ \Big|\ p_a \geq 0 \ \forall\, a \in \mathcal{A}, \quad \sum_{a \in \mathcal{A}} p_a = 1 \right\} Definition (Non-stationary deterministic policy). A policy \pi is deterministic if, for every t and s, \pi_t(s) places all probability mass on a single action—equivalently, \pi_t(s) is a vertex of \Delta(\mathcal{A}). The space of all non-stationary stochastic policies is the Cartesian product \displaystyle \Pi_{\mathrm{all}} = \prod_{t=0}^{H-1} \prod_{s \in \mathcal{S}} \Delta(\mathcal{A}), a continuous, uncountably infinite set of dimension H \cdot |\mathcal{S}| \cdot (|\mathcal{A}|-1).

2.3 The Gridworld Example

Example. Consider a gridworld state s with four actions: Up, Down, Left, Right. At time step t, a stochastic policy might assign \displaystyle \pi_t(s) = (0,\; 0.5,\; 0,\; 0.5), meaning: never go Up or Left, and flip a fair coin to decide between Down and Right. The space of all possible policies at this single state and time step is the 3-simplex—the tetrahedron in \mathbb{R}^4 whose vertices are the four pure actions (1,0,0,0), (0,1,0,0), (0,0,1,0), and (0,0,0,1). At a different time step t', the policy at the same state s may assign an entirely different distribution—this is the non-stationary nature of finite-horizon policies. The number of remaining steps influences the optimal action: with many steps remaining, an exploratory detour may be worthwhile; with one step left, only the immediate reward matters.

2.4 The State-Value Function

Under a fixed non-stationary policy \pi, the value of state s at time t is the expected discounted cumulative reward from s over the remaining horizon: \displaystyle V_t^\pi(s) = \mathbb{E}_\pi\!\left[ \sum_{k=0}^{H-1-t} \gamma^k \, R_{t+k}(s_{t+k}, a_{t+k}) \ \Big|\ s_t = s \right] At the terminal time t = H, no further decisions are made, so the boundary condition is: \displaystyle V_H^\pi(s) = 0 \qquad \forall\, s \in \mathcal{S} The optimal value function at time t is defined by optimizing over all non-stationary stochastic policies: \displaystyle V_t^*(s) = \sup_{\pi \in \Pi_{\mathrm{all}}} V_t^\pi(s) On its face, this is an optimization over the product of simplices—a space with H \cdot |\mathcal{S}| \cdot (|\mathcal{A}|-1) continuous dimensions. We now show that this optimization collapses dramatically.

3. The Stochastic Collapse: Why Deterministic Policies Suffice

3.1 The Bellman Optimality Equation (Finite Horizon)

The key structural result is the Bellman optimality equation for the finite-horizon setting. With the boundary condition V_H^*(s) = 0 for all s, the optimal value function satisfies, for t = H-1, H-2, \ldots, 0: \displaystyle V_t^*(s) = \max_{a \in \mathcal{A}} \left[ R_t(s,a) + \gamma \sum_{s' \in \mathcal{S}} P_t(s' \mid s, a)\, V_{t+1}^*(s') \right] Notice the operator is \max over the finite action set, not an optimization over probability distributions. This is not assumed—it is derived. We now show why.

3.2 The Linearity Argument

Define the action-value function (or Q-function) at time t under optimality: \displaystyle Q_t^*(s,a) = R_t(s,a) + \gamma \sum_{s' \in \mathcal{S}} P_t(s' \mid s, a)\, V_{t+1}^*(s') Under a stochastic policy \pi_t(\cdot \mid s) \in \Delta(\mathcal{A}), the value at state s and time t is the expectation over actions: \displaystyle V_t^{\pi}(s) = \sum_{a \in \mathcal{A}} \pi_t(a \mid s)\, Q_t^*(s,a) This is a linear function of the policy weights \pi_t(\cdot \mid s) \in \Delta(\mathcal{A}). We are maximizing a linear functional over a convex polytope (the simplex). By a fundamental result in linear programming—indeed, by the same geometric argument that underlies the simplex method: Proposition (Vertex optimality on the simplex). Let f(\mathbf{p}) = \sum_{a} p_a c_a be a linear function on the simplex \Delta(\mathcal{A}) with c_a \in \mathbb{R}. Then \displaystyle \max_{\mathbf{p} \in \Delta(\mathcal{A})} f(\mathbf{p}) = \max_{a \in \mathcal{A}} \; c_a, and the maximum is attained at the vertex \mathbf{e}_{a^*} where a^* = \arg\max_{a} c_aProof. Since \mathbf{p} \in \Delta(\mathcal{A}), we have p_a \geq 0 and \sum_a p_a = 1. Let a^* = \arg\max_a c_a. Then \displaystyle \sum_a p_a c_a \leq \sum_a p_a c_{a^*} = c_{a^*} \cdot \sum_a p_a = c_{a^*}, with equality when p_{a^*} = 1. \square Applying the Proposition with c_a = Q_t^*(s,a): \displaystyle \max_{\pi_t(\cdot|s) \in \Delta(\mathcal{A})} \sum_a \pi_t(a|s)\, Q_t^*(s,a) = \max_{a \in \mathcal{A}}\; Q_t^*(s,a) This holds at every state s and every time step t, independently. Therefore the optimal policy is deterministic and non-stationary: \displaystyle \pi_t^*(s) = \arg\max_{a \in \mathcal{A}}\; Q_t^*(s,a), \qquad t = 0, 1, \ldots, H-1

3.3 The Collapse Summarized

Theorem (Deterministic optimality in finite-horizon MDPs). For any finite-horizon MDP \langle \mathcal{S}, \mathcal{A}, P, R, H, \gamma \rangle with \gamma \in (0,1], there exists a deterministic non-stationary policy \pi^* = (\pi_0^*, \pi_1^*, \ldots, \pi_{H-1}^*) that is optimal: \displaystyle V_t^{\pi^*}(s) = V_t^*(s) = \sup_{\pi \in \Pi_{\mathrm{all}}} V_t^\pi(s) \qquad \forall\, s \in \mathcal{S}, \quad t = 0, 1, \ldots, H Moreover, \pi_t^*(s) = \arg\max_{a \in \mathcal{A}} Q_t^*(s,a) for each t and s. This is the “stochastic collapse”: the optimization over \Pi_{\mathrm{all}}—an uncountable product of simplices with H \cdot |\mathcal{S}| \cdot (|\mathcal{A}|-1) continuous dimensions—reduces to H \cdot |\mathcal{S}| independent \arg\max operations, each over the finite set \mathcal{A}.

3.4 Computational Significance

Without the collapse, finding an optimal policy would require searching over the continuous manifold \Pi_{\mathrm{all}}. With the collapse, the search space reduces to |\mathcal{A}|^{H \cdot |\mathcal{S}|} deterministic non-stationary policies—a finite (though exponentially large) set. More importantly, as we show in the next section, backward induction avoids enumerating even this finite set.

4. Computing the Optimum: Backward Induction

The Bellman optimality equation, combined with the boundary condition V_H^*(s) = 0, yields a backward induction algorithm that computes V_t^* and \pi_t^* exactly. Algorithm: Backward Induction (Finite-Horizon Value Iteration)

  1. Initialize: Set V_H(s) = 0 for all s \in \mathcal{S}.
  2. For t = H-1, H-2, \ldots, 0:
    1. For each s \in \mathcal{S} and a \in \mathcal{A}, compute: \displaystyle Q_t(s,a) = R_t(s,a) + \gamma \sum_{s' \in \mathcal{S}} P_t(s' \mid s,a)\, V_{t+1}(s')
    2. For each s \in \mathcal{S}, set: \displaystyle V_t(s) = \max_{a \in \mathcal{A}}\, Q_t(s,a), \qquad \pi_t^*(s) = \arg\max_{a \in \mathcal{A}}\, Q_t(s,a)
  3. Return \{V_t^*\}_{t=0}^{H} and \{\pi_t^*\}_{t=0}^{H-1}.

Theorem (Correctness of backward induction). Backward induction computes the optimal value function V_t^* and an optimal deterministic non-stationary policy \pi^* for any finite-horizon MDP. No initialization choice is required beyond the boundary condition V_H^* = 0, which is given by the problem definition. Proof. By (backward) induction on t. The base case t = H is the boundary condition. For the inductive step, assume V_{t+1}^* has been computed correctly. Then Q_t^*(s,a) is correct, and by the Deterministic Optimality Theorem—itself a consequence of the Vertex Optimality Proposition—the \max over actions yields V_t^*(s). \square Complexity. Each time step requires computing Q_t(s,a) for all |\mathcal{S}| \cdot |\mathcal{A}| state-action pairs, and each Q_t(s,a) involves a sum over |\mathcal{S}| successor states. The total complexity is: \displaystyle O\bigl(H \cdot |\mathcal{S}|^2 \cdot |\mathcal{A}|\bigr) This is polynomial in the tabular quantities H, |\mathcal{S}|, and |\mathcal{A}|—a dramatic improvement over enumerating |\mathcal{A}|^{H \cdot |\mathcal{S}|} policies.

5. Connection to the Infinite-Horizon Discounted Setting

A parallel and widely-studied formulation is the infinite-horizon discounted MDP \langle \mathcal{S}, \mathcal{A}, P, R, \gamma \rangle with \gamma \in [0,1), time-homogeneous dynamics, and no terminal time. Here H is absent from the tuple; instead, the strict discount \gamma < 1 ensures that the infinite sum of rewards converges. The value function under a stationary policy \pi: \mathcal{S} \to \Delta(\mathcal{A}) is: \displaystyle V^\pi(s) = \mathbb{E}_\pi\!\left[ \sum_{t=0}^{\infty} \gamma^t R(s_t, a_t) \;\Big|\; s_0 = s \right] The Bellman optimality equation in this setting is: \displaystyle V^*(s) = \max_{a \in \mathcal{A}}\left[ R(s,a) + \gamma \sum_{s'} P(s'|s,a)\, V^*(s') \right] and the same linearity argument (Vertex Optimality Proposition) guarantees the existence of a deterministic stationary optimal policy.

5.1 Value Iteration and the Banach Contraction Mapping Theorem

In the infinite-horizon setting, there is no terminal time from which to work backward. Instead, one defines the Bellman optimality operator T^* acting on value functions: \displaystyle (T^* V)(s) = \max_{a \in \mathcal{A}}\left[ R(s,a) + \gamma \sum_{s'} P(s'|s,a)\, V(s') \right] and applies it iteratively: V_{k+1} = T^* V_k, starting from an arbitrary initialization V_0. Convergence is guaranteed by the following classical result. Theorem (Banach Fixed-Point Theorem). Let (X, d) be a complete metric space and T: X \to X a contraction mapping, i.e., there exists \gamma \in [0,1) such that \displaystyle d\bigl(T(x),\, T(y)\bigr) \leq \gamma\, d(x,y) \qquad \forall\, x, y \in X Then: (i) T has a unique fixed point x^* \in X; (ii) for any x_0 \in X, the sequence x_{k+1} = T(x_k) converges to x^*; (iii) the rate of convergence satisfies \displaystyle d(x_k, x^*) \leq \frac{\gamma^k}{1-\gamma}\, d(x_1, x_0) The space of bounded value functions V: \mathcal{S} \to \mathbb{R} equipped with the supremum norm \|V\|_\infty = \max_s |V(s)| forms a complete metric space (a finite-dimensional Banach space). The operator T^* is a \gamma-contraction under this norm: \displaystyle \|T^* V - T^* U\|_\infty \leq \gamma\, \|V - U\|_\infty which follows from the non-expansiveness of the \max operator and the \gamma-scaling of the future terms. By the Banach theorem: (i) V^* is the unique fixed point of T^*; (ii) value iteration converges to V^* from any initialization V_0; (iii) convergence is geometric with rate \gamma, yielding an \epsilon-approximate solution in O\!\bigl(\frac{1}{1-\gamma}\log\frac{1}{\epsilon(1-\gamma)}\bigr) iterations. Each iteration has complexity O(|\mathcal{S}|^2 |\mathcal{A}|), giving a total complexity of: \displaystyle O\!\left( \frac{|\mathcal{S}|^2\,|\mathcal{A}|}{1-\gamma}\, \log\frac{1}{\epsilon(1-\gamma)} \right) Remark (Finite vs. infinite horizon). The two settings are related but structurally distinct. In the finite-horizon case, convergence is trivial—backward induction terminates in exactly H steps from the known boundary V_H^* = 0. In the infinite-horizon case, there is no boundary condition; convergence must be proved, and the Banach theorem provides this guarantee. One can embed a finite-horizon problem into an infinite-horizon one by augmenting the state space with a time index and introducing an absorbing terminal state, but this is an encoding device rather than a natural subsumption. An alternative exact method for the infinite-horizon setting, Policy Iteration (Howard, 1960), alternates between policy evaluation (solving a |\mathcal{S}| \times |\mathcal{S}| linear system) and greedy policy improvement. It converges in finitely many iterations (at most |\mathcal{A}|^{|\mathcal{S}|}, though typically far fewer), with each iteration costing O(|\mathcal{S}|^3) for the linear solve.

6. Connection to Game-Theoretic Equilibria

The vertex-of-the-simplex phenomenon has a notable parallel in non-cooperative game theory. Nash’s Existence Theorem guarantees the existence of mixed-strategy equilibria in finite games via Kakutani’s fixed-point theorem applied to best-response correspondences over strategy simplices. In certain structured games—including two-player zero-sum games (the minimax theorem, von Neumann 1928)—the equilibrium may be achieved in pure strategies. The analogy is instructive but the mechanisms differ. In single-agent MDPs (both finite and infinite horizon), the optimality of deterministic policies follows from the linearity of the objective in the policy at each state and time step, combined with the simplex geometry. It is a consequence of LP duality, not a fixed-point existence argument. In multi-agent games, pure-strategy equilibria are not guaranteed in general. The minimax theorem for zero-sum games relies on saddle-point structure, and Nash equilibria in general games may require genuinely mixed strategies. The single-agent MDP is thus a more favorable setting: the linearity structure is stronger than what multi-agent interactions provide, and deterministic optima are guaranteed rather than merely possible.

7. The Curse of Dimensionality: Why Exact Methods Fail at Scale

Both backward induction and infinite-horizon value iteration are tabular methods: they maintain an explicit table of values for every state (and possibly every state-action pair) at every time step. Their per-iteration complexity is polynomial in |\mathcal{S}|—but |\mathcal{S}| itself is the problem. In real-world applications, the state space grows combinatorially or continuously: Autonomous driving. The state includes the ego vehicle’s position, velocity, and heading; the positions and velocities of dozens of surrounding agents; lane geometry; traffic signal states; weather conditions; and sensor uncertainty. Even a coarse discretization yields state spaces of 10^{20} or more. The horizon H can span hundreds of decision steps at 10 Hz control frequency. Storing a single floating-point value per state per time step is physically impossible, let alone iterating over them. The complexity bound O(H \cdot |\mathcal{S}|^2 \cdot |\mathcal{A}|) becomes vacuous. LLM reasoning (DeepSeek-R1 and GRPO). DeepSeek-R1 uses reinforcement learning—specifically Group Relative Policy Optimization—to train a large language model to perform multi-step reasoning. Here, a “state” at time t is the entire sequence of tokens generated so far (the partial chain of thought), and an “action” is the next token from a vocabulary of ~100,000. The state space is the set of all possible token sequences up to the context length—a number that dwarfs any astronomical quantity. The horizon H is the maximum generation length (often thousands of tokens). The value function (estimating the quality of a partial reasoning chain) cannot be tabulated; it must be approximated, which is exactly what the neural network serves as. In this setting, the non-stationary nature of the finite-horizon formulation is particularly natural: the quality of a token choice depends critically on how many reasoning steps remain.

7.1 The Approximation Paradigm

The modern response to the curse of dimensionality is function approximation: rather than storing V_t(s) or Q_t(s,a) in a table, represent them with a parameterized function—a neural network, a linear combination of features, or another compact representation: \displaystyle V_\theta(s, t) \approx V_t^*(s) This introduces both power and peril. Deep Q-Networks (DQN, Mnih et al., 2015) demonstrated that neural networks could approximate Q^* well enough to achieve superhuman Atari play directly from pixels. Proximal Policy Optimization (PPO, Schulman et al., 2017) and its descendants optimize parameterized stochastic policies directly via gradient ascent, sidestepping the need to ever represent V^* or Q^* in full. The clean convergence guarantees of backward induction (finite horizon) and the Banach theorem (infinite horizon) no longer apply in their original form—function approximation can cause divergence (the “deadly triad” of off-policy learning, function approximation, and bootstrapping). Ensuring stability in this regime remains one of the central open problems in RL theory.

8. Conclusion

The essence of Bellman optimality is the dramatic simplification hidden inside an apparently intractable optimization problem. The scaling limitations of dynamic programming based methods motivate the need for alternatives based on approximate rather than exact solutions – follow on articles in this series will introduce approximate methods starting with Deep Q Learning and then move over to a discussion of policy networks starting with the so called “Reinforce Trick” and culminating in a description of the PPO Surrogate Objective function.


Props to Pivotal for supporting work on my recent internet draft on post quantum KEX from RLWE. Access the IRTF/IETF draft through the link below:

Post Quantum Key Agreement from Learning With Errors Over Rings

Slides from a recent presentation on homomorphic encryption schemes based on approximate GCD, learning with errors (LWE) and ring LWE assumptions (link below):

Fully Homomorphic Encryption from the Ring-LWE Assumption

Slides from a recent talk on secure computation / zero knowledge proofs (click on the link below):

MultiParty Computation : Zero Knowledge Proofs : Oblivious Transfer

The following is a post on the concrete security of symmetric encryption, focused primarily on the work of Bellare, Desai, Jokipii and Rogaway [BDJR1997], which builds on prior work of Goldwasser and Micali on the notions of semantic security and ciphertext indistinguishability in a complexity theoretic setting. The [BDJR1997] work  views these concepts in the context of symmetric encryption and presents the additional notions of left-or-right and real-or-random indistinguishability followed by  what are in all but a single  case security preserving reductions between the notions in the CPA and CCA sense.

Following this [BDJR1997] provides a concrete security analysis of some of the prevalent modes of operation, of interest to us here are the counter modes. There are two such modes discussed, the CTR and XOR schemes, where the former is stateful counter mode whereas the latter is randomized counter mode in which the nonce r \leftarrow  \{0,1\}^{l} is sampled uniformly from the set of all l bit strings (where l is the block size of the underlying PRF in bits). Also since we are talking about counter modes that do not require the block cipher decryption operation, we’re strictly not required to consider pseudo-random permutations and treat the underlying cipher (AES) as a PRF, thereby relating the advantage of an adversary attacking the mode of operation to the PRF insecurity (the advantage of attacking AES as a PRF).

The theorem of interest from the paper for AES counter modes we consider is the following (Theorem 11), related to the XOR scheme (randomized counter mode):

\textbf{Adv}^{lor-cpa}_{XOR[F]}(\dot,t,q_{e},\mu_{e}) \leq 2 \cdot \textbf{Adv}^{prf}_F + \frac{\mu_{e} \cdot (q_{e} -1)} { L \cdot 2^{l}}

In the above expression, \textbf{Adv}^{lor-cpa}_{XOR[F]}(\dot,t,q_{e},\mu_{e}) is the advantage of an adversary attacking the XOR scheme in the left-or-right distinguishing game with F as PRF, t relates to the running time of the adversary, \mu_{e} relates to the total ciphertext seen by the adversary and q_{e} denotes the total number of oracle queries. Moreover, \textbf{Adv}^{prf}_F, the PRF insecurity, is the advantage of a PRF adversary attacking F (AES, in our case), \mu_{e} is the total ciphertext seen by the adversary, q_{e} denotes the total number of queries to the LOR oracle, l is the size of the PRF input in bits (128 in the case of AES) and L is the PRF output length in bits. Under the AES assumption, we take \textbf{Adv}^{prf}_F to be a negligible quantity and seek to to understand the advantage conferred to the XOR[F] adversary by the term  \mu_{e} \cdot (q_{e} -1) / (L \cdot 2^{l}). Since we expect the PRF input l to be appropriately padded to the block size, we take l to also mean the block size of the PRF in bits. While details around the proof of theorem 11 are left to the reader, at a high level, it employs a reduction from the pseudo-randomness of the PRF to the security of the mode of operation, where an adversary attacking XOR[F] with advantage greater than \textbf{Adv}^{lor-cpa}_{XOR[F]}(\dot,t,q_{e},\mu_{e}) is used to build a distinguisher that attacks the PRF F with advantage exceeding \textbf{Adv}^{prf}_F, thereby contradicting the assumption of F as a psuedo-random function (Concretely, the distinguisher D which is attacking F and is able to implement the scheme simulates the LOR-CPA oracle for the XOR[F] adversary A . If A is able to successfully determine whether it is in the ‘left’ or ‘right’ game, the distinguisher assumes that F is psuedo-random, otherwise it guesses that F is a random function). The term \mu_{e} \cdot (q_{e} -1) / (L \cdot 2^{l}) can be thought of as a bound on the likelihood of an overlap of nonces (or an overlap of counters) across the distinct oracle queries in the distinguishing game – taking n = \mu_{e}/(Lq_{e}) to be the average number of blocks per query, this term bounds the likelihood that \mid r_{i} - r_{j} \mid < n for any two nonces r_{i} and r_{j} used to answer a pair of distinct oracle queries (where we treat r_{i} and r_{j} as positive integers). This proof, which is in the complexity theoretic setting, also leverages lemma 10 which provides an upper bound on the advantage of an adversary attacking the ideal scheme with a random function R rather than the pseudo-random F. We re-state lemma 10 below:

\textbf{Adv}^{lor-cpa}_{XOR[R]}(\dot,t,q_{e},\mu_{e}) \leq  \frac{\mu_{e} \cdot (q_{e} -1)} { L \cdot 2^{l}}

Where \textbf{Adv}^{lor-cpa}_{XOR[R]}(\dot,t,q_{e},\mu_{e}) is the adversary’s advantage in attacking XOR[R].

Assuming \textbf{Adv}^{prf}_F to be negligible (\epsilon) and setting n = \mu_{e}/(Lq_{e}) as the number of blocks per oracle query, we can restate theorem 11 as:

\textbf{Adv}^{lor-cpa}_{XOR[F]}(\dot,t,q_{e},\mu_{e}) \leq 2 \cdot \epsilon + \omega

where \omega \approx (q_{e}^{2}/(2^{l}) \cdot n. Theorem 11 therefore states that so long as the underlying PRF is secure, the XOR[F] scheme is secure and the best that an adversary can do to break the scheme in the LOR-CPA sense is the so called “birthday attack” where the advantage exhibits a quadratic dependence on the number of queries.

 

An Example

 

We now use the bounds derived above to get a sense of the advantage for an AES “randomized” counter-mode. In this case we set l = 128 and assume 2^{32} blocks per query. Importantly, we define the nonce r as r \leftarrow \{0,1\}^{128}. If we wish to confine the adversary’s advantage to 1/2^{32}, what is the maximum number of oracle queries that limits the advantage at or below 1/2^{32} ? Putting the numbers into our expression for \omega (above), we see that:

(q_{e}^{2}/2^{128}) \cdot 2^{32} \leq 1/2^{32}

\Rightarrow q_{e}^{2} \leq 2^{64}

\Rightarrow q_{e} \leq 2^{32}

Therefore, in our example, to confine the LOR-CPA advantage below 1/2^{32}, it becomes necessary to rotate the encryption key prior to 2^{32} encryption queries (where each query consists of 2^{32} blocks, the reader may also refer to similar analysis around the “counter-mode theorem” here).

On the other hand, limiting the distinguishing advantage to 1/2^{64} implies q_{e} \leq 2^{16} where once again, each query consists of 2^{32} blocks (or 2^{36} bytes).

 

Encrypting Large Volumes

 

We now turn the discussion to requirements around rotating keys in scenarios where potentially large volumes of data at rest are encrypted – while other sites such as the Kubernetes’ encrypting data at rest page  attempt to provide such guidance by providing bounds on the amount of data encrypted with a fixed key, they fail to explicitly do so from the standpoint of the distinguishing arguments and privacy / forgery bounds presented in [BDJR1997], whereas this post seeks to motivate limits on key usage from this viewpoint (later on, however, we’ll consider other factors around limiting key usage and lifespan independent of the concrete analysis). The Kubernetes page, for instance, prescribes a limit of 200K (\approx 2^{18}) “writes” for a given key with AES-256 GCM and random nonces, but  omits guidance for limits on the total number of blocks ciphered with a given key, a critical factor for determining privacy bounds.

To recap, in order to confine the adversary’s LOR-CPA distinguishing advantage below 1/2^{32} where the nonce r is defined as r \leftarrow \{0,1\}^{128}, the analysis presented above puts a limit on the maximum number of blocks encrypted with a fixed key at 2^{64} which corresponds to 2.9 \times 10^{20} bytes (or 256 exbibytes). On the other hand, to confine the advantage below 1/2^{64} requires rotating the key before 2^{48} block encryptions under a fixed key (or 4 pebibytes. Also, as is typical in the complexity theoretic setting we take the PRF insecurity to be negligible).

 

Limitations and Ambiguities in the GCM Spec.

 

NIST SP 800-38D limits the message size for a single GCM message at 2^{39} -256 bits (about 64 GiB) – no explicit analysis around this number in the vein of the [BDJR1997] distinguishing bounds are provided in the NIST publication. We surmise that stated limits most likely arise from (i) the fact that this is something that would be necessitated for usage patterns that use a 96 bit IV since the counter space in this construction is limited to 2^{32}  after which the counter would wrap around and return to its initial value (also refer to later section that discusses “nonce counter-modes”) and though this is a less likely rationale (ii)  the fact that GCM is vulnerable to hash key recovery and authentication tag forgery for short tags. Moreover, it has been recognized that there are number of ambiguous statements in the GCM spec around max invocations of the authenticated encryption function, to wit the following statement that “the total number of invocations of the authenticated encryption function shall not exceed 2^{32}, including all IV lengths and all instances of the authenticated encryption function with the given key”. Since the authenticated encryption function is used for both encryption and decryption purposes, it seems unclear whether this guidance seeks to limit the total amount of data encrypted (for all IV lengths) with a fixed key, or whether it truly seeks to limit the total number of and encryption and decryption operations for a given key. The following statement is an attempt at clarifying the above guidance by Philip Rogaway who relates an interpretation of this statement by David McGrew from personal communications (2/2011) as something implying that “the total number of invocations of the authenticated encryption function shall not exceed 2^{32}, including [invocations with] all IV lengths and all instance[s] of the authenticated encryption function with the given key.”

 

Guidance on CCM Parameters

 

As an alternative to GCM, CCM authenticated encryption mode could be considered for encrypting larger volumes of data owing to larger specified privacy bonds (that we will discuss below). Unlike GCM, however, CCM does not offer the ability to compute the MAC in parallel, however, we do not view this as a concern in an offline encryption / decryption scenario. Also, CCM is not on-line in the sense that the length of  the message must be known beforehand. On the other hand CCM, like GCM, is a NIST approved mode of operation which is of particular benefit in environments that require a high level of review. Moreover, CCM does not from a specification viewpoint have an explicit \approx 64 GiB limitation per message as is the case with GCM (along with the apparent 64GiB limit on total data encrypted with a given key) – in theory, these facts should give us greater flexibility around specifying maximal or average message sizes and viewing security bounds around key rotation in terms of the overall amount of data encrypted and the average size of each message based on [BDJR1997] type arguments.

 

It is important to note that in the [BDJR1997] description of the XOR[F] scheme, the nonce, specified as r \leftarrow \{0,1\}^{l} is drawn uniformly from the set of all l bit strings where l is the size of the block, whereas the CCM spec allows for a variable nonce size ranging from 56 to 104 bits. In other parts of the literature, the XOR[F] approach is often called “randomized counter-mode” whereas the approach of segmenting the block into two substrings, one for the nonce and another for the counter is sometimes referred to as “nonce counter-mode”.

The length of the nonce is specified by the parameter ‘n’ and the spec allows for this to be 7, 8, 9, 10, 11, 12, or 13 bytes. The plaintext length in bytes, denoted ‘p’, is formatted as an octet string denoted Q in the first input block. The length of Q in bytes, is denoted q where q can take on the values 2, 3, 4, 5, 6, 7, 8 subject to the constraint n + q = 15. The value of q determines the max. size for a given message which is 2^{8q} bytes. The byte length of the nonce n in turn determines the maximum number of CCM invocations for a given key, which is given by 2^{8n}. The condition n + q = 15 therefore reflects a tradeoff between the max. CCM invocations for a given key and the max. size of the payload per invocation. If CCM is used, in this guidance, we specify the following CCM parameters –

q = 5

This parameter allows message sizes upto 2^{8q} = 2^{40} bytes or 1 tebibyte (\approx 1.1 terabytes). From the constraint n + q = 15, it follows that the size of the nonce in bytes is:

n = 10

Finally we specify the length of the tag in octets as:

t = 16

We recommend a key size of 256 bits.

Rationale

Why do we select the following parameters? The selection is primarily driven by the need for designing a method that will aid in ease of implementation while remaining secure. Though implementors will chunk inputs under the 1.1 TB limit owing to buffering limits (see later section on discussion on the OpenSSL CCM API), they are theoretically not required to do so with the specified parameters. Another goal for the specified parameters is to make the nonce space large – the rationale for this is that we cannot control how implementors will generate the nonces. The privacy analysis of CCM is based on the assumption of a nonce-respecting adversary that does not repeat nonces across invocations. In practice, however, since the nonce must be supplied by the user as input to the scheme, we are not entirely confident that users will be nonce-respecting. They may randomly sample nonces. Given this fact, we think it prudent to specify a larger nonce space to reduce the likelihood of collisions while limiting the number of distinct messages encrypted with a given key (but at the same time discourage the use of random nonces). Then, for illustration purposes, we’ll consider the overall privacy bounds for the parameters specified above by limiting the number of distinct messages encrypted under a given key to 2^{8} (albeit under the nonce-respecting assumption). If the freshness of nonces is not something that can be guaranteed by the implementation, then implementors may consider using the key just once to encrypt a single large data artifact with this method.

While we have thus far discussed limits on key usage from a concrete security standpoint, there are several reasons to rotate keys before such data limits are reached, significant among those reasons being leakage of all or parts of the key – fragmenting of data into smaller portions and generating distinct data encrypting keys per fragment limits the amount of information leaked in the event that a single key is compromised. While implementors should consider these facts and may refer to extant practice on this subject, for instance, here,  we once more emphasize the primary goal of this exercise is to talk about actual limits arising from the concrete privacy bounds.

Returning to our parameters, the max limit on the number of block cipher calls for a given key is 2^{8} \cdot 2^{40 - 4} = 2^{44}, which is well under the stated limit of 2^{61} invocations for the lifetime of a key stated in the spec. Moreover, it is typical practice to assign a so called “cryptoperiod” to limit exposure of data based on a number of risk factors, for more information, the reader to is referred to section 5 of NIST Special Publication 800-57 Part 1 Revision 4 Recommendation for Key Management Part 1. Considering the guidance above and shortening cryptoperiods should be considered as part of a broader enterprise security strategy around establishing good key hygiene – there has been significant movement on this subject with platform vendors actively seeking to reduce friction associated with key rotation through the use of automation, the reader is referred to the post The Three Rs of Enterprise Security: Rotate, Repave, and Repair for views on emerging industry practices.

Key Wrapping

As mentioned, there are compelling reasons to fragment data, encrypt, and then spread it across different physical volumes. In this case, it can be beneficial to locate the data encryption key near or along with the data being encrypted. For these scenarios, we recommend key wrapping of the data encrypting key with AES-256 GCM (refer to internal Pivotal standards for these ciphers). Ideally, the key encrypting key should be stored in an HSM. We do not currently advocate the use of key wrapping mechanisms in NIST Special Publication 800-38F “Recommendation for Block Cipher Modes of Operation: Methods for Key Wrapping”.

Appendix A: CCM Privacy Bounds

Here, we relay the concrete privacy bounds provided by Jakob Jonsson, which are stated as:

\textbf{Adv}^{\textbf{priv}}_ {CCM}(A) \leq \textbf{Adv}^{prp}_{E}(B) + l_{E}^{2}/2^{n}

where\textbf{Adv}^{\textbf{priv}}_ {CCM}(A) is the advantage of a CCM attacker A. This in turn is related to the advantage of an adversary B attacking the block cipher (AES) E as a PRP (through the switching lemma) plus the birthday term l_{E}^{2}/2^{n} exhibiting quadratic dependence on the number of queries l_{E} where n is the size of the block (128). Given that the standard mandates use of the PRP for no more than l_{E} = 2^{61} calls, the bounds yield a maximal advantage of 0.016 in addition to the PRP insecurity for a given key. Our earlier example (refer to the section entitled “Rationale”) limiting the number of distinct calls at 2^{8} with a maximal message size of 2^{40} bytes, on the other hand, yields the much smaller advantage 2^{88}/2^{128} = 2^{-40} (plus the PRP insecurity).

 

 

This post discusses the security of NIST prime curves (i) given the compromised status of Dual_EC_DRBG, (ii) and in light recent announcements from the NSA’s IAD which can be found here:

https://www.nsa.gov/ia/programs/suiteb_cryptography/

Interestingly, the agency has also stated that “With respect to IAD customers using large, unclassified PKI systems, remaining at 112 bits of security (i.e. 2048-bit RSA) may be preferable (or sometimes necessary due to budget constraints) for the near-term in anticipation of deploying quantum resistant asymmetric algorithms upon their first availability”.

The intent of this post is to try and question some theories around deliberate specification of weak curves  by selection of “weak seed” inputs to the SHA-1 function as part of the ANSI / NIST curve generation process. The reasoning presented expands on implied requirements around the number of weak curves that must exist over 256 bit prime fields in order for such theories to hold.

Weak Curves:

It is widely known that certain classes of “weak curves” exist where the standard ECDLP problem is reduced to a simpler problem. For example, attacks on curves which leverage pairings are documented in the literature. Such attacks leverage bilinear maps \langle \cdot,\cdot\rangle: \mathbb{G}_{1}\times \mathbb{G}_{2} \rightarrow \mathbb{J} from abelian groups \mathbb{G}_{1} and \mathbb{G}_{2} that evaluate to elements in some other abelian group \mathbb{J}. For the Weil and Tate pairings, we map from groups in E(\mathbb{F}_{p^{k}}) to a multiplicative group \mathbb{F}^{*}_{p^{k}} where  \mathbb{F}_{p^{k}} is some finite extension of \mathbb{F}_{p} with k > 1. Pairings, when applied to so called supersingular curves with embedding degree less than 6, reduce standard ECDLP to the sub exponential DLP in multiplicative groups. Another case of weak curves are the so called prime field anomolous curves with \sharp E(\mathbb{F}_{p}) = p, though in general we don’t expect such an equality to hold (On the other hand, the so called Hasse bound tells us that the most that \sharp E(\mathbb{F}_{p}) can differ from p + 1 is by the trace of Frobenius t where \mid t \mid \le 2 \sqrt{p})

Koblitz and Menezes (KM2015):

For the rest of this post, we’ll dissect some parts of a post Snowden defense of ECC authored by Koblitz and Menezes, two mathematicians of high repute (Koblitz in particular is the co-inventor of elliptic curve cryptography, author of the landmark text “A Course in Number Theory and Cryptography”, and noted sceptic of reductionist methods / provable security) . The ECC defense is entitled “An Riddle Wrapped in an Enigma” published here –

Click to access 1018.pdf

The line of thought we’re concerned with traces lineage of the NIST prime curves to NSA  involvement in the ANSI X9F1 committee, the exact nature of NSA involvement is not well outlined in the paper, but the implication is that NSA was willing to generate curves on behalf of the ANSI effort, and at the time, had the technology to enumerate orders of the resulting groups.

In order to ensure that known weak curves (known to the NSA, that is) were not selected, it was decided that curve co-efficients be selected by passing a seed to the SHA-1 function. The security of this method would be assured given the fact that it’s hard to invert SHA-1, i.e., in the event that NSA was aware of certain weak curves, it would be hard to find the SHA-1 pre-image that would result in the required co-efficients for the weak curve.

So this brings us to one of the primary arguments against security of the prime curves, and it’s centered on the provenance / legitimacy of the SHA-1 seeds used for curve generation. In the vein of classic conspiracy theories, the argument goes that the NSA could have known about “many” weak curves, and repeatedly picked random seeds until one of the seeds resulted in a weak curve. Koblitz and Menezes in the referenced paper refute this line of thought by arguing that in order to find a seed for a weak prime curve in 2^{48} SHA-1 trials, there would have to be approximately 2^{209} weak curves over 256 bit prime fields. The argument is as follows:

For a field \mathbb{F}_{p}, the number the number of “isomorphism classes” of curves over the field is 2p. Therefore, for a 256 bit prime there are about 2^{257} isomorphism classes of elliptic curves over \mathbb{F}_{p}.  Also, for a given curve, we would prefer \sharp E(\mathbb{F}_{p}) to be as close to a prime as possible since complexity of ECDLP is based on the largest prime subgroup of E(\mathbb{F}_{p}) – put another way, an attacker can use the group order to solve ECDLP in subgroups whose orders are co-prime and then map back to the full group through Chinese Remainder Theorem (CRT) – therefore, it’s important to know the group order, which circa 1997 was purportedly a computation only within reach of the NSA – Koblitz and Menezes (KM2015) imply this as one of the reasons behind involvement of NSA in the ANSI process.

With E(\mathbb{F}_{p}) we’re interested primarily in groups of prime order (given CRT derived weaknesses), so to pick up the analysis of KM2015 from where we left off, for p being a 256 bit number p, the prime number theorem tells us that there are around p/ln \ p = 2^{256}/ln(2^{256}) = 2^{256}/256 \ ln2 \approx 2^{248} primes. Put another way, a proportion 2^{-8} of 256 bit numbers are prime, this tells us that the proportion of strong curves over the field is about 2^{-8} of 2^{257} (since there are 2^{257} isomorphism classes of curves over a 256 bit field) which puts the number of strong curves at about 2^{249}. Now the argument goes that if NSA knew how to break a proportion 2^{-40} of the strong curves, then there would be 2^{249} / 2^{40} = 2^{209} curves that NSA could break, but that the rest of the world believes to be secure.  In this case, 2^{209}/2^{257} = 1/2^{48} is the proportion of total curves that are known to be insecure only to the NSA. So if you go with the math presented, the argument goes that NSA could have found a known weak curve after 2^{48} seed input trials to SHA-1.

KM2015 then proceed to argue that for a 256 bit prime, 2^{209} is an extremely large number of weak curves to have gone undetected by the community for 18 years. The argument goes – assume the NSA could conduct week seed searches in 2^{48} work – if we take this to be true, we can then work backward with the prime number theorem etc. to arrive at the number of required weak curves for NSA to have “plausibly” searched for seeds producing desired weak curves in 2^{48} trials.

Deep Crack: Bespoke Silicon Circa 1998

Though KM2015 go on to cite other possible reasons for dropping p-256, including adaptations of Bernstein and Lange’s so called n^{1/3}  pre-computation based attacks, our focus here was to understand the capping of possible NSA SHA-1 trials / hashing power at 2^{48}, and the derived requirement around 2^{209} weak curves.  Should such a bound on available hashing power be considered high or low given the state of 1997 technology? Alas, we must draw our own conclusions, but we present one important benchmark around the state of bespoke silicon in that era. Specifically, we cite the case of the EFF 1998 DES Cracker, colloquially called “Deep Crack”. This was a machine that brute forced the 2^{56} DES key space in a matter of days. Deep Crack was designed and built by Kocher et. al. at CRI with 1856 custom ASICS. The cost of the DES cracker is stated at $250,000 in 1998 dollars.

Conclusions:

It is not the intent of the author to promote the view that there are a large number of weak curves nor that the NSA exploited this fact to pre-determine SHA-1 seeds in order to ensure selection of weak curves and their eventual standardization through NIST. The situation with weak curves is quite different from that of Dual_EC_DRBG, since in some sense, eventual knowledge of such weak curves would confer an “equal opportunity” advantage to anyone with cryptanalytic ability, rather than bestowing an asymmetric advantage to any one party.  Rather, the point of this post is to merely trace reasoning around existence of weak curves and provide more insight around quantifying the size of classes of purported weak curves. If one allows NSA to have about 2^{80} hashing power in 1997 rather than 2^{48}, by the reasoning outlined above, this would imply the existence of about 2^{257}/2^{80} = 2^{177} weak curves which is a very large number and seems unlikely. In concluding, we echo the remarks of KM2015:

“Although there is no plausible reason to mistrust the NIST curves, there are two reasons why it is nevertheless preferable to use other curves .. The first reason is public perception — even though it is unlikely that the NIST curves are unsafe, there has been enough speculation to the contrary that to keep everybody happy one might as well avoid the NIST curves. The second reason is that the other curves do have some advantages … It is no discredit to the NIST curves that more efficient alternatives are now available — after all, it has been 18 years, and it would be surprising if no one had come up with anything better by now”

Recommended Cryptographic Primitives and TLS Suites

The following is a list of cryptographic primitives and TLS cipher suites. They should be adhered to in order to provide a minimal or “baseline” level of security for software and firmware components.  Most of the algorithms cited here are described in standards published by the US National Institute of Standards and Technology (NIST). The list also outlines algorithms and key length requirements that are likely to be encountered for DoD applications, since the requirements for these types of applications are often more stringent than what is permitted by the NIST. 

 

Cryptographically Secure Pseudo Random Number Generation (CSPRNG)

For generation of IVs, nonces, challenges and salts (These terms are defined in the course of this document). 

1)Hash_DRBG based on SHA256

2)Hash_DRBG based on SHA384 (minimal requirement for DoD applications)

This functionality will normally be provided by your programming language or API (eg. java.security.SecureRandom or rand()  rand_bytes() defined in the header OpenSSL header rand.h) 

In some cases, you may have to use the above random number generator directly for symmetric (AES) key generation – this is generally OK for AES key generation especially when no other method for generating the AES key is available. Other programing languages will provide you with a direct interface to generate symmetric keys. It is generally not appropriate to directly use these generators for creating RSA, ECC, or Diffie Hellman keys. Modern programming language interfaces should provide you with functions for generating this latter class of key  (eg. RSA_generate_key() defined in the OpenSSL header rsa.h). The above generators are typically seeded with input from a true entropy source, which in the case of Linux is /dev/random (NIST standards also allow one to augment the input from the true entropy source with a “nonce” and a “personalization string”, but the programming interface should typically shield you from these details). Though the periodicity of the above CSPRNGs is extremely high, in some rare cases, it may be relevant to re-seed the CSPRNG from the physical source at some point in time after its instantiation.

(Advanced Note:  The draft standard NIST SP 800-90C outlines methods for constructing pseudo-random number generators, also referred to as deterministic random number generators (or “DRBGs”, such as the one listed above), by specifying how to combine entropy sources with a DRBG. We are not currently considering this standard, since at the time of writing, it is still in draft form) 

 

True Random Number Generation (TRNG)

True random number generation (also called non-deterministic random number generation) generally refers to the construction of entropy sources. At the time of writing, we have no recommendations for this other than use of /dev/random, but are tracking development of the draft NIST SP 800-90B standard that specifies multiple TRNG mechanisms. 

 

10.3 Symmetric Encryption

  1. AES-128-GCM_16

 

  1. AES-256-GCM_16 (minimal requirement for DoD applications)

 

AES GCM encrypts an input stream and then computes an authentication tag over the ciphertext. The tag length should be 16 bytes. Every distinct message that is encrypted requires that a distinct random string of bits, called an IV is provided as input to the AES ciphering function in addition to the key and the plaintext message. The IV input should be 12 bytes. The AES key should be rotated after a maximum of $2^32$ encryptions. It is critically important to the security of GCM that the IVs used to encrypt two distinct messages under GCM are in turn distinct. There are two methods to construct IVs for GCM, the first being the so called deterministic method the second, the so called “RBG” method. The RBG method should be used with the secure random number generator listed above. Some programming languages and interfaces may require that you directly use a random number generator to generate the AES key (eg. rand_bytes() in the OpenSSL header rand.h). Other programming languages provide you with an interface to do so (eg. the java.security KeyGenerator and java.security.SecretKey classes)

 

Digital Signature Algorithms 

  1. RSASSA-PKCS1-v2.1 

 

Applications can use either of the above RSA based signing algorithms with modulus size 3072 bits.  The RSA public exponent should be 65537. There is an alternative (provable secure) RSA based signature scheme which utilizes probabilistic padding (PSS). We don’t currently include this scheme owing to a greater number and variation in parameters required for this signature scheme, but it may be included for use in the future. 

 

Advanced note on the generation of RSA keys – it is expected that a programming interface is available for the generation of RSA keys (for eg. the function RSA_generate_key() in the OpenSSL header rsa.h.) In some cases, the key generation interface may presuppose instantiation of an appropriately seeded random number generator (refer to discussion on CSPRNGs above). It is generally expected that prime number generation for the programming interface uses an approved method for testing primes, such as the methods specified in the FIPS 186-4 appendices)

 

Hash Functions

  1. SHA-256

 

  1. SHA-384 (minimal requirement for DoD applications)

 

Keyed Message Authentication Codes

In a loose sense, the RSA based digital signature scheme above can be thought of as a way to authenticate and verify the integrity of a message received from the holder of a RSA private key. In a similar vein, a keyed message authentication code (MAC) can be thought of as a way to authenticate and verify the integrity of a message received by the holder of a secret key, though in this case, the sender and the receiver must possess the same secret key. The following keyed MACs should be used

 

  1. HMAC (with SHA256)

 

  1. HMAC (with SHA384 minimal requirement for DoD applications)

 

  1. GMAC

 

HMAC is a hash function based key message authentication code. It should be used with a key size of 256 bits or 384 bits depending on the size of the hash function output. 

 

GMAC is a keyed message authentication function based on the GCM GHASH function. GMAC should be minimally be used with a key size of 128 bits. In addition, the IV size should be 96 bits and an output tag length should be 128 bits. For generation of keys and IVs related to GMAC, refer to the above discussion on AES-128-GCM_16

 

Elliptic Curves

The primary reason for providing guidance on elliptic curves is owing to their relevance to key exchange mechanisms within recommended TLS cipher suites described later. The following NIST “prime” curve should be used:

 

  1. NIST P-384

 

(Note: At the time of writing, this is the only curve currently allowed in DoD environments. The higher strength P-521 curve is generally disallowed) 

 

Diffie Hellman Groups

 Once more, the primary reason for providing guidance on Diffie Hellman (DH) groups is owing to their relevance to key exchange mechanisms within recommended TLS cipher suites described later. It is recommended that only DH groups with a prime modulus of 3072 bits or higher from RFC 3526 (i.e. “MODP group” ids 15 – 18) be used. 

 

Asymmetric Encryption

  1. RSA-OAEP with 2048 bit modulus 

 

RSA-OAEP requires a hash and mask generation function, it should be used with SHA-256 and mask generation function MGF1 

 

(Note: Analysis on specification of OAEP with SHA-384 for DoD applications is pending) 

 

Password Hashing 

 

  1. Bcrypt with 128 bit salt and a minimal load factor of 10 of regular users and 12 for privileged users.

 

Password Based Key Derivation  

  1. pbkdf2 – analysis pending

TLS Cipher Suites

The following cipher suites are recommended in order to secure internal PCF communications. The reader will notice that the following cipher suites only utilize the algorithms specified in the preceding sections. We do not currently consider it feasible to constrain cipher suites for external clients. 

TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 

TLS_DHE_RSA_WITH_AES_256_GCM_SHA384 (DoD compliant algorithms)

 

TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (DoD compliant algorithms)

 

EC suites should only use the NIST P-384 curve (as per section above)

DH suites should only use the DH groups specified in the section above. 

RSA signatures should conform to the requirements stated above

 

(Note: the following facts are currently unknown (1) is P-384 supported with AES-128 suites, the standards are not clear on this, it may be implementation dependent (2) are DH groups 15 – 18 supported with AES-128 suites on all programming stacks? Curve P-384 and (minimally), the DH group id 15 should be supported with the AES-256 suites)  

Fully homomorphic encryption has evolved from Craig Gentry’s original work based on worst case problems in ideal lattices and problems based on sparse subset sums, to newer schemes such as the BGV Ring Learning With Errors (LWE) scheme (based on lattice SVP) – This is a scheme where the plaintext space is taken to consist of integer  subrings \mathcal{O}_{\mathbb{Q}(\zeta_m)} = \mathbb{A}_p  : = \mathbb{Z}[X]/\langle \Phi_m(X),p \rangle \cong \mathbb{Z}_p(\zeta_m) of  cyclotomic number fields \mathbb{Q}(\zeta_m)/\mathbb{Q}. Of the lattice based schemes, other than the Ring LWE based BGV scheme, (named after Brakerski, Gentry and Vaikuntanathan) there is also the “LATV” scheme of Lopez-Alt, Tromer and Vaikuntanathan, based on the NTRU problem, and Brakerski’s “Scale Invariant” scheme. Notably, the BGV scheme introduces the novel modulus switching technique that, while ensuring the correctness of the scheme, allows the evaluator to scale down the magnitude of the noise in the ciphertexts without knowing the secret key. But here, we pause and take note of the original noise management technique in Gentry’s original work, which is commonly known as the bootstrapping theorem – it states that efficient fully homomorphic encryption can be constructed from a somewhat homomorphic scheme that “compactly” evaluates its decryption circuit.

To expand on this notion, consider a ‘somewhat’ homomorphic scheme \varepsilon with plaintext space P = {0,1}, a ciphertext \Psi_1 that encrypts a message \pi under a public key pk_1, and a secret key sk_1 associated with pk_1. The key sk_1 is then encrypted under a second public key pk_2. Denote the encryption of the jth bit of sk_1 under pk_2 as \bar{sk_1}_j.  Define a ‘re-encryption’ procedure for the ciphertext \Psi_1 as the following –

Recrypt_\varepsilon(pk_2,D_\varepsilon, \langle \bar{sk_1}_j \rangle, \Psi_1) :

1) \bar{\Psi_1}_j \gets Encrypt_\varepsilon(pk_2, \Psi_1j)

2) \Psi_2 \gets Evaluate_\varepsilon(pk_2, D_\varepsilon, \langle \langle \bar{sk_1}_j \rangle, \langle \bar{\Psi_1}_j \rangle \rangle)

Where D_\varepsilon denotes the scheme’s decryption circuit. The algorithm takes the bits of the ciphered message \Psi_1 (which is an encryption of the message \pi under pk_1), and encrypts them under the ‘outer’ encryption key pk_2 to yield the ciphertext bits \bar{\Psi_1}_j.  The decryption circuit is then homomorphically evaluated with the ciphered decryption secret \bar{sk_1} (the secret key sk_1 encrypted under pk_2) and the ‘doubly’ encrypted message ciphertext \bar{\Psi_1}. This latter step removes the ‘inner’ encryption of the message \pi under the key pk_1, thereby yielding the ciphertext \Psi_2 which is just the encryption of \pi under pk_2. The bootstrapping theorem states that such an algorithm can be used as an effective noise reduction mechanism to create a fully homomorphic encryption scheme from a somewhat homomorphic one so long as the decryption circuit is “compact” and does not increase noise components to a level that results in ciphertext “wraparound” in the underlying ring. Another way to say this is that the scheme cannot work if the magnitude of ciphertext noise results in reductions modulo q during decryption, where q is a parameter that characterizes the ciphertext space. Interestingly, the decryption circuit associated with Gentry’s original somewhat homomorphic scheme was not sufficiently compact for bootstrapping fully homomorphic encryption. The curcuit was therefore “squashed” along with an approach where the party encrypting data  initiates the homomorphic decryption process.

In the context of privacy preserving outsourced computation, the Recrypt procedure described would typically be evaluated by an untrusted counterparty. Therefore, it cannot compromise the security of secret keys (This is the case since \bar{sk_1} used by the decryption circuit is the encryption of a secret key rather than the secret key itself) [Todo: discuss circular security and security of key dependent messages].

Lets take a step back and consider some essential aspects of the problem and of Gentry’s original scheme  – Fully homomorphic encryption (FHE) schemes allow for the computation of arbitrary functions on encrypted data viz. you can do anything on the data that can be efficiently expressed as a circuit. FHE schemes have eluded cryptographers for at least 3 decades. In 2009 Craig Gentry published an initial scheme based on the ideal coset problem instantiated over ideal lattices, that is lattices corresponding to ideals in polynomial rings of form \mathbb{A} = \mathbb{Z}[X] / \Phi_m(X) , consisting of residues of integer polynomials modulo the m'th cyclotomic polynomial $latex \Phi_m(X) $ of degree m.

Specifically, in his initial somewhat homomorphic construction, Gentry employs computational assumptions around the ideal coset problem (ICP) – when instantiated over ideal lattices. this is equivalent to the so called the bounded distance decoding problem (BDDP). The second computational assumption used involves sparse subset sums – this arises out of a requirement to  ‘squash’ the decryption circuit in order to address its essential non-compactness .  At the time, lattices were chosen to further minimize complexity of the decryption circuit and facilitate the bootstrap while maximizing the scheme’s evaluative capacity – decryption in lattice based schemes is typically dominated by matrix vector multiplication (NC1 complexity). Contrast this to RSA, for instance, that requires evaluating modular exponents.

To recap, a central aspect of Gentry’s result is precisely bootstrapping, which brings us to US patent 2011/0110525 A1 granted May 12 2011 “Fully homorphic encryption method based on a bootstrappable encryption scheme, computer program and apparatus” (Asignee IBM).  Within the detailed description of the invention, the following is stated – “It is shown that if the decryption circuit of a somewhat homomorphic encryption scheme is shallow enough, in particular, if it is shallow enough to be evaluated homomorphically by the somewhat homomorphic scheme itself (a self-referential property), then this somewhat homomorphic scheme becomes “bootstrappable”, and can be used to construct a fully homomorphic scheme that can evaluate circuits of arbitrary depth.” The description within the remaining text of the patent is essentially a distillation of Gentry’s 2009 thesis. So on the face of it, it does look like bootstrapping is covered by the patent.

There is the lingering question of the applicability of this (or other) patents to the “leveled”  scheme of Brakerski, Gentry, Vaikuntanathan (BGV) published in 2011. This is a fully homomorphic scheme based on the learning with errors (LWE) or Ring-LWE (RLWE) problem/s. Rather than bootstrapping as a noise management technique, BGV starts with somewhat homomorphic encryption and uses tensor products along with a process called re-linearization and modulus switching to maintain the relative magnitude of ciphertext noise during homomorphic evaluation, but does propose bootstrapping as an optimization for evaluating deeper circuits that otherwise in the purely leveled scheme may require very large parameter choices for “aggregate” plaintext / ciphertext rings.

Concerning references and the attribution of results – the work discussed spans multiple groups and individuals including Craig Gentry, Vinod Vaikuntanathan, Shai Halevi, Zvika Brakerski, Nigel Smart, F. Vercauteren, Adriana Lopez-Alt and Eran Tromer to name a few.

This post picks up at the point of considering BGV (Brakerski, Gentry, Vaikuntanathan) encryption variants based on Ring-LWE. Thought it may seem natural to start by considering the BGV enciphering function, I will avoid discussing this since is actually quite simple in principle and essentially consists of polynomial mixing and sampling from an error distribution. Also, it is described well in a post on Windows on Theory by Boaz Barak and Zvika Brakerski that can be found here:

Building the Swiss Army Knife

Instead, I’d like to delve into the plaintext algebra of the “Ring-LWE” BGV variant in order to understand the encoding / representation of plaintext bits in the scheme . In this post, we consider the idea of batched homomorphic operations and so called ‘plaintext slots’, along with some of the basic algebra that supports the specification of plaintext slots.

In particular, we consider the Ring-LWE BGV variant defined over rings of form:

\mathbb{A} = \mathbb{Z}[X] / \Phi_m(X) where  m  is a parameter and  \Phi_m(X) is the m' th cyclotomic polynomial whose roots are the primitive m 'th roots of unity.  The native plaintext space for this variant is taken to be the ring:

\mathbb{A}_2 = \mathbb{A}/2\mathbb{A}

Note that the cyclotomic polynomial factors into  irreducible factors mod 2 (which follows partially from the fact that the cyclotomic extension \mathbb{Q}(\zeta_m)/\mathbb{Q} is Galois) :

\Phi_m(X) = F_1(X)\cdot F_2(X) \cdot \cdot \cdot F_r(X) mod 2

Lets cover some background around Chinese Remaindering and homomorphisms of rings – Consider F(X) \in \mathbb{F}_2[X], a monic polynomial of degree that splits into distinct irreducible factors of degree d = N/r, viz. F(X)=\prod_{i = 1}^{r}F_i(X) . (In the case of F(X) = \Phi_m(X) mod 2 we have d \times r = \phi(m)). Let A denote the ring

A = \mathbb{F}_2[X]/(F(X))

which is the quotient of \mathbb{F}_2[X] modulo the ideal generated by F(X)

By the Chinese remainder theorem, there is the isomorphism:

A \cong \Bbb{F}_2[X]/F_1(X) \otimes \Bbb{F}_2[X]/F_2(X) \otimes\dots\otimes \Bbb{F}_2[X]/F_r(X)

\cong \Bbb{F}_{2^d} \otimes \dots \otimes \Bbb{F}_{2^d}

So that A is isomorphic to r copies of \Bbb{F}_{2^d} (fields of d bit strings).

This captures central idea around the of  batching homomorphic operations -We start by encoding individual bits in the r distinct plaintext fields \Bbb{F}_{2^d}. More precisely, we are not just confined to representing bits, but any element in the ring. For example, if 8 divides d, then we can represent a byte since in this case there is an embedding of \mathbb{F}_{2^8} in \mathbb{F}_{2^d}. Then, rather than evaluating functions on the individual bits or ring elements, we use the isomorphism above and work in the ring A. Evaluating a function once in A implicitly evaluates the function in parallel over the r distinct plaintext spaces \Bbb{F}_{2^d}. The ring \mathbb{A}_2 is typically referred to as the “aggregate” plaintext space in BGV, and the sub-rings \mathbb{Z}_2[X]/F_i(X) are referred to as the “embedded” plaintext spaces or “slots”.

 

‘Entropy’, in the context of the Linux Random Number Generator (‘LRNG’), is an accretional quantity arising the uncertainty of hardware generated interrupt time stamps. If you’re already familiar with the mechanism, bear with me as I provide some intuition – consider timestamp parity and think of the kernel appending 0 to a stored register value if the millisec counter on an interrupt timestamp is even parity, or 1 if it’s odd, and then generating a string of random bits this way after witnessing many interrupts. The string of resulting bits can then be read out of the register for applications that require randomness, at which point the value is reset to zero. If we admit that even parity on timestamps is as likely as odd, it’s clear that the uncertainty (i.e., the entropy) in bitstrings from this construction is a function of the length of the strings : The greater the number of observed interrupts -> the longer the string -> the higher the entropy. Here is a simple illustration – assume that I create a 3 bit random string by observing the parity of three successive interrupt time values (measured in milliseconds since system boot). If these time values are 5000090, 5000301  and 5001403, I create the 3 bit random string by taking each of the residues of these values modulo 2, i.e I take the values $5000090 mod 2 = 0$, $latex 5000301 \mod 2 = 1$,  $latex 5001403 mod 2 = 1$ to generate the string $latex 011$ which I then store in a secret register. Without any other information, it’s reasonable to say that your chances of guessing this string are 1 in 8 (i.e. 1 in 2^{3}). In the parlance of Information Theory, we say this string has 3 ‘bits’ of ‘entropy’. In a similar vein, the entropy of a four bit string generated as such would be 4 ‘bits’ owing to greater uncertainty in it’s value.

The LRNG does not actually generate random bits this way (that is, by concatenating interrupt millisec time values modulo 2). Roughly, the actual method is based on uncertainty in interrupt times represented as ‘jiffies’, 32 bit words representing time of an interrupt in millisecs since system boot, along with a ‘high resolution’ time value obtained from the CPU. These values are ‘mixed’ into the entropy ‘pools’ from where an extraction function creates sequences of pseudo-random bits on demand. This is concretely described in Gutterman, Pinkas et. al’s description of breaks to backtracking resistance of the LRNG [GPR 2006]. The goal of this piece, however, is not to address weaknesses like the analytic breaks discussed above, it’s to try and address looming confusion on a couple of fronts including (i) general suitability of the LRNG pools, including the non blocking ‘urandom’ pool, for ciphering (ii) issues with the quality of randomness arising from virtualization, including arbitration and biasing of otherwise random processes by a hypervisor hosting multiple guest systems on shared hardware (iii) the conflation of virtualization and biasing effects mentioned above, with the effects of app containerization on a single VM.

Rather than expound on all the above, let’s address the suitability of the LRNG for ciphering, after which, perhaps we outline a discussion on real or perceived biasing of timestamp values due to a hypervisor interfering with these values. We will also touch upon the impact and potential for entropy loss from interrupt event correlation across multiple VMs sharing hardware. I don’t think we need to address containers, there’s a good post describing those effects here.  

First, let’s talk about the LRNG’s entropy estimation ‘heuristic’, this becomes important when we consider the problem in light of NIST requirements for seeding pseudo random number generators (PRNGs) – The LRNG consists of three pools, the primary, secondary and urandom pools. Concretely, the term ‘pool’ refers to m-bit arrays sized at 4096, 1024 and 1024 bits respectively. Entropy is ‘mixed’ into the primary and sometimes to the secondary pool. Based on this, an entropy count for each of the pools is incremented. It should be clear that the maximal entropy possible for the primary pool is 4096 bits. To expand on intuition provided earlier, If we think of each bit in a 4096 bit string as being equally likely (i.e., each bit is uniformly generated), then each of the 2^4096 variants of this string is equally likely giving us a maximal uncertainty, or ‘entropy’, of 4096 bits for this string. When we say that a string has 4096 ‘bits’ of entropy, we mean to say there is a hidden string that can take on 2^4096 distinct values and your chance of guessing the actual value is 1 in 2^4096 . Importantly, quantifying entropy requires that (i) we can estimate a distribution on the string (as we did above by assuming each bit in the string is uniformly generated), (ii) we choose an estimate of entropy such as so called Shannon entropy, or alternatively, a more conservative estimate favored by NIST, the ‘min’ entropy.

The next thing to consider is how do we estimate or model a distribution on the events used to generate random bits in this string. In the case of the toy example above where we consider parity of interrupt times, a natural model may be to use the uniformity assumption, i.e., it’s just as likely that a timestamp has even or odd parity. But the LRNG doesn’t consider uncertainty in the parity, rather the random variable considered is a 32 bit word representing time of an interrupt in millisecs since system boot along with a time value provided by the ‘high resolution’ CPU timer, instances of which are mixed into the entropy pools along with auxiliary data. What is a reasonable distribution for this variable (actually, a more important question is what is the distribution of the **low order bits** of this variable), and is it something that exhibits short and long range uncertainty? Hard to say, it would depend on a number of factors including the type of device and it’s  usage pattern – it’s a little deceptive to think of the ‘exact’ entropy of a random variable of this nature – this is something that can’t truly be known owing to its inherent uncertainty. Since we lack a priori knowledge of its distribution, it’s something that must be estimated by collecting a sufficiently large sample of data and fitting a distribution or using related min entropy estimation methods (‘binning’, collision estimates etc.). Based on this, the GPR2006  authors estimate about 1.03 bits of entropy per interrupt event based on a collection of 140K events (though the exact entropy estimate used is unclear).  Later, we will look at other studies that sample block device I/O events for the purpose of estimating the entropy of these events in a VM running on a variety of hypervisors.

From a concrete standpoint, the LRNG does not use min entropy estimation, rather, the estimation used is a heuristic based on taking the logarithm of inter-arrival times of interrupts along with their ‘finite derivatives’. The details are not important, the point is that the heuristic potentially runs afoul of NIST recommendations which require min entropy estimates of PRNG seeds. On the bright side, GPR2006 and other studies allude to the fact that the RNG’s estimation heuristic is likely quite conservative. Other than entropy estimation, a couple of other things to consider are (i) periodicity, structural and algorithmic aspects of the LRNG state and entropy extraction functions and comparison to other NIST functions for generating pseudo random sequences (ii) initial seeding and use of the generators, especially during critical periods following system boot / reset, along with NIST seeding requirements vis a vis the ‘instantiation strength’ (Section 12.1) of generators.

First, lets consider the structural and algorithmic aspects of the LRNG. Though the LRNG is known to provide good random data for cryptographic applications, a head to head comparison with NIST PRNGs (section 12.1) is not truly possible, and it’s difficult to impute suitability of the LRNG for ciphering functions by looking at comparable guidance for the NIST PRNGs. For the non blocking urandom pool in particular, we can say that (i) it has no direct interaction with noise sources (ii) has a ‘mixing function that incorporates a linear feedback shift register (LFSR) generating elements in field extensions (we describe this later) (iii) uses SHA-1 for output extraction from its internal pool. This makes it structurally somewhat different from the NIST SP 800 90A generators as well as some of the other Linux generators such as the ‘jitterPRNG’ which is based on variation in execution times for a fixed set of CPU instructions.

We gain a  coarse intuition for thinking of the periods of deterministic (pseudo random) generators by considering their outputs as elements in large finite fields. Specifically, consider a field of order 2^{1024} - 1  we can think of 1024 bit strings as elements in this field. To do this, however, we need to fix a ‘representation’. Concretely, we do this by thinking of the field as an extension formed by the quotient of the ring of all univariate binary polynomials over GF(2) (i.e. with coefficients in {0,1}) by the ideal of a “primitive” binary polynomial of degree 1024 (which we can call f(x) ), that is, \mathbb{Z}[X] /  f(x). The ‘raw’ output of the generator is then simply an element in this field (of which there are 2^{1024} -1 unique non zero elements). Now, we need a way to generate elements, and since we’ve fixed a representation with the primitive polynomial f(x), we can use the obvious cyclic generator; powers of x \mod f(x). From an implementation standpoint, if z  is a root of f(x) in the extension, we can fix a representation as binary polynomials in z of maximal degree 1023 (that is, “adjoining” powers of z to the base field). If we refer to elements in this representation as y_i, then starting with an arbitrary element, the following recurrence y_{j} = z \times y_{j - 1} \mod f(z) generates all unique 2^{1024} - 1 non zero elements in the extension. The following example illustrates the process with f(z) = z^{4} + z + 1, a degree 4 primitive trinomial evaluated at z. Field elements are represented as 4 bit strings. In this representation, the polynomial 1 is represented as 0001, z as 0010, z^{2} as 0100, z^{3} as 1000, z+1 as 0011 etc. We use the (above) recurrence y_{j} = z \times y_{j - 1} \mod f(z), for f(z) = z^{4} + z + 1 to generate 5 successive elements starting with y_{1} = 0001  :

y_2 = z \times y_1 \mod (z^{4} + z +1) = z  \mod (z^{4} + z +1) = 0010

y_3 = z \times y_2 \mod (z^{4} + z +1) = z \times z  \mod (z^{4} + z +1) = z^2 \mod (z^4 + z + 1) = 0100

y_4 = z \times  y_3 \mod (z^{4} + z +1) = z \times  z^2  \mod (z^{4} + z +1) = z^3 \mod (z^{4} + z + 1) = 1000

y_5 = z \times y_4 \mod (z^{4} + z +1) = z \times z^3  \mod (z^{4} + z +1) = z^{4} \mod (z^{4 }+ z + 1) = z +1 = 0011

y_6 = z \times y_5 \mod (z^{4} + z +1) = z \times ( z+1)  \mod (z^{4} + z +1) = z^{2} + z \mod (z^{4} + z + 1) = 0110

Clearly the period for this method is 2^{n} - 1 for unique non zero n bit sequences (where n = 4 above), but there is a question around sub – periods within the larger sequence, for example, sub – periods within parts of the sequence that represent low order coefficients in the y_i. Rather than delve into this, I’ll state that directly taking such a sequence as output is not secure. Rather, the sequence should be processed through an extraction function that typically hashes and ‘folds’ all of the bits. So the ‘Galois’ RNG just described is capable of generating a long series of n bit sequences, but the reader may ask why not just monotonically increment a n-bit counter – the obvious answer being that this method is totally predictable. But this observation leads us to a derived construction that exploits so called ‘Tausworthe’ sequences of zeros and ones. These are sequences that are obtained by evaluating successive elements in an extension field where evaluation is done in GF(2). So in the example above, we obtain the 5 bit Tausworthe sequence 1 1 1 0 0 by summing all the ‘1’s modulo 2 in each of the y_{1}, …, y_{6} . As it turns out, sequences generated in this way have desirable statistical properties and a long period. An earlier generator based on such sequences was described in 1973 by Lewis and Payne with period typically equal to the degree of a primitive ‘trinomial’ used to define its underlying field, and a variant of this described by Matsumoto and Kurita (and used by the LRNG) the ‘Twisted GFSR’ generator, extends this period and has better statistical  uniformity of output – which brings us to the entropy input or the ‘seed’ for such a generator. A way to think about the seed of a generator like this is as an ‘index’ into the sequence that determines the initial position for generating the next sequence such that the uncertainty around the selection of this initial position is 2^n where the bit entropy of the seed is given by n. Since the output of the generator has good randomization properties (such as uniformity of output), the method has the benefit getting rid of any biases that exist in the seed. The LRNG utilizes a variant of the Twisted GFSR generator mentioned above, and roughly speaking, mixes entropy into its pools by continuously re-seeding the generator and mixing its output into the pools (this fact modifies its period in an indeterminate way). Another way to think of it is that the LRNG uses this mechanism for debiasing raw entropy input coming from interrupt events prior to mixing these events into the pool.

Finally we consider the effects of virtualization on the collection and entropy along, with the initial seeding of the generator. On the subject of seeding, we cite guidance from Stephan Muller in a proposal for a new generator motivated by a study on the effects of virtualization on entropy collection. In this proposal, Muller states that –    “[the] DRBG is seeded with full security strength of 256 bits during the first steps of the initramfs time after about 1.3 seconds after boot. That measurement was taken within a virtual machine with very few devices attached where the legacy /dev/random implementation initializes the nonblocking_pool after 30 seconds or more since boot with 128 bits of entropy”.

It should be explicitly said that the term ‘DRBG’ above refers not to the existing LRNG, but to the entropy collection mechanism for a proposed kernel generator based on well studied NIST SP 800 90A algorithms (but excluding the NIST ‘Dual-EC DRBG’ algorithm). Whereas the proposed generator collects entropy quite rapidly, based on this study, rate of collection is much lower for the LRNG which, strictly speaking, does not have sufficient entropy for applications requiring more than 128 bits of entropy (eg. AES 256 bit key generation, secret key generation on the p384 elliptic curve etc. ) at 30 seconds past boot. This situation can be alleviated in some cases by configuring systems to persist a seed at system shutdown that is re-read during boot (and this requires that the seed is kept secret), but persisting a seed is obviously not an option with strategies for continuous ‘repaving’ of systems.

When thinking about virtualization and the accumulation of entropy, we note that hypervisors typically broker access to hardware, and in some cases, abstract underlying hardware to VMs. Brokering of access is typically done either by serializing access requests from multiple guests to a hardware resource, or in some cases, by exclusively assigning access to a single VM. Exclusive or serialized access to hardware is desirable from a cryptographic standpoint since it implies that two guests on the same hardware cannot arrive at a shared view of the state and events generated by this hardware. This is obviously a good thing since from the definition of entropy, any sharing and reuse of timing information among more than one party has the effect of negating all the uncertainty in this information. This BSI study (also authored by Muller) hypothetically poses scenarios where errors in estimation of entropy through the estimation heuristic may occur. The thinking here is that this can arise due to delays added by a hypervisor’s handling of hardware timing events. Shortly, I’ll cite some results from this study around actual entropy measurements obtained by sampling numerous block device events on a variety of hypervisors.  

In theory, a hypervisor can emulate most devices, such as a block I/O device with spinning platters (so called ‘rotational’ devices), by converting a VM’s request to such a device to another type of request (eg. a network call) that bears no similarity with the original hardware. It’s important to note this – the kernel does not, for example, consider events generated from flash storage for entropy collection owing to their higher predictability  – whereas this is a safeguard that could be obviated by a hypervisor that provides block device emulation of flash storage or a ramdisk. In other cases, the hypervisor may return data corresponding to a block I/O request by retrieving it from a cache rather than accessing the device, a fact that may materially impact uncertainty of related timing events.  

The BSI study referenced above provides concrete numbers around the statistical properties of entropy sources in a virtual environment. A quick recap of some of the results for block I/O devices are summarized below:

Fig. 11 in the reference shows 0.43 bits of entropy in the ‘jiffies delta’, i.e., the difference between successive block I/O interrupt time stamps (measured in millisecs) based on 100K samples for a block I/O device on the KVM hypervisor without buffer caching. This figure shows that the majority of the jiffies deltas are either zero or one therefore containing a high degree of predictability. Fig 14, on the other hand, assigns a Shannon entropy of 16 bits to the 22 low bits of the high-resolution timestamp delta, showing that the low bits of the differences of high resolution timestamps provide the bulk of entropy for block devices. The author also points out that this is greater than the max. assignable value of 11 bits of entropy to a single delta by the estimation heuristic, which supports the statement that the latter method is indeed a conservative estimate for a virtualized environment of this type. Remarkably, figure 19 in this reference shows that the low 22 bits of the block I/O high resolution timestamp difference with hypervisor caching enabled contain about 15 bits of Shannon entropy, which is almost the same as the entropy conveyed in the previous statistic with caching disabled. This implies that buffer caching of data by the KVM hypervisor does not seem to alter inherent uncertainty of this phenomenon. Figures 23 and 28 show similar distributions for block device I/O on Oracle’s VirtualBox with Shannon entropy of the low 22 bits at 13.7 and 14.95 bits respectively for buffer caching disabled vs. enabled scenarios where caching in this case has slightly increased the uncertainty of the events. The Block I/O experiments with caching enabled do not, however, discuss the level to which device reads actually result in cache hits which has material impact on access rates to the hardware. Similarly, fig. 32 shows 15.12 bits of entropy per event on a Microsoft hypervisor, whereas fig. 39 measures 13.61 bits of Shannon entropy per delta on VMWare ESX. Another point I should make is that with this study, the experiments measure timing on a single VM, they do not look at possible correlation of timestamp events on many VMs (for eg., two VMs on shared hardware requesting access to the same block in near simultaneity) – though theoretically, serialized access to the hardware will likely result in different timestamp values for the access.  

In concluding, we’ve gone over the basics of entropy measurement and collection in the LRNG, and started a discussion on the effects of virtualization on timing events that provide the raw uncertainty for this process. Future work could include (i) a more detailed discussion of the LRNG mixing function (ii) more analysis on measurements of initial seed entropy of the non blocking urandom pool immediately after system boot. Based on my current understanding, this pool does not wait for sufficient entropy accumulation before transferring bits to calling applications, thereby violating NIST requirements around the entropy of seeds vis a vis the ‘instantiation strength’ of a PRNG (In my view this is the main issue with urandom rather than its periodicity) (iii) more analysis on the impact of virtualization on timing events not related to block device access (iv) my memory is hazy on this point, but I recall that RedHat may have obtained FIPS certification for a kernel RNG, it may be worthwhile to look into this. It was almost certainly not the LRNG, but another generator based on NIST SP 800 90A deterministic algorithms and NIST 800 90B entropy tests.