Improved Masking Multiplication with PRGs and Its Application to Arithmetic Addition

,


Introduction
Side-channel attack (SCA) [1,2] is a kind of attack exploiting physical leakage (eg:; timing information, power consumption, or electromagnetic leaks) of the cryptographic implementations.Masking is a popular countermeasure against SCA, whose concept is to randomly divide every variable (say, x) into d þ 1 shares x 1 ; x 2 …x dþ1 such that the joint distribution of any d shares is independent of x.This is known as the d-probing (aka., d-private) security, and d is called the security order.Notably, for the popular Boolean masking, we have x ¼ x 1 ⊕ x 2 ⊕ ⋯ ⊕ x dþ1 with ⊕ the addition over F 2 (aka., bitwise XOR).Besides, it has been proved that d-probing security can ensure that the information exploited from any adversary decreases exponentially with d [3].Mainstream masking schemes use a gate-by-gate approach that transforms each elemental operation (eg:; addition and multiplication over F 2 w ) into its masked correspondence called gadget, surrounding which flourishing literature emerges in the last years.
One of the most groundbreaking works toward designing masking schemes is the work of Barthe et al. [4].Instead of proving the security of full implementation at once, this work introduces the composable security notions called noninterference/strong noninterference (NI/SNI).The composable security notions allow proving the security of smaller gadgets in terms of composability with other masked circuits.Later, Cassiers and Standaert [5] proposed a new composable security notion called probing-isolating noninference (PINI), enabling a more straightforward composition of gadgets.That is, gadgets fulfilling this PINI notion can be freely composed with each other without interfering with their SCA resistance.
Coron et al. [6] proposed a special technique called locality of randomness subset, allowing the usage of multiple PRGs to reduce the randomness cost by setting proper randomness subsets of each gadget.According to it, if all gadgets are SNI-R/PINI-R defined in [6], we can securely use d-wise PRGs [7] to generate the random bit for the gadgets and keep an equivalent security in the probing model, even if the worst case where the adversary can get the variables in a PRG with one probe happens.Then, we can reuse the random seeds of dwise PRGs in different gadgets based on the locality of the subsets, which significantly reduces the randomness cost.In [6], the ISWAND [8] has been proved as SNI-R with 1-local use of dðd þ 1Þ=2 subsets.Furthermore, two better SNI-R AND algorithms are given in [6] with d 1-local use subsets and d 2-local use subsets.
When a cryptographic algorithm involves arithmetic addition operations (eg:; the add-rotate-xor (ARX)-based block ciphers such as XTEA [9] and SPECK [10], hash functions SHA-1 and SHA-2, and NIST lightweight cryptography finalist SPARKLE [11,12]), transforming elemental operations becomes intricate-because of the higher algebra degree of the arithmetic addition operations.At CHES 2001, Goubin [13] described a very elegant algorithm for converting between shares x 1 ; …; x dþ1 and shares A 1 ; …; A dþ1 such that: with þ the arithmetic addition.Afterward, there has been a series of literature focusing on designing better-converting algorithms [13][14][15][16][17][18][19].At FSE 2015, Coron et al. [20] described an improved algorithm performing arithmetic addition modulo 2 w with complexity Oðd 2 log wÞ that integrated conversion of both directions.Although Coron's algorithm is a very efficient scheme, (there indeed exist some other approaches only focusing on the conversion (of one direction) from Boolean to arithmetic masking with somewhat better complexities [18,21,22].Despite their prospective applications in many scenarios such as masked postquantum cryptography [23,24], it is intricate to applying them to arithmetic addition, which requires the conversions of both directions) it is not provably secure in any composed security notions and thus is risky to be used for larger composed computation.Then there is a highorder arithmetic addition algorithm proposed at [25], which is based on [20] but only satisfies NI security.
1.1.Our Contributions.Following [6], our main contribution is to propose a new security notion allowing multiple PRGs, and a more efficient masked d-order AND algorithm with 1local use of 2d randomness subsets based on [5] which satisfies the new notion.Besides, we consolidate the work on masked arithmetic addition by improving the existing work of Coron et al. [20].Our contributions can be summarized as follows.
1.1.1.A New Security Notion Allowing the Use of Multiple PRGs.
We extend the composable security notion called PINI to allow more efficient (than the work in [6]) the use of multiple PRGs.This brings a new notion called PINI-extension (PINI-E).We also describe the deduction from security in PINI-E to the security in the probing model.We introduce the usage of PRGs for PINI-E gadgets at Figure 1.
1.1.2.A New Algorithm for Bitwise Multiplication.We propose a new d-order AND algorithm with dðd þ 1Þ random bits and PINI-E security, where we apply the PINI trick proposed at [5].Besides, we can keep its 1-local use for 2d randomness subsets.We show the comparison of the works proposed at [6] and ours in the locality and randomness subsets in Table 1.
1.1.3.Application to Arithmetic Addition.Based on the methodology from Coron et al. [20], we provide an algorithm for higher order masked arithmetic addition, and describe applications of our countermeasure to the SPECK, XTEA, and SPARKLE.
We implement masked round functions on the ARM Cortex M3 architecture at the assembly level and report the performance results.Notably, to the best of our knowledge, they are the first implementation results of higher order masking for the ciphers using the ARX structure.
1.2.Organization.In the rest of this paper, we present notations and backgrounds in Section 2. And, we describe the new AND algorithms and give the necessary proofs in Section 3. Section 4 presents the arithmetic algorithm, including its description, related proofs and randomness cost.The implementations of the arithmetic algorithm are in Section 5. Finally, we conclude our work in Section 6.

Preliminaries
2.1.Notations.Let F 2 w be a field with characteristic two.Let ⊕ be the field addition over F 2 (aka., bitwise XOR), and ⋅ be bitwise AND operation.We denote a set of variables by the set of variables by x ⋆ ¼ def fx 1 …x dþ1 g.In addition, we use Let þ be addition modulo 2 w .For any In a matrix A, we define s A i j as the element at the i-th row and j-th column.For n × n matrices A and B, let [12] Our work Composable security PINI-R * PINI-E Ture random cost * The state-of-the-art work is SNI-R security, and its implementation in [12] uses double-SNI construction to ensure the PINI-R security.We choose to compare with the double-SNI version for a similar composability.

IET Information Security
where s C i j ¼ def ðs A i j ; s B i j Þ for i; j 2 ½n.Let S k i be a sequence fs i1 ; s i2 ; …; s ik g in a n × n matrix for i; k 2 ½n.And, let 2.2.Private Circuits.In this part, we describe some definitions regarding the private circuit proposed in [8].A circuit is a directed acyclic graph with gates as vertices and wires as edges, respectively, where every wire carries a variable in F 2 w , and each gate represents an elementary calculation over F 2 w .We recall the definition of private circuit proposed at [26] below.
Definition 1 (private circuit [26]).A private circuit for f : w is a circuit called decoder.It maps the outputs (d þ 1 shares) of C to the original outputs of the private circuit.
Moreover, a private circuit is called a d-private (or dprobing secure) circuit if it satisfies the requirements below: (1) Correctness: for any input x 2 F n 2 w , OðCðIðxÞÞÞ ¼ f ðxÞ; (2) Privacy: for any x; x 0 2 F n 2 w and any set P of at most d wires in C, the distributions of C P ðIðxÞÞ and C P ðIðx 0 ÞÞ are identical, where C P ðIðxÞÞ refers to the values of variables in P with input x.
Although the definition of private circuit nicely provides protection against the SCAs, proving a large circuit (such as the AES) to be d-private is nontrivial since the possible tuples of the d wires grow exponentially with the circuit size.To cope with such an issue, Ishai et al. [8] proposed a gate-bygate approach to transform each gate separately into the masked correspondence circuit called gadget and compose the gadgets to achieve the private circuit.A gadget is a circuit with shares as inputs and outputs.
The first d-probing secure bitwise AND gadget (that implements the bitwise AND operation over F 2 w in the masked domain) was proposed by Ishai et al. [8] at CRYPTO 2003 named ISWAND, which we give an example for d ¼ 2 in the following: Meanwhile, we give a randomness matrix in Figure 2 to express the construction of its randomness, which will appear in Section 3 again.
We can verify that the sum ðover F 2 w Þ of all c i s is the bitwise AND of a and b.Note that the order of the calculation is strict.For instance, at line 8, r ij ⊕ a j ⋅ b i is calculated before XORing a i ⋅ b j .

Composable Security Notions and
Extensions.Note that one has to insert many refreshing gadgets to compose d-private gadgets securely, significantly increasing the randomness cost.Barthe et al. [4] proposed the concept of composable security notions NI/SNI that enable the composition without refreshing gadgets.Below, we recall the definitions of NI/SNI proposed at [4].
Definition 2 (NI/SNI [4]).Let G be a gadget taking a ⋆ ; b ⋆ as inputs and returning c ⋆ .The gadget G is NI (resp., SNI) secure if and only if for any set of t intermediate variables and any subset O of output indices such that t 0 ¼ t þ jOj⩽d, there exists sets I and J of input indices with jIj⩽t 0 and jJj⩽t 0 (resp., jIj⩽t and jJj⩽t), such that the t intermediate variables and the output variables c j O can be perfectly simulated from a j I and b j J .⋆ ⋆ * The more stars there are, the stronger the requirement is.More precisely, a PINI-E gadget is PINI but not the other way.Furthermore, a PINI-R gadget is PINI-E where the subset number of PINI-E is n, which is not the other way as well.The difference of the diffculty for simulation between PINI-R and PINI-E comes from the requirement whether c j jR should be simulated or not.Besides, there is an updated definition called SNI-R proposed at [6], which is used in the situation where a randomness subset in gadget G can be got with a single probe.Definition 3 (SNI-R [6]).Let G be a gadget with input shares a ⋆ and b ⋆ , output shares c ⋆ .Let ðρ i Þ 1⩽i⩽dþ1 be subsets of the randoms used by G.The gadget is SNI-R if and only if for any set of t intermediate variables, any subset O of output indices and any subset R ⊂ ½n, such that t þ jOj þ jRj⩽d.Then the t intermediate variables, the output variables c jO∪R , and all ρ i for i 2 R can be perfectly simulated from the knowledge of a jI∪R and b jJ∪R with jIj⩽t and jJj⩽t.
However, the security of the trivial composition of several private circuits is not evident.More precisely, even the SNI circuits can not keep its security with trivial composition.To mitigate this issue, Cassiers and Standaert [5] proposed a new composable security notion called PINI, by which we can concentrate on the proof of every single gadget and the global security can be directly deduced.We recall it in the following.Definition 4. (PINI [5], adapted (the original PINI security is defined for arbitrary number of inputs, we provide a fan-in 2 version in our paper)).Let G be a gadget with input shares a ⋆ ; b ⋆ and output shares c ⋆ .The gadget G is PINI if for any t 1 2 N, any set of t 1 intermediate variables and any subset O of output indices, there exists a subset I ⊂ ½1; d þ 1 of input indices with jIj⩽t 1 such that the t 1 intermediate variables and the output shares c jO can be perfectly simulated from the input shares a jI∪O and b jI∪O .
Meanwhile, Cassiers and Standaert [5] provided a gadget construction called double-SNI which can turn SNI gadgets into PINI one.
Definition 5 (double-SNI [5]).Let G be an SNI gadget taking as input a ⋆ ; b ⋆ and output c ⋆ .Let R be an SNI gadget taking as input x ⋆ and output y ⋆ .The composite gadget G' taking as input x ⋆ ; b ⋆ , and output c ⋆ with To reduce the randomness cost of a large circuit, there is an adapted definition called PINI-R proposed at [6] which also assumes the adversary can get a randomness subset with a single probe.Definition 6 (PINI-R [6]).Let G be a gadget with input shares a ⋆ ; b ⋆ and output shares c ⋆ .Let ðρ i Þ 1⩽i⩽dþ1 be subsets of the randoms used by G.The gadget G is PINI-R if for any t 1 2 N, any set of t 1 intermediate variables, any subset O of output indices and any subset R ⊂ ½n, there exists a subset I ⊂ ½d þ 1 of input indices with jIj⩽t 1 such that the t 1 intermediate variables, the output shares c jO∪R , and the randoms ρ i for i 2 R can be perfectly simulated from the input shares a jI∪O∪R and b jI∪O∪R .
2.4.Masking with Randomness from PRGs.In this part, we recall some definitions for gadgets using randomness generated from PRGs and the corresponding PRGs.

Locality of Randomness and Its Application.
First of all, we introduce the locality of randomness subset proposed at [6] used to describe the reuse extent of the randoms.It decides the PRGs used for the subset.
Definition 7 (ℓ-local randomness subset [6], adapted).Let G be a gadget and ρ be a randomness subset used by G.We say that ρ is ℓ-local use if any intermediate variable of G is related with at most ℓ elements of ρ.
With the definition of locality, we propose a weaker security definition than PINI-R which can also keep the composability and d-private with the same extended probing model as PINI-R.We define it as PINI-E (shorted for PINI-Extension) and provide the definition in the following.
Definition 8 (PINI À E).Let G be a gadget with input shares a ⋆ ; b ⋆ and output shares c ⋆ .Let ðρ i Þ 1⩽i⩽m be subsets of the randoms used by G with m 2 N, and each ρ i is ℓ i local use.The gadget G is PINI-E if for any set of t intermediate variables, any subset O of output indices and any set of t r randomness subsets ρ i with t þ t r þ jOj⩽d, there exist subsets I; R ⊂ ½d þ 1 with jIj⩽t and jRj⩽t r such that the t intermediate variables, the t r subsets ρ i and the output shares c O can be perfectly simulated from the input shares a jI∪O∪R and b jI∪O∪R .
Obviously, PINI-E is an extension of PINI which allows to probe a subset of randomness with a single probe.And compared with PINI-R, the PINI-E security does not need to simulate c j R , therefore PINI-E algorithm is easier to construct.But intuitively, its number of randomness subsets is not bounded as PINI-R.We introduce the d-private security and composability of PINI-E in the following and provide the proofs at Appendix A as supproting information.
Theorem 1 (security of PINI À E).Let G be a gadget with input shares a ⋆ ; b ⋆ and output shares c ⋆ .Let ðρ i Þ i2½m be a partition of the randomness used by G.If G is PINI-E with randoms ðρ i Þ i2½m , then G is d-private secure in an extended model of security where the adversary can get each ρ i with a single probe.
We stress that the G i in Theorem 2 are the implementations of the same f .Meanwhile, we provide a proposition about the composition of PINI-E gadgets implementing different algorithms ð f i Þ 1⩽i⩽k which we also prove at Appendix A as supproting information.IET Information Security The composite gadget made of G i is PINI-E with the same randomness subsets.
Proposition 1 shows why PINI-E is weaker than PINI-R since the composition of PINI-R gadgets keeps the number of randomness subsets regardless of the circuit size.We mention that the composability of PINI-E is theoretically limited for the situation where there is more than one kind of gadget used to replace the same gates in the unprotected circuit, and all these gadgets use the same PRG (e.g., two kinds of multiplication gadgets are used in one circuit, and both of them use the same PRG), which barely happens in reality.In Table 1, we compare the security of PINI, PINI-R, and PINI-E.The remaining part is the construction of the masked implementation with locality property.We recall the mask refreshing named locality refreshing (LR) from the study of Ishai et al. [7] to keep a small locality for each gadget in Algorithm 1.
The proof of Lemma 1 is equivalent to prove PINI which has been proposed at [6], because the division of ρ i in Lemma 1 is exactly the single random s i .
Theorem 3 (locality composition with randomness subset [6], adapted).Let ðG i Þ 1⩽i⩽k be a set of 2-input gadgets with randomness subsets ðρ i j Þ 1⩽j⩽m , each of which makes an ℓ jlocal use.Consider the gadgets G 0 i where the inputs and output of each G 0 i is locality refreshed with randoms s

Application of Multiple PRGs.
We recall the definition of r-wise independent PRG, which can be much more efficient than traditional PRGs.Definition 9. (r-wise independent PRG [7] (we adapt the elements in each subset to those in F 2 w , while the original definition was in F 2 in [7])).A function G: F n 2 w → F m 2 w is an rwise independent PRG if any subset of its r outputs is independently and uniformly distributed when the input is uniformly distributed.
Here, we describe two r-wise PRGs called R 1 and R 2 proposed at [6].The parameter r of R 1 can be set as any positive integer while that of R 2 is fixed as three.However, the running efficiency of R 2 is much higher than that of R 1 .
We define R 1 : F r 2 w → F 2 w 2 w as follows: where a ¼ ða 0 ; …; a r−1 Þ 2 F r 2 w and: R 1 is an r-wise PRG because there is a bijection between the r coefficients of h a ðxÞ and its evaluation at r distinct points x i [6].For instance, R 1 can output at most w ⋅ 2 w bits of randomness when given wr bit seeds over F 2 w .
We define another PRG R 2 : F 2n 2 → F n 2 2 as follows: This PRG is based on the expander graph used in [7].It can generate n 2 randoms by 2n bit seeds.It is much more lightweight (with only XOR operations) than R 1 .In [6], it is proved as a 3-wise PRG, recalled Lemma 2.
Lemma 2 (see [6]).The randomized function R 2 is a 3-wise independent PRG.Then, we introduce the security of masking with multiple PRGs in Theorem 4 proposed at [6], where we can keep ℓlocal gadgets secure when multiple PRGs are used to generate the random elements.This reduces the randomness cost efficiently.
Theorem 4 (security with multiple PRGs [6], adapted).Suppose C is a d-private implementation of f with encoder I and decoder O, where the circuit Cðb ω; ρ 1 ; …ρ k Þ uses for each 1 ⩽ i ⩽ k, n random elements ρ i and makes an ℓ-local use of ρ i , b ω are the inputs of C, and the adversary can obtain ρ i with a single probe.Let G : F n r 2 w → F n 2 w be a linear ℓd-wise independent PRG.Then, the circuit C 0 denoted by C 0 ðb ω; ρ 0 1 ; k ÞÞ is a d-private implementation of f with encoder I and decoder O, which uses k ⋅ n r random elements.

Coron's Work on Masked Arithmetic Addition. Coron et al. [20] introduce a new algorithm to convert from
IET Information Security arithmetic masking to Boolean masking, which is introduced in Theorem 5.This algorithm uses Kogge-Stone [27] carry look-ahead algorithm proposed at to replace the classical ripple-carry adder, which reduces the complexity from OðwÞ (in a previous work [15]) to Oðlog wÞ.
Theorem 5 (see [20]).Let x; y 2 F 2 w , ℓ ¼ dlog 2 ðw − 1Þe.Define the sequence of w-bit variables P ðiÞ and Q ðiÞ , with P ð0Þ ¼ x ⊕ y and Q ð0Þ ¼ x ⋅ y, and: In Section 4, we propose a new algorithm that expands the security order from 1 to any d based on Theorem 5.
Besides, we have proved its PINI-E security and locality in Appendix B as supproting information.We give a new AND algorithm in Section 3 with 1-local use of its OðdÞ randomness subsets with odd d and use it in the new arithmetic algorithm.

The New Masked AND Gadget
In this section, we introduce a new masked AND algorithm with lower locality, as well as its PINI-E security and corresponding proof.
3.1.The Description of the New Algorithm.We describe our new algorithm with odd d in Algorithm 2, which is provable secure in PINI-E with 1-local use of 2d randomness subsets.We will prove its PINI-E security and locality in the next section.Algorithm 3 provides a PINI trick proposed at [5] keeping LatinAND PINI (and PINI-E).Note that the inputs and ouput of PIRT (short for PINI-PART) are explained at Figure 3.
Intuitionally, PINI security does not allow the leakage of more than one input indices with one probe, and the PINI trick (i.e., Algorithm 3) avoids these leakages in the multiplication gadgets by changing the operation order of multiplying secret (i.e., a i and b j in Algorithm 3) and adding randoms (i.e., r i 0 j 0 in Algorithm 3).In comparison, ISWAND calculates a i ⋅ b j þ r ij directly and thus it is not PINI.
The intermediate step intuitionally defines a partial order among the intermediate variables.We use this definition in The example of how PIRT works in a gadget G.The indexes i 0 ; j 0 are used to mention they are independent with i; j.
ALGORITHM 3: PIRT: part of a PINI algorithm [5], adapted.The proof of Theorem 7 is obvious, because all randoms in each ρ L k or ρ R k appear only once in each c i .This is exactly the definition of the locality of randomness subset.

3.3.
Discussion for the Randomness of LatinAND.We have proven in the prevoius subsections that PINI-E gadget is dprobing secure with PRG-generated randoms and almost trivial composability, and the PRGs are required to be ℓdwise if the randoms are ℓ-local use.Also, we provide the construction of r-wise PRGs with arbitrary r.Moreover, we have proven that LatinAND is PINI-E.Thus, the randomness of LatinAND is theoretically indistinguishable from a dprobing secure AND gadget with TRNG-generated randoms if the PRGs of LatinAND are d-wise.
To validate the impact of the randomness on the practical security, we run LatinAND and another multiplication gadget proposed in [28] on a ChipWhisperer STM32F4 UFO target board and collect its power traces with Picoscope 5244D at sampling rate of 125 MS/s.Besides, we perform a Welch's T-test with 10, 000 executions, whose randoms are generated by PRGs (LatinAND) and TRNGs (AND gadget proposed in [28]), respectively, to compare the randomness of the PRG implementation and the TRNG ones.Figure 4 depicts the T-test results for LatinAND, and we provide in Figure 5, the result for the other gadget with the randomness from TRNGs.

Application to Arithmetic Addition
In this section, we implement LatinAND gadget in an arithmetic addition algorithm proposed at [25], which is costly in randomness for previous multiple gadgets.
Our description is structured by means of top-down.All gadgets presented in this subsection are PINI-E, and we defer the security proofs to Appendix B as supproting information.First of all, we describe the algorithm SecADD to perform addition operations directly on the masked shares, which is similar to the algorithm proposed in [25] but we add some construction in our algorithm so that it can use multiple PRGs.More precisely, we receive the shares a ⋆ and b ⋆ satisfying a ¼ ⊕ a ⋆ and b ¼ ⊕ b ⋆ as inputs, and the goal is to compute c ⋆ satisfying ⊕ c ⋆ ¼ a þ b.Note that our new algorithm is based on the concept of [20] and adapted for higher security orders.We describe it in Algorithm 4.
In the rest of this subsection, we will explain the construction of the ingredients GoQ i and GoP i .Both of them are additionally with locality property for the use of r-wise PRGs, so that the randomness cost can be reduced.
First, we propose the gadgets to calculate P ðiÞ and Q ðiÞ in Theorem 5. We will introduce GoP k gadget first, which is used to generate P k proposed at Equation (6) because the inputs of GoP k are the outputs of GoP k−1 , furthermore, they do not need any intermediate variables from the generation is the description of GoP k .
Then it comes to Q ðkÞ , which is shown in Algorithm 6.We use GoQ k gadget to get all Q ðkÞ proposed at Equation (6).But the inputs of GoQ k are the outputs of GoP k−1 and GoQ k−1 , so we must get P ðk−1Þ and Q ðk−1Þ first to calculate Q ðkÞ , and this is why we introduce it as the latter one.IET Information Security Meanwhile, we provide the evaluation of the randomness cost for Algorithm 4 in Appendix C.

Masked Implementations of SPARKLE, XTEA, and SPECK
In this section, to evaluate the performance of SecADD, we apply our scheme to SPARKLE, XTEA, and SPECK ciphers.SPARKLE [11,12] is a family of cryptographic permutations shortlisted for the finalists NIST lightweight cryptography standardization.We choose SPARKLE256 for the evaluation.XTEA block cipher was introduced in [9], which is designed to correct weaknesses in TEA.And SPECK is a family of lightweight block ciphers publicly released by the National Security Agency (NSA) [10], which is optimized for performance in software implementations.SPARKLE, XTEA, and SPECK are all based on the ARX design with arithmetic addition, rotation, and XOR operations, where the masked arithmetic addition perfectly fits SecADD.We can use SecXOR for the masking of bitwise XOR operations, which is PINI-E.For masking of shifting operations, we directly use the trivial implementation where each share is operated separately.For example, the masked rotate left shifting by n can be implemented by c i ¼ a i ⋘ n; i 2 ½1; d þ 1 with a i the input share and c i the output one, which is secure in PINI-E.We use independent random bits/seeds for different SecADD.SecADD is PINI-E according to the composability of PINI-E.By Proposition 1, the masked SPARKLE, XTEA, and SPECK are all PINI-E.
We implement masked SPARKLE XTEA and SPECK based on ARM Cortex M3 architecture at assembly level, for illustrative purposes and timing comparisons.We show the costs in  Tables 2-4, with the number of required true random number bits.We present the implementations using SecADD with both R 1 and R 2 introduced in Section 2.

Conclusion
We proposed a new security definition named PINI-E to release the requirements in PINI-R proposed in [6], where both of them support the randoms generated by multiple PRGs.Furthermore, we provide a high-order PINI-E multiplication gadget (i.e., Algorithm 2) with a two-thirds reduction of true random cost compared with the state-of-the-art work proposed in [6].Then we apply the new multiplication gadget into the Boolean-to-Boolean arithmetic addition algorithm (i.e., Algorithm 4), and use it in the implementations of SPARKLE, XTEA, and SPECK based on ARM Cortex M3, which are the first implementations of higher order masking for the ciphers using the ARX structure.

Appendix A. Composability and Security of PINI-E
A.1.Proof of Composability Proof.Consider the composite gadget like Figure 6, we define P i as the probed intermediate variables of G i , where jP i j ¼ t i .And we denote by R i the indice sets of probed randomness subsets ρ j for each G i .Furthermore, the indice set of probed output are defined as O i for G iþ1 .For G 1 which is the last gadget of the composite gadget, its probed output set is O.Meanwhile, we have: First we consider G 1 .According to PINI-E, the indice set of its inputs I 1 ∪ O ∪ R 1 can simulate all probes in G 1 , where jI 1 j⩽jP 1 j.Then we consider G 2 .Since the outputs of G 2 are the inputs of G 1 , the indice set for the simulation of G 1 is equivalent to the probed output of G 2 .Therefore the probed output indice set of G 2 becomes O 1 ∪ I 1 ∪ O ∪ R 1 .Meanwhile, according to Theorem 4, the indice set of randoms for G 2 should be R 1 ∪ R 2 .So, the indice set of input for G 2 to simulate all the probes is as follows: With this proof method, the indice set of input for the first G i of the composite gadget is ⋃I i ∪ ⋃O i ∪ O ∪ ⋃R i .And we have: ðA:4Þ Therefore the adversary learns nothing from the inputs.□

A.3. Proof of Proposition 1
Proof.We suppose the input indice set for G i is I i and the probed randomness subset is R i .Consider the last gadget G 1 where its ouput O is the output of the whole composition, we The composition of PINI-E gadgets.Let P i be the set of probed intermediate variables, and let R i be the randomness subsets.O i are the probed outputs of G iþ1 .Specially, O is the probed output set of the composite gadget. use O to simulate all its probed variables.For G 2 where its output O 1 is one of the inputs of G 1 , its indice set for simulation should be: ðA:5Þ If G 1 and G 2 is the implementation of the same f i , they are PINI-E according to Theorem 2. And if they are different implementations, we have where t 2 (resp., t 1 ) is the number of probes in G 2 (resp., t 1 ) without its probed output.Therefore, the simulation for the whole composition needs no more than ∑ i2½k ðt i þ jR i jÞ⩽d input indices, and the indice set is According to the illustration in Sections 2.3 and 2.4, we summarize a proof sketch on the probing security of gadgets' composition in Figure 7.

B. Proofs for LatinAND
B.1.The Security of LatinAND.First we introduce the construction of the matrix L d .We give a ðd þ 1Þ × d matrix as Figure 8.Its first row is f1; 2; …; dg, and in other rows, the order of sequence is the cyclic shift of its last row except the last row whose first dþ1 2 elements are 2j − 1 for the j-th element and the rest elements are 2j − d − 1.Then we add a sequence f0g dþ1 as the first column of L d .We give the construction of L 5 as an instance in Figure 8.
Then let M r d be the randomness matrix of r in LatinAND with order d, and we define the mapping ϕ :    IET Information Security randomness subsets of t is also the mirror symmetry of ρ L k for k 2 ½d, called ρ R k .We give an example of M r d and M t d in Figure 9.Let M d ¼ def M r d þ M t d be the randomness matrix of LatinAND.
Finally, we define N d as the matrix mixed M d with the inputs a i ; b j .More precisely, let: We provide Lemma 3 for the proof of PINI-E and prove it at Appendix C as supproting information.
Lemma B.1.In M r d , there are at most 2 randoms r i 1 j 1 2 S k i and r i 2 j 2 2 S k 0 i 0 satisfying r i 1 j 1 ¼ r i 2 j 2 for r ik ; r i 0 k 0 2 ρ L j .
Proposition B.1.Lemma 3 also works when we replace S k i ; S k 0 i 0 with S dþ1 i =S k i and S dþ1 i 0 =S k 0 i 0 .More precisely, the randoms pair exists for S dþ1 i =S k i and S dþ1 i 0 =S k 0 i 0 iff the randoms pair in Lemma 3 does not exist.Lemma 3 and Proposition 2 show that every random is used only twice in the different outputs.

B.2. Proof of Lemma 3
Proof.According to the construction of M r d , there always exists r i 1 j 1 and r i 2 j 2 satisfying Lemma 3 between S dþ1 i and S dþ1 i 0 .So Lemma 3 is proved.
i 0 , the proof is the same as Lemma 3.

B.4. Proof of Theorem 6
Proof.There are two steps in our proof, the proof of PINI and the extension to PINI-E.First we prove the PINI security.Let I be the indice set of inputs.WLOG, we only consider the randoms r, because the other randoms do not weaken the security.
(1) According to the construction of M d , each pair of a i b j and a j b i for i ≠ j is protected by the same random r ij .As a result, if the random r ij in the probed variables is simulated, we put the corresponding indice i; j into I.
(2) Then we consider the situation where the randoms r of probed variable p are simulated by more than one probe, for example, r 1 ; r 2 in the probed variable p 1 are simulated by variables p 2 and p 3 because each probe can simulate at most one random of the other probe according to Lemma 3 and Proposition 2. Consider Algorithm 2, the input indice of each  (t 14 , 3) (r 14 , 3) (r 13 , 2) (r 12 , 1) (0, 0) (0, 0) (r 13 , 2) (r 23 , 3) (r 24 , 1) (0, 0) (r 14 , 3) (r 24 , 1) (r 34 , 2)  IET Information Security intermediate step of c i for i 2 ½d are continuous.More precisely, for the adjacent elements r i 1 j 1 and r i 2 j 2 in S k i of M r d , there must be i 1 ¼ i 2 or j 1 ¼ j 2 , which means the input indices corresponding to the randoms are also continuous.Thus in this case, each additional probe only adds 1 more indice into I.
(3) For the intermediate steps of c dþ1 , the adjacent elements of S k i also own the same indice.And thanks to PIRT, the simulation of c 0 in PIRT needs all randoms contained in its intermediate steps, which is similar to case 2. Consider s ij and p 0 i j must satisfy PINI, there is only p 1 i j left for the proof of PINI.Since p 1 i j ¼ a i b j ⊕ a i r i 0 j 0 , there are at least 2 probes to simulate a i r i 0 j 0 .Therefore the intermediate steps of c dþ1 also satisfy PINI, the PINI security of LatinAND is deduced.
The proof also works when we only consider the randoms t, the proof is the same as that of r so we omit it.Then we prove the PINI-E of LatinAND.We provide Figure 10 to describe the distribution of proved randoms at M d when a ρ L i and a ρ R j are probed.First, we prove that all intermediate variables must satisfy PINI-E except those at the "intersection" as Figure 10.In other words, an intermediate variables will not break PINI-E unless all its randoms r and t are contained in the probed randomness subsets.Consider the proof of PINI, if the randoms r for some intermediate variable are probed, it still satisfies PINI security because the PINI proof also works with the randoms t.And the situation of probing t is the same.We mention that the only difference between PINI-E and PINI is the probes of randomness subsets, so the intermediate variables mentioned above also satisfy PINI-E.
As a result, we only consider those intermediate variables whose randoms are simulated with their randomness subsets, which is called bare in the rest of the proof.
(1) We prove that there are at most two bare s ðNÞ i j when there are one probe for ρ L k and ρ R k 0 , respectively, with k; k 0 2 ½d.First, we consider the i-th row for i 2 ½d.In this case, the proof is equal to prove there are at most two intersections in Figures 10(a) and 10(b).Mention that the included angle of either the blue line or the orange line in Figure 10 and the edges of M d are π 4 , we know that the blue line is perpendicular to the orange one.We assume, there are more than two intersections of these lines, i.e., there are three or four intersections.In this case there must be two intersections at the extreme points of one of the dotted lines, WLOG, we assume they are at the extreme points of the vertical line.However, according to M r d , if Hence, there must not be two intersections at the extreme points of the dotted line, which means there are no more than two intersections.So, we prove the proposition.WLOG, in the rest of the proof we assume there are two bare a i b j for each two i and ρ R i , where we assume there are three probes for both ρ L i and ρ R i .The r (resp., t) refers to the probed randoms at ρ L (resp., ρ R ), and r þ t refers to that both randoms r and t are probed, defined as bare.The red squares in subparts (c, d) are used to stress the bare variables.

12
IET Information Security probes of ρ L k and ρ R k 0 with k; k 0 2 ½d (generally, if which comes from the construction of L d .Therefore we only consider j ¼ k).
(2) Then we show that indice set js ðNÞ i j j I ∪ js ðNÞ i j 0 j I for 1<i ⩽ d and 1<j; j 0 ⩽ i satisfies ‖s ðNÞ i j j I ∪ js ðNÞ i j 0 j I j ¼ 3, which can be got from the construction of N d .And it also works for j; j 0 >i.More precisely, for a fixed i, there is always an input indice i þ 1 in the s ðNÞ i j with 1<j; j 0 ⩽ i, and i for j; j 0 >i.Specially, js (3) Then we prove there are at least two probes to get S k i =S k i 0 at N d .Mention that there is no a i b j appearing directly in the intermediate variables at PIRT, the only way to get s ðNÞ i j is to probe both S i j and S i j−1 at N d .As a result, getting m S k i =S k i 0 needs at least 2m probes.Moreover, consider the distribution of a i b j in N d , which we discuss at last case, the most efficient probe method for the adversary is to probe the continuous sequence S k i =S k i 0 instead of s ðNÞ i j with discrete j, thus we omit other situations in the rest of the proof.(4) According to case 2 and case 3 above, we consider the probes containing the adjacent ρ L m and ρ R n for m; n 2 ½d and the corresponding intermediate variables, the "adjacent" means the subsets are adjacent at M r d and M t d and intuitionally the adjacent subsets can also describe as the orange and blue lines in Figure 10 with larger thickness.Figures 10(a and ℓ 1 (resp., ℓ 2 ) is the number of probes of ρ L i (resp., ρ R i ).We assume the number of probes at ρ L i and ρ R i is equivalent, and we put the explanations about the propositions and assumption above into Appendix C as supproting information.In the rest of the proof, we assume the shape of the bare variables is the ℓlength square proposed at Figure 11.(5) According to case 2, the adjacent s ðNÞ i j at the same row or the same column have at least one same indice if the adjacent elements satisfy j ⩽ i or j>i.Therefore, if any S k i =S k i 0 for N d is probed, there are at least 2κ more probes for the randomness subsets needed to make the probe bare with κ¼ def jS k i =S k i 0 j.And consider that there are at most κ þ 1 indices contained at S k i =S k i 0 , its simulation satisfies PINI-E.Mention that κ < d 2 , we consider other probes contained by the ℓ-length square mentioned at case 4. Note that, there are two intersections for the probes of randomness subsets, we assume there are α ⩽ d 6 probes for each of ρ L i and ρ R i , and probe all sequences contained by the two squares, which are 2 × α þ 2 × ð2 × αÞ ¼ 6α ⩽ d probes totally according to the discussion at case 3. First, we consider the square with the probe simulated before, each other sequence in this square provides at most two more input indices, one of the additional indice comes from the situation with i; i 0 ⩽ j mentioned at case 2 and the other possible indice comes from i; i 0 >j.Therefore the probes in this square provide at most  4) In case 4 we extend the conclusion at case 1 from the single probe of both ρ L i and ρ R i to several probes.Also, we discuss the "shape" of the bare variables and extend it into a square in N d which is easier to prove security.The details of why the "shape" is exactly what we claim and how the size of the extended square comes are put at Appendix C as supproting information.According to case 1, we only need to prove the security of the s ðNÞ i j contained in the extended squares. is the situation where there is no element at the top and bottom vertexes of the dashed square, thus the remained shape is hexagon, note that if the top vertex is not element, the bottom one is neither because of the symmetry of the construction.Figure 11(c) is the situation where there is no element at all 4 vertexes of the dashed square, therefore the remained shape is octagon.(this conclusion comes from elementary geometry and we omit the detailed proof ), where ℓ 1 and ℓ 2 is the probe number of ρ L i and ρ R i and we assume ℓ 1 >ℓ 2 .The blue squares at Figure 11 are the scaled ones at the proof of Theorem 6, easy to see that its side length is ℓ 1 þℓ 2 2 and it contains all bare s ðNÞ i j .

C. Evaluation of the Randomness Cost for SecADD
In this part, we will calculate the cost of randomness in SecADD.We consider the operations are over F 2 w and let ℓ¼ def ⌈log 2 ðw − 1Þ⌉.
Now we calculate the cost in SecADD with multiple PRGs.According to Theorem 4, we use a set of d-wise PRGs to generate randoms used by LR gadgets as they make a 1-local use of each ρ ðkÞ s; i for k 2 ½3, and a set of d-wise PRGs to generate r ij and t ij in LatinAND gadget because they make a 1-local use of each ρ L k and ρ R k .In the following, we will separately discuss the number of PRGs and randoms with either R 1 or R 2 .
When using R 1 , we calculate the number of PRGs and randoms in different situations.First, we consider PRG ðrÞ which is used to generate the randoms in ρ L k and ρ R k .According to the maximum distance separable (MDS) conjecture [29], we have the following inequality, where there are 2ℓ LatinAND gadgets used in a SecADD: Meanwhile, the blue squares refer to the extended squares.
Then we consider the case using R 2 , and we will also calculate PRGs and randoms, respectively.As the output of R 2 is 3-wise independent, according to Theorem 4, we always need d-wise PRGs, and thus the security order d is no more than 3. Let x r and x s be the numbers of randoms needed in R 2 for ρ L k or ρ R k and ρ ðkÞ s; i for some k 2 ½3.First, we consider PRG ðrÞ , we can get the following inequality by the definition of R 2 : ⌉ ¼ 72 bit seeds to generate all randoms in LR.We compare the randomness cost of R 1 ; R 2 and situation without PRGs in Table 5.
Then, we discuss the case when a set of PRGs are used by mutliple SecADD.We only consider the use of PRG R 1 , and the maximum number of SecADD can be calculated by n ¼ ⌊ 2 w 2ℓ⋅⌈ d 2 ⌉ ⌋.It means that there are 2ℓ LatinAND in each SecADD and each randomness subset of LatinAND contains at most ⌈d=2⌉ elements, and thus 2ℓ ⋅ ⌈d=2⌉ elements are contained by a ρ L k in a SecADD algorithm.And, 2 w refers to the number of output variables of a PRG.Hence, ⌊ 2 w 2ℓ⋅⌈ d 2 ⌉ ⌋ is the maximum number of SecADD for one set of R 1 .We set d ¼ 1, w ¼ 8, and ℓ ¼ 5 which is quite a practical relevant setting.Then, we have n ¼ 25.Considering that, in Table 5, one SecADD using R 1 and no PRGs requires 160d 2 and 320d 2 þ 1; 728d random bits, respectively.Therefore, the randomness cost can be reduced by a factor of up to 25ð320d 2 þ 1; 728dÞ=160d 2 ¼ 320.
Ig to denote a set of variables whose indices are contained in I and denote the size of indices set I by jIj.

6d 2 2d 2 FIGURE 1 :
FIGURE 1: The comparison of previous works and ours, and the usage of multiple PRGs in PINI-E gadgets.

2 w
is defined by a triple ðI; C; OÞ, where

13 FIGURE 2 :
FIGURE 2: The randomness matrix of a second-order ISWAND gadget.It reflects the bold part of the above example.

6Theorem 6 .Theorem 7 .
IET Information Security the PINI-E proof of LatinAND which is given at Appendix B as supproting information.Definition 10. (Intermediate step).Let a and c be the intermediate variables of gadget G.We define a as the intermediate step of c if some b exists for a ⊕ b ¼ c or a ⋅ b ¼ c.LatinAND is PINI-E with randomness subsets ρ L k and ρ R k for k 2 ½d.3.2.The Randomness Reuse of LatinAND.We mention that LatinAND is 1-local use of 2d randomness subsets in Theorem 7.And we can build a gadget G with LatinAND which always keeps its 1-local use of randomness subsets by Theorem 3. And, if G satisfies the d-probing security in Theorem 4, we can use 2d d-wise PRGs to generate all r ij and t ij in LatinAND.LatinAND is 1-local use of ρ L k and ρ R k for k 2 ½d.

Þ with s ðLÞ i j 2 L d and s ðMÞ i j 2
M r d .For a d-order LatinAND, we define:

FIGURE 7 :
FIGURE 7:  One can focus on PINI-E security and locality property of each single gadget, then the whole algorithm's probing security can be deducted by the proof sketch.

12 ¼ r 12 ⊕ t 13 and correspondingly s ðNÞ 12 ¼ a 1 b 2 ⊕
r 12 ⊕ t 13 .And we define js ðNÞ i j j I as the indice set of the corresponding a i b j of s ðNÞ i j .

FIGURE 8 :
FIGURE 8: Constructing an L d from an latin square with d ¼ 5.

FIGURE 9 :
FIGURE 9: Examples of the randomness subsets for Algorithm 2 with d ¼ 3.Each element in the same ρ L k or ρ R k is the same color.Intuitionally in M r d , the randoms corresponding to the "initial" latin square of L d are axial symmetry to the diagonal of latin square, and the randoms at the last row are exactly the randoms at the diagonal.Meanwhile M t d is M r d 's mirror symmetry.

FIGURE 10 :
FIGURE 10: Subparts (a, b) are the two cases for probing randomness subsets ρ L k and ρ R k at c i for i 2 ½d, where the blue line corresponds to the probed ρ L k and the orange one corresponds to ρ R k .And the ði; jÞ is the index of ðρL i ; ρ R j Þ in M d .Subpart (a)refers to the situation where the probed ρ L i and ρ R j do not intersect at M d , while subpart (b) does.Subparts (c, d) are two cases of the intersections of more than one probes to ρ L i and ρ R i , where we assume there are three probes for both ρ L i and ρ R i .The r (resp., t) refers to the probed randoms at ρ L (resp., ρ R ), and r þ t refers to that both randoms r and t are probed, defined as bare.The red squares in subparts (c, d) are used to stress the bare variables.
ðNÞ i; i j I ∩ js ðNÞ i; iþ1 j I ¼ fi þ 1g.Meanwhile, there are js ðNÞ i j j I ∩ js ðNÞ i 0 j j I ¼ fjg for i; i 0 <j and js ðNÞ i j j I ∩ js ðNÞ i 0 j j I ¼ fj − 1g for i; i 0 ⩾ j.Therefore, each probe for adjacent s ðNÞ i j can provide at most one more indice.
) and 10(b) show two different situations of the intersections of ρ L m and ρ R n .In the situation of Figure 10(a), there are no s ðNÞ i j are bare, so we only consider the situation of Figure 10(b) according to case 1.And the case of Figure 10(b) can be divided into two different situations as Figures 10(c) and 10(d).The "shapes" (enclosed by the red lines at Figures 10(c) and 10(d)) of the bare s ðNÞ i j may be square, hexagon or octagon, which depends on the choice and the number of probes.All the shapes can be contained at a square with side length ℓ where ℓ¼ def ℓ 1 þℓ 2 2

( 1 )
and the indices for the two squares are less than d.Hence, we prove the PINI-E security for c i and their intermediate steps with i 2 ½d.(6) The PINI-E security for c dþ1 and its intermediate steps is trivial.Since the adjacent s ðM r Þ dþ1; j and s ðM t Þ dþ1; j are different with any other elements at other rows, the probed s ðMÞ dþ1; j are not adjacent when we probe the adjacent s ðMÞ i j with i 2 ½d, which means there are twice probes needed to probe the bare s ðNÞ dþ1; j .Therefore the PINI-E security also works for c dþ1 and its intermediate steps, and we deduce Theorem 6. □ Remark B.1.In this part we give a retrospect of the proof.First, we prove LatinAND is PINI with either ρ L i or ρ R i .Then in the proof of PINI-E, we reduce the scope of potential "unsecure" intermediate variables and finally prove that all variables are PINI-E.More precisely, In case 1 we provide the distribution of the bare s ðNÞ i j with the single probe of both ρ L i and ρ R i .Consider the PINI security of LatinAND with either ρ L i or ρ R i for i 2 ½d, the intermediate variables which are not bare must satisfy PINI, and thus satisfy PINI-E.As a result, we only consider these bare s ðNÞ i j .(2) In case 2 we analyze the indices of the elements at N d at the same row or column.(3) In case 3 we provide the relation between the number of probes and the constructions of probed sequences at N d .Consider the indice distribution of N d discussed at case 2, we determine the most efficient probing method to get most indices.(

( 5 )B. 5 . 6 B. 5 . 1 .
In case 5 we prove the PINI-E security of s ðNÞ i j and their intermediate steps with the conclusion in case 2 and 3 for i 2 ½d.(6) In case 6 we prove the PINI-E of the rest intermediate variables (i.e., c dþ1 and its intermediate steps).Explanations about the Proof of Theorem The Enclosed Part at Figures 10(c) and 10(d).We provide the different shapes at Figure 11, in which the enclosed parts of the dashed lines refer to the probed randomness subsets and those with full lines refer to the bare variables at N d .The Figure 11(a) is the situation where there is an element of N d at each vertex of the dashed square exactly, therefore the full line square is also a square.The Figure 11(b)

B. 5 . 2 .
The Figure of the Enclosed Part and the Scaled Square.

Figure 11 (
d) shows the situation where the probe number of ρ L i and ρ R i is unequal.With the green full lines at Figure 11(d), we know that the red rectangle can be contained by a square with side length ℓ 1 þℓ 2 2

FIGURE 11 :
FIGURE 11: Subparts (a-d) are the different shapes of bare s ðNÞ i j and their corresponding extended squares in the proof of Theorem 6.The enclosed parts of the dotted lines refer to the s ðNÞ i j which randoms r or t are probed, and the enclosed parts of the full lines are the bare s ðNÞ i j .Meanwhile, the blue squares refer to the extended squares.
Input: input shares a i ; b j , random r i 0 j 0 and an intermediate variable c Output: the intermediate variable c 0 i ← PIRTða iþ1 ; b j ; r j; iþ1 ; c i Þ

TABLE 2 :
Running kilocycles/random bits of masked SPARKLE with different security orders.

TABLE 3 :
Running kilocycles/random bits of masked XTEA with different security orders.

TABLE 4 :
Running kilocycles/random bits of masked SPECK with different security orders.
Consider the properties of PINI-E, if ðG i Þ i2½n are proved as PINI-E, their composition will satisfy Theorem 4.Moreover, all ℓ-local randomness subsets ρ j in Theorem 2 can be generated by a ℓd-wise PRG.Moreover, we provide the whole procedure of how to use multiple PRGs in PINI-E gadgets and keep them d-private in the following.How to use multiple PRGs in PINI-E gadgets:(1) We assume gadget G is the composition of gadgets ðG ij Þ i2½n where G is the implementation of f and G ij are those of f i , and each G ij is PINI-E with randomness subsets ðρ i k Þ k2½q i , each of which is ℓ i k local use.The subscript k is used to distinguish the different randomness subsets in G ij .We mention that the subscript j is used to count how many times the f i is implemented in f .(2)According to Theorem 3, we add three LR gadgets to the two inputs and one output of each G ij , each of which owns the randomness subsets ρ ⋅ m i g; k 2 ½3 and i 2 ½d with 1-local use.So that each randomness subset ρ i k keeps their ℓ i k -local use.We define the composition of G ij and LRs as G 0 .(3)According to Theorem 2, the composition of G ij with the same i is PINI-E with the randomness subsets ðρ i k Þ k2½q i .We define these compositions as G i for each i.And the LRs are also PINI-E with randomness subsets ρ .Therefore according to Theorem 4, G 0 is still d-private with 3d þ ∑ i2½n q i PRGs among which the ℓ i k d-wise one is used to generate randoms for the ℓ i k -local subset, and the other 3d PRGs are used to generate randoms for the LR gadgets.
which satisfies PINI-E.As a result, the composition is PINI-E.□ A.4. Application of PINI-E.

TABLE 5 :
The random bits used in SecADD with the application of R 1 , R 2 , and the situation without PRGs with d 2 ½3 for R 2 and d 2 ½50 for R 1 .