Semantically Secure Symmetric Encryption with Error Correction for Distributed Storage

A distributed storage system (DSS) is a fundamental building block in many distributed applications. It applies linear network coding to achieve an optimal tradeoff between storage and repair bandwidth when node failures occur. Additively homomorphic encryption is compatible with linear network coding. The homomorphic property ensures that a linear combination of ciphertext messages decrypts to the same linear combination of the corresponding plaintext messages. In this paper, we construct a linearly homomorphic symmetric encryption scheme that is designed for a DSS. Our proposal provides simultaneous encryption and error correction by applying linear error correcting codes. We show its IND-CPA security for a limited number of messages based on binary Goppa codes and the following assumption: when dividing a scrambled generator matrix Ĝ into two parts Ĝ1 and Ĝ2, it is infeasible to distinguish Ĝ2 from random and to find a statistical connection between Ĝ1 and Ĝ2. Our infeasibility assumptions are closely related to those underlying theMcEliece public key cryptosystem but are considerably weaker.We believe that the proposed problem has independent cryptographic interest.


Introduction
The world's ability to generate, process, and store information is growing at an exponential rate [1].The Internet of Things (IoT) has enabled objects to collect and share a vast amount of data enabling new applications and improving efficiency.In distributed IoT, intelligence is pushed to the very edge of the networks.Such decentralized approach has created challenges regarding the security and privacy of the collected data [2].A distributed storage system (DSS) is a widely used technology for storing data in a reliable way.It is one of the essential building blocks for distributed applications.Such a system consists of a collection of  storage nodes that may be individually unreliable but apply redundancy to make the system reliable as a whole.Coding schemes are applied to ensure its reliability and to reduce the bandwidth required for repair.In particular, linear network coding has turned out to offer good performance both in theory and in practice.
Complications arise if we cannot be certain that the storage nodes are well-behaved.Encryption needs to be applied to ensure the confidentiality of the data.However, traditional cryptographic primitives are ill-suited for network coding which requires that data packets from different nodes can be combined according to the coding scheme.Secure network coding [3][4][5][6] has been applied to ensure confidentiality in the information-theoretic security model.However, secure network coding incurs a cost on the storage capacity of the system.It decreases exponentially with the number of compromised nodes [7].Furthermore, in many cases the storage nodes are provided by a third party storage service provider leading to systems with zero secrecy capacity [8].
In this paper, we consider the confidentiality of network coding and, in particular, distributed storage systems in a setting where the adversary has complete control of the nodes but is computationally bounded.We devise a linear error correcting code based symmetric additively homomorphic encryption scheme that is compatible with linear network coding.There are several advantages of our scheme compared to ordinary encryption: (1) Linear network coding can be applied as if working directly with the plaintext messages.Linear operations on the ciphertext space transfer to the plaintext space upon decryption.
(2) The encrypted parts of the file do not disclose which part is which.The part information can be kept in the plaintext domain.It makes it impossible for the storage nodes or the adversary to eavesdrop on which subsets of the data the user requests.
(3) The plaintext data can be first authenticated and then encrypted.For storage systems, this ordering is often desirable to ensure plaintext integrity.Our scheme can support this functionality with an additively homomorphic message authentication code such as [9] meaning that all linear combinations of the plaintext messages are authenticated.
(4) Our scheme provides simultaneous encryption and error correction.
There are encryption schemes possessing additively homomorphic properties such as the Goldwasser-Micali scheme [9] and the Paillier cryptosystem [10].However, to apply coding schemes for distributed storage we need flexibility in choosing the ciphertext space field which, for efficiency reasons, is often an extension field of the binary field F 2 when working with big data [11].The required flexibility is not provided by existing proposals.
We construct a symmetric encryption scheme AddHomSE that is homomorphic from (F   , +) to (F   , +), where  <  and F  is a finite field.In particular, our security proofs are shown in the case where F  = F 2 the binary field resulting in a scheme that is homomorphic from the additive group (F  2 , ⊕) to (F  2 , ⊕).We also show that our construction is semantically (IND-CPA) secure in the standard model (on F 2 ) for a fixed number of messages showing that it provides indistinguishability for each individual part of the file.We apply problems that are closely related to the McEliece cryptosystem [12].In particular, we formulate an assumption that is related to the pseudorandomness of the McEliece generator matrix.However, our assumption is much weaker.We believe that the corresponding problem has cryptographic interest in its own right.
The paper is organized as follows.In Section 2 we present work that is related to ours.Section 3 describes the preliminaries for the rest of the paper.We formulate AddHomSE in Section 4. We show that the scheme is IND-CPA secure for a limited number of messages in Sections 5 and 6.In Section 7 we consider the infeasibility of the applied problems and discuss how the scheme can be applied in practice with compact keys.Finally, Section 8 provides the conclusion.

Related Work
The theory of confidentiality of distributed storage is related to that of network coding.Cai and Yeung were the first to consider secure network coding [3,4].In their security model, a passive wiretapper is able to eavesdrop on a subset of the links between nodes.The adversary is computationally unbounded and privacy is considered information-theoretically. A similar model was considered in [13][14][15][16].The security model of eavesdropping nodes, which is more natural for distributed storage, was suggested by Pawar et al. [8].In their model, a computationally unbounded eavesdropper can access data on her selection of the nodes.The maximum file size that can be stored with information-theoretic security in the DSS using an optimal bandwidth MDS code (with exact repair) is called the secrecy capacity of the DSS.Regenerating codes achieving the secrecy capacity were suggested by Shah et al. [5].Regenerating codes and locally repairable secure codes that achieve minimum storage requirements for a DSS were suggested by Rawat et al. [6].Multiple simultaneous node failures, cooperative regenerating codes, and their secrecy capacity were considered in [17].Kosut et al. considered networks where a node behaves traitorously [18].Multiple nodes containing adversarial errors were considered by Dikaliotis et al. [19].Pawar et al. considered an active omniscient adversary that has complete knowledge of the data on all nodes and can corrupt  nodes, where 2 <  [20].
The concept of homomorphic encryption was introduced by Rivest et al. [21].While fully homomorphic encryption enables arbitrary computations on ciphertexts, many proposed schemes have homomorphic properties over specific operations.For example, RSA [22] is homomorphic over multiplication.Additively homomorphic schemes enable the computation of linear combinations of the ciphertexts.For the Goldwasser-Micali scheme [9] and the Paillier cryptosystem [10] multiplication in the ciphertext space corresponds to addition in the plaintext space.The scheme proposed by Lyubashevsky et al. is additively homomorphic with a polynomial ring as the ciphertext space [23].Other asymmetric schemes with additively homomorphic properties can be found, for example, from [24][25][26][27][28][29].The functionality of public key encryption incurs a computational burden that is not needed in certain situations.For many applications, symmetric encryption suffices.Few symmetric schemes with the additive homomorphic property have been proposed.Some constructions, mostly concentrating on realizing fully homomorphic encryption, can be found from [30][31][32][33].In addition, the ciphertext and plaintext spaces in these schemes cannot be easily applied with linear network coding where we want to work with extension fields of the binary field F 2 for efficiency reasons.

Preliminaries
3.1.Notation.Standard notation will be used for probabilistic algorithms [34].We denote by  ← A(; ) the result of running a probabilistic algorithm A on input  with randomness  and setting  to be equal to the output.We denote the uniform probability distribution on a set  by ().If  is a random variable and F is a distribution, we denote  ∼ F when  is distributed according to F. A probability ensemble  = {  } ∈N is a collection of random variables indexed by the integers.The problem of computationally distinguishing between two probability ensembles  and  is denoted by (, ).
Whenever we refer to indistinguishability of probability ensembles, we mean computational indistinguishability unless stated otherwise.Security proofs are considered in the standard model.That is, all algorithms are considered to be probabilistic polynomial time (PPT) and time complexity is considered in the average case.The success probability (called the advantage) of an adversary A on a problem P is considered asymptotically as a function of a security parameter  and is denoted by Adk P A ().A function  : N → R is negligible if for every  ∈ N there is   ∈ N such that () ≤ 1/  for every  ≥   .A problem P is considered infeasible if for all PPT algorithms A the advantage Adk P A () is negligible.

Dynamic Distributed Storage.
Let  be a file consisting of  elements from a finite field F  .A dynamic distributed storage system (DSS) consists of  live nodes each storing  symbols over F  .These nodes can be individually unreliable but the system is designed to apply redundancy in a clever way to achieve robust and efficient data recovery against failures.The file Given such a codeword x, the part x  is stored into node .During operation, some of the nodes of the DSS may fail.If node  fails, a new node is added to the network.It contacts  live nodes and downloads  symbols from each.The total amount of downloaded data,  = , is called the repair bandwidth.The new node processes these symbols to reconstruct x  .The repair process is conducted so that data stored at  <  nodes allows  to be completely constructed (the " out of  property").A DSS satisfying such a property is often referred to as a (, )-DSS.
There is a tradeoff between the repair bandwidth  and the amount of data that can be stored in each node [35].Dimakis et al. suggested network coding [36,37] for distributed data storage in order to reduce the bandwidth of node repair [35].They introduced regenerating codes that achieve the optimal tradeoff between storage and repair bandwidth.This tradeoff can be achieved with linear network coding [20].See Figure 1 for an example of a DSS and the repair process after node failure.

Mutual Information. Mutual information of two random variables 𝑋 and 𝑌 is
where (, ) is the joint probability distribution function of  and , () is the marginal probability distribution function of , and () is the marginal probability distribution function of .We say that  and  are dependent if Generalizing this to probability ensembles  = {  :  ∈ N} and  = {  :  ∈ N} we say that  and  are dependent if for every  ∈ N.

McEliece Cryptosystem and Related Problems. The
McEliece scheme McEliece = (Gen, Enc, Dec) applies binary Goppa codes [38] to enable asymmetric encryption.The key generation algorithm Gen outputs a private/public key pair New Node 1 such that the private key consists of three matrices (S, G, P) with entries in F 2 , where P is an  ×  permutation matrix, S is a nonsingular  ×  matrix, and G is the generator matrix for a binary Goppa code that is able to correct up to  errors.The public key is the  ×  composition matrix SGP.A message m ∈ F  2 is encrypted by Enc by computing c = mSGP + e, where e is a randomly chosen error vector of Hamming weight .For the decryption, Dec first computes cP −1 = mSG + eP −1 and then decodes the corresponding Goppa codeword to obtain mS.Since S is nonsingular, the message m is computed by multiplying with S −1 from the right.A semantically secure version of the scheme can be found in [39].Here, semantic security refers to indistinguishability of ciphertexts under chosen plaintext attack.For details on semantic security, see, for example, [40].
The security of McEliece is based on a certain assumption on the generator matrix SGP.Let   denote the random variable determined by the probability distribution of sampling a generator matrix SGP according to Gen(1  ), where  is a security parameter.Let the probability ensemble  = {  :  ∈ N}.Let  denote the probability ensemble of random matrices with the same size as .The following hardness assumption was first formulated in [41].

Assumption 1 (pseudorandomness of McEliece generator matrix). There exists a negligible function 𝜖 𝑀 such that
for every  ≥ 1.
Gen: In addition to this pseudorandomness assumption, McEliece relies on the hardness of the learning parity with noise problem.However, we do not need to apply it in our scheme.

Additively Homomorphic Symmetric Encryption Scheme
In this section, we give a construction of a symmetric encryption scheme that is homomorphic from the additive group (F  2 , ⊕) to (F  2 , ⊕), where ,  ∈ N and  < .Due to linearity, it will be compatible with linear network coding.Our construction is inspired by the symmetric scheme suggested in [42], the homomorphic scheme suggested in [43] and the McEliece public key encryption scheme [12], and, especially, its IND-CPA variant [39].Similarly to the McEliece scheme, our scheme is based on binary Goppa error correcting codes [38].However, contrary to the McEliece scheme, we do not disclose the scrambled generator matrix.We also do not add any errors while encrypting which means that the full error correction capacity of the code can be utilized in applications.It would also be easy to adapt our proposal to apply other codes on an arbitrary finite field F  .However, binary fields and their extensions are useful for many applications since they enable efficient data combination due to efficiency of addition modulo 2 [11].
In general, the scheme operates as follows.Suppose that our file is divided into  parts constituting  plaintext messages m 1 , m 2 , . . ., m  .Each of these messages are padded with a random suffix z and encrypted by encoding with a scrambled generator matrix Ĝ of a linear error correcting code: c  = (m  , z) Ĝ.Note that the resulting ciphertexts can be linearly combined and the corresponding combination translates back to the plaintext space upon decoding due to linearity of the code.Furthermore, since the generator matrix is scrambled, an adversary is not able to determine the applied code and thus not able to decrypt the ciphertexts.In the following, we rigorously formulate this construction and the related computational assumptions.Based on computational indistinguishability, we then proceed to show its semantic security.

Definition 2 (AddHomSE). The symmetric encryption scheme
consists of a three-tuple of algorithms given in the following: is obtained by decoding cP −1 using the Goppa code, mapping the decoded message by S −1 and discarding the last  bits.
The key generation, encryption, and decryption processes are depicted in Algorithms 1, 2, and 3, respectively.
Note that contrary to the McEliece cryptosystem, the matrix SGP is not public.Instead, it is kept as a secret key.In addition, no error vectors are added in the encryption process.
We shall now proceed to show the IND-CPA security of our construction.Our plan is the following.We first show that AddHomSE can be divided into two parts, Enc 1 and Enc 2 , such that the output of Enc is the sum of the outputs of these two algorithms.We then proceed to show that Enc 2 produces Enc: Algorithm 2: AddHomSE encryption.
a probability ensemble that is indistinguishable from random under a certain (reasonable) assumption.We then consider the sum of the outputs of these two algorithms and proceed to show that (under another reasonable assumption) the complete encryption algorithm produces ciphertexts that are indistinguishable from random.
We start by showing that Enc can be expressed as a sum of two algorithms.Let the scrambled generator matrix Ĝ = SGP be partitioned into  * ×  and  ×  submatrices Ĝ1 and Ĝ2 such that (SGP)  = ( Ĝ1  , Ĝ2  ), where  denotes transpose.

Then we have
Enc ((S, G, P) , m; where Enc 1 is deterministic PT, Enc 2 is PPT, and   is the internal randomness used by Enc.Now, Enc 2 adds a different element z Ĝ2 ∈ F  2 to the output of Enc 1 determined by the randomness   .Suppose that we are encrypting  messages and that the output of Enc 2 is a truly random  (ℎ) ∼ (F  2 ) for every ℎ ≤ .Then for every ℎ ≤  and every plaintext message m ℎ the output of Enc would be characterized by and AddHomSE would satisfy perfect secrecy for  encryptions.In reality, the output of Enc 2 is not truly random.However, in the following we show that it is indistinguishable from random under a certain assumption.Then we consider the connection between Enc 1 and Enc 2 and, finally, the   indistinguishability of encryptions from random.For easier reference, variables used in the description of the scheme, as well as in the following proofs, have been collected into Table 1.Similarly, the used random variables have been collected into Table 2.
Security and Communication Networks

The Probability Ensemble Induced by Enc 2
In the following, we consider the probability ensemble such that  2,(ℎ)  = Enc 2 (S, G, P) = z ℎ Ĝ2 for every ℎ ∈ {0, 1, . . .,  − 1}, where (S, G, P) ← Gen(1  ) and z ℎ ← (F  2 ).Note that ,  * , , and  depend on the security parameter .In the following, we have made the dependence explicit.We can consider  2, as a random variable over F ×  2 by setting  2, = Z Ĝ2 , where Z is a  ×   matrix chosen uniformly at random.For convenience, we assume that  2, is written in such a matrix form.
5.1.Indistinguishability of Ĝ2 from Random.Our plan is to show the indistinguishability of  2, from random for all  ≤   .In order to do that we want Ĝ2 to be also indistinguishable from random.We could apply the McEliece assumption (Assumption 1) that states that the complete generator matrix SGP satisfies this property.However, such an assumption is too strong in our case.We derive a weaker assumption that relates only to Ĝ2 .) for every  ∈ N.For every PPT algorithm A, there is a negligible function  such that for every  ≥ 1.
If the generator matrix satisfies the formulated assumption, then Ĝ2 cannot be distinguished from random.Suppose that Ĝ2 is exchanged with truly random matrix.Let   = {   } ∈N be a probability ensemble such that    = ZG  , where Z ← (F ) and G  ← (F ).Let   = {   } ∈N denote the uniform probability ensemble such that    ∼ (F ), where   is determined by Gen(1  ).Clearly, the statistical distance for every  ∈ N and  ≤   since all of the elements of ZG  are uniformly random. ( We shall now provide a connection between Assumption 4 and the indistinguishability of  2, from   for  ≤   .Proposition 5.For every PPT algorithm A there is a PPT algorithm B such that for every  ≤   and  ∈ N.
Proof.The reduction is straightforward.Let  ∈ N be given and let A be a PPT algorithm considered as a distinguisher for  2, and   .Let us define the distinguisher B for  2 and  2 that is shown in Algorithm 4.
If X ←  2  , then B is invoked with   rows of a McEliece generator matrix.By the description of B, A is queried with a matrix sampled according to  2,  .Let now X ←  2  .Then A is invoked with an element sampled according to    and since B outputs the same bit as A, we have A direct consequence of Proposition 5 is the result we aimed for: indistinguishability of  2, from random under Assumption 4. Proposition 6.For every PPT algorithm A and  ≤ , for every  ∈ N.

Semantic Security for 𝑟 Messages
Let us now turn to the probability ensemble induced by the complete encryption algorithm Enc.We establish the semantic security of AddHomSE by proving that it satisfies ciphertext indistinguishability for up to   messages under two assumptions: Assumption 4 and a new one regarding independence of Ĝ1 and Ĝ2 .Let m 0 , m 1 , . . ., m −1 ∈ F  2 be any plaintext messages.Let   = {   } ∈N such that    = ( (0)  ,  (1)   , . . .,  (−1)
(1) procedure DepExp(A, , , )() output 1 (10) else (11) output 0 (12) end if (13) end procedure Algorithm 5 6.1.Computational Independence.Assumption 4 concerns the last part Ĝ2 of the generator matrix G.However, we need to also make an assumption regarding Ĝ1 .For example, suppose that it was possible that Ĝ1 = Ĝ2 .Then   would be easily distinguishable with high probability by choosing M = I, the identity matrix.To foil such attempts, we want Ĝ1 and Ĝ2 to be sufficiently independent of each other.We shall formulate an assumption concerning the mutual information of  1  and  2  .Let us define the following experiment in which we attempt to determine whether two probability ensembles are dependent.Suppose that we have three probability ensembles , , and .Suppose also that  is indistinguishable from .Furthermore, suppose that (  ;   ) > 0 while (  ;   ) = 0 for every  ∈ N. We define the experiment that is shown in Algorithm 5.
In the experiment, A is either given an element from   such that (  ;   ) > 0 or an element from   that is indistinguishable from   such that (  ;   ) = 0. Since  and  are indistinguishable, A succeeds in this experiment with nonnegligible probability only if it is able to find the dependability of   from   .Definition 7. Let  = {  :  ∈ N},  = {  :  ∈ N} be probability ensembles.We say that  and  are computationally independent if for every PPT algorithm A and every probability ensemble  = {  :  ∈ N} such that  is computationally indistinguishable from  and (  ;   ) = 0 for every  ∈ N there is a negligible function  such that Adk Dep(,,) A () = |2 ⋅ Pr [DepExp (A, , , ) = 1] − 1| ≤  () (16) for every  ∈ N. If this does not hold, then we say that  and  are noticeably dependent.
Assumption 8 ( Ĝ1 and Ĝ2 computationally independent).For every probability ensemble  indistinguishable and independent from  2 and every PPT algorithm A there is a negligible function  such that for every  ≥ 1.
The assumption states that it is not feasible to find any information that links Ĝ1 and Ĝ2 .The assumption is still weaker than the McEliece assumption that states that the whole Ĝ = SGP is indistinguishable from random.(If they are, then necessarily Ĝ1 and Ĝ2 are computationally independent.)However, Assumption 8 does not require Ĝ1 to be indistinguishable from random.In fact, our proofs do not depend at all on the structure of Ĝ1 as long as Ĝ1 and Ĝ2 are computationally independent.To make the scheme faster, we could, for instance, omit S and P from affecting the first  rows of the generator matrix G.
We are now ready to show the semantic security of AddHomSE by showing the indistinguishability of   from random.

Proposition 9. AddHomSE has indistinguishable encryptions for 𝑟 𝑠 messages under Assumptions 4 and 8.
Proof.Suppose that Assumption 4 holds.We establish the claim by showing that for every set of  ≤   plaintext messages m 0 , m 1 , . . ., m −1 ∈ F   2 and every PPT algorithm A there is a PPT algorithm B such that for  ∈ N, where   is induced by m 0 , m 1 , . . ., m −1 .Then, under Assumption 8, the advantage of A is negligible.Since  2 is truly random, we have ( 2 ;  2 ) = 0.In addition, by Assumption 4,  2 is computationally (1) procedure B(1  , Ĝ1 , X) ⊳X is either Ĝ2 or a random matrix (2) Z ← (F indistinguishable from  2 and therefore Dep( 1 ,  2 ,  2 ) is well defined.Let the security parameter  be fixed and let m 0 , m 1 , . . ., m −1 ∈ F  2 be any messages.Let M be the message matrix of m 0 , m 1 , . . ., m −1 .Written in the matrix form, we have    = M 1  ⊕  2,  and the elements are of the form where ).Let A be any PPT algorithm considered as a distinguisher for (  ,   ).Using A, we construct an algorithm B that determines the dependability of  1 and  2 (see Algorithm 6).
Suppose that the input X is random matrix.Then is a truly random matrix.Therefore, A was invoked with a matrix sampled according to    .Suppose now that X = Ĝ2 .Then and Y was sampled according to    .Since B outputs the same bit as A, we have AdHomSE is IND-CPA secure under Assumptions 4 and 8 whenever the adversary is restricted to at most   queries to the encryption oracle (the test query included).Considering a DSS, whenever the dataset is divided into at most   parts, each of those parts remains secret even under a chosen ciphertext attack where the adversary is able to choose each of those parts separately and adaptively.

Infeasibility, Key Size, and Error
Correction Capacity Choosing  * ≈ 0.8 maximizes the complexity of information set decoding attacks [44].
For AddHomSE, the attacker is not given the generator matrix.Instead, the attacker gets at most  scrambled messages under an adaptive chosen plaintext attack.Therefore,  can be drastically lower for AddHomSE.We suggest  = ⌊ * /2⌋ − 1 and  =  * −  so that randomization length is slightly more than half of the input.The rate  * should be kept close to a constant.We suggest choosing a rate  * that is close to 0.8 due to information set decoding attacks [44].

Key Size.
The key size of AddHomSE is big if truly random matrices are used.In a practical setting, we want to use pseudorandom matrices for S and P. The key size is dramatically decreased by exchanging these matrices with a short seed  and generating S and P using a pseudorandom generator G.The generating matrix G of the Goppa code can be derived from the Goppa polynomial () and pseudorandom elements generated by G. Therefore, in practice, the key can be compactly presented by the seed  and the polynomial ().
Typically, in a distributed storage systems we want to encrypt files or file systems that are huge.If a large file is divided into few parts, we do not want to consider each part as a single plaintext message since such an approach would require  * and  to be at least as large as the length of the file part.In such a case, we can further divide the part into smaller blocks and encrypt those block independently.Such an approach enables us to select small and efficient values for  * and .Note that such a division does not affect the homomorphic property of the scheme provided that each of the file parts are processed similarly and encrypted with the same keys.It also does not have an effect on the key size since the keys of those individual blocks can be derived from the same seed  and the polynomial ().

Error Correction.
Due to requirements of semantic security and error correction, ciphertexts contain overhead compared to plaintext messages.For example, with (,  * , ) = (256, 200, 100), where the rate  * ≈ 0.78, plaintexts of length 100 will be encrypted into ciphertexts of length 256.The scheme can correct up to  errors, where  is the degree of the Goppa generator polynomial.With these parameters, we should choose  ≥  min = 5 [47].Choosing the smallest , which results in the most efficient implementation, enables us to correct up to 5 errors in each 256 bits meaning that the plaintext messages are correctly decrypted with high probability whenever the error rate is less than 2%.If more error correction capacity is needed, then a higher degree Goppa generator polynomial needs to be selected and/or the rate  * should be lowered.As a final remark, we note that the binary Goppa code can be exchanged with another linear code on a finite field F  .However, we have only shown the security of AddHomSE based on the indistinguishability of a scrambled Goppa generator matrices.The applied linear code has to satisfy a similar infeasibility result.

Conclusion
We propose an additively homomorphic symmetric encryption scheme AddHomSE that is compatible with linear network coding: a linear combination of ciphertext messages decrypts to the same linear combination of corresponding plaintext messages.The scheme can be used for the encryption of data stored in a distributed storage system (DSS), for example, in the distributed Internet of Things.We show that the scheme is semantically secure (IND-CPA) and provides computational indistinguishability for each individual part of the file stored in the DSS.In combination with an additively homomorphic MAC our scheme supports the authenticate-then-encrypt paradigm that ensures plaintext integrity.Finally, based on Goppa codes, our scheme offers simultaneous error correction.Our proofs are shown for the binary field F 2 which is commonly used for the implementation of a DSS due to computational efficiency reasons.We also discuss the selection of secure parameters for the scheme and explain how it can be applied with compact keys.

Figure 1 :
Figure 1: An example of a distributed storage system with linear coding.A file  = ( 1 ,  2 ,  3 ,  4 ) is distributed to  = 4 nodes each storing a vector of two parts of the file ( = 2).The file is safe if one node fails.If Node 1 fails, it can be replaced by communicating only three blocks ( 4 ,  2 ⊕  4 ,  1 ⊕  2 ⊕  4 ) instead of all four.

( 1 ) 7 )( 3 )
Gen(1  ): based on the security parameter 1  , Gen chooses a randomization length , a linear [,  * , ]error correcting Goppa code over F 2 with a generator matrix G such that  * > .It also samples a random nonsingular  * ×  * matrix S and a random  ×  permutation matrix P. It then sets the cleartext length to be  such that  * =  + , where  ≤  − 1 and sets ,  as public parameters and outputs (S, G, P) as the secret key.(2) Enc((S, G, P), m): the input consists of a key (S, G, P), a plaintext m ∈ F  2 .It then samples a random z ←  (F  2 ) (6) and encodes the concatenation (m, z) ∈ F  * using SGP to obtain a ciphertext message c = (m, z) SGP ∈ F  .(Dec((S, G, P), c): the input consists of a key (S, G, P) and a ciphertext c ∈ F  2 .The plaintext message m ∈ F  2

Definition 3 .
Let  = {  } ∈N denote a probability ensemble of McEliece generator matrices (chosen according to some schema) such that   is distributed over matrices of size  *  ×  *  for every  ∈ N. Let  1 = { random non-singular matrix S ∈ F  * × *

Table 1 :
The variables used in AddHomSE and in its proof of security and their descriptions.

Table 2 :
The used random variables and their descriptions.
2, = { 2,  } ∈N induced by Enc 2 for  encryptions.That is, we have a -tuple ) for every  ∈ N, where  1 is distributed over matrices of size   ×  and  2 is distributed over matrices of size   ×   , where   +   =  *  and   and   are chosen according to Gen(1  ).
7.1.Infeasibility of the Problems.Let us briefly consider the infeasibility of the underlying problems related to AddHomSE.The IND-CPA security is based on assumptions that are weaker but closely related to the ones underlying the McEliece scheme.The selection of parameters for the