Distortion-Free Watermarking Approach for Relational Database Integrity Checking

Nowadays, internet is becoming a suitable way of accessing the databases. Such data are exposed to various types of attack with the aim to confuse the ownership proofing or the content protection. In this paper, we propose a new approach based on fragile zerowatermarking for the authentication of numeric relational data. Contrary to some previous databases watermarking techniques which cause some distortions in the original database andmay not preserve the data usability constraints, our approach simply seeks to generate the watermark from the original database. First, the adopted method partitions the database relation into independent square matrix groups. Then, group-based watermarks are securely generated and registered in a trusted third party. The integrity verification is performed by computing the determinant and the diagonal’s minor for each group. As a result, tampering can be localized up to attribute group level. Theoretical and experimental results demonstrate that the proposed technique is resilient against tuples insertion, tuples deletion, and attributes values modification attacks. Furthermore, comparison with recent related effort shows that our scheme performs better in detecting multifaceted attacks.


Introduction
Nowadays, with the wide distribution of digital data, it is necessary to protect them against illicit copying, intellectual property theft and falsification.Digital watermarking is part of the big family of information security that has been proposed to overcome the above issues.Traditionally, its purpose is to protect a digital content by embedding a secret mark (watermark) into the original data.The technique initially designed for image has been extended to other digital data such as video, audio, software, text document, and relational database.This paper focuses on numerical relational database watermarking.
In the literature, most of the research works in relational database watermarking focused on the following three main applications regardless of the database attributes type (numeric, nonnumeric, or mix format): database ownership protection, content authentication, and fingerprinting.Usually, the ownership protection and the fingerprinting techniques [1][2][3][4] can be considered as part of robust watermarking approach, whereas the authentication or integrity checking [5][6][7] is based on the fragile watermarking technique.Whether a watermarking scheme is fragile or robust, it may suffer from intentional or unintentional attacks which may destroy the watermark [8].Benign update may cause the modification or deletion of marked tuples.Bit and rounding attacks intentionally try to demolish the watermark by changing the bits position from the marked value.In collusion attack, the attacker has access to many fingerprinted copies of the same database and inserts his watermark into the marked database to claim the ownership.The false claim of ownership attack attempts to give to a traitor the evidence that there is a doubt that the data belong to the owner.In the subset reverse order attack, the attacker hopes to destroy the watermark by simply exchanging the tuples or attributes positions in the relation.The brute force attack is made by trying to guess the private data used in the embedding process by traversing the possible spaces of the parameter.Unanimously, a good database watermarking technique should meet the following challenges: (i) imperceptibility: the embedded watermark should be invisible and the watermark insertion process should not degrade the data usability, (ii) robustness: the watermark should be robust against attack with the aim to destroy the watermark, (iii) security: the watermarking embedding process must use a private key for the security purpose, (iv) blindness: the watermarking detection process should not require the knowledge of original data and the watermark information.
Since the first well known effort which showed the need of watermarking relational database [9], the area gains a lot of interest and many works have been done on it.Guo et al. identified the problem and the importance of fragile watermarking to verify the integrity of numeric streaming data [10].Their technique can detect and locate any modification made to a data stream.In [11], the authors focused on the connection between relational database and optimization techniques.This method considered the watermarking issue as a constrained optimization problem for the embedding process; the watermark decoding process is based on a threshold based technique to minimize the probability of decoding errors.The authors in [12] proposed the reversible watermarking algorithm which enables recovering the original data back from tampered data.This technique is primary key dependent and is not flexible to linear transformation attack since the watermark embedded into some selected tuples cannot detect if marked tuples are deleted.The work in [13] which seeks to protect the database integrity proved that the database tables' indexes would improve the detection of unauthorized alterations.In this regard, the author used R-tree scheme data structure which does not change the value of the attribute.The proposed work in [14] argues about the rewatermarking attack.The approach integrated the watermark information from the date stamp to overcome the problem.Furthermore, the bi-folder security scheme is used to detect and resolve conflicting ownership issues in case of rewatermarking attacks.
Our approach is a distortion-free fragile watermarking technique; it does not change any data value from the database.In the literature there is a huge research work on relational database watermarking but only few of them are focused on fragile watermarking technique [7,15].
In this paper, the following contributions have been accomplished.(i) We identified that a numerical database can be securely partitioned into a set of square matrices (groups).As a result, some matrix-based properties like determinant could be applied to check the integrity of the database.(ii) We designed an algorithm that can detect and localize the tampered region in the database at group level.(iii) We conducted experiment to show the proposed technique feasibility and usefulness and also compared our technique with previous work.
This paper is organized as follows.The next section discusses the related work.In Section 3, the basic terminology and concept used in this paper are explained.Section 4 describes the details of our proposed scheme; the performance of our approach is also discussed.The results of our experiments are discussed in Section 5. Finally, Section 6 concludes our paper and provides some guidance for future work.

Related Work
Agrawal et al. [9] identified the need for watermarking techniques in relational database; they embedded the watermark in the least significant bits (LSB) of selected attributes of some selected tuples.A secure message authentication code (MAC) is computed using secret key and primary key for each tuple to select the candidate tuples, attributes, and LSB position for watermark embedding.Inserting watermark bits in LSB is efficient, but the watermark can be easily compromised by bits attacks.Guo et al. [15] proposed a fragile watermarking scheme to detect, localize, and characterize the malicious modifications of relational database.The technique used numeric data and divided all tuples into groups according to their primary key hash values.Furthermore, their technique modify two bits in the data LSB values and may not preserve the data usability constraints.In [5], another work used fragile watermark technique to detect and localize malicious alterations made to the relational database with categorical attribute.It does not introduce any distortion to the cover data; it is a distortion-free approach.Database tuples are securely parsed into groups; watermarks are then embedded and verified at group level independently.In [16], the authors presented a zero distortion watermarking technique to verify that the integrity of relational databases is introduced.The partitioning technique used is a virtual group operation which generates image of the partition as a watermark of that partition.
Recently, Khan and Husain [17] proposed a fragile scheme based on zero watermarking technique to protect the integrity of database relations.Their technique is algorithmically based on evaluating the local characteristics of database relation like data values frequency distribution of digit, length, and range.Moreover, their technique can characterize the malicious data in the data set using data parameters like the fraction of digit, length, and range to quantify the nature of tampering attack.Their technique is not resilient to attribute values substitution attack.In this paper, we present a distortionfree watermarking technique which partitions a database into groups of square matrices and then generates the watermark.

Basic Concept and Terminology
In this section, we present some notations, parameters, and mathematical approaches used in our proposed algorithm.
We consider  being a relational database having  attributes,  tuples, and the following scheme (,  0 , . . .,  −1 ), where  is the primary key and  0 , . . .,  −1 are the  attributes of the database .The primary key  may be one or a combination of some attributes of (  ) 0≤≤−1 ∈ .Notations Section summarizes some parameters used in our algorithms.
A matrix is a rectangular array of numbers.A matrix with  rows and  columns is called an  ×  matrix [18].The matrix may contain complex numbers and each number that composes the matrix is called an element of the matrix.Two matrices are equal if they have the same number of rows and the same number of columns and above that the corresponding entries in every position are equal.A matrix   is called square matrix if its numbers of rows and columns are the same and it is represented by  = ( ).The determinant of a square matrix "" denoted by "det " or simply || is the real number that can be computed from the square matrix ; formula (1) shows its computation formula.The minor of any element   of a square matrix "" denoted by   is the determinant obtained when the row and column of that element are deleted.Let "" be a square matrix of order  with coefficients in R; the determinant of  is computed as follows: where   is obtained by deleting the first column and th row.The ceiling function assigns to the real number  the smallest integer that is greater than or equal to .The value of the ceiling function at  is denoted by ⌈⌉.

Proposed Approach
The different phases of the proposed approach are detailed in this section.Our technique is distortion-free watermarking based on using numeric data; it does not embed any watermark in the original database.Figure 1 shows the architecture of the proposed watermarking approach; there are two main phases: the watermark generation and certification phase and the watermark verification phase.
The watermark generation and certification phase focuses on the characteristics of the content of the subsets of numeric database values which are summarized as follows.
Step 1 (data set partitioning).The secret key  and the number of group ] are used to partition the data set  to obtain ] different square matrix groups or partitions { 1 , . . .,  ] }.
Step 2 (watermark generation).Individual watermark is first computed for each group and the computed watermarks are then concatenated to obtain the data set watermark   .
Step 3 (watermark encryption).The data set watermark   is encrypted using a secure hash function to obtain the encrypted data set watermark    .
Step 4 (watermark certification and registration).The watermark certificate   is obtained by concatenating the encrypted data set watermark    , the data set owner ID ( ), and the coordinated universal time date time stamp (UTC).The watermark certificate is finally registered at a Certification Authority (CA) for certification purpose.
Our watermarking approach does not modify the data set but simply computes some information from the data set.The data set  is delivered to the intended recipient, and the data set  may be intercepted by an attacker through the insecure channel or attack channel and may be subject to intentional or unintentional attack which necessarily modifies the data set .
The watermarking verification phase is the process of comparing the data set watermark   registered at the certification authority and the data set watermark    from suspicious data set.
The watermark verification is depicted as follows.
Step Step 3 (watermark extraction from the CA).The data set watermark   is extracted from the watermark certificate registered at the certification authority.
Step 4 (watermark comparison).The data set watermark    and the data set watermark   are compared for the integrity verification of the target data set .

Data Partitioning.
The proposed data partitioning technique is described in Algorithm 1.The data set  that is composed of  tuples and  attributes is partitioned into ] different groups as follows: ( 1 ,  2 , . . .,  ]−1 ,  ] ), where   ̸ =   for  ̸ = .Our parsing process exploited the idea of the partitioning technique used in [11].The difference between the two algorithms resides in the number of desired partitions.The number of partitions in [11] varies depending on the number of partitions decided by the data owner but for our approach the number of partitions is V = ⌈/⌉ such as  the number of data set tuples,  the number of data set attributes, and ⌈ ⌉ the mathematical ceiling function.The secure message authentication code (MAC) is computed for each tuple  that belongs to  and few tuples   with the size that is equal to  are inserted logically into each partition  (1≤≤) .The tuples partitioning assignment to the group is given as follows:  = Hash(‖  ⋅ ‖) mod ], where   ⋅  is the primary key of the tuple   ,  is the secret key, ‖ is the concatenation operator, and Hash() is the secure hash function.Furthermore, if the database does not have any primary key, the primary key can be a combination of few attributes from the relational database.
If  mod  ̸ = 0, we simply securely insert records to complete the last group in order to form a square matrix and we assume that there is no identical tuple in the same group.Note that the added records will be deleted after the watermark computation.
To the best of our knowledge, no partitioning technique parses the database relation into square matrix-based groups.The group number is kept secret in order to reinforce the security of our scheme; however, even if an attacker knows the number of group used in watermark generation process, he cannot generate the groups because you need to know the secret key which is kept secret to do that.
The major advantage of such partitioning method is that many properties of square matrix can be applied to check the group's data integrity.By knowing the number of attributes and tuples of a data set, the number of group can be easily computed so as each group attribute cardinal equals its tuple cardinal.

Watermark Group Generation.
In this step, the group watermark generation approach (Algorithm 2) is described.According to each group primary key hash value, the  tuples are first sorted in a certain order (increasing order in our algorithm); this process will not physically affect the group data values.In lines 5-6, the determinant value of the group and the different values of the minor of diagonal values of the group are computed and finally the different computed values are concatenated to get the watermark value of the group.The watermark values of each group are used in the watermark certificate computation.

Watermark Computation and Registration.
The watermark generation method is presented in Algorithm 3. In lines 1-2, the data set is partitioned into groups in such a way that each group has equal number of attributes and tuples.In line 3, the watermark is computed from each group as follows.First, the tuples are sorted in ascending order by considering their primary key hash values.Note that this operation is secure virtual operation; it is governed by the use of a secret key and it does affect the tuples physical position or the primary key value.Then, to compute the watermark of each group, we consider the group as square matrix and we compute and concatenate its determinant and the minors of each diagonal values as described in Section 4.2.It is important to note that for a matrix it is difficult to find two matrices having the same determinant and diagonal's minors.In lines 3-4, the database watermark group is computed by the concatenation of different group's watermarks.The secure hash function and the secret key are used in line 5 to encrypt the data set watermark.Finally, the encrypted watermark is concatenated with additional information (owner ID and coordinated universal time (UTC) time stamp) and then Input: Group   Output: Group Watermark   (1) Begin (2) for  = 1 to ] do (3) Sort all tuples in   according to the increasing order of their primary key hash (4) compute the determinant   of   // th determinant group (5)  registered with the certification authority (CA) which is a trusted party.The purpose of CA is to identify a decision authority for the suspicious data set authentication.The owner ID and the UTC may give more reliability to the certification authority on the owner of the database.

Integrity Checking.
The integrity of a suspicious database is verified in Algorithm 4. In lines 2-3, the watermark certificate registered at the CA is recuperated and the original relational watermark   is extracted using the secret key  used in watermark generation.
The suspicious data set   is partitioned using the technique used in the watermark generation; afterwards, the watermark    is computed in line 6; note that the watermark generation technique is the same as the one used in Algorithm 2. In lines 7-10, the two generated watermarks are compared, and if they are different, the data set has been modified and if there are the same this means that the original data set is not tampered.To verify the integrity or the tamper proofing of a suspicious data set, the CA can solve the above problem by partitioning the suspicious data set   into different groups, and each group watermark is computed and compared with its corresponding watermark extracted from the original watermark group registered.If the two watermarks differ from each other, this means that the original data set  has been modified at the group index level.In the same manner, the data set is modified.Furthermore, the CA can solve multiple watermark conflicts by comparing the UTC date and time issue from the data set owner to the attacker.It is important to mention the blindness of our technique because it does not require the original database or the original watermark information in the watermark detection.

Security Analysis.
In this section, we provide the security analysis of our scheme using the theory of probability.Since the proposed technique is fragile in nature, the suspicious database can be subject to many attacks with the aim to maliciously modify the protected data while not touching the certified watermark.We analyze the success of the use of probability to change the database while keeping the watermark intact.We suppose that an attacker has access to the data partitions.Our watermark computation is based on the calculation of the determinant of group matrix.By substituting twice two columns of a matrix, the value of its determinant remains the same.Let  be a data set having  attributes and  tuples; an attacker may generate the partitions of the database (recall that partitions are considered as square matrices) and succeed to substitute columns to get the value of determinant with the probability  succes () but cannot generate the expected values of minor for the same matrix.Since our technique is based on fragile database watermarking, any change made to the marked data should affect highly the original watermark.In this study, we deal with the case of interchanging data values.
We treat the above problem as a distinct permutation of  objects taken  at a time and represented by the symbol Pr and Pr = (, ) = !( − )!.We analyze the case of large data set that can be partitioned to a number of square matrices.Suppose that each matrix has at least 3 attributes ( ≥ 3).The probability to successfully compute the determinant of a single group is Recall that  is the number of database tuple and  the number of database attribute.The probability of successfully modifying the data set group by preserving it determinant value is It is clear for large data set that we have 4 * /( 2 )! ≺ 0 and / ≻ 1.So, for such databases the probability of successfully modifying the database by preserving each group determinant value is: It is clear that this probability is very small and approaches zero.Consider

Experimental Results
We performed experiments to show the effectiveness and the accuracy of the proposed technique.We made sure that the subject data set contains some numeric attributes.We evaluated our approach on a Forest Cover Type [19], a real life data set containing 581,012 tuples and each tuple contains 10 integer attributes, 1 categorical attribute, and 44 Boolean attributes.We added a primary key attribute to a Forest Cover Type for experiment purpose.We used Microsoft SQL Server 2008 running on 2.2 GHz Intel Dual CPU with 2 GB of RAM computer.
We first generated the original data set watermark   ; afterwards, we simulated different attacks against it with the aim to generate the same computed original watermark.In the detection process, the same secret parameters used in watermark generation are used.The resilience of our approach and Khan and Husain approach [17] are verified by performing different attacks on the cover Type database.The accuracy of our approach after an attack on the database is verified by computing and comparing the original watermark   and generated watermark    obtained after a modification of data set .
We also measured the cost of our algorithm, the watermark computation, and registration program and the watermark verification program was run twice.The average times needed for the experiments are, respectively, 5281 seconds and 4952 seconds.The data set integrity verification is costly but can be performed once a day mostly depending on the need.
We randomly tested two kinds of attacks against the watermarked relation: (i) common attacks: tuples insertion, tuples deletion, and attributes values alteration and (ii) multifaceted (substitution) attacks in which the pirate interchanges attributes values and/or replaces some tuples by others while preserving the original size of the database relation [20].A multifaceted attack is performed by an experienced Mathematical Problems in Engineering  attacker who may be aware of the whole process but does not know the secret key. Figure 2 shows the results of the different performed experiments.We have also compared our approach with Khan and Husain technique [17].These results (Table 2) clearly demonstrate that Khan's technique is not resilient to attributes values interchanging attack, whereas with our scheme even a minor data change has significant impact on the extracted watermark.By knowing the number of groups used in the watermark computation process, after massive deletion or insertion attacks, we may have one or more incomplete groups than the expected one.For all types of attacks in fragile watermarking, the aim of an attacker is to alter the original database by keeping the watermark intact.
The experiment results of all attacks are showed in Figure 2.

Common Attacks.
Such attacks are traditionally performed to verify the integrity of a relational database watermarking technique.
(i) Insertion Attacks.Both approaches have been tested against insertion attacks.We inserted randomly and progressively new tuples in the data set .Both techniques are resilient to insertion attack, and the attack is easily detected.In our approach if the insertion reaches a certain threshold, by comparing the number of groups of the suspicious database and the group ], the tampering can be easily detected.Using the number of group should not dispute (doubt) our technique blindness because the number of group is not the original database, it is used in watermark integrity checking process as watermark function input.
(ii) Deletion Attacks.Tuples are deleted progressively from original database and the watermark verification algorithms of both approaches compute the different watermarks from the deleted database.Experiments show that the two techniques are resilient to deleting attack even if the rate of deleted tuple is small.From our algorithm partitioning process we can easily realize that the suspicious database group is smaller than the expected one ]; then, the tampering is proved.
(iii) Alteration Attacks.We assume that he attacker does not have access to the original data set  or the secret key .We randomly modify progressively the original database data values.We can see that any change in the data set is detected after comparing the original watermark and the one from the modified data set.Experimental results reported in Figure 2 showed that the two approaches are resilient against the alteration attack; the tampering is proved even if one data value of the database is modified.

Multifaceted Attacks.
In such attacks, we alter the database as a sophisticated attacker.
(i) Attribute Values Substitution Attacks.In this kind of attacks, we change the position of some attribute values of the database.Our approach is resilient to data value interchanging attack and can prove the nonintegrity of the database even if two data values have been exchanged.In contrary, khan's approach is not resilient to such attack, after computing the suspicious database watermark; we may erroneously conclude the integrity of the database.
(ii) Tuples Insertion-Deletion Attacks.To deal with this kind of attack, we progressively delete and add the same number of tuples (if  tuples are deleted,  tuples are inserted).The reason is that our technique can easily detect any diminution or augmentation of tuples in the data set; it cannot generate the expected number of groups.Experimental results in Figure 2 showed that the two approaches are resilient against the combined insertion and deletion attack.

Discussions
The experiments showed that our technique is feasible and resilient to malicious attacks.The massive insertion and deletion attacks may be easily detected up to the group partitioning process.The reason is that the number of group is known by the owner and the partitioning process may fail by giving an unexpected group after a high number of tuples insertion or deletion.By changing around the group data values, it may be possible to get the group determinant value, however, getting the diagonal minor of the same group is more complex, may be impossible.Accordingly, it gives more reliability to The success probability that an attacker can modify a group and retrieve the determinant and minor values computed from watermark generation phase approaches zero.After a wide experimentation and researches, we have not found one possible case.Suppose that an attacker can generate the data set groups, he (she) can modify the data set group and obtain the same determinant values computed from watermark generation phase by substituting an even number of group attribute.After such attack it is clear from Figure 2 that the probability that the attacks fail approaches 1.All the experimental results are reported in Figure 2. We use minor because the data set can be modified, while the value of the determinant remains the same; adding minor gives more strength to our technique, and it will be difficult or even impossible to compute the same minors values.Moreover, it can help to localize the tampered area.Algorithm 5 explains how a tampered area can be detected from a specific group after slight alteration.
The experimental results indicate that our algorithm performs well enough to be used in real world database applications.We are planning to reduce the cost of our approach by computing and selecting some individual square matrix for watermark certificate computation and verification.
In our approach partitioning process, supposing the data set had  tuples,  attributes, and ⌈/⌉ ̸ = 0, we securely complete the last group to square matrix.For example, if  = 14 and  = 3 we can generate 5 groups: 4 groups with 3 attributes and 3 columns considered as 4 square matrices, but the 5th group is incomplete and one more tuple should be added to it to complete it to 3 tuples.To deal with such case, we complete the 5th group to 3 rows by adding the first tuple of the first group.The added tuple must be deleted after watermark generation.
The observation from Figure 2 shows our approach resilience against various attacks and when an attack occurs, the probability that it fails is 1.Furthermore, a sophisticated attacker may not be able to succeed any alteration to our database without affecting the watermark.
From the comparison between our approach and Khan's approach, one important difference is the resilience of our approach against database attribute values substitution attacks.For example, let us consider a simple example given by the following student records as shown in Table 1.
The database has been altered; the two score values of the record  have been substituted and accordingly new altered record   is created.By computing the determinant and the diagonal minor values of the two database values, we can detect the nonintegrity of the database.But, by computing the length, the range, and the digit of the database attributes values, we may erroneously conclude that the integrity of the database remains intact.Our technique is resilient to such kind of attack, but khan et al. 's approach is not resilient to data values substituting attacks.Another difference is that our approach is best suited for a database having at least two numeric data attributes and khan's approach can be used even with a single numeric attribute database.

Conclusion and Future Work
In this paper, a novel fragile watermarking technique for database integrity verification is presented.The proposed technique is based on distortion-free watermarking concepts and does not modify the original data.Our technique partitions the data set into different sets of square matrices and generates the relational watermark using the determinant and the minor of the generated square matrix.We have compared our technique with previous technique and experiments to show the simplicity and usefulness of the proposed technique to verify the integrity of database after data values interchanging attack and other malicious attacks.Our approach overcomes some limitations in existing fragile watermarking techniques like preserving the data usability constraints and the integrity verification when various attacks occur, and it may retrieve the tamper attribute up to group Se c r e tk e y .

Figure 1 :
Figure 1: Architecture of watermark computation, certification, and verification process.
] groups ( 1 ,  2 , . . .,  ]−1 ,  ] ) of length  each 1 (data set partitioning).The secret key  and the number of group ] are used to partition the data set   from the insecure channel which allows third party to have access to the data set.The data set   is partitioned into ] different square matrix groups or partitions {  1 , . . .,   ] }.Step 2 (watermark generation).Individual watermark is first computed for each group and the computed watermarks are concatenated to obtains the data set watermark    .Input: Data set relation , Number of groups compute the minor of   (0≤≤) of ( , ) 1≤≤ // minor of th group diagonal (6) compute   =   || Suspicious database   , Secret key K, Input: Group Watermark   , and Input:

Table 2 :
Comparison with Khan's approach.The proposed approach does not tolerate any alteration in the numeric database; it therefore detects malicious attacks with high probability.In the future work, we are planning to insert our own watermark into the original database by controlling the usability constraints and extend our approach to nonnumeric data.
:N u m b e ro ft u p l e si nt h ed a t a b a s e :N u m b e ro fa t t r i b u t e si nt h er e l a t i o n