Software birthmark is a unique quality of software to detect software theft. Comparing birthmarks of software can tell us whether a program or software is a copy of another. Software theft and piracy are rapidly increasing problems of copying, stealing, and misusing the software without proper permission, as mentioned in the desired license agreement. The estimation of birthmark can play a key role in understanding the effectiveness of a birthmark. In this paper, a new technique is presented to evaluate and estimate software birthmark based on the two most sought-after properties of birthmarks, that is, credibility and resilience. For this purpose, the concept of soft computing such as probabilistic and fuzzy computing has been taken into account and fuzzy logic is used to estimate properties of birthmark. The proposed fuzzy rule based technique is validated through a case study and the results show that the technique is successful in assessing the specified properties of the birthmark, its resilience and credibility. This, in turn, shows how much effort will be required to detect the originality of the software based on its birthmark.
A software birthmark is an intrinsic property of software that is used to detect the theft of software systems. A software system can be stolen or pirated which ultimately results in financial loss to the owner organization. Software piracy is a global problem of unauthorized copying, installing, using, distribution, or sale of software other than what is officially documented as exclusive rights by the authors as described in relevant license agreement. With the growth of software development industry, manufacturing and use of Internet and software piracy have become a red alert sign for numerous software industries. Software companies encounter tremendous losses due to software piracy. On the other hand, software pirates earn huge sums of money from the piracy which they are doing. General international community is not yet aware of the serious crime that is being done. Software piracy happens in diverse ways which includes hard-disk loading, soft lifting, counterfeit goods, rental software, and bulletin board piracy [
Different advanced techniques are used for the detection and prevention of software theft, such as software watermarking and software fingerprints [
The contribution of this paper is to estimate software birthmark to show its effectiveness. The estimation is based on the well-defined properties of birthmarks. The method provides an intelligent solution to estimate two commonly used properties that are credibility and resilience which in turn will provide estimates of the birthmark. For this purpose, a fuzzy model has been designed which is based on membership function and fuzzy rules to provide an appropriate estimation for software birthmarks.
The structure of the rest of the paper is as follows. In Section
Till now, researchers have considered two important properties of a software birthmark to evaluate their effectiveness that are credibility and resilience. Zeng et al. [
Kakimoto et al. [
The following are the main concepts used to define the proposed birthmark estimation technique.
The software industry has faced huge financial losses due to the piracy of software. Software piracy is performed by end-users as well as the dealers. Software piracy causes serious problems which hinder the success of the international software industry. Piracy of software is a global problem of illegal copying, installation, use, distribution, or sale of software in any manner other than that expressed in the appropriate license agreement. The pirates gain easy benefits from the sale of pirated software which ultimately affects the business of the software industry. Figure
Software piracy.
The original licensed software offers a number of high valued benefits to the customers, including assurance of software quality, availability of upgrades, technical and manual documentations, and less bandwidth consumption. On the other hand, pirated software does not provide such kind of facilities. If an organization is using pirated software, there might be risk of failure of the system, which might put the organization at risk of huge financial loss.
Software birthmark is a unique property of every type of software which can help in detecting software theft. It is the intrinsic characteristics of a program or software that can be used to spot the theft. Comparing the birthmarks of software tells us whether a program or software is a copy of any other software or not. The following definitions of birthmark are given by Tamada et al. [
Suppose
Suppose
All program paths cannot be covered by the dynamic birthmarks; dynamic birthmark only detects the theft of the program. On the other hand, static birthmark is extracted by the static program analysis, that is, liable to the properties of overestimated program.
Software birthmark is classified into the following three categories [
Software birthmark is a promising technique used for the detection of software theft. Birthmark does not embed additional code or information in any form in the original program. Software birthmarks only extract the inherent characteristics from the original program to detect the originality of program [
The following sections define the proposed methodology to estimate software birthmarks.
In order to estimate the success of software birthmarks, researchers typically consider two properties, which are credibility and resilience [
According to Tamada et al. [
Let
Let
Property
Property
Figure
Properties based software birthmarks.
In the existing literature on software birthmarks, there is no model which exactly estimates the birthmark of software based on the properties of credibility and resilience. The proposed methodology helps to estimate the birthmarks of software based on these properties.
Fuzzy logic concept was developed by Zadeh in 1965 [
In the proposed method, the membership functions named mf1 in the range of (0–19), mf2 in the range of (20–39), mf3 in the range of (40–59), mf4 in the range of (60–79), and mf5 in the range of (80–100) are defined. Also, to plot fuzziness triangular membership functions are defined and used to represent weights. The triangular membership function has three parameters (
Details of fuzzy logic concept are given in Zadeh [
Estimating software birthmark is an essential part of software system development to get rid of the entire theft of the software system. Most of software theft threats are faced during the implementation of the software. Developers are still in confusion about how to handle such situations. If birthmarks of the system are estimated, then one can easily make decision about the alternate design. The proposed methodology, based on fuzzy concept, provides an estimation model to software birthmark. Initially inputs (properties of birthmark) are selected on the basis of which the birthmark(s) is to be estimated. On the basis of inputs, the membership functions are plotted. The membership function identifies the degree of relationship of the concept (data) to a particular area (data range). Five membership functions were plotted that are mf1, mf2, mf3, mf4, and mf5. The inputs and membership functions are combined in rule editor which forms fuzzy rules. A fuzzy inference system model is obtained based on membership functions and rules.
The following are the steps to design the proposed model. Perform domain analysis on software birthmark. Identify properties of software birthmark on which birthmark is to be estimated. Establish an input data base for these properties. Design the fuzzy inference system based on these properties (inputs). Define the membership functions for these properties (for both inputs and output). Design the fuzzy rules based on membership functions. Obtain a fuzzy inference system (model to estimate birthmark). Estimate the inputs accordingly.
The graphical representation of the algorithm is given in Figure
Graphical representation of the proposed algorithm.
The proposed work for estimating software birthmark has been carried out by using MATLAB fuzzy tool box [
The different membership combinations are given in Table
Membership function pairs.
mf1, mf1 | mf1, mf2 | mf1, mf3 | mf1, mf4 | mf1, mf5 |
|
||||
mf2, mf1 | mf2, mf2 | mf2, mf3 | mf2, mf4 | mf2, mf5 |
|
||||
mf3, mf1 | mf3, mf2 | mf3, mf3 | mf3, mf4 | mf3, mf5 |
|
||||
Mf4, mf1 | mf4, mf2 | mf4, mf3 | mf4, mf4 | mf4, mf5 |
|
||||
mf5, mf1 | mf5, mf2 | mf5, mf3 | mf5, mf4 | mf5, mf5 |
The fuzzy rules and model in the proposed methodology are given in Figure
Proposed fuzzy rules model.
The proposed model can further be explicitly explained in Figure
Detailed fuzzy rules model (inputs, membership functions, rules, and output).
The rules are as follows. If (credibility is mf1(0–19)) and (resilience is mf5(80–100)) then (output is (0–19)) (0). If (credibility is mf1(0–19)) and (resilience is mf4(60–79)) then (output is (20–39)) (0.2). If (credibility is mf1(0–19)) and (resilience is mf3(40–59)) then (output is (40–59)) (0.4). If (credibility is mf1(0–19)) and (resilience is mf2(20–39)) then (output is (60–79)) (0.6). If (credibility is mf1(0–19)) and (resilience is mf1(0–19)) then (output is (80–100)) (0.8). If (credibility is mf5(80–100)) and (resilience is mf1(0–19)) then (output is (80–100)) (0.8). If (credibility is mf4(60–79)) and (resilience is mf1(0–19)) then (output is (60–79)) (0.6). If (credibility is mf3(40–59)) and (resilience is mf1(0–19)) then (output is (40–59)) (0.4). If (credibility is mf2(20–39)) and (resilience is mf1(0–19)) then (output is (20–39)) (0.2). If (credibility is mf2(20–39)) and (resilience is mf2(20–39)) then (output is (80–100)) (0.8). If (credibility is mf3(40–59)) and (resilience is mf3(40–59)) then (output is (80–100)) (0.8). If (credibility is mf4(60–79)) and (resilience is mf4(60–79)) then (output is (80–100)) (0.8). If (credibility is mf5(80–100)) and (resilience is mf5(80–100)) then (output is (80–100)) (0.8). If (credibility is mf2(20–39)) and (resilience is mf5(80–100)) then (output is (20–39)) (0.2). If (credibility is mf3(40–59)) and (resilience is mf5(80–100)) then (output is (40–59)) (0.4). If (credibility is mf4(60–79)) and (resilience is mf5(80–100)) then (output is (60–79)) (0.6). If (credibility is mf3(40–59)) and (resilience is mf4(60–79)) then (output is (60–79)) (0.6). If (credibility is mf2(20–39)) and (resilience is mf4(60–79)) then (output is (40–59)) (0.4). If (credibility is mf2(20–39)) and (resilience is mf3(40–59)) then (output is (40–59)) (0.4). If (credibility is mf4(60–79)) and (resilience is mf3(40–59)) then (output is (60–79)) (0.6). If (credibility is mf5(80–100)) and (resilience is mf3(40–59)) then (output is (80–100)) (0.8). If (credibility is mf4(60–79)) and (resilience is mf2(20–39)) then (output is (60–79)) (0.6). If (credibility is mf3(40–59)) and (resilience is mf2(20–39)) then (output is (40–59)) (0.4).
Based upon the above rules, a fuzzy inference system is obtained for estimating software birthmark, which is given in Figure
Proposed fuzzy inference system.
Figure
Surface view of inputs and outputs (generated in MATLAB).
Once the fuzzy rules model is designed, inputs will be given according to the customer requirements to the model. The model will generate the output based on the fuzzy rules. Details of the proposed system, inputs, and output are given as shown in Table
Proposed model (inputs and output).
Model |
|
|
|
|
Name = “Credibility” |
|
|
|
Name = “Resilience” |
|
|
|
Name = “output” |
The present research work has been validated by a case study of small module for Android application. The Android radiocalc module consists of 109 lines of code. The methodology has been applied on a similar application for Android. The birthmark of the module has been estimated based on the properties of resilience and credibility.
We applied SandMark [
Inputs and value for the proposed model.
Inputs |
For |
|
Value in % | Value for proposed model | |
|
||
Credibility | 40% | 0.4 |
Resilience | 80% | 0.8 |
The defined inputs to the fuzzy model are described as follows. If credibility is equal to 0.4 (40%) and resilience is 0.8 (80%), these inputs are given to the fuzzification model (fuzzy inference system). Credibility 0.4 is the degree of membership function mf1 (40–59) and resilience 0.8 is the degree of membership function mf2 (20–39). It will give the output 0.500 from the degree of membership function based on the designed model. So from the results one can make a decision about the birthmark of the software.
A fuzzy inference system is designed which models the system which in turn estimates the birthmark of the software. Inputs are assigned to the model to check and estimate the software birthmark in terms of credibility and resilience. The designed model evaluates the inputs (which are given to the model) and gives results. On the basis of the given results, one can check the estimation of software birthmark for the properties of credibility and resilience. To check the validity of the proposed model, inputs were given as follows: out = evalfis
Software theft is a global problem of copying, stealing, and misusing the software without proper license agreement. Software birthmark is a capable technique to detect the theft of software systems. Software birthmark is an intrinsic characteristic of software used to detect the similarity of software. The estimation of software birthmark can play a key role in accepting the effectiveness of a birthmark. In this research, fuzzy logic has been used to estimate software birthmark(s), which is an efficient and powerful tool to tackle issues of uncertainty. This method is based on fuzzy rules which were designed from the fuzzy membership functions. Different techniques are used in practice but all are based on known information. In practice situations of uncertainty also arise. The proposed model works well in case of uncertainty and with unknown information. The model is based on the two properties of software birthmark, credibility and resilience. The model has been validated using some Android applications. Various experiments have been performed using different existing tools of code obfuscation and software birthmark(s) are estimated. Results produced by the proposed process show that the method is efficient and provides satisfactory results. The approach has been tested only for credibility and resilience as these two properties are considered as the most important properties of software birthmark(s). Therefore, these are selected here for model testing. In the future, the model can be expanded for a different set of properties.
The authors declare that there is no conflict of interests regarding the publication of this paper.