Universal Keyword Classifier on Public Key Based Encrypted Multikeyword Fuzzy Search in Public Cloud

Cloud computing has pioneered the emerging world by manifesting itself as a service through internet and facilitates third party infrastructure and applications. While customers have no visibility on how their data is stored on service provider's premises, it offers greater benefits in lowering infrastructure costs and delivering more flexibility and simplicity in managing private data. The opportunity to use cloud services on pay-per-use basis provides comfort for private data owners in managing costs and data. With the pervasive usage of internet, the focus has now shifted towards effective data utilization on the cloud without compromising security concerns. In the pursuit of increasing data utilization on public cloud storage, the key is to make effective data access through several fuzzy searching techniques. In this paper, we have discussed the existing fuzzy searching techniques and focused on reducing the searching time on the cloud storage server for effective data utilization. Our proposed Asymmetric Classifier Multikeyword Fuzzy Search method provides classifier search server that creates universal keyword classifier for the multiple keyword request which greatly reduces the searching time by learning the search path pattern for all the keywords in the fuzzy keyword set. The objective of using BTree fuzzy searchable index is to resolve typos and representation inconsistencies and also to facilitate effective data utilization.


Introduction
Cloud computing [1,2] makes the infrastructure, platform, and software as a service for the worldwide users. The cloud paradigm makes the user outsource their personal data to the cloud storage [3] server which facilitates the users' access to their data anywhere at any time. The data users of the cloud storage have to pay only for the actual storage they use. Some companies will use the cloud storage for their data backup.
Problem Formulation. We highlight here that the cloud storage server has the responsibility to keep their customer data to be available and accessible all the time. The cloud storage must facilitate their customers to access their wide range of resources, application, and data through internet service interface immediately and fast. The number of customers utilizing the cloud storage increases significantly every day. The data on the cloud storage increases dynamically due to the increasing demands of existing customer and addition of new customers. This means the cloud storage is under a state to respond to increasing customer data and effective access to their data. To retain their customers, the cloud storage must optimize its computational time for searching the requested data. It must have some efficient searching method or additional provisions to serve their customers to provide the requested data immediately. So with this observation, we propose a new searching method named Asymmetric Classifier Multikeyword Fuzzy Search which utilizes the universal keyword classifier to store the search path pattern of all the keywords of their customers data. This allows the cloud storage server to use its time effectively to perform multiprocessing of their growing customers. Our scheme also resolves typos and representation inconsistencies since the searching is done on BTree fuzzy searchable index.
Our Contributions. In this paper, we propose new Scheme Asymmetric Classifier Multikeyword Fuzzy Search (ACMFS) which greatly reduces the time spent for searching the data 2 The Scientific World Journal and delivers the requested file immediately to the users. It also utilizes the data effectively from the cloud storage through fuzzy search on BTree fuzzy searchable index. Experimental results shows effective data utilization and the search efficiency of the proposed scheme. Our contributions are summarized as follows: (i) Asymmetric Searchable Encryption allows the server to search over encrypted BTree fuzzy searchable index thereby providing effective data utilization.
(ii) The cloud storage server would not disclose the files to illegal access as they do not have the information about the multiple keywords and files.
(iii) As the BTree fuzzy searchable index is created from wild card fuzzy keyword set, it tolerates typos and representation inconsistencies of authorized users.
(iv) Classifier search server uses universal keyword classifier for traversing the storage efficient BTree wild card fuzzy searchable index which stores all the search path pattern of the multiple keywords of the entire encrypted files.
Paper Organization. The rest of the paper is organized as follows. The related modules are discussed in Section 2 along with the limitations of the existing searching methods. In Section 3, we formulate our problem by designing the system model and goals of the proposed solution. Then we provide the detailed description of Asymmetric Classifier Multikeyword Fuzzy Search scheme in Section 4 followed by Section 5, which discusses the detailed design and implementation of algorithms of our proposed method. The Experimental results and performance analysis with output are shown in Section 6. We conclude our paper in Section 7.

Related Work.
Although Cloud Service Provider (CSP) hosts several third party data, Liu et al. [4] pointed out that managing sensitive data leads to security and privacy concerns. Cryptographic methods can be used to disclose the key only to authorized users to protect the data from untrusted CSP. Ren et al. [5] state that users have several types of typing behaviour for keywords which are commonly termed as typos, representation inconsistencies, and typing habits. They suggested fuzzy keyword search to overcome these inconsistencies. Though the fuzzy keyword search is prevalent in popular search engines like Google, Bing, and so forth, it still poses risk in cloud storage due to inherent security and privacy obstacles. The searchable encryption [6][7][8] is recommended which takes encrypted data as files labeled with keywords and lets user securely search over the files through predefined keywords for retrieving them.
Zhou et al. [9] created k-gram based fuzzy keyword set for keywords W of the encrypted files C and Jaccard coefficient to calculate the keywords similarity.
Wang et al. [10] pointed out that keyword holds sensitive information of the files and thus keyword privacy must be protected for effective data utilization. Xu et al. [11] identified that third party could access the files by knowing the keyword search trapdoor. Xu proposed public key encryption with fuzzy keyword search (PEFKS) in which each keyword corresponds to exact keyword search trapdoor and fuzzy keyword search trapdoor.
Wang et al. [12] discusses that the search over encrypted data not only involves information retrieval techniques such as data structures for representing the searchable index but also depends on efficient search algorithms that run on the index.

Limitation of the Existing Methods
(1) Secured and privacy preserving keyword search [4]: (i) The encryption and decryption process incurs high communication and computational cost.
(2) Secured fuzzy keyword search [5]: (i) It does not support fuzzy search with public key based searchable encryption. (ii) It could not carry out multiple keywords semantic search. (iii) The update operation on fuzzy searchable index is not much efficient.
(3) K-gram based fuzzy keyword Ranked search [9]: (i) The k-gram based fuzzy keyword set size depends on the jaccard coefficient value.
(4) Verifiable fuzzy keyword search (VFKS) [10]: (i) The symbol tree fuzzy searchable index occupies more space in this search.
(5) Public key encryption with fuzzy keyword search: (i) Creating fuzzy keyword index and exact keyword index is not compatible with large database.
(6) Privacy-preserving multikeyword fuzzy search [12]: (i) It demands files with relatively high score to reduce the false negative rate.

Methodology of Our Scheme
3.1. Cloud Data Utilization Service Architecture. In this paper, we consider our cloud data utilization service architecture which consists of four entities as data owner, cloud storage server, classifier search server, and data users and is shown in Figure 1. Here we assume that the authorization is suitably done between the data owner and data users. Initially, the data owner generates user's public and private key pair as ( , ). Data owner has a set of data files [])} using wild card based technique with the predefined edit distance value. Data owner creates BTree wild card fuzzy searchable index from fuzzy multikeyword set. Data owner encrypts the files and index using user public key and is outsourced to the cloud storage server. Data owner sends the user private key as private secret key which is used by the data users for creating keyword trapdoor and for decrypting the file. Now the cloud storage server has the encrypted data files DF and encrypted BTree wild card fuzzy searchable index . The cloud storage server shares the encrypted BTree wild card fuzzy searchable index to the classifier search server. The data user requests the multiple search keywords which are encrypted using the private secret key to create multikeyword trapdoor which is sent to the cloud storage server. The server sends the request to the classifier search server. The universal keyword classifier receives the request to check whether the request is coming for the first time. If the request is arriving for the first time; then the keyword classifier captures and stores the path of the by searching over the encrypted BTree wild card fuzzy searchable index and sends the search path to the cloud storage server. If the request given by the user matches a previous request then it is a repeated multiple keyword. Then the classifier search server extracts the stored search path patterns of the repeated multikeyword from the universal keyword classifier and the search path is sent to the cloud storage server. After receiving the search path pattern of the multiple keywords from the classifier search server, the cloud storage server extracts the set of encrypted files from and is sent to the data users. After receiving the encrypted files, the data user decrypts the files using private secret key.

Design Goals.
To effectively optimize the searching time for the multiple keywords in the cloud storage server and for tolerating the typos and representation inconsistencies of authorized users, our searching method seeks to achieve the following design goals.
Search efficiency goals are (i) to construct the universal keyword classifier for BTree wild card fuzzy searchable index for optimizing search time and for tolerating typos and representation inconsistencies of authorized users.
Security goals are (i) to avoid the cloud storage server from getting the knowledge of data files and keyword set. This is 4 The Scientific World Journal achieved by outsourcing the encrypted files and index to the cloud storage server.
Privacy goals are (i) to provide user privacy by abstracting the details of data files, keyword, and index to the cloud storage server; (ii) to support data privacy by encrypting the files and index with user public key before outsourcing to the cloud storage server; (iii) to attain keyword privacy by forming BTree wild card fuzzy searchable index from the fuzzy multikeyword set for the predefined set of multiple keywords; (iv) to achieve query privacy by sending k-gram keyword trapdoor encrypted with the private secret key; (v) to accomplish index privacy by creating encrypted BTree wild card fuzzy searchable index.

Notations and Preliminaries.
They are as follows.

Searchable Encryption.
Searchable encryption [13] is a cryptographic technique where the data users search the encrypted searchable index by the following steps.
(i) The encrypted tokens in the searchable index have the pointers to encrypted files. Token symbols are the encrypted keyword.
(ii) If the requested token found a match in the searchable index, then it extracts the file pointer without decryption.
(iii) If the token is not found in the searchable index, then it returns the null file pointer.
The two types of searchable encryption are symmetric and asymmetric (public key based) searchable encryption [14,15]. In Symmetric Searchable Encryption, the data owner who outsources the encrypted index and data and the server that searches the data share the same secret keys. The efficiency of SSE is high since it uses symmetric cryptographic methods such as block ciphers, pseudorandom functions, and hash functions. The disadvantage of SSE is that the server has high probability to learn about the owner data and keywords. In Asymmetric Searchable Encryption [16,17], the data owner outsources the index that is encrypted by the user's public key. The keyword trapdoor is created by the user's private key. So only the authorized users can request the search from the server. The advantage of ASE is that it supports conjunctive or disjunctive keywords searches. The disadvantage of SSE is that it suffers from KGA.

Edit Distance.
Edit distance is the method of quantifying the similarity of the two strings. The edit distance ( 1 , 2 ) between two strings 1 and 2 is the smallest number of operations necessary to change one string to another. The three primitive operations are as follows: (i) insertion: inserting one character into the string; (ii) deletion: deleting one character from the string; (iii) substitution: changing one character to another in the string.
The edit distance of the two strings in our system is analysed by dynamic programming. By dynamic programming strategy, the edit distance ed( , ) of any two strings " " and " " is defined as follows and refer to Algorithm 1. If = 0, then ed(0, ) = .

Fuzzy Keyword Set.
Since different users have various typing behaviors, they may misspell the keywords. So the fuzzy keyword set is formed to effectively utilize the data. Fuzzy keyword set can be created by wild card based technique, k-gram based technique, and symbol tree based technique. For example, fuzzy keyword set for "W = HEN" with edit distance = 1 is as follows:

Wild Card Based Technique.
Wild card based technique is used to create storage efficient wild card based fuzzy keyword set. We use the wild card "#" character to represent the positions of three edit distance operations such as insertion, deletion, and substitution thereby creating tiny fuzzy keyword set. For example, the wild card based fuzzy keyword Total number of keywords = 11.

K-Gram Based
Technique. K-gram based technique is used to create k-gram based fuzzy keyword set for the predefined gram value k. The keywords of k-gram based fuzzy keyword set is the subset of keyword set. For example, the kgram based fuzzy keyword set for w = "HEN" with gram value k = 1 is FKS K=1 as follows: Total number of keywords = 3.

Assumptions of ADKEFS.
Before we start our framework design, we have the following assumptions on our proposed scheme Asymmetric Classifier Multikeyword Fuzzy Search ACMFS.
(i) We assume that the cloud storage server concentrates on servicing more customers and not to leave partnership of their customers from the business.
(ii) Here we assume that the authorization is suitably done between the data owner and data users.
(iii) Data owner creates users private and public key pair.
(iv) We assume the wild card based fuzzy multikeyword set FMKS of multikeyword set MKW contains the original keyword as the first component.
(v) We assume that each file has multiple keywords and it is possible for a keyword to be the same for multiple files.

Implementation of Asymmetric Classifier Multikeyword Fuzzy Search (ACMFS)
Here we discuss our proposed scheme in detail with algorithm for all the functions involved.

Function Definitions of ADKEFS.
The following functions are implemented to optimize the searching on the cloud storage server and to achieve the effective data utilization. Functions on data owner are

Overall Framework of Asymmetric Classifier Multikeyword Fuzzy Search (ACMFS).
Our proposed method Asymmetric Classifier Multikeyword Fuzzy Search has classifier search server that creates the search path pattern for all the keywords of the encrypted set of data files. Data owner creates encrypted BTree wild card fuzzy searchable index for the fuzzy multikeyword set and is outsourced to the cloud storage server. This overcomes the problem of typos and representation inconsistencies behaviour of the data users. The overall conceptual description of Asymmetric Classifier Multikeyword Fuzzy Search is shown in Figure 2 as an activity diagram. Please refer to Algorithm 3 for the pseudo-code.

Key Generation Algorithm.
Here we use RSA public key algorithm for generating public and private key. Here this takes two secret keys SECRET 1 and SECRET 2 which is predefined by the data owner where both secret keys SECRET 1 , SECRET 2 = {0, 1} * . Algorithm 4 is executed by the data owner to generate public and private key pair. Data owner sends the user private key as private secret key which is used by the data users to create keyword trapdoor and for decrypting the file.

File Encryption
Algorithm. Data owner executes Algorithm 5 to form encrypted set of data files = { 1 , 2 , . . . , } that are encrypted with user's public key PUB KEY and are outsourced to the cloud storage server.

File Decryption Algorithm.
After receiving the search path pattern of the multiple keywords from the classifier search server, the cloud storage server extracts the set of encrypted files from DF and is sent to the data users. After receiving the encrypted files, the data user decrypts the files using private secret key by executing Algorithm 6.   (1) Declare the integer variables S1, S2, S3, S4, private, public and key (2) Assign S1 = SECRET 1 , S2 = SECRET 2 (3) Find key = S1 * S2 (4) Compute S3 = (S1 − 1) * (S2 − 1), S4 = S3 − (S1 + S2 − 1) (5) Pick random integer "public".Check gcd(public, S4) = 1 // gcd is Greatest Common Denominator   420  12  350  450  14  400  453  16  450  530  18  500  650  20  550  655  22  600  710  24  650  730  26  700  810  28  750  980  30  800  1173 whether the request is coming for the first time. If the request is arriving for the first time, then the keyword classifier captures and stores the path of the MKTW by searching over the encrypted BTree wild card fuzzy searchable index BSIE by executing Algorithm 10 and sends the search path to the cloud storage server. If the request given by the user matches a previous request stored then it is a repeated multiple keyword. Then the classifier search server extracts the stored search path patterns of the repeated multikeyword from the universal keyword classifier and the search path is sent to the cloud storage server.

Implementation Setup.
The implementation of the proposed work was accomplished through Asymmetric Classifier Multikeyword Fuzzy Search (ACMFS) cloud data utilization service architecture using Jelastic PaaS LayerShift cloud storage provider which offers infrastructure, platform, and application as a service for the customers. The experimentation was carried out with the code programmed in JAVA for data owner, users, classifier search server, and cloud storage server. Microsoft SQL MYSQL 5.5.42 was enabled to act as the database for the proposed system. The simulation was performed with the setup of data owner, data users from our side, and classifier search server, cloud storage server on the Jelastic cloud storage. The data owner authenticates 100 users and defines the multikeyword set for each data files. Prior to evaluating the results, the data owner outsourced 1000 encrypted files to the Jelastic cloud storage. The data owner creates the wild card fuzzy multikeyword set FMKS for edit distance 1, 2 and BTree fuzzy searchable index BSI WC . The data owner outsources the encrypted BSI WC and 1000 encrypted files to Jelastic cloud storage server. The cloud storage now contains the encrypted 1000 files and encrypted BSI WC . With this simulated setup, the authorized users are allowed to access the files in the cloud storage using their individual identity. The users are now allowed to access the files in the cloud storage by entering the multiple keyword search request.

Performance Analysis.
The performance of the proposed method was evaluated taking into account the search time efficiency and data utilization from the Jelastic cloud storage by giving the multiple keyword search request from the data users with classifier search server. The experimental results obtained by ACMFS cloud data utilization system architecture are shown in Tables 1-5 and Figures 3-7. Table 1 shows the analysis of time taken for creating the wild card based fuzzy multikeyword set with different number of users and files and its analysis chart is shown in Figure 3. Here data owner predefines five keywords for each file. Table 2 shows the analysis of data utilization efficiency for correct keyword in terms of number of files retrieved from the cloud storage and its analysis chart is shown in Figure 4. Table 3 shows the analysis of data utilization efficiency for misspelled keyword in terms of number of files retrieved from the cloud storage and its analysis chart is shown in Figure 5.  [1] (3) Assign public1 = PUB KEY [2] (4) Find the number of elements "Enum" in BSI WC (5)   Inter college meet Cause of concern Web crime issues Open access journal The set of files Storage server Remote car access Write mode Tax benefit  Table 4 shows the analysis of search time efficiency for correct multikeywords with and without classifier search server for edit 1, 2 and its analysis chart is shown in Figure 6. Table 5 shows the analysis of search time efficiency for misspelled multikeywords with and without classifier search server for edit 1, 2 and its analysis chart is shown in Figure 7.

Conclusion
This work Asymmetric Classifier Multikeyword Fuzzy Search presented a method that can be successfully used to enhance   data utilization and improved search efficiency for public cloud storage. Provided that the data owner stores the set of encrypted files in cloud storage, we showed that the user experience improved search time due to the presence of classifier search server which searches the BTree wild card fuzzy searchable index. The proposed system extracts the wild card fuzzy multikeyword set for the multiple keyword   The Scientific World Journal   search request resulting in increased data utilization in terms of number of files retrieved for the corresponding users.
The proposed system's performance is demonstrated with the advent of classifier search server which stores the pattern of search and helps reducing the search time for repeated multiple keyword search request. The classifier search server concept adds a new paradigm to cloud storage server serving several thousands of data owners and their users.