Agent-Based Data Extraction in Bioinformatics

Department of Computer Science, Iqra University, Islamabad, Pakistan National University of Computer and Emerging Sciences, Peshawar, Pakistan College of Computing and Information Technology, College of Computer Science and Engineering at Khulais, Department of Information Technology, University of Jeddah, Jeddah, Saudi Arabia College of Computer Science and Engineering at Khulais, Department of Information Technology, University of Jeddah, Jeddah, Saudi Arabia


Introduction
e volume and complexity of data over multiple service providers are generated exponentially. Now the issue is extraction, retrieval, and processing of relevant information which has made obvious the need for a system to facilitate users. Data coming in from multiple domains needs to be integrated together to provide a cohesive view. One technology that massively helps facilitate this goal is the concept of mashups [1,2]. Mashups help users get an integrated useroriented view of data and code from multiple heterogeneous sources. Trendsmap, housingmaps, Wheel of Lunch, and InstantWatcher are some examples of mashup. A mashup is a web application that stitches together the contents, presentations, and application functionalities from multiple sources and gives them a new and useful look. In other words, it combines multiple services into a single one [3,4]. In this paper, we apply the concept of mashup through multiagent paradigm to the domain of bioinformatics. e bioinformatics [5] and [6] is about understanding biological data and is a growing field of research. With the advance in technology, the amount of biological data is growing at a tremendous pace.
is makes the field of bioinformatics important to the society. ey have also made strides in understanding how they interact with other proteins, which is known as protein-protein interactions. For example, data reports [8] and [9] also include both structural data [8] and other data [10], [11] about proteins. A wide range of computational techniques has proposed, primarily for image manipulation and pattern detection. With NGS, analysts are analyzing precious data and doing a lot of intensive work to find patterns. More importantly, it is considerably difficult for them to create complex maps from heterogeneous sources.
ere are various experimental approaches [18] which are very expensive and time consuming because they require a lot of resources and time to measure the physical interaction among proteins. ey have a high possibility of error because experiments are purely carried out in the lab and are not standardized. e adoption of agent technologies and multiagent systems constitutes an emerging area in bioinformatics where data is quite big (that is, in gigabytes) and complex in nature. Researchers face the problem of data retrieval due to low (1) bandwidth, (2) storage, and (3) computation on their machines. For the processing, analyzing and transportation of multilevel complexity of molecular data require high bandwidth and storage. us, it becomes very difficult for researchers to conduct research with low power researches. is study proposed agent migration-based approach [19][20][21]. e main advantage of an agent-based approach is that the agent will get the request from the requester and visit the service provider, which saves the time for the data provider to make the data available, and agents will return back to the requester. Another advantage of using agent-based approach is that we will be able to transfer data to the machine which has high computational power. e computation is done at the service side. And the results are available in the same format as they were at service side. is will help to reduce the data size and transfer the data to the machine which has high computational power. is will give better efficiency, reduce network congestion, and transfer the data to machine with high computational power. e main theme of this paper is to use agent-based methodology which tackles processing and network issues of multiagent systems. It processes all the necessary steps on the client side with low computational resources. e rest of the paper is organized into different sections. Literature review about bioinformatics, multiagent systems, and mashup is provided in Section 2. Section 3 provides a list of complete steps of our agent-based solution along with details. In Section 4, as a proof of concept, a reference implementation is listed. Section 5 provides the analysis and discussion of the proposed solution. Section 6 concludes this study along with future directions.

Literature Review
In this section, literature about bioinformatics mashup and multiagent systems is presented. In each section, the importance of each domain is provided. We first turn to the target domain, namely, biofoundation.

Bioinformatics.
Bioinformatics is an interdisciplinary field that mostly uses computer as a computational tool for solving issues related to the biological data. Such computing devices are used for the analysis of the internal structure and biological functions of living organisms. e main purpose of computing devices is giving an efficient structure to the data so that it could be interpreted accurately. Mainly it deals with genome and protein. One of the important characteristics of bioinformatics is personalized medicines. It is the application of computer processing techniques to the field of genetics and biochemistry.
is is a branch of computer science that deals with the storage, retrieval, and analysis of biological data [11]. It is a classification of data in a standard manner. e data are analyzed in order to determine their structure and content [12]. e bioinformatics includes the research in the field of genetics and genomics. is branch of science is implemented in various fields such as the study of evolution and phylogeny of various species [13]. e data generated by the bioinformatics are stored at various data centers. ese data can be used for the diagnosis of the diseases and for the treatment and prevention of the diseases [14]. e protein-protein interactions [22] are of extreme importance because they play a vital role in many biological processes, such as signal transduction and transcription regulation. ey also act as protein-based modules that are extensively used by nature to build complex systems. erefore, they are a primary objective of many bioinformatics algorithms. However, in the field of proteinprotein interactions, the problem of interactions between proteins in the context of the entire proteome has received less attention. In this context, a recent study has been carried out by proposing a new protein-protein interaction network. e network is based on the comparison of the entire proteome between two different organisms.
Protein-protein interactions [8,9,23] are a key element in the study of molecular biology, as molecular interactions are at the foundation of all biological systems. In this regard, the interactions between proteins have been extensively studied [10,24]. Protein Interactions by Structural Matching (PRISM) [25][26][27] is an online web tool for predicting protein-protein interactions with high confidence. It is based on the structural and functional domain similarity of proteins. e first step in the PRISM-2 algorithm is to generate a structural alignment of two proteins. e proteins are compared by hand-determined structural similarity, and this similarity is used to generate a structural alignment.
ere is a gigantic development in the organic succession where a huge amount of data is being made and transferred on the sites/servers. Presently to get the information we would have to communicate with the connection point utilizing electronic inquiries [28]. is means that the user has to click on a single link to access all the linked websites.
is is very time consuming and tedious to stay online each query. e idea of a mashup is to integrate multiple datasets into a single system. e paradigm of a mashup combines the data from multiple heterogeneous data sources. It is a great way to combine diverse data sources into a single view. Mashups are a great way to implement the existing data into a new structure. It is used in situations where the data from multiple heterogeneous data sources are required to be combined into a single view. e main idea behind the mashup is to integrate the data from multiple data sources into one system.

Mashup.
Information retrieval is becoming a challenging task due to rapid proliferation of data. It becomes more complex when the required information is scattered on multiple service providers.
is complexity demands an efficient system to retrieve the desired results in an appropriate manner. ere are different approaches to retrieve the information, combine it, and give a desired look; mashup is one of such approaches [29]. Mashup gives entirely a new and different look or some added value to the existing data for end users. Service providers provide APIs which act as an interface for data and services. Some APIs are free, and some are proprietary in natures that need authentication and authorization. Asynchronous JavaScript and XML (AJAX), Representational State Transfer (REST), and Services-Oriented Access Protocol (SOAP) are some state-of-the-art technologies that have influenced the mashup architecture [30]. REST, screen scraping, and RSS feed/widget are used to retrieve the contents from other websites. It is widely developed for web applications such as social networking, e-government, enterprise resource management, real state, and more [31,32].
A number of tools exist to create mashup such as Yahoo Pipes [33] or IBM Mashup Center [34]. Traditionally, a mashup runs inside a web browser, but there are also some other environments for it. Two important styles of mashup are server-side mashup and client-side mashup [35]. e difference between server-side and client-side mashups is the way the data is processed. In a server-side mashup, also called a proxy-style mashup, a web server serves the mashup to retrieve all the data from multiple web hosts, and stitching takes place on server side and is rendered on client's web browser. In a client-side mashup, opposite to server-side mashup, stitching of the services and contents takes place on the client, namely, within the web browser. ese are also called Rich Internet Applications (RIAs) and have the added advantage of prompt response over server-side mashup. A mashup can be either a consumer or enterprise [36]. A consumer mashup also known as service or client mashup integrates data from multiple public sources inside the browser, for example, iGuide; server-side mashup is the target of this study. Both styles of mashup have their own obvious benefits, as both provide new insight into existing resources. But using such mashup tools, users must trust them. So user data is not secure, since it has to be released to the third parties. We address this issue using the multiagent paradigm.

Multiagent Systems (MAS).
Multiagent system is the collection of multiple software agents [37]. A software agent is a piece of code that works autonomously and communicates with other agent-oriented and non-agent-oriented software [38,39]. e basic building blocks of an agent consist of code, data, and state. e data part represents the data structure to preserve important information about the expression before and after evaluation. e configuration of the agent is stored in its data and state parts. It contains information about platforms which changes dynamically when it travels from one node to another node within a network. e code part of an agent is the collection of ordered statements that remains nearly constant during the execution though it can change when required. It represents the actual logic of the agent. e state part of an agent represents the current status of the data part. Basically, state is the collection of information of all data structures.

Significance of Multiagent Systems.
Agent-oriented software paradigm has become a promising technology which is widely used in distributed environments such as e-commerce [40], network management [41], data mining [42], robotics [43,44], and information extraction [45]. Some interesting applications of agent systems can be found in healthcare system [46,47] for patient scheduling, storing medical records of patients, and sharing them with concerns. Agent-based system, also called multiagent system [48], is the system in which multiple agents interact, cooperate, and coordinate with each other. Such system, loosely coupled, enhances the capabilities of monolithic system to perform different tasks which are beyond the scope of individual agent. It is widely used to share or get resources over the network among agents. e resources might be computational, logic to solve the problem, software or expertise distributed temporally and spatially. Normally, systems are categorized into two categories: client (to make a request) and server (to server) but multiagent systems combine the benefits of both in a social, proactive, and reactive manner.

Design Issues in Multiagent Systems.
e most important design issue for multiagent systems is how they will communicate among each other and with other entities. e starting point is to select any tool or middleware to facilitate developers to get the core benefits of this technology rather than to resolve the basic issues of communication. So some standards are needed prior to deploying such system. e Foundation for Intelligent Physical Agents (FIPA) [49], AGENTLINK [50], and OMG Agent Platform Special Interest Group (PSIG) [51] are the leading standardization bodies to promote agent technology. is study focuses on FIPA for agent reference and development model. ere are various tools for agent-based modeling like NOMADS [52,53], AgentScape [54], Agentcities [55], Aglets [56], Voyager [57], Janus [58], TACOMA [59,60], Grasshopper [61], JADE [62,63], JaCaMo [64], Addre Jason [65], and ABLE [66]. JADE [62] is used to launch an agent platform. e most important reason is that it is an open source middleware under the Library Public License (LGPL). It is entirely implemented in the Java language, which makes it more portable and smarter. It is one of the most popular middleware types within the research community. It alleviates the implementation of multiagent systems (MAS). It provides a set of graphical tools which make it very easy to deploy agent platform on a standalone system as well as over a distributed network. is study highly recommends JADE as agent middleware. Its infrastructure is very flexible and agent community is adding different add-ons to enhance its features. It is compliant with the FIPA-IEEE computer society specifications. e core concept of FIPA is to resolve the issue of interoperability and it has extended FIPA's model in multiple areas. According to the JADE specification, the mobility of an agent can be categorized into two types: inter-and intraplatform. In intraplatform mobility, an agent migrates itself between containers of the same platform but cannot move to containers of the different platform. In intraplatform mobility [67], an agent moves among different platforms. In interplatform mobility, the agent leaves its own main container and joins another main container of another platform. e main focus of this study is interplatform mobility; see step 1 for details of each and how an agent can migrate from one platform to another.

Bioinformatics and Multiagent Systems.
is study also explores the area of bioinformatics as a real application of multiagent systems and explores how a mobile agent can operate in a highly dynamic environment for data dissemination. A mobile agent visits different itineraries to collect the required information and stitch it together to provide a new shape. We propose agent migration to mitigate the aforementioned issues by moving the agent to the server side to perform computations [21,68]. In a nutshell, this study proposes agent migration characteristic to make it a mashup. Hence, it can be used for data dissemination.

Solution
We propose mobility characteristic of an agent to find a solution. e solution provides accurate and fine-grained result even though the bandwidth and storage of the client might be low. It also deals with complexity present in the nature of the data as well as in the dynamic environment. In the agent migration approach, an agent is executed in a client machine; the agent migrates to another machine when the original machine is overloaded. To migrate an agent from one machine to another, the agent must be able to traverse the network. e abstract details are in Figure 1.
Java Remote Method Invocation (RMI) [69] is a way to extract data from the server as it allows remote access to Java objects on a remote host. It is light weight communication protocol. e payload of Java RMI is the Java object that contains references to the remote methods. A Java object on the remote host is a Java object that is created on the remote host. A client stub object, which is a proxy object that contains references to the remote object, is used to access remote object. e complete steps which were used in this study are listed in Figure 2.
Mobile agents use the resources of the system to complete the tasks and then get back to the system where they started their execution [10,11]. Agent-based systems are a powerful and effective way to develop intelligent systems because of their simplicity and extensibility. ey are more effective in handling with the problems related to distributed, parallel, and autonomous systems.
It is also effective in handling with the problem of complex systems. e reason behind it is that they are not heavy in computation and they can be used on multiple systems at the same time. is makes it easy to design and implement these systems. e steps which are carried out in this study are mentioned in Figure 3.

Reference Implementation
To deploy agents, various agent frameworks are available [70,71]. Java Agent Development (JADE) framework is FIPA Agent Markup Language (FAM), a language designed to be used to model agent systems. It is compliant with the FIPA Agent Communication Language (ACL) specification and with the FIPA Agent Communication Framework (ACF). We provide the source of our own reference implementation at https://github.com/BioAgent.
For the testbed configuration, two personal computers were used: one as a server and the other one as a client. Both systems were connected through the 4G Huawei E5573s-320 which is a pocket WiFi router. Table 1 shows both hardware and software details of both personal computers.

Results and Discussion
is section presents the study carried out on the performance of mobile agents and Java RMI. A detailed discussion of the results is carried out in this section. Java RMI and agent migration approached are compared. Due to the fact that mobile agents are not dependent on the host application and can be independently transferred to another host, an effective approach to large-scale agent migration has been proposed. Table 2 shows the amount of network load made by client using Java RMI and our agent-based approach. e agent approach is more efficient than Java RMI approach because it decreases the number of network calls made by the client. e agent approach is more efficient because the agent is migrated to the server only when there is a need for it. As a result, the client makes fewer network calls, and the overhead of the network calls is reduced. erefore, it is clear that the Java RMI approach has more network load than the agent approach. In the agent approach, the number of network calls is reduced by migrating the agent. It is important to note that, in the agent approach, the agent is only migrated if there is an urgent need for it. e Java RMI approach is a lot more mature compared to the agent-based approach. Table 2 provides a summary of the results of Java RMI and agent approaches based on network load. e agent migration approach does not have any network load as it needs only interconnectivity.
In Figure 4, the x-axis represents the size of the extracted data from multiple databases, while the y-axis shows the network load consumed by both approaches. According to Figure 4, we conclude that Java RMI is showing an increasing trend in result size. But, at the same time, the curve shows an increasing trend in network load.
us, while achieving high results, network load also increases with time.
at is why the curve has been shown in the graph. It shows a direct relationship between result size and network load. e result size is dependent on network load. On the other hand, an agent-based system shows high results with a constant value of network load. at is why the graph of an agentbased system is a straight line.
In Figure 5, we can see that the agent-based approach gives better results as compared to Java RMI when the result size increases from 2 kB. Similarly, the response time is only 10 seconds at result size of 5 kB. Furthermore, 10 kB result size is achieved at a response time of only 15 seconds. From Figure 5, which is based on Table 3, we can conclude that, in the Java RMI system, result size is directly proportional to response time. As the size of the result increases, the response time also increases. It will take more response time to achieve a high volume of results. On the other hand, an agent-based system shows a high return size with 46.82% less response time than the other with respect to data that the agent carries. e average of Java RMI approach is 20.21785714 while the average time of our approach is 10.75047619. e difference of both approaches is 9.467380952.
In Figure 6, the blue line shows the agent graph and the red line shows the graph of Java RMI. We can clearly see that if we decrease the bandwidth, our agent is computing faster as compared to Java RMI.
According to Table 4, the average responses of both approaches are 20.21785714 and 10.75047619. It is important to note that our findings reveal that our strategy saves the user up to 16.25% of the average time with respect to bandwidth.

Conclusions and Future Work
In this study, we have designed an agent migration approach for transferring the information between the clients. e agents migrate from client to client to collect the data and transfer it to a central server. e client uses the agent's services. Feedback service of the agent is used to ask the client for any information required by the agent. e client can provide information to the agent to ask the server for any service the client requires. e agent can migrate between the client and the server. e client can also request the agent to migrate to any other client.
is approach has many advantages. e agents are intelligent, and they can even work well in low network areas. ey can be used for many generic purposes. e agents can be used to find out the interactions between proteins. is approach can be used for many bioinformatics problems like finding out the similarity of sequences, or even finding the missing sequence in known sequences. e findings also show that mobile agent technology leverages network load and storage on the client side and heterogeneous data can be converted into homogeneous format. e main limitation of this study is the deployment of agent environment on client and service side. is approach does not demand the availability of the user online for a full time period. Our research can be modified to make it work on different bioinformatics problem, like viewing the interaction of sequences. It can also be used to find out the similarity of sequences. By modifying the approach, one can    )  10  84  74  12  64  58  14  45  40  16  35  29  18  34  28  20  33  26  22  30  25  24  29  23  26  28  21  28  25  19  30  24  18   8 Security and Communication Networks also find out the similarity of proteins, or even find the missing sequence in known sequences. It is also possible to find out the similarities between different organisms.