A Collaborative Semantic Annotation System in Health : Towards a SOA Design for Knowledge Sharing in Ambient Intelligence

People nowadays spend more and more time performing collaborative tasks at anywhere and anytime. Specifically, professionals want to collaborate with each other by using advanced technologies for sharing knowledge in order to improve/automatize business processes. Semantic web technologies offer multiple benefits such as data integration across sources and automation enablers. The conversion of thewidespreadContentManagement Systems into its semantic equivalent is a relevant step, as this enables the benefits of the semantic web to be extended.The FLERSA annotation tool makes it possible. In particular, it converts the Joomla! CMS into its semantic equivalent. However, this tool is highly coupled with that specific Joomla! platform. Furthermore, ambient intelligent (AmI) environments can be seen as a natural way to address complex interactions between users and their environment, which could be transparently supported through distributed information systems. However, to build distributed information systems for AmI environments it is necessary to make important design decisions and apply techniques at system/software architecture level. In this paper, a SOA-based design solution consisting of two services and an underlying middleware is combined with the FLERSA tool. It allows end-users to collaborate independently of technical details and specific context conditions and in a distributed, decentralized way.


Introduction
Context-aware systems are defined as those which "use context to provide relevant information and/or services to the user" [1]; this kind of system adapts to the user and his/her environment, being also possible, among other features, the optimisation of their functioning [2].intelligence (AmI) environments make use of additional techniques and methods of those adopted by context-aware systems to provide a natural, intelligent, and unconscious interaction with other users and the computational system itself.Thus, interconnected smart computer-based systems (sensor networks, platforms, services, and applications) can allow people to carry out their everyday life tasks by exchanging information, a.k.a.Internet of Things (IoT) [3].However, the Internet of Things involves an increasing volume of heterogeneous information that makes difficult for people and smart things to manage [4].In this context, Semantic web is presented as an appropriated approach to facilitate the management of this information, for both people and smart things.
One of the main issues to be resolved in order to progress towards Semantic web is how to convert existing and new information that can be understood by humans into semantically enriched contents that can be understood by machines.Semantic enrichment is made possible by tagging documents with metadata, which enables entities that are found in the contents, and relations between them, to be described.The provision of the information elements that currently make up the web with a well-defined meaning [5] would improve its contextual search capabilities, increase interoperability between systems for collaboration, and allow the automatic composition of published web services [6] to be used by applications.
Nowadays, a wide range of annotation tools for producing semantic tags is available, such as Amaya [7], Epiphany [8], DOSE [9], or Melita [10].However, such tools often take a platform-centred approach rather than a user-entered one.They usually require complex installation procedures and some are becoming obsolete.In response, a semantic annotation tool called FLERSA [11] has been developed.It was 2 Mobile Information Systems built to transform a CMS (Content Management System) into its semantic equivalent, in order to partially mitigate the lack of semantic content of the current web and take advantage of the multiple benefits offered by semantic web technologies, likewise the facilities and advantages provided by CMS platforms.FLERSA provides several advantages in comparison to existing annotation tools, such as user-centred interface, lightweight, manual, and automated annotation, avoids the "Deep Web" problem (i.e., search engines indexers can access semantic information stored in documents annotated with the tool, as annotations are embedded within documents in RDFa format), and offers multiontology annotation.However, the FLERSA tool also presents a main architectural limitation, it is highly coupled with Joomla [12], as the CMS is used as the underlying web infrastructure.That is, serverside and client-side implementations cannot be easily adapted to another system.Thus, the possibility of making multiple semantic annotations simultaneously by several users specifically depends on the Joomla!CMS, and it is not provided.
The Service Oriented Architecture 2.0 (SOA 2.0) [13], together with techniques to manage data distribution/replication, may help to address this limitation.In particular, a SOA 2.0 based architecture can provide FLERSA tool capabilities such as scalability, interoperability, and business agility, through the service encapsulation.Replication is crucial to obtain data high-availability, especially in AmI environments.However, they pose additional challenges due to frequent changes in their execution context and often limited resources [14].Thus, centralized services or static deployments must be avoided in order to provide a higher availability of services [15].The architecture supports the synchronization and consistency management of distributed/replicated resources through two main services (monitoring and synchronization).In the present research work, that software architecture is applied to the FLERSA tool.In this way, it will allow end-users to collaborate everywhere and anytime by using current technologies for a more advanced knowledge sharing in order to improve business processes.The main goal is to demonstrate the usefulness of a SOA architecture in a specific application domain, as well as its enactment as a common base to facilitate the design and development of AmI and IoT applications regardless of technical details (e.g., specific data source platforms and technologies) and specific context conditions (e.g., wireless disconnections and battery status of the mobile devices).
The paper is organized as follows.Section 2 outlines related work.Sections 3 and 4 present, respectively, an overview of FLERSA for semantic information enrichment and the SOA 2.0 based architecture to support the synchronization and consistency management of distributed/ replicated resources.Section 5 shows how this design solution is applied to FLERSA allowing end-users to collaborate everywhere and anytime for a more advanced knowledge sharing and providing FLERSA with a clear differentiation and independence of the client-side.Section 6 presents some of the new settings the proposal enables in a health case study, in which doctors from multiple medical departments and hospitals can collaborate in disease diagnosis.Finally, Section 7 discusses relevant points and summarizes conclusions.

Related Work
In the last years several collaborative semantic annotation systems have been proposed.The majority of these systems are asynchronous, while the synchronous systems usually use a whiteboard to share annotations between remote users.Whiteboards allow users see and modify the same document or image, but whiteboards are unstructured and they make it difficult to search and retrieve annotations difficult.In response to this limitation the other collaborative annotation applications emerged.The collaborative semantic annotation systems more commonly used are the semantic wikis.Wikis have a large numbers of users that can modify or create any page using their web browser easily.This kind of systems is asynchronous: a user modifies a page and this is available for the remainder users when it has been stored in the server.The problem of wiki-systems are that when they increase their size they become unstructured and therefore difficult to navigate.Several semantic wikis can be found such as SweetWiki [16] or IkeWiki [17].
There exist several annotation applications that allow users to create and modify annotations collaboratively.Vannotea [18] or eSports [19] allow users to create annotations on multimedia files collaboratively and synchronously.Vannotea is focused on secure collaborative actions; it is based on Annotea [20], for annotation structure; Jabber [21], to instant messaging; Shibboleth [22], a secure middleware to access web resources; and XACML [23], XML-based language to define control policies.Vannotea is developed in C# and it stores semantic annotations separately from the content; this allows users to annotate the same content with different annotations depending on the context.eSports is designed to support the distance coaching.The main contribution of eSports is that it allows annotating live video in real time collaboratively.It is based on NaradaBrokering middleware [24]; this middleware is based on the publish/subscribe messaging model.These applications present some limitations: they are mainly designed to work in desktops and therefore they have not taken into consideration disconnection scenarios that are necessary for mobile devices.Moreover, they have been designed for a specific platform and they are not easily portable to other platforms.
Recently, GATE Teamware [25] has been proposed as another collaborative text annotation tool.The main purpose of GATE Teamware is support complex annotations process on corpus.It is based on multirole model in order to reduce the conflict in annotation tasks.GATE Teamware is webbased and it has a layered architecture: user interface layer, which provides the different web user interfaces, according to user's role; executive layer, which implements user management and user authentication; and services layer, where, in this layer, web services such as storage service and annotation service can be found.However, GATE Teamware is an ad hoc solution; it has been designed to operate only with collaborative text annotation, whereas the service platform proposed aims to be a generic platform that can be adapted to any shared resource.
FIWARE [26] is a new platform that provides software components (called Generic Enablers) with functionalities for a more rapid, modular, and flexible development of IoT applications.It can be considered a partial implementation of IoT-A.For instance, FIWARE has been applied to the design of a real e-health Remote Patient Monitoring with an agile software development methodology [27] and the agriculture domain [28].In [29] the focus is on the automatic service migration and deployment between providers' cloud infrastructures within the FI-STAR FP7 project, in which several cloud services have been developed for the healthcare domain using the FIWARE platform.Our proposed architecture, including middleware and service platform, could be another specific implementation for one of these Generic Enablers (e.g., the Orion Context Broker), but based on replicated (decentralized) services, that is, following the vision present in Mobile Cloud Computing [30] and Fog Computing [31].
In [32] a comparison between smart city platforms and their supported features is presented.The CityPulse framework (as other similar ones) supports the development of semantic-based services focused on data discovery, analysis, and integration from different domains in IoT and social media [33] (also called Web of Things [34]) and include a more complete set of analytics tools.That framework provides a Resource Management Module for data distribution, but it does not address the consistent management of replicated information under concurrent operation and specific context conditions (e.g., disconnections).

FLERSA: Semantic Annotation on CMS
FLERSA [11] has been developed to transform a CMS into its semantic equivalent, in order to partially mitigate the lack of semantic content of the current web and take advantage of the multiple benefits offered by semantic web technologies.The main originality of the tool is to use the manual and multipleusers annotations that can be added at any moment, to learn to automatically annotate documents.Furthermore, these annotations may be related to any kind of pieces of an html document (the whole document, a node, a set of nodes, or a text segment).The main functions that the tool enables are the following ones: creation of annotations associated with a range of text, editing/deleting existing annotations, clearing all annotations in the document, permanent storage of annotations, creation of global annotations to a web page, where the scope of the annotations are whole pages, visualization of RDF generated for the page (W3C's RDFa Distiller [35]), and ontology-based queries about properties that have been annotated.Inference is made in the taxonomies of concepts when the search is conducted by the annotation properties and automatic generation of semantic annotations.
In the system architecture layers (Figure 1), the white boxes represent system components and the shadow ones are related with semantic capabilities which are detailed below: (i) In the core layer are placed the Operating System (that provides network services) and the Web Server.
(ii) The data management layer is made up of system components responsible for both content storage of Web documents and also the annotations on them and the knowledge base consisting of the ontologies of the system.
(iii) The server-side layer is where server application services are developed.All message traffic between web clients requesting services and the programs that provide them takes place at this level.In this layer the implementation of programs that serve the web interface is carried out.The programs implemented here make use of programming libraries and Application Programming Interface (APIs) that provide the underlying layers.Among the most frequently used functions provided by these APIs the facility for storage and retrieval of information, facilities for working with visual objects in the front-end programming, and facilities for working with ontologies are worth highlighting.
(iv) Finally, the web interface layer is located at the uppermost level of abstraction of the system architecture, where the user performs all interaction with the semantic annotation tool.At this level the contents of web documents coexist with metadata and with web technologies in charge of modifying web documents in runtime to provide them with semantic annotations in the form of metadata and also to achieve timely message handling, by using server-side services to provide the functionality of the tool.
However the FLERSA tool presents a main limitation: the tool has a high coupling with Joomla! [12] as the CMS is used as the underlying web infrastructure; that is, serverside (module) and client-side (server interaction) implementations cannot be easily adapted to another system.Thus, the collaborative capability, that is, the possibility of making multiple semantic annotations simultaneously by some users, depends on Joomla and it is not provided.Besides, although the most common environment for semantic web tools is a web browser, a lot of enterprise applications could need to integrate web annotations with annotations made on enterprise documents/data (e.g., for strategic scanning, technological watching, or social monitoring), which is not possible with the current FLERSA design.

A Software Architecture to Support Information Sharing and Collaboration
Collaborative systems are complex, this is challenging in analysis, modelling, and development [36].One of the main tasks to be solved in collaborative systems is to maintain data consistency when they are simultaneously shared by several users [37].Nowadays, in the absence of standardized methods for the synchronization of the shared data replicas, most of the proposed solutions are planned in an ad hoc manner.By taking into account the possibility of an increasing number of users and resources to be managed in very dynamic environments, this entails a higher complexity in the correct synchronization of these resources.Thus, a SOA 2.0-based architecture [15] has been proposed, which intent to provide a common basis for the consistent management of shared information in collaborative systems.It consists of two main services (Figure 2): (i) Monitoring Service.This service gathers all events related with modifications on shared data.This information can fulfil several purposes, for example, version control or security logs.For the synchronization purposes, this information is required by the specific synchronization algorithm to be applied to know the occurrence and order of the modifications on the resource.The monitoring service is able to communicate under two different paradigms (SOA 2.0 [13,38]): (1) the Publish-Subscribe paradigm, to know the modifications produced by the users on the shared data, and (2) the Request-Response paradigm, for example, when the synchronization service inquires the monitoring service about the modifications produced on a specific resource in a specific time interval.Thus, the use of an EDA ("Event-Driven Architecture") approach [39], specifically the concept of event, allows the developers to provide a reusable service.The monitoring service has been designed taking advantage of the low coupling in the communications between the sender and receiver provided by EDA; thus, it is able to monitor any kind of event.In this way, regarding resource monitoring, in the specialization of the platform it is only necessary to design the structure of the events that will be sent when the shared resources are modified.(ii) Synchronization Service.The service platform proposed aims to be a generic platform that can be applied in any application domain within the collaborative systems.However, as the synchronization algorithms are dependent of the resource type and its specific nature and usage, it is not possible to provide a general service for the synchronization.Namely, the conflicts that could be generated in the concurrent modification of images are not the same as that in the modification of, for example, plain text, as well as the processes or policies to resolve them.For this reason, and regarding the goal of providing a reusable service, the synchronization service is designed as abstract service, which must be specialized according to each particular resource to be synchronized.This abstract service uses the monitoring service to obtain information about changes on the different replicas of the shared resource.In this way, and in order to provide a more generic service platform, it has been designed considering two different levels: (1) the common part related to manage the resource synchronization, which is identified and located into the abstract service and its composition with the monitoring service; in this way, the synchronization service, according to the information received from the monitoring service can detect the actions that have been applied to other replicas of the resource, but not applied to its associated replica; (2) the specialization of this service, where, once the inconsistencies are detected, they should be resolved.This level will depend on the requirements associated with the resource kind and the use of the resource in a specific domain (i.e., on the same resource different synchronization policies could be applied).

From FLERSA Tool to FLERSA Service
FLERSA tool has been developed as a module of Joomla! in order to transform this CMS into its semantic equivalent.This is the reason why there is a high coupling between the tool and the CMS.As a solution to this limitation, we propose to adapt FLERSA to the new architectural design followed by the SOA 2.0 architecture described in the previous section.This will lead to provide FLERSA with a clear differentiation and independence of the client-side, the CMS (Figure 3).
In order to adapt an existing tool or service to the proposed architecture, the developer must first clearly identify what is the functionality of the tool that must be implemented in the server-side.In the FLERSA case, this functionality is about the information retrieval, storing, and reasoning on the knowledge base, which will now be implemented in the specialization of the synchronization service, in order to take advantage of the architecture proposed.Currently, the services that support this architectural design have been implemented both in C++ and in C#.Nevertheless, the programming language is only a technical issue that does not affect the interoperability of the architecture, since standard protocols for exchanging information (e.g., SOAP) have been adopted, as well as communication approaches for loosely coupled components (e.g., EDA).
Once the server-side functionality of the tool is implemented, as the specialization of the synchronization service, it is necessary to identify which is/are the shared resource(s), the actions that users can perform on it, and the possible inconsistencies that can arise because of those actions.This is one of the most important steps in the adaptation process, owing to the fact that the correctness of the resource will depend on the right identification of the possible inconsistencies and the resolution policies applied.In the FLERSA case, the shared resource is a knowledge-base, whereas the actions that can be carried out on the resource are add, modify, and delete semantic annotations.Moreover, these annotations can be performed on the whole document, a node, a subset of nodes, or a text segment, and each of these elements can present several annotations.Therefore, the conflicts during the use of the FLERSA service can be caused by deletions or modifications coming from different users on the same annotation.In order to solve this kind of conflicts, a version control has been implemented, where a deletion or modification is not permanent and it is possible to revert to the previous version of the annotation.
All of these actions are represented as events in the new architectural design.This will facilitate the management and broadcasting of the actions performed on the shared resources along the concerned entities.In the proposed platform, the BlueRose communication middleware [38] is used, which provides a Publish-Subscribe service and an interface for event managing.An event is represented by means of a pair topic-attributes, where each attribute is a pair key-value.In this way, the topic denotes the event type, which is unique in the system, whereas with the set of attributes it is possible to represent information of a wide range of complexity.
The flexibility and low coupling provided by the EDA approach have made it possible to design and implement a monitoring service, which is designed and implemented to monitor any kind of event.In the FLERSA case, three types of events taking into account the possible actions that a user can perform have been considered: add, modify, or delete an annotation.The generated events contain information related with the user who performs the action, the content related with the annotation (the whole document, a node, a subset of nodes, or a text segment), the timestamp, and information related with the semantic content of the annotation (see Figure 4).These events are stored in a NoSQL database (i.e., a nonrelational database), specifically MongoDB.The NoSQL systems arise to address the scalability problems of the traditional databases (i.e., relational) by means of a more flexible storage structure.Particularly, the absence of data schema allows storage of any information as a register with a key-value structure.This makes a NoSQL database ideal to store any kind of event generated in the system, whereas in a traditional database it would be necessary to create a new table to store each new kind of event.To this regard, the monitoring service translates the events from BlueRose format to MongoDB (JSON format) and vice versa.
With the server-side being implemented, as a specialization of the synchronization service, and together with the identified events, the functionality of the client-side must be additionally implemented.This can be done by creating a new application or adapting an existing one.In the latter, to make use of third-part applications, the common solution is to implement an intermediate entity (known as wrapper) that is capable of translating the petitions of the client to the server and vice versa, and thus it is also capable of adapting existing interfaces to the new service as well.In the FLERSA case, the client-side is located in the CMS module, where a wrapper function is implemented.In this way, if the CMS changes then only a new corresponding implementation of the wrapper function will be required in order to use the new FLERSA service.Web documents and metadata, like RDFa, are also in the client layer.These resources are downloaded to the device temporally when the client accesses the CMS.
In addition to the benefits initially mentioned obtained through the adaptation of FLERSA to the new architectural design proposed, this SOA architecture also manages a number of additional events at system and infrastructure levels [15], as, for instance, the battery level of the device or network topology.This, together with replication and caching techniques, allows to provide a context-aware solution and therefore guarantee the quality attributes of the FLERSA service in AmI environments where the context conditions (e.g., disconnections) are continuous and they can affect the proper functioning of the service.The integration of FLERSA in the proposed SOA architecture has strengthened three of the initial design requirements of the tool [11].(1) Requirement 2. Collaborative design/usercentred: as a result of the integration, FLERSA service allows now the concurrent and distributed edition of HTML documents.(2) Requirement 5. Evolution of documents (document and annotation consistency): the synchronization service and the synchronization policies implemented guarantee the consistency of documents, as well as allowing users to know the existence of conflictive modifications and to recover previous versions of documents.(3) Requirement 8. Integration: now FLERSA functionality has been encapsulated in a service, which facilitates the reusability and interoperability of the tool.

Case Study
The FLERSA service can be useful in several scenarios in the eHealth domain, where the semantic web can help to retrieve information [40] and share the patient's medical histories created in different health centres and the semantic interoperability of distributed information systems in eHealth [41] for the collaborative decision support in disease diagnosis [42].
Specifically, the collaboration between various specialists for the diagnosis of patients with strange symptoms [43] is of special interest to illustrate the usefulness of FLERSA Service.On one hand, health is a wide knowledge area with a complex taxonomy, where any department or research group can define some particular protocol or vocabulary.Thus, to establish semantic relations between concepts or procedures is a mandatory step to achieve collaboration between different specialists or health institutions.On the other hand, it is unusual the existence of medical centres containing every health speciality, being generally more common the existence of specialised centres.Therefore, a tool that allows the distributed collaboration could provide clear benefits.
Figure 5 shows a general scenario about how users can work with FLERSA service and how they can collaborate.In Figure 5 are depicted two web servers and five users.Several users access the FLERSA service by using different types of devices.The web users use web browsers to access the service through web servers as front-end, while the rest of users use the service through an application deployed in a mobile device (e.g., smartphone or tablet) as front-end.
For existing applications, like web systems and other applications, a specific wrapper function for each one is needed, which translates the requests that web client makes to web site to the FLERSA service.In case of new applications, the wrapper function is not required, given that the interface of FLERSA service can be used directly by these applications.
Specific scenarios can be considered in Figure 5, for instance, if a doctor (e.g., "Web User 1" in Figure 5) creates a new semantic annotation in an HTML document (e.g., a patient report).This action will be propagated as an event through the system, and it will be registered by the monitoring service and applied by the FLERSA service to the knowledge base.In this way, this change will be reflected on the Drupal CMS as well as the Joomla!CMS and the rest platforms and technologies in the network.That is, FLERSA is now platform independent, which increments its scalability and interoperability.In this situation, the "Web User 1" could be collaborating with specialists that belong to another clinical centre and work on Joomla!(e.g., "Web User 2" and "Web User 3"), or even with mobile users.
The process that a user follows to create an annotation on an HTML document is depicted in the Figure 6.Note that an annotation can be fully or partially overlapped with others.FLERSA resolves this creating different SPAN sections with unique IDs.
However, under this new configuration, as it has been mentioned conflicts can occur due to concurrent editing.For instance, if "Web User 1" modifies an annotation on a patient report, while "Web User 2" deletes it, FLERSA service is in charge of maintaining a consistent version of the shared resource.In this case, the users could have created the conflict because of a distraction, a temporal disconnection of one of them, and thus the changes were not reflected to each other in time (i.e., they ignore the action performed by the other user), or intentionally.For this reason, the policy implemented in FLERSA service is to apply the last change but maintain the previous version of the annotations in order to avoid information loss, notifying to the users the existence of a  potential conflict.In this way, the users can mediate to decide the correct state of the resource, but guarantee that both are consistent versions.
The architecture can react to context changes in order to guarantee the proper functioning of the service.This feature is of special interest for mobile users, where, for example, specialists would want to work together while they move across the hospital or they are travelling (e.g., by train), where the connection can be lost easily at certain points during that travel.In this case, the FLERSA service receives events regarding the connection loss, and together with caching techniques, it can allow users to the user continuing working transparently to the connection lost.The modifications made on the local copy of the resource are stored locally at the device during the disconnection, as a set of events.Later, when the connection is recovered, the FLERSA service will be able to synchronize the modifications that the user has made offline with the modifications of the rest of the users that were online.When FLERSA service (as specialization of the synchronization service) receives the request of reconnection from the client that was offline, together with the actions that he/she has performed in that period on the cache copy of the resource (as a set of events), it will inquire the monitoring service about what changes have been made in the main copy of the resource during that period of time by the rest of the users.Once FLERSA service has the two set of ordered events, it can detect the conflictive events (i.e., the conflictive modifications) and apply the versioning policy described above.In a similar way, the application could decide to start working locally when the battery of the device is low and synchronize the changes later on, when the energy connection is not critical.

Conclusions
In this paper, the evolution of the FLERSA tool towards a SOA proposal for consistent knowledge sharing and collaboration in AmI environments and IoT has been presented.The proposal provides FLERSA tool capabilities such as scalability [44], since the services are loosely coupled; interoperability [45], since it is possible to expose any existing data source like service and to implement workflows that allow exchange of information between different services and platforms through a communication protocol; and business agility [46], thanks to service reusability and the use of access and publication standards.The design of tools (like FLERSA) based on the proposed architecture aims to provide a general solution that opens up new possibilities for the development and deployment of other AmI applications by making use of different technologies and heterogeneous platforms in IoT.For example, the FLERSA service can be now deployed simultaneously in several Cloud providers and IoT nodes (smartphones, on-board car systems, etc.), obtaining benefits such as higher availability, where the resources or services will be also available wherever an Internet connection exists; transparency, where the user no longer has control of the geographical location of service; and the resources which can be increased or decreased as needed.
The SOA 2.0-based architecture for the FLERSA deployment is designed to provide support to the distributed collaboration in environments that exhibit discontinuous operation.To this regard, it combines caching and replication techniques together with a context-aware approach to provide a solution to address complex interactions between users in AmI environments in a transparent way.Therefore, it provides a common basis to handle the changing execution context in which AmI and IoT applications can be deployed.
Currently, we are still working in FLERSA with the focus on supporting automated annotation processes in the healthcare environment.We are collaborating with professionals from a public regional hospital in Granada, who have proven the low error rate while categorizing physiology documents [47].Nowadays, we are involved in a research project with that hospital; the aim of the project under way is the use of the automated annotation system at anywhere and anytime, but limited to the hospital building environment, to generate metadata into the existing electronic clinical records, using concepts and properties from medical ontologies.In short, the multiple benefits offered by semantic web technologies would take advantage in the Hospital Healthcare Information Systems such as enhanced search capabilities, multidisciplinary queries, and alert autogeneration for specialist, among others.

Figure 2 :
Figure 2: Architecture of the generic service platform to support sharing and collaboration.

Figure 3 :
Figure 3: The SOA architecture for the FLERSA tool.