Semantic-Based Requirements Content Management for Cloud Software

Cloud Software is a software complex system whose topology and behavior can evolve dynamically in Cloud-computing environments. Given the unpredictable, dynamic, elasticity, and on-demand nature of the Cloud, it would be unrealistic to assume that traditional software engineering can “cleanly” satisfy the behavioral requirements of Cloud Software. In particular, themajority of traditional requirementsmanagements take document-centric approaches, which have low degree of automation, coarse-grained management, and limited support for requirementsmodeling activities. Facing the challenges, based onmetamodeling frame called RGPS (Role-Goal-Process-Service) international standard, this paper firstly presents a hierarchical framework of semantic-based requirements content management for Cloud Software. And then, it focuses on some of the important management techniques in this framework, such as the native storage scheme, an ordered index with keywords, requirements instances classification based linear conditional random fields (CRFs), and breadth-first search algorithm for associated instances. Finally, a prototype tool called RGPS-RM for semantic-based requirements content management is implemented to provide supporting services for open requirements process of Cloud Software. The proposed framework applied to the Cloud Software development is demonstrated to show the validity and applicability. RGPS-RM also displays effect of fine-grained retrieval and breadth-first search algorithm for associated instance in visualization.


Introduction
Cloud computing is a collection of web-accessible resources, provisioned under service-level agreements established via negotiation, that should be dynamically composed and virtualized based on consumers' needs on an on-demand basis.Cloud Software is a software complex system whose topology and behavior can evolve dynamically in Cloud-computing environments.Cloud Software tends to evolve at a rapid pace to meet the continuous growing requirements.Due to this, Cloud Software can support (i) coordination of independent and self-interested parties, for example, Cloud consumers and Cloud service providers, (ii) efficient reconfiguration of existent and permanent Cloud service compositions, given constantly changing Cloud consumer requirements, (iii) dealing with incomplete knowledge about the existence of Cloud participants and the services they provide, due to the distributed nature of Cloud-computing environments, and (iv) dynamic and automated composition of distributed and parallel Cloud services [1,2].The adaptability of Cloud Software to Cloud-computing environments and requirement variations becomes critical.Given the unpredictable, dynamic, elastic, and on-demand nature of the cloud, it would be unrealistic to assume that traditional software engineering can "cleanly" satisfy the behavioral requirements of Cloud Software [3].Hence, a unified requirements metamodeling frame called RGPS (Role-Goal-Process-Service, International Standard ISO/IEC19763) [4] is proposed by merging functional and nonfunctional (context and trustworthy) requirements.This trend is an underlying driving force for "X" as-a-service (XAAS).
A phenomenon worth noting is that vast stacks of requirements of Cloud Software are deposited in the open interconnected environment.The majority of traditional requirements management tools (such as QSS's DOORS, TBI's Caliber-RM, and IBM Rational RequisitePro) store requirements specifications and related information in multiuser database.This problem can fall into low degree of automation, coarse-grained management, and limited support for requirements modeling activities.Software requirements are insufficient in semantic information and have no unified metamodel.Moreover, traditional requirements management was concerned with change control, version management, and requirements tracking management, but it cannot provide perfect support for requirements elicitation, requirements analysis and requirements V&V, and other activities.Requirements are low reuse and unsuitability for complex software systems including Cloud Software.
Compared to traditional requirements management, requirements management of Cloud Software shows new characteristics: requirements management meets Cloud participants to the greatest extent; requirements of Cloud Software are not only personalized and uncontrolled, but also managed to throughout ensure the lifecycle of requirements, from induction period-growth period-maturity period and finally to the decline period; requirements changes are more complex and difficult to be controlled; requirements reuse is needed to be more refined to achieve the finer granularity.
Accordingly, we argue that automatic construction of Cloud Software is very important for Cloud services that depend upon requirement capabilities.The motivation of our work is to provide a semantic-based requirements content management for Cloud Software that is based on RGPS frame and provides effective support for open requirements process.Meanwhile, we make efforts to build prototype tool called RGPS-RM and explore its function and performance.

Metamodeling Frame RGPS. The ultimate goal of Cloud
Software is to realize mass customization at a low cost in a short time.There are many challenges in requirements engineering of Cloud Software as scarcity of standard description language (like WSDL and WADL) for Cloud service [5], without unified domain knowledge for mass requirements custom, and no unified description method for personalized preference.Furthermore, due to uncertain requirements of Cloud Software, current requirements modeling method is difficult to deal with personalized, mass, and diversified requirements of Cloud Software.
Wuhan University and Tsinghua University joined the ISO/IEC/JTC1/SC32/WG2 workgroup, which made International Standard ISO/IEC19763-Metamodel Framework for Interoperability (MFI) complex software systems including Cloud Software [4].Metamodel frame RGPS (Role-Goal-Process-Service) is the most important standard.In Figure 1, metamodel frame RGPS is a hierarchical and cooperative frame, which sums up multilevel and multigranularity requirements as Role layer-Goal layer-Process layer-Service layer and the relationship between these four layers.In detail, the Role layer characterizes the organization, role, and actors and describes the interaction and cooperation in problem domain; the Goal layer depicts the decomposition of goals and determines the constraint relationship among goals; the Process layer distinguishes atomic processes and composite processes and defines the inputs/outputs/preconditions/effects of processes; the Service layer guides the construction of service aggregation of service resources.
In fact, requirement modeling process, based on metamodeling frame RGPS, begins from the analysis of organizational structure of requirement-solved problem space, through mapping and transformation between Role layer-Goal layer-Process Layer-Service layer, and generates Cloud service-based requirements specifications ultimately.The normative formal definition of metamodeling frame RGPS is OWL language.

Traditional Requirements Management
System.Schwaber and Sterpe provide a definition of requirements management: "The storage of requirements, the tracking of relationships among requirements, and the control of changes to individual requirements and groups of requirements" [6].Hence, traditional requirements management is the process of documenting, analyzing, tracing, prioritizing, and agreeing on requirements and then controlling change and communicating to relevant stakeholders.For supporting an efficient requirements management process, firms have developed some specific tools, which can assist organizations in defining and documenting requirements by allowing them to store requirements in a central location.

DOORS.
DOORS is a sophisticated product that can manage requirements on large products.It treats individual requirements as objects but presents them in a visual format that resembles a structured, hierarchical requirements document.The requirements display also shows attribute values, indicators of links to other requirements, and colored bars that indicate a requirement's change status.Defining requirement links through the link matrix is clumsy, but DOORS also provides several other link definition mechanisms.

Caliber-RM.
Caliber-RM takes a database-centric approach to requirements management.Caliber-RM has fairly loose integration with Word, but it has highly flexible import capabilities.Caliber-RM provides a Windows Explorer-like workplace for manipulating the hierarchical requirements tree, with requirement details accessible through a tabbed dialog on the right side of the screen.It can also manage traceability relationships and attributes through a grid display.

RequisitePro.
RequisitePro takes a document-centric approach to requirements management, exhibiting the tightest integration with Word.It can mark selected blocks of text to include in the database as discrete requirements.It is easy to access the requirement details, including its revision history, attributes, traceability, hierarchy, and discussions.The mechanisms it uses for synchronizing the requirements in the database with the contents of the requirements specification (SRS) are a bit clumsy [7,8].

Requirements Management of Cloud Software.
In addition to the above features, requirements management of Cloud Software provides supporting services for other requirements activities including requirements elicitation, requirements analysis, and requirements validation.The majority of traditional requirements managements take documentcentric approaches because neither software requirements have a unified metamodel nor computer needs to understand contents of requirements specifications.Links between requirements are constructed by manual way or information retrieval technologies.Correspondingly, requirements management of Cloud Software has metamodel frame RGPS and needs semantic content management with RGPS requirements (Cloud Software requirements).Keywordbased matching is not enough to reuse and needs to improve the granularity and accuracy, in which information retrieval technologies may be inapplicable to build the relationship among RGPS requirements.

Semantic-Based Requirements Management Framework
Facing the new characteristic and challenges of semanticbased requirements management for Cloud Software, we propose a hierarchical framework for semantic-based requirements management of Cloud Software, including semantic requirements storage layer, semantic content management layer, fine-grained semantic indexing layer, requirements organization, and classification layer and common APIs layer, as shown in Figure 2.

Semantic Requirements Storage Layer.
When customizing personalized requirements of Cloud Software, a requirements description language called Service-Oriented Requirements Language (SORL) [9] is to elicit users' requirements precisely.
Based on metamodel Frame RGPS, user's requirements in SORL can be customized into goal model and process chain step by step.Finally, these models expressed with OWL can be saved as the corresponding Cloud servicebased requirements specifications.In order to get rid of the document-centric management mode of traditional requirements management, this layer focuses on solving the storage mechanisms of Cloud Software requirements and meeting the manageability of massive Cloud Software requirements under the complex network environment.This layer should be necessarily scalable and efficient.Based on the native storage scheme, it takes an ontology hypergraph representation as data mode for its persistent storage, which effectively avoids the costs of data model transformation when accessing Cloud Software requirements data.

Fine-Grained Semantic Indexing Layer.
Besides the basic hypergraph traversal, Cloud Software requirements indices mechanism is designed to accelerate the access to the persistent storage, including tree triple indices.The approach is based on the so-called triple indices which are B-tree index structures built on tree triple sets.In the practical implementation, an ordered index with keywords is an important basis to achieve fine-grained granularity retrieve requirements and to improve the flexibility of requirements reuse.

Requirements Organization and Classification Layer.
In order to achieve an order cache and improve the efficiency of hierarchical retrieval Cloud Software requirements, requirements classification algorithm is a necessary component in typical classification tasks.Some typical algorithms are (1) kernel-based algorithms, such as support vector machines (SVMs) [10], which maximize the margin of confidence of the classifier, and are the method of choice for many such tasks and (2) probabilistic graphical models; they represent correlations between labels by exploiting problem structure.For example, conditional random fields (CRFs) [11] are a probabilistic framework for labeling and segmenting structured data, such as sequences, trees, and lattices.

Hierarchical Retrieval Layer.
In order to solve single and unitary form in traditional retrieval, this layer constructs concepts, properties, and instances indices to achieve composite retrieval.In addition, association retrieval of Cloud Software requirements is designed to find indirect relevance of requirements instances, to support trace analysis, change analysis, and postimplementation (i.e., free customization of Cloud service-based requirements specifications).

Common Interfaces Layer
. This layer provides welldefined and rich APIs for integration and invocation services of requirements management when it is needed by requirements activities.For example, requirements query service gives four main interfaces, that is, SimpleQuery(), Advanced-Query(), HierarchicalQuery(), and RelationQuery().These interfaces shield complex implementation of services and guarantee transparency when requirements activities access semantic-based requirements management system of Cloud Software.We design a concise storage scheme as the class diagram shown in Figure 3, which is implemented on Berkeley DB.GraphVertex represents a vertex in  and has three fields: oid, label, and type.Oid has the type of integer with the length of 64 bits and is used to uniquely identify a vertex.Label is a variable length string used to record the value of URI reference or literal.Type is also a 64-bit integer designed to encode specific semantic information of the vertex in a bitmap manner.GraphEdge represents an edge, where srcVertex, typeVertex, and tarVertex correspond to the oids of three elements of a triple [12].

Key Technologies of Semantic-Based Requirements Management Framework
The native persistent storage takes the hypergraph representation for Cloud Software requirement as the data model, which effectively avoids the costs of data model transformation, and thereafter achieves an efficient and scalable access to the RDF data.

Cloud Software Requirements Indices Schema.
The data indexing helps to ease the search of and access to data at any given time.Most of the approaches maintain a set of six indices covering all possible access schemes an RDF query may require.These indices are PSO, POS, SPO, SOP, OPS, and OSP ( stands for property,  for object, and  for subject) [13].These indices materialize all possible orders of precedence of the three RDF elements.This representation allows fast retrieval of all triple access patterns.Otherwise, it is oriented to towards simple statement-based queries and has limitations for efficient processing of more complex queries.
Besides the above set of six indices, there exists another approach to facilitating complex queries.The approach is based on the so-called an ordered index with keywords (see Figure 4) which are structures built on Lexion and placement table (postings).Lexion takes charge of keywords related to postings.Placement table is designed as two-dimensional pointer list: one includes pointers related to software requirements documents; another includes pointers pointing to the same requirements documents where keyword-included triples and records statistics of its frequency exist.
The first dimension pointer can easily obtain network software requirements document related to the keywords.The second dimension pointer can determine the exact triple where the keyword appears and also returns relevant statistical information.Thus, requirements indices schema can ensure fine-grained retrieval and fast keyword search.

Cloud Software Requirements Instances Classification.
Based on metamodel frame RGPS, classification algorithms (such as SVM, linear CRFs) are used to achieve the effective organization of Cloud Software requirements and automatic classification.The No Free Lunch (NFL) [14] theorem suggests that a more useful strategy is to gain an understanding of the dataset characteristics that enable different classification algorithms perform well.Figure 5 shows the general process of automatic classification of Cloud Software requirements.Obviously, classification algorithm is crucial to automatic classification of Cloud Software requirements.An algorithm's performance is measured on both the percentage of correct classifications and computational complexity.
In document summarization area, many classification algorithms consider the summarization task as two classes and classify each sentence individually without leveraging the relationship among sentences.One is in a discriminative way with well-known algorithm such as support vector machine (SVM).SVM is to find the decision surface that maximizes the margin between the data points of the two classes.Yet, as Cloud Software follows metamodel frame RGPS, its feature space is so large and dependent that makes SVM algorithm cannot fully exploit the potential useful features.
Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting structured data, such as sequences, trees, and lattices.Taken concept instance, attribute, and attribute value of Cloud Software requirement as three entities, if there is a relationship among the concept instance, attribute, and attribute value, it is marked 1; otherwise it is marked 0. Input sequences of CRFs are attribute value and edge.Suppose the attribute value sequence  = ( 1 ,  2 , . . .,   ) and corresponding edge sequence  = ( 1 ,  2 , . . .,   ), the probability of  conditioned on  defined in CRFs, ( | ), is as follows: where   is the normalization constant that makes the probability of all state sequences sum to one.  ( −1 ,   , ) is an arbitrary feature function over entire observation sequence while   (  , ) is a feature function of state at position  and the observation sequence;   ,   are the weights learned for the feature functions   and   , reflecting the confidence of feature function.The feature functions can measure any aspect of a transition for  −1 →   , and the global characteristics of .
Given the conditional probability of the state sequence defined by a CRF in (2) and the parameters Λ, the most probable sequence can be obtained as which can be efficiently calculated by the Viterbi algorithm.Obviously, CRF algorithms consider classification problem with leveraging the relationship among RGPS features.
4.4.Hierarchical Retrieval.Hierarchical retrieval serves for output visualization of semantic requirements management and avoids unitary form in traditional retrieval.In addition, association retrieval of RGPS requirements is designed to find potential relevance of requirements instances.
OIDsubject OIDpredicate OIDobject Frequency Postings of "destination" Postings of "tripItinerary" 4.4.1.Compound Retrieval.One of the Cloud Software requirements is viewed as ontology data in nature.It is necessary to construct all kinds of composite indices, including keywords, attribute, and concept instance, to support arbitrarily complex retrieval.On basis of keyword indexing, we designs three indexing structures: ⟨, URI(), ⟩, ⟨, URI(), ⟩, ⟨, URI(), URI(), ⟩, where  is the keyword, URI() is the concept, URI() is the attribute, URI() is an instance URI, and frequency represents occurrences of keyword .And it in consequence supports compound retrieval with SPARQL and SeRQL languages.

Association Retrieval.
In order to find out the potential relationship among Cloud Software requirements, it provides support for requirement traceability of Cloud Software.
Requirements traceability is concerned with documenting the life of a requirement.Traditionally, it is difficult to construct the relationship among Cloud Software requirements by manual way because it is not applied to fine-grained changes of Cloud Software requirements.Through ontologybased relation mining among Cloud Software requirements, including machine automatic analysis, cluster analysis, and statistical analysis, semantic relationship and measurable relationship are found out to support requirements change analysis and requirements version management.

Definition 2 (association relation).
There are four association relation modes in Cloud Software requirements as follows: The distinction between the direction mode and indirect mode is very important because it determines whether data mining is needed for requirements data.
Fortunately, metamodeling frame RGPS has been clearly defined relationship among the concepts existed in Cloud Software requirements.We design breadth-first search algorithm for associated instances under the help of metamodeling frame RGPS, without consideration of indirect association.
The advantage of this algorithm is to fully utilize metamodeling frame RGPS to guide node expansion.However, this RGPS-dependent relation is relatively strong and a large number of nodes need to be expanded especially in complexity of nested subclass.Notice that queue instead of stack is used to control the instance expansion (see Algorithm 1).

5.
1.An On-Demand Q/A Case in Traveling Domain.In this section, experiments to test the semantic-based requirements management framework are provided.Through an ondemand Q/A system case in traveling domain, the corresponding instance of RGPS requirement is construed.
Take Xiamen traveling planning as an example, Mr. Li chooses a suitable planning considering factors as weather conditions, traffic information, and distance between Xiamen University and corresponding destination.RGPS guides dynamic requirements modeling by describing personalized requirements from multirole, multiobjective, multiprocess, and multiservices.A requirements description language named SORL is to elicit users' requirements precisely, and requirement of Cloud Software that can be expressed with OWL can be saved as the corresponding Cloud servicebased requirements specifications (Cloud service ontology) in the end.A prototype tool called RGPS-RM for semanticbased requirements content management is implemented to provide supporting services for open requirements process of Cloud Software.
Figure 6 demonstrates the process of automatic construction of Cloud Software.Figure 7 gives part of traffic services categorization ontology.Cloud providers/services of traveling domain must be searched according to their geographical location, offered services, deployment model, security mechanisms, pricing, and so forth.When one Cloud provider goes out of business again, a compliant Cloud provider according to the requirements must be chosen by searching method (e.g., ontology-based generic search method [15] and hidden Markov model-based service discovery method [16]) and it must be ensured that Cloud services are interoperable.

RGPS-RM.
In order to evaluate the effectiveness of the proposed framework, a prototype tool called RGPS-RM is built, whose backend database is Berkeley DB to ensure reliable and high performance.Combined with an on-demand Q/A system case in traveling domain, Figure 8 shows an overall view of RGPS-RM, which is very convenient for observation and navigation.Different colors dots denote Role layer (red)-Goal layer (yellow)-Process Layer (blue)-Service layer (green).RGPS-RM provides four main interfaces, that is, SimpleQuery(), AdvancedQuery(), HierarchicalQuery(), and RelationQuery(), to achieve arbitrary granularity retrieval.Figure 8 shows center all kinds of #TripleItinerary's association relations which are searched by RelationQuery() using breadth-first search algorithm for associated instances.

Performance Evaluation.
In order to evaluate the performance and native storage scheme, we compared RGPS-RM with Sesame v2.7.14 against Lehigh University Benchmark (LUBM) [17].Sesame was configured to store RDF data in MySQL v5.6.22.All the experiments were conducted on a server with two Dual-Core Intel Xeon CPUs (2.8 GHz), 3 GB DDR 333 RAM, and two 300 GB hard disks.Query response time comparisons are shown in Figure 9. RGPS-RM is better than Sesame in queries 1, 3, 7, 9, and 10.Since native storage scheme and RGPS requirements indices schema in RGPS-RM are designed improving the performance, the experimental results fully demonstrate the effectiveness of the index structure.
RGPS-RM has similar query response time to Sesame for queries 1, 3, 7, 9, and 10.RGPS-RM and Sesame use a persistent storage based common hypergraph data model.RGPS-RM is a prototype system.There is still much room to improve it for query 1; RGPS-RM relies on some redundancy in exchange for improving efficiency.

Classification
Result.We use precision, recall, and  1measure [18] to evaluate the results, which are widely used in information retrieval.We choose 500 RGPS requirements stored in RGPS-RM.Evaluation was performed through SVM and CRFs to classify RGPS requirements.Table 1 shows that CRFs better than SVM in both the average value of each category and as whole.

Conclusions
Cloud Software brings forward many new challenges to traditional requirements management, particularly its low degree of automation, coarse-grained management, and limited support for requirements modeling activities.
Based on metamodeling frame called RGPS (Role-Goal-Process-Service) international standard, this paper presents a hierarchical framework of semantic requirements content management for Cloud Software, which has the following key technologies: introducing native storage scheme to avoid data model transformation; taking an ordered index with keywords to support Cloud Software requirements in arbitrary granularity retrieval; employing linear CRFs to achieve the effective organization of Cloud Software requirements and automatic classification; and using breadth-first search instance to find out the potential association relation among Cloud Software requirements.In addition to the above, we also construct prototype system called RGPS-RM, which demonstrates perfect performance and exhibits effect of breadth-first search algorithm for associated instance in visualization.
The future work includes providing better supporting services for requirements elicitation, requirements analysis and requirement V&V, and other activities and completing prototype system RGPS-RM and being applied in some areas including Cloud-based e-business and Cloud-based mobile augmentation.

Figure 2 :
Figure 2: A hierarchical framework for semantic-based requirements management of Cloud Software.

Figure 4 :Figure 5 :
Figure 4: Diagram of an ordered index with keywords.

Figure 6 :
Figure 6: Automatic construction of Cloud Software.

Figure 7 :
Figure 7: Part of traffic services categorization ontology.
It enhances storage and data access performance, as the Cloud Software needs large amount of Cloud Software requirements, as well as the relational model and ontology graph model in the data model conversion prone to impedance mismatch effect.OWL specification defines a mapping OWL ontology  into an RDF graph ().In fact, a Cloud Software requirement is defined as a set of triples, a formal definition is given as follows.Suppose that  = (, ) is an RDF graph, where  = {V  |  ∈  ∪  ∪ }, and  = {(V  , V  , V 4.1.Native Storage Scheme.Data storage modes can be categorized into three groups.4.1.1.Memory-Based Storage Model.It loads all data into memory with queries fast response, memory overhead significantly.4.1.2.Relational Object Database-Based Storage Model.It organizes, operates, and manages data by using mature database technologies.However, the performance bottleneck is also inevitable because of the extra cost for data model transformation when accessing Cloud Software requirements data within a non-RDF persistent storage.4.1.3.Native Storage Schema. } | (, , ) ∈ G},  stands for an infinite set of URI references,  for an infinite set of blank nodes, and  for an infinite set of RGPS literals.

Table 1 :
Result of SVM and CRF in RGPS classification.