Requirements taxonomies have been found useful in software requirements elicitation and specification, both for educational purposes and in practical usage, for instance, as checklists to ensure that important categories of requirements are not forgotten, and for guidance on how to write various types of requirements. While mobile information systems are becoming increasingly important, traditional requirements taxonomies do not have any category for mobility requirements. This paper reports on a controlled experiment where two groups of students both got the same excerpts of the well-known Volere requirements taxonomy, but for one treatment group the tutorial material was also extended with additional material on mobility requirements as a requirements category in its own right. Using the provided taxonomy material for guidance, the students were asked to write requirements for a system presented in a natural language case description; afterwards their output was analyzed to score the number and quality of requirements found by each student. The main finding was that the students using the extended taxonomy also found more requirements, but there was no significant difference in the quality of requirements between the two groups.
Requirements engineering is a crucial activity in the development of large software systems, bridging between the customer’s needs and what can realistically be implemented in software. Requirements taxonomies have been found useful for this activity, both for educational purposes and in practical usage, for instance, as checklists to ensure that important categories of requirements are not forgotten, and for guidance on how to write various types of requirements. One well-known generic taxonomy for software requirements is the Volere taxonomy [
Recently, mobile devices have seen a lot of improvements in computing power, user interfaces (e.g., smart phone touch screens), and bandwidth. This has made it possible to perform a wide range of information processing tasks on the go that previously had to be performed in the office. This has increased the importance of mobile information systems. Many mainstream requirements techniques were however developed before mobile information systems were in widespread use, and this also applies to the general requirements taxonomies [
The research questions for the experiment are as follows: Does the taxonomy of mobility-related requirements help people finding more requirements? Does the taxonomy of mobility-related requirements help people write requirements of better quality? In the context of this experiment, quality was defined in terms of relevance, testability, and clarity of the requirements.
The rest of the paper is structured as follows: Section
Before getting into the experiment part, a brief introduction of the Volere taxonomy and the proposed Mobility extensions will be useful. Volere [
Volere taxonomy (white), including only level 3 nodes relevant for the experiment, and extensions for mobility (grey).
Volere does not have any particular category for mobility requirements, but it does include some other categories which are clearly related to this. In particular, the category “physical environment” may cover requirements that the system shall be used in, for example, wet, noisy, or warm settings. Also, categories like reliability, availability, and robustness—although generally relevant for all kinds of systems—will have special challenges for mobile systems, for example, because of poor or lost network coverage and various physical impacts on the equipment. Some of the requirements that might be captured as mobility requirements according to our taxonomy in [
Our proposed taxonomy for mobility-related requirements [ (pure) mobility requirements: specifying some level of mobility, without indicating design solutions, that is, specifying mobility challenging factors and mobility achievement levels. Examples for mobility challenge factors may be the speed of movement needed (the larger the speed, the more difficult it might be to support the movement or provide nondegraded service while moving), the area/range of movement (the larger the area, the more difficult). Examples for mobility achievement levels may be ability to (actively) move. This could be particularly relevant for embedded systems, for example, where a software application is running an engine and steering system, but less relevant for enterprise information systems, which is our main concern, ability to facilitate movement, for example, real-time positioning, mapping, and navigational services, ability to provide a certain level of service (e.g., no degradation from the normal situation) in spite of the challenging usage context, mobility-system requirements: requirements associated with subsystems whose purpose is to support mobility. It obviously depends on whether the system needs to be in a particular network or not. Examples may be positioning system, network scanning and acquisition system, service scanning and acquisition system; mobility data requirements: data requirements werer associated with subsystems which support data and mobility. Examples may be data the system needs to store mobile devices in use, their contact details, storage capacity, position data, tracking data, and so forth; mobility constraints: design decisions related to mobility that have been lifted to the requirements level. Examples may be decision to use one specific standard for mobile communication, decision to use one specific type of mobile equipment, or equipment compatible to that—given that the customer really
Both the Volere taxonomy and our mobility taxonomy are too big to be explored in their entirety in just one single experiment. The most interesting part of our taxonomy if considered to have mobility requirements as a category in their own right in a large, more general taxonomy, is the pure mobility requirements (first black bullet item in the list above, with white subbullets). So, as a starting point we took just a simplified version of our taxonomy, focusing on this part, and likewise just an excerpt of the Volere taxonomy that would be most closely related to this, namely, the subcategories of reliability, availability, robustness, and physical environment. The requirements taxonomies that were used in the experiment are thus indicated in the UML class diagram in Figure
For the mobility category, the tutorial explained that such requirements might typically contain a challenge part and an achievement part, as indicated by the aggregation relationship at the bottom of Figure
To our knowledge, there is no other paper reporting an experiment comparing two software requirements taxonomies for the purpose of elicitation and specification of requirements. Reference [
Reference [
Our research goal in this paper is to find out whether a taxonomy for mobility-related requirements will help people find more and better such requirements given a requirements specification task. Typical ways to perform such evaluations are either case studies or controlled experiments, between which there are some obvious trade-offs, especially concerning realism (in favor of case studies) versus control (in favor of experiments) [
For practical reasons, it was decided to run the experiment with student participants. While industry participants might have been better in terms of being more representative of the state-of-practice, we lacked the funds to hire practitioners by the hour to participate in such an experiment. Moreover, since the experiment was about a new requirements taxonomy, which the participants had to learn and then use in the time frame of the experiment, the difference between students and practitioners might be smaller than it would have been if, for example, comparing two RE techniques well known in industry already. That is, in our experiments, practitioners would, in the same manner as students, have had to start by reading about the proposed taxonomy and then try to apply it.
A vital challenge for the experiment design would be exactly what to compare. Our main alternatives would be the treatment group get our mobility requirements taxonomy, the control group get nothing, thus using an ad hoc technique, the treatment group get an existing requirements taxonomy extended with our mobility requirements taxonomy, the control group get only the existing requirements taxonomy without any extension.
The perceived problem here was that alternative 1 would be too biased in favor of the new taxonomy, since—through the tutorial about this taxonomy—the students would get a lot of hints about possible types of requirements and how they could be written, while the control group would get nothing. This could have worked if the participants were experienced RE practitioners who could then use instead their state-of-practice technique, but with students lacking such a technique, the result for the control group might easily be very limited or poor output. We therefore chose alternative 2, both groups getting the same excerpt of the Volere taxonomy, and the treatment group additionally getting one extra category of requirements in their experiment tutorial on mobility requirements. Thus, both groups would get some hints on types and styles of requirements, only the treatment group would get a little more. This experiment could more easily go both ways, the treatment group could gain from the extra material if the mobility-related taxonomy turned out useful for identifying some requirements, or alternatively—if this taxonomy turned out confusing or not fitting the other material, or the treatment group were thus overloaded with information compared with the control group—it could also be detrimental to their performance.
Another challenge was that we could not include the entire Volere taxonomy and entire taxonomy of mobility-related requirements in the experiment, as this would have caused the tutorial material to be way too long to fit into the time frame of a controlled experiment. Hence, as mentioned before, from the taxonomy of mobility-related requirements, we only included the so-called pure mobility requirements. Similarly, from Volere we only included the categories most closely related to mobility, namely 12d reliability and availability, 12e robustness and fault tolerance, and 13a Expected Physical Environment.
54 students were recruited from a second year computer science class to take part in the experiment. The participants were randomly divided into 2 groups, the treatment group using excerpts from the Volere taxonomy extended with a part about mobility requirements, and the control group getting only the excerpts from Volere.
The participants performed the following tasks during the experiment: Answering a preexperiment questionnaire. The purpose of the preexperiment questionnaire was to investigate the participants’ prior knowledge of related topics like mobility, requirements specification, and so forth, which can be used to control for any accidental group selection bias in spite of random selection (e.g., one group accidentally containing people with much more relevant experience). Questions investigated previous knowledge on eliciting requirements and specification, IT work experience, and familiarity with portable mobile devices, in total 8 questions that were to be answered within 5 minutes. Reading a tutorial about the requirements taxonomy to be used during the experiment, either a pure Volere excerpt for the control group or the same Volere excerpt extended with a part about mobility requirements for the treatment group. Specifying requirements for a mobile application for airline check-in, for which a case description was provided in natural language. From the case description, which was simple running prose, the students were asked to write suggestions for natural language “The system shall…” requirements, as many relevant requirements as they could come up with during the allotted time. The precise task stated that a web-based system usable on stationary office PCs already existed for the system in question, so that their task was not to propose requirements for the system in general, only to propose additional requirements that would come up when the system owner now wanted to make the system usable also on mobile devices.
The motivation for focusing exclusively on mobile requirements in item 3 is of course that it is such requirements the mobility taxonomy purports to support. We have no assumption that the mobility taxonomy would be of any help if specifying requirements for a traditional stationary information system, so experimenting with the taxonomy in such a context would not be very useful.
From investigating the subjects’ task performance, we are interested in finding how many requirements each would identify and the quality of these requirements. It will also be interesting to look not only at the total number of requirements identified, but also at different types of requirements. Since the only difference between the two treatments is that one tutorial contained some extra material about mobility requirements that the other one did not contain, the main effect—if any—would be assumed to materialize for this particular type of requirement. Moreover, since mobility requirement is considered a subtype of quality requirement, there could again be some effect on these. Hence we divide requirements in 3 disjoint categories: mobility requirements, other quality requirements (i.e., which are not mobility requirements), and other requirements (i.e., which are not quality requirements). In some cases, it might also be interesting to look at unions of these, for example, all requirements taken together, or all quality requirements taken together (i.e., including mobility requirements).
As for quality of requirements, there are many possible measures for this. For instance, Knauss and El Boustani [ relevance: the degree to which the proposed requirement is relevant to the presented case or not, ranging from 3 (obviously relevant) to 0 (not at all relevant). Relevance is pointed out as an important property of requirements in [ clarity: the degree to which the requirement’s intent is clearly understandable to the reader. This is also pointed out as an essential property of requirements in a survey reported in [ testability: the degree to which it would be possible to write a test which clearly demonstrates whether the requirement has been satisfied or not. For quality requirements this will typically mean some kind of quantification. Again, this is pointed out as essential in [
All in all, this gives us a large number of variables to be measured. To save space later in the paper, each variable is named with three letters. The first letter indicates the measure (number, relevance, clarity, etc.), the second the category of requirements that the measure applies to (mobility requirements, other quality requirements, etc.), and the third letter indicates the group of experimental subjects that the measure applies to (treatment group or control group). Examples of variables are the following: NMT, NMC, the number (N) of mobility requirements, (M) found by the treatment group (T) or control group (C), respectively; NXT, NXC, the number (N) of quality requirements except mobility, (X) found by the treatment group (T) or control group (C), respectively; NQT, NQC, the number (N) of quality requirements (Q = M + X) found by the treatment group (T) or control group (C), respectively; NOT, NOC, the number (N) of other requirements (O, that is, not quality requirements) found by the treatment group (T) or control group (C), respectively; NAT, NAC, the number (N) of all requirements (A = Q + O) found by the treatment group (T) or control group (C), respectively; RMC, RMT, the relevance (R) of mobility requirements, (M) found by the treatment group (T) or control group (C), respectively; CMT, CMC, the clarity (C) of mobility requirements, (M) found by the treatment group (T) or control group (C), respectively; TMT, TMC, the testability (T) of mobility requirements, (M) found by the treatment group (T) or control group (C), respectively. QMT, QMV, the quality (Q) of mobility requirements, (M) found by the treatment group (T) or control group (C), respectively. The quality is calculated as the average of the three quality criteria used, namely relevance, clarity, and testability.
Similar variables would then result for the quality of other quality requirements (e.g., RXT,…QXC), quality requirements, other requirements, and all requirements—for the sake of brevity these are not listed in detail since it is assumed to be fairly obvious what all these variables would be.
For the posttask questionnaire, we have some questions related to perceived ease of use (PEOU), some to perceived usefulness (PU), and some to intention to use (ITU). Hence it makes sense to measure 3 variables for each treatment group: PEOU_T, PEOU_C, the response to the PEOU questions by the treatment group (T) or control group (C), respectively; PU_T, PU_C, the response to the PU questions by the treatment group (T) or control group (C), respectively; ITU_T, ITU_C, the response to the ITU questions by the treatment group (T) or control group (C), respectively; P_T, P_C, the overall preference for the taxonomy (i.e., average of PEOU, PU, and ITU) for the treatment (T) and control (C) groups, respectively.
For each of these variable pairs, there will then be a null hypothesis stating that there is no difference between the two variables, and an alternative hypothesis stating that there is a difference, for example, H0,NM: there is no difference in the number of mobility requirements found by the two groups (i.e., NMT = NMC); H1,NM: there is a difference in the number of mobility requirements found by the two groups (i.e., NMT H0,TO: there is no difference in the testability of other (non-quality) requirements found by the two groups (i.e., TOT = TOC); H1,TO: there is a difference in the testability of other (non-quality) requirements found by the two groups (i.e., TOT H0,PEOU: there is no difference in the perceived ease of use of taxonomies reported by the two groups (i.e., PEOU_T = PEOU_C); H1,PEOU: there is a difference in the perceived ease of use of taxonomies reported by the two groups (i.e., PEOU_T
These are just 3 out of a total of 24 pairs of hypotheses, but again we leave out the rest for space reasons, as it would be fairly monotonous to list them all. It can be noted that all the alternative hypotheses are formulated without indicating any expected direction for the difference. Of course, one might think that, for example, the number or quality of mobility requirements should increase for the treatment group, which gets the mobility taxonomy. However, since the taxonomy has not been tried in practice before, it could also happen that the extra taxonomy was only confusing and actually led to fewer requirements or poorer quality. It therefore seemed most appropriate to use hypotheses without any particular assumptions about the outcome.
On average, the participants’ response to the questionnaire was slightly more positive for the treatment group than for the control group, the total average response for all questions being 3.5 versus 3.3, respectively. Since this was scored on a 5-point Likert scale where 3.0 would be the average, it seems that the respondents have been slightly positive but not really enthusiastic about any of the taxonomies. Table
Results for the posttask questionnaire.
Variable | Treatment group | Control group | Diff | Sign.? | ||
Mean | St.dev. | Mean | St.dev. | |||
PEOU | 3.74 | 0.62 | 3.69 | 0.69 | 0.05 | No |
PU | 3.64 | 0.71 | 3.35 | 0.83 | 0.29 | No |
ITU | 3.14 | 0.72 | 2.85 | 0.93 | 0.29 | No |
For the task performance, we were investigating whether one group would find more requirements than the other group, and whether there would be any difference in the quality of the requirements. Three scores were given for the quality of each requirement, namely its relevance, clarity, and testability. Moreover, requirements were classified either as “mobility requirement”, “other quality requirement” (except mobility), and “other requirement” (except quality).
Table
Number of requirements found by the participants.
Variable | Treatment group | Control group | Diff | Effect size | Sign.? | ||
Mean | St.dev. | Mean | St.dev | ||||
NM? | 3.39 | 1.57 | 2.30 | 1.46 | 1.09 | 0.68 | |
NX? | 1.71 | 1.63 | 1.46 | 1.53 | 0.25 | 0.16 | No |
NO? | 1.46 | 1.79 | 1.77 | 1.77 | −0.31 | −0.18 | No |
NQ? | 5.11 | 2.47 | 3.77 | 2.23 | 1.34 | 0.55 | |
NA? | 6.57 | 2.44 | 5.54 | 2.63 | 1.03 | 0.40 | No |
As the data sets passed an Anderson-Darling test for normality, a simple
Table
Quality analysis of requirements found by the two group participants.
Variable | Treatment group | Control group | Diff | Effect size | Sign.? | ||
Mean | St.dev. | Mean | St.dev. | ||||
QM? | 1.98 | 0.73 | 1.97 | 0.74 | 0.01 | 0.01 | No |
RM? | 2.35 | 0.82 | 2.15 | 0.98 | 0.20 | 0.20 | No |
CM? | 1.83 | 0.99 | 1.98 | 0.96 | −0.15 | −0.16 | No |
TM? | 1.75 | 0.80 | 1.77 | 0.82 | −0.02 | −0.03 | No |
QX? | 1.08 | 0.75 | 1.19 | 0.72 | −0.11 | −0.15 | No |
RX? | 0.88 | 0.96 | 0.82 | 0.95 | 0.06 | 0.06 | No |
CX? | 1.29 | 1.03 | 1.34 | 0.94 | −0.05 | −0,05 | No |
TX? | 1.08 | 0.90 | 1.42 | 0.86 | −0.34 | −0.39 | No |
QO? | 1.17 | 0.77 | 1.43 | 0.79 | −0.26 | −0.32 | No |
RO? | 0.37 | 0.58 | 0.74 | 1.00 | −0.37 | −0.37 | No |
CO? | 1.51 | 1.29 | 1.80 | 1.05 | −0.29 | −0,28 | No |
TO? | 1.63 | 1.16 | 1.74 | 1.08 | −0.10 | −0.10 | No |
QA? | 1.56 | 0.85 | 1.59 | 0.82 | −0.03 | −0.04 | No |
RA? | 1.52 | 1.19 | .1.35 | 1.19 | 0.17 | 0.14 | No |
CA? | 1.62 | 1.09 | 1.76 | 1.01 | −0.14 | −0.14 | No |
TA? | 1.55 | 0.95 | 1.67 | 0.93 | −0.12 | −0.1 | No |
An Anderson-Darling test showed that the data for the variables in Table
Wohlin [
If we use
This paper has reported an experiment comparing some excerpts of the Volere taxonomy with an extension of the same excerpts, the extension especially addressing mobility requirements. The motivation is that such requirements are becoming increasingly important as more and more information systems become mobile and multichannel, the users therefore expecting to be able to perform their tasks in a variety of different locations, with various types of equipment. Taxonomies are believed to be helpful in guiding stakeholders about what requirements might have to be elicited and specified, and the question was therefore whether the extension with mobility requirement would guide the experiment subjects more or better than a taxonomy not including any such category of requirements.
The results showed that the treatment group getting the add-on part about mobility requirements found more such requirements than the control group. This is not surprising or particularly impressive—more guidance should result in better performance. Still, it was not obvious that they would find more requirements, as the extra material might also have been felt as inconsistent with the Volere material or in other ways confusing, causing the participants to lose focus and productivity instead of gaining. Even just the factor of having more tutorial material to read and look through could potentially have resulted in less time spent on effective writing, and thus fewer requirements. So, even if the result is not surprising or impressive, it is still encouraging. However, it must be emphasized that the experiment results by no means prove that “mobility requirement” really deserve to be a category in its own, guidance for writing.
More disappointing, of course, is the finding that the treatment group did not produce better quality requirements than the control group. Ideally, the guidance given in the added tutorial section about mobility should have helped them writing more precise and testable mobility requirements. But quality was not poorer for the control group either, so at least the result for quality has shown that the increased number of requirements found by the treatment group did not come at the cost of reduced quality and so was indeed a productivity increase.
An important question for further work would be to investigate in more detail why the taxonomy only gave an advantage in numbers, not quality. It could be that a quality increase would have been easier to achieve if there had also been tool support for the taxonomy. This could be investigated in a new experiment, but if so, both groups must of course have similar tool support available, only with the difference that the tool of the treatment group offers one extra category of requirements. For new experiments, it would also be interesting to have more complex cases and maybe have participants working in pairs, for example, one domain expert and one analyst, so that the experimental task becomes more similar to a realistic working situation.
Finally, it would also be important to perform larger case studies in mobile information systems projects, preferably in industry, to check whether advantages observed in a limited experimental setting also hold for real world usage. Such industrial evaluations would also give more insight on the workplace usefulness of applying taxonomies for eliciting requirements.