Generating Feature Models from Requirements: Structural vs ...

17 downloads 3381 Views 424KB Size Report
Sep 19, 2014 - automation in the adoption process is desirable, especially with respect to variability .... Marketing and QA tasks are more likely to be interested.
Generating Feature Models from Requirements: Structural vs. Functional Perspectives Nili Itzik

Iris Reinhartz-Berger

Department of Information Systems, University of Haifa, Israel

Department of Information Systems, University of Haifa, Israel

[email protected]

[email protected]

ABSTRACT Adoption of SPLE techniques is challenging and expensive. Hence, automation in the adoption process is desirable, especially with respect to variability management. Different methods have been suggested for (semi-)automatically generating feature models from requirements or textual descriptions of products. However, while there are different ways to represent the same SPL in feature models, addressing different stakeholders’ needs and preferences, existing methods usually follow fixed, predefined ways to generate feature models. As a result, the generated feature models may represent perspectives less relevant to the given tasks. In this paper we suggest an ontological approach that measures the semantic similarity, extracts variability, and automatically generates feature models that represent structural (objects-related) or functional (actions-related) perspectives. The stakeholders are able to control the perspective of the generated feature models, considering their needs and preferences for given tasks.

Categories and Subject Descriptors D.2.1 [Software Engineering]: Requirements/Specifications – languages; D.2.13 [Software Engineering]: Reusable Software – domain engineering

General Terms Algorithms, Management, Design

Keywords Feature Models, Reverse Engineering, Mining, Ontology, Semantic Similarity

1. INTRODUCTION

commonality and variability has valuable benefits when adopting SPLE practices. Different methods have been suggested for (semi-)automatically generating feature models (or variability models) from requirements or textual descriptions of products, e.g., [1], [7], [18], and [22]. Those methods commonly apply different semantic similarity measures and clustering techniques for extracting and identifying similar features and hierarchically structure them into models. Each method generates feature (or variability) models that represent a perspective inherited from the method characteristics and similarity measures. In particular, there are methods that represent the structural perspective of the SPL (namely, the features represent objects or components), the functional perspective of the SPL (namely, the features represent actions, functions, and services), or a mixed perspective of structure and functions. For example, consider an SPL of e-shops. The features can be primarily organized according to what they do (a functional perspective), e.g., search, order, and ship, or on what they are done (a structural perspective), e.g., catalog and shopping cart. As later elaborated, each perspective may be beneficial to different development and maintenance tasks. In [19], we suggested analyzing variability of SPLs using semantic and ontological considerations. This method, which is called SOVA – Semantic and Ontological Variability Analysis – and supported by a tool [12], extracts behaviors from textual descriptions of (functional) requirements. In particular, it automatically identifies the initial state of a system before the behavior occurs, the external events that trigger the behavior, and the final state of the system after the behavior occurs. Based on these extractions and semantic similarity measures, the method define behavioral similarity as the weighted averages of the initial state, external events, and final state similarities. Here, we use SOVA to generate different feature models taking into consideration possible structural and functional perspectives, namely, focusing on objects (or components) and actions (or functions), respectively. This introduced flexibility may support generating feature models that are geared to given tasks.

Although Software Product Line Engineering (SPLE) practices have been proven to be successful in reduction of development cost, timeto-market and improvement of product's quality [16], adoption of SPLE techniques is challenging, expensive, time consuming, and error-prone. It requires analyzing the commonality and variability of The rest of the paper is structured as follows. Section 2 reviews existing artifacts, potentially introducing changes to these artifacts and tracking those changes, utilizing advanced techniques, methods the literature for constructing variability models from textual and tools for SPLE. Therefore, automation of activities related to descriptions, while Section 3 motivates the need for developing SPLE adoption is desirable. In particular, support for extracting SPL feature models that follow different perspectives. Section 4 describes SOVA, and Section 5 presents preliminary evaluation results. Permission to make digital or hard copies of all or part of this work for Finally, Section 6 concludes and refers to future research. personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. SPLC '14, September 15 - 19 2014, Florence, Italy Copyright 2014 ACM 978-1-4503-2739-8/14/09…$15.00. http://dx.doi.org/10.1145/2647908.2655966

2. RELATED WORK Existing studies that use textual descriptions for constructing feature models (or variability models) generate outcomes that follow a predefined perspective, which can be pure (e.g., structural) or mixed (e.g., structural and functional). Acher et al. [2], for example, suggest constructing feature models from public product descriptions expressed in a tabular format. The resultant feature models represent the terminology in the domain and, hence, can be

considered as following a pure structural perspective (namely, the features are nouns that mainly represent components and classes). Ferrari et al. [9] suggest extracting common and variable features from public brochures using structural part-of-speech patterns which are based on nouns. They claim that the method can be extended to represent functional perspectives, but then new verb-based patterns need to be developed. The method does not support structuring the extracted features into feature models. Davril et al. [7] and Dumitru et al. [8] suggest utilizing publicly available repositories of product descriptions. The features are extracted from nouns, verbs and adjectives, yielding feature models that represent a mixed perspective of structure and functions. Niu and Easterbrook [18] introduce a semi-automatic method for constructing Orthogonal Variability Models (OVM). Using expert knowledge and linguistic clues, this method extracts functional requirements profiles, represented as pairs of verbs and direct objects. As a result, the generated models also follow a mixed perspective of structure and functions. The ArborCraft tool, presented in [22], creates feature models by grouping similar requirement sentences using Latent Semantic Analysis (LSA) method [14], which analyzes the statistical relationships among words in a large corpus of text. The resultant feature models represent a mixed perspective of structure and functions. ArborCraft allows controlling the resultant models by several parameters: number of sentences (in each requirement), (maximal) number of levels (in the resultant feature model), similarity threshold (for clustering similar requirements), and mandatory percentage (given multiple documents). However, ArborCraft does not enable controlling the perspective of the resulting feature models or creating feature models that represent only structural aspects.

a mixed perspective, e.g., documentation and requirements specification. In order to examine which perspectives are commonly followed in (manually-created) feature models, we examined S.P.L.O.T, an academic repository of feature models [20]. Out of the complete S.P.L.O.T repository (508 feature models), we considered all models satisfying the following two conditions (resulting in 189 models): (1) They included unique information, namely they were not submodels of other feature models in the repository; and (2) They had meaningful feature names in English. Most feature models (146 out of 189, about 77%) followed to some extent a structural perspective (classes, components, and attributes), while about half (92 feature models, about 49%) referred to functionality through actions, functions, and services. We further noticed that actions are usually accompanied with the objects on which they are performed (e.g., search items and order products). Other perspectives, such as agents (stakeholders and users), were also observed but very rarely. We further noticed that the domain-of-discourse cannot necessarily predict the selected perspective. In the Smart Home domain, for example, we found a feature model representing a functional perspective, e.g., Fire Control, Light Management, and Flood Detection, and a feature model representing a structural perspective, e.g., Room, Floor, Door, Sensors, Facility, and Alarm. Based on the above observations, we claim that variability representation mainly follow structural or functional perspectives and needs to fit stakeholders’ preferences and needs for the given tasks. As an example, consider once again the domain of e-shops. A sales person who aims to recommend on the adoption of an e-shop application to customers may be interested in the actions and services that such applications can supply. This point of view is also of interest to testing managers, who plan testing executions, and to project managers, who can use this information to identify appropriate development or support teams. Thus, the feature model in Figure 1, which represents the variable actions in e-shops, may be suitable for those tasks. Among the features in this feature model we can find the following functions: product return, order management, shipment, registration, payment management, and search.

There are also studies that generate features models from feature configurations using refactoring techniques, e.g.,[1], [3], [21]. In these cases the inputs are presented in certain ways (e.g., propositional formulas or feature lists and dependencies) and the feature models are structured applying either user specified knowledge or inferred knowledge. Bécan et al. [3], for example, use ontological knowledge in order to infer the most suitable parent or Developers, however, and especially software designers and siblings for a given feature. Note that in these studies the features are architects, may be more interested in the variability of objects and explicitly given in the methods’ inputs, so no extraction, potentially components. In this case, a feature model like that in Figure 2 may following different perspectives, is supported. be more useful. This feature model refers to the following objects: catalog, wish list, shopping cart, payment type, customer profile, To summarize, existing methods generate models representing inventory, etc. As can be seen, the feature model in Figure 2 does fixed perspectives derived from the methods’ algorithms. We claim not refer to the functions of e-shops and their variability. that following such a rigid approach may generate variability models that represent perspectives less relevant to the given tasks. Thus, we To support appropriate perspectives of variability, we next suggest a method that takes into consideration the stakeholders’ suggest introducing flexibility to the feature models generation needs and preferences when generating feature models. process, enabling the stakeholders control the perspective appropriated to their needs and preferences. The method is named 3. MOTIVATION SOVA – Semantic and Ontological Variability Analysis. A recent survey on variability modeling in industry [4] reports on the typical use cases of variability models. Besides the main (and 4. THE SOVA METHOD obvious) uses for variability management and product configuration, SOVA introduces ontological considerations to the semantic ones, variability models are used for requirements specification, design which are already explored by other methods. Figure 3 depicts the and architecture, software deployment, documentation, QA/Testing, three main stages in SOVA: text parsing, behavioral similarity and marketing purposes. These tasks are quite different and require calculation, and feature model construction. The inputs are textual presenting different perspectives of variability. Design and descriptions representing behaviors, e.g., functional requirements of architecture, for example, may concentrate on a structural the products, and a perspective profile representing stakeholders’ perspective of variability, in which the variability of components is needs and preferences. Each input document (or description) highlighted. Marketing and QA tasks are more likely to be interested represents a different product in the SPL and consists of statements in the functional perspective of variability, namely, the variability of (i.e., paragraphs). Each statement, which may be composed of one or functions, actions, and services. There may be also tasks that require more sentences, describes a different behavior of the product. The

perspective profile, which is further explained in Section 4.3, analyzed and their constituents are labeled with semantic roles. includes parameters and weights that reflect the desired perspective Following the semantic role labeling (SRL) approach presented in of the output feature models. Next, we elaborate on each process. [10], we use six semantic roles which have special importance to functionality: (1) Agent – Who performs? (2) Action (the sentence’s 4.1 Text Parsing predicate or verb) – What is performed? (3) Object – On what objet The first stage is responsible for extracting the behaviors from the is it performed? (4) Instrument – How is it performed? (5) Temporal textual descriptions. This is done using natural language processing modifier (AM-TMP) – When is it performed? And (6) Adverbial (NLP) techniques and an ontological model. First the statements are modifier (AM-ADV) – In what conditions is it performed?

Figure 1. An e-shop feature model – a functional perspective

Figure 2. An e-shop feature model – a structural perspective

Figure 3. An overview of the process in SOVA

Each phrase of a sentence in the input textual description is considered a behavioral vector. In particular, we identify two types of behavioral vectors: action vectors, which are based on verbs and predicates (actions) and potentially combine agents, objects, and instruments, and non-action vectors, which are based on atomic (temporal or adverbial) modifiers. As an example consider the following requirement from the e-shop domain. When the products list is displayed, a registered customer can order an item from the list. She enters the shipping preferences and provides payment details. If the payment details are valid, the system generates an order confirmation note. Table 1 presents the six behavioral vectors derived for this requirement. Vectors 1-4 and 6 are action vectors (i.e., represent actions), while vector 5 represents a non-action vector, namely, an adverbial pre-condition to the last action vector (6). Note that the first vector is an action vector whose source is a temporal modifier, and thus it also represents a (temporal) pre-condition. Both preconditions in this example (vectors 1 and 5) has unknown agents as they are phrased in passive. We classify the different behavioral vectors employing concepts from Bunge's ontological model [5], [6]. According to this model, the world is made of things that possess properties. Properties are known via attributes, which are characteristics assigned to things by humans. State variables are functions that assign to attributes of things values (out of specific ranges) at given points of time. A thing may be at different states, but at a specific point in time a thing will be in exactly one state, defined as the vector of values of state variables of the thing at that point in time. Finally, an external event is a change in the state of a thing as a result of an action of another thing. Accordingly, behavioral vectors are classified into those that represent initial states (pre-conditions or inputs of the behavior), those that represent external events (triggers of the behavior or interactions with the environment), and those that represent final states (post-conditions or outputs of the behavior). The classification of the behavioral vectors to these groups is mainly done by analyzing the agent and the action parts of the vectors: behavioral vectors whose agents are external and their actions have active meanings are classified as external events. Behavioral vectors whose agents are internal are classified to one of the state elements based on their temporal appearance in relation to the external events: behavioral vectors whose agents are internal and appear after the last external event in the requirement will be classified as representing the final state, while behavioral vectors whose agents are internal and appear before the first external event in the requirement will be classified as representing the initial state. Vectors whose agents are missing are classified to multiple behavioral elements (e.g., initial state and even, or event and final state). Vectors which cannot be classified following the

aforementioned rules are ignored at this stage, as they represent intermediate outcomes. More about this classification and Bunge’s ontological model can be found at [19]. Table 2 demonstrates the classification performed on the e-shop requirement example above. Vector 1 appears twice as its agent is unknown and therefore it can be considered either an external event (namely, the products list is displayed by an external agent) or an initial state (namely, it is displayed by the system as a precondition of the behavior). Vectors 2-4 are classified as external events since their agent is external (a registered customer). Vector 5 (“the payment details are valid”) does not appear at all since it represents an intermediate pre-condition of Vector 6. This last vector is classified as representing the final state, due to the lack of external agent participation.

4.2 Similarity Calculation In the second stage, the similarity of each pair of components (agent, action, object, and instrument) and modifiers (temporal and adverbial modifiers) is calculated. We use semantic measures for this purpose, which may be knowledge-based or corpus-based [11], [17]. Corpus-based measures identify the degree of similarity based on information derived from large corpora. The Latent Semantic Analysis (LSA) [14] is an example of a well-known corpus-based method which computes sentence similarity as the cosine of the angle between the vectors representing the sentences’ words. Knowledge-based measures use information drawn from semantic networks. Many of these methods use WordNet [23] for measuring word (or concept) similarity. The measure can be calculated in different ways, e.g., by using the path length between terms on the semantic net or by using information content, namely, the probability to find the concept in a given net. Some of the methods for calculating word similarity are extended to calculate sentence similarities (e.g., [15] and [17]). While not limiting our approach to a specific similarity measure, we empirically found Mihalcea, Corley, and Strapparava’s (MCS) measure [17] suitable for calculating similarity between phrases and Wu and Palmer measure [24] for calculating word similarity. The MCS measure calculates sentence similarity by finding the maximal similarity score of word similarities, taking into consideration the parts of speech to which the words belong. The derived word similarity scores are weighted with the inverse document frequency scores of the corresponding words. Wu and Palmer measure [24] considers paths and distances between words in WordNet. As an example consider the following pair of requirements from the e-shop domain. (1) A customer can order items from the catalog. (2) The supplier can remove items from the catalog.

Table 1. Examples of behavioral vectors # 1 2 3 4 5 6 1

A registered customer 1 She [a registered customer] 1 She [a registered customer]

Action is displayed Order Enters Provides

Object the products list an item the shipping preferences payment details

The system

Generates

an order confirmation note

Agent

Instrument

Modifier

from the list

the payment details are valid

Replacement of a pronoun by the relevant noun is indicated with pronoun [noun].

Source AM-TMP None None None AM-ADV None

Table 2. The behavioral vectors classification for the E-shop example S1 (initial state)

E (external event to which the system responds)

S* (final state the system is expected to have)

bv1=( , is displayed, the products list, ,AM-TMP)

bv1=( , is displayed, the products list, , AM-TMP) bv2=(a registered customer, order, an item, from the list , None) bv3=( a registered customer, enters, the shipping preferences, , None) bv4=( a registered customer, provides, payment details, , None)

bv6=(the system, generates, an order confirmation note, , None)

The text parsing stage results in a single behavioral vector (representing an external event) for each requirement. The objects and the instruments of the two behavioral vectors are identical (“items” and “from the catalog”, respectively). The agents are similar to some extent (“customer” and “supplier”, resulting in a similarity value of 0.73), while the actions are relatively different (“order” and “remove”, resulting in a similarity value of 0.44).

4.3 Feature Model Construction In the third stage, a feature model that represents the variability found in the input textual descriptions is automatically constructed. To this end, the stakeholders’ preferences and needs are taken into consideration, using perspective profiles. In these profiles, the similarities of the initial states, external events, and final states are weighted, as well as the similarities of the agents, actions, objects, and instruments in each such behavioral component. Using the perspective profile and the semantic similarity values calculated in the previous stage, the similarity of the behavioral vectors is formally expressed as a weighted average of the corresponding vector components. Formally expressed: ∑



,



where:

-

-

, ∈

, ,





! ,

,

!

∗ "δ

. ∗

,

.

highest. The output of this algorithm is a binary tree of clusters, which is then flattened by joining grandchild and child nodes whose similarities are alike (i.e., the differences are less than a predefined threshold) under the same parent node. Optionality as well as OR and XOR relations are deduced examining the appearance of the different requirements in the input requirements documents: if at least two requirements that are grouped under the same parent node appear in the same input document, then the corresponding requirements are OR-grouped. If requirements originating from different input files are grouped under the same parent node, we consider the corresponding requirements as XOR-grouped. The parent node can be mandatory, if it includes requirements from all input files or optional otherwise. The final output is presented in featureIDE format [13]. After the creation of the feature model structure, the features are named, following the perspective profile information. To compose the nodes names, the method uses up to two behavioral components whose weights are the greatest (e.g., action + object). The lowest common subsumer of similar behavioral components are found in WordNet [23] and used for composing the names of inner nodes in the feature models hierarchy.

Section 5 demonstrates two feature models generated by SOVA for the same set of requirements, following different perspective wcomp is the weight given to a specific vector component profiles for generating feature models from textual descriptions or (agent, action, object, or instrument), actually requirements. ∑ =1. ∈ $%&'(,$ ( ',)*( &'(,+' (, &'(" # δcomp is 1 if the component comp exists and 0 otherwise. 5. PRELIMINARY EVALUATION sim(v1.comp, v2.comp) is the semantic similarity of the To preliminarily evaluate SOVA outcomes, we considered two two vectors’ components. partial requirements files of e-shop products. These requirements are

Given two requirements, R1 and R2, the overall similarity is listed in Figure 4. As can be seen, the requirements do not follow a calculated as the weighted average of their initial state, external predefined format or pattern. events, and final state similarities. Those similarities are calculated We constructed two feature models to the given set of as the average of the maximal pair-wise similarities of the requirements following two perspectives: a structural perspective behavioral vectors classified as the same behavioral element. which concentrates on the outcomes, as represented by the objects of As an example, consider an external functional perspective. Such the final states, and a functional perspective, which focuses on the a perspective is mainly interested in the external events and, in actions (70%) and objects (30%) of the external events. Figure 5 and particular, in their actions. However, as mentioned before, actions Figure 6 depict the generated models, respectively. are commonly associated in feature models with the objects on As can be seen the two generated feature models are quite which they are performed. Thus, the profile needs to assign positive different, although they include the same requirements in the leaves. weights to actions and objects of external events, where actions should be much more dominant than objects in the overall similarity The structural perspective profile groups the requirements based on calculation. To construct a structural feature model, on the other the involved objects, e.g., product review, shopping cart, shipment hand, the initial and final states are likely to be relevant, as they details, and catalog. Consider as an example, the variability of potentially describe inputs and outputs. In these behavioral 'shipping': shipment can be done via land (A7) or via air (A5). The components, the similarity of objects matters. Thus, the profile functional perspective, on the other hand, arranges the requirements needs to assign positive weights to objects of initial and final states. according to the actions that can be performed by an external agent including the objects on which they are performed: confirm The hierarchy of the resultant feature models is composed purchase, review product, and so on. From this perspective, applying the hierarchical agglomerative clustering algorithm. This requirement A5 is more similar to requirements A6 and B3, which algorithm puts each requirement in a separate cluster and iteratively deal with buying products, while requirement A7 is more similar to merges the clusters whose average requirements’ similarities are the requirements A2 and B6, which concern paying.

As can be observed, the same pairs of requirements can be grouped similarly in both perspectives (e.g., requirements A10 and B1). This situation implies that the requirements are similar in both their external events and final states. Both A10 and B1 refer to entering a new product from the functional perspective, and affect the catalog from the structural perspective.

(few) existing methods, accompanied with tools, we found ArborCraft [22] the most relevant for our purposes. First, this tool uses similar inputs (textual requirements documents) and produces similar outputs (feature models). Second, the tool is available for use. Since the tool refers to a fixed number of sentences in each requirement (given as a parameter), we used requirements that are composed of exactly two sentences. Note that SOVA does not Next we compare our resultant feature models with a feature model restrict the number of sentences in each requirement. created using a different tool for the same set of inputs. Among the File 1: A1 After the system verifies the purchase's payments details, the supplier confirms the purchase. The system asks for shipment details. A2 The system supports paying with credit cards. If a customer pays with a credit card, the system approves first the payment by contacting the credit card company. A3 When the system completes recording an order, the supplier can ship the ordered products to the customer. The system sends the shipping documents via email. A4 When the system finalizes a software order details, the supplier ships the ordered product via email. The system sends the shipping documents. A5 The system supports different shipping options. However, if a customer buys a very small product, the system supports only air mail shipping. A6 The system displays the available products. When a registered customer buys a product, the system updates the inventory. A7 The system supports paying with gift cards. If a customer pays with a gift card, the system supports only land shipping. A8 A customer can track the purchase status. The system provides details on the product delivery status. A9 The system presents the product return page. If the customer returns a product, the system updates the inventory. A10 The system presents the product page. When a supplier enters new products, the system updates the catalog. A11 The system enables customers writing reviews on products. When a customer reviews a product, the system sends the product review to the relevant supplier.

File 2: B1 When a supplier receives new products, he enters the new products to the system. The system updates the catalog. B2 When the system presents the available products list, a customer can purchase a product. The system updates the shopping cart. B3 The system presents the ordering page. If a customer purchases a product, the system updates the inventory. B4 When the system finalizes a software order details, the supplier ships the ordered product via email. The system updates the product delivery status. B5 When the system validates purchase details, the supplier confirms the purchase. The system clears the shopping cart. B6 The system supports paying with PayPal. If a customer pays with PayPal, the System verifies the PayPal payment information. B7 The system provide tracking options. When a customer tracks the product's order status, the system presents the shipment details. B8 The system can handle product return. If a customer returns a damaged product, the system sends a negative review on the product to the supplier. B9 The system allows canceling an order. If a customer cancels an order that has not been shipped, the system refunds the payment. B10 The system allows canceling an order. When a customer cancels an order that has been shipped, the system updates the payment by including the goods return impact. B11 The system can maintain product reviews. If a registered customer reviews a product, the system stores the product review, by product category.

Figure 4. (Partial) requirements files of e-shop products

Figure 5. A feature model automatically generated for a structural perspective profile

Figure 6. A feature model automatically generated for a functional perspective profile

Figure 7. The feature model created using ArborCraft Tool [22] As noted in the related work section, ArborCraft follows a mixed functional and structural perspective. ArborCraft further uses the entire requirement text to calculate similarity, while SOVA refers to the extracted behavioral vectors and the weights set in the perspective profile. Another difference is that ArborCraft calculates similarity following a pure semantic approach and using LSA, while SOVA computes similarity taking into account ontological considerations besides the semantic ones. Finally, in order to avoid comparison bias due to visualization differences, we manually created a version of the generated ArborCraft feature model in featureIDE2. The feature model created by ArborCraft for the given set of requirements is presented in Figure 7. As can be seen the resulted feature model is different from the models created by SOVA. To analyze the differences between the models, we show in Table 3 for each requirement its similar requirements according to the three generated models. For 41% of the requirements, Arborcraft took neither structural nor functional perspective; for 32% it took a structural perspective; and for 27 % it took a functional perspective. To better understand the meaning behind the numbers, consider requirement B9. This requirement is grouped with requirements B6, B10, and A2 in the structural feature model, since the three requirements represent payment. In the functional feature model, on the other hand, requirement B9 is considered similar with requirement B10, as they both refer to order cancelation. From a functional point of view, requirement A2 and B6 are not considered similar to the later two requirements as they refer to paying. ArborCraft grouped requirements B9 and B10 together, as the overall semantic similarity measure of the whole texts in those requirements is 0.89 (according to LSA). As another example, consider requirement A3. It is considered similar to requirement A4 following a structural point of view, as they both refer to shipping documents. In the functional perspective, however, requirement A3 is grouped not only with requirement A4 but also with requirement B4, as they all refer to product shipping. Note that requirement B4 refers to product delivery status, which is not a shipping document, and, hence, it was not clustered with requirements A3 and A4 in the structural view. ArborCraft groups requirements A3, A4, and B4, but in two steps. First, requirements A4 and B4, whose LSA similarity is higher, are grouped and then requirement A3 is grouped with the parent of requirements A4 and B4 (the latter grouping in higher levels of the feature model is indicated in brackets in Table 3).

shopping carts). However, from a functional perspective, these two requirements are different: requirement B2 refers to order purchasing, while requirement B5 refers to purchase confirmation. Functionally, requirement B2 is similar to requirements B3 and A6 which also refer to product buying. ArborCraft grouped requirement B2 completely differently: it is grouped with requirement B6 (under feature30, as their similarity value is 0.92). These two requirements are also clustered with A9, A10, B1, and B3 under feature30 (although not in the same hierarchy level). Table 3. Generated feature models comparison Similar to requirements…

Req # A1 A2 A3 A4 A5 A6

SOVA (structural perspective (Figure 5) B7 B6,B9,B10 A4 A3 A7 A9,B3 (A10,B1)

SOVA (functional perspective, Figure 6) B5 A7,B6 A4,B4 A3,B4 (A6,B2,B3) B2, B3,(A5)

A7 A8 A9

A5 B4 A6,B3 (A10, B1)

A2,B6 B7 B8

A10

B1 (A6,A9,B3)

B1

A11 B1

B8, B11 A10 (A6,A9,B3)

B11 A10

B2

B5

B3,A6,(A5)

B3

A6,A9 (A10,B1)

B2,A6,(A5)

B4 A8 A3,A4 B5 B2 A1 B6 A2,B9,B10 A2,A7 B7 A1 A8 B8 A11, B11 A9 B9 A2,B6,B10 B10 B10 A2,B6,B10 B9 B11 A11,B8 A11 str – Arborcraft FM follows a structural perspective fnc –Arborcraft FM follows a functional perspective none – different grouping policy

ArborCraft (Figure 7)

Arborcraft is similar to FM …

(A4, B4) B4, (A3)

fnc fnc

B2, (A9, A10, B1, B3)

str

B3, (A6, A10, B1, B2) B1, (A6, A9, B2, B3) B8 A10, (A6, A9, B2, B3) A6, (A9, A10, B1, B3) A9, (A6, A10, B1, B2) A4, (A3)

str

fnc

A11 B10 B9

str fnc fnc

str

str str

fnc

str

32% 27% 41%

As a last example, consider requirement B2, which is similar Note that besides the differences in the similarity calculations from a structural point of view to requirement B5 (they both refer to that affected the clustering outcomes, there are variability-related differences between SOVA and Arborcraft. ArborCraft tends to use mandatory and optional relations (in the low levels of the feature 2 Note that in the available version of the ArborCraft tool, the models), while we constrain variability by XOR and OR relations. automatic naming option is not supported. Instead, the nodes in Therefore, our solution can be considered stricter, but it allows less the output get dummy names of feature1, feature2, and so on. valid configurations, deduced from the input files.

6. SUMMARY AND FUTURE WORK We suggested SOVA – a Semantic and Ontological Variability Analysis method – for constructing feature models following different perspectives. The method uses NLP techniques and an ontological model to parse the requirements and extract their behavioral components. It then calculates the semantic similarity of the different behavioral components. Finally, it presents the variability analysis results in feature models using different perspective profiles, to reflect different stakeholders’ needs and preferences. We focused on two perspectives in this paper, structural and functional, which were largely observed in existing feature models. However, any other (pure or mixed) perspective can be utilized, generating explainable feature models that can be utilized for different tasks.

documents. In Proceedings of the 17th International Software Product Line Conference (pp. 116-120). ACM [10] Gildea, D. and Jurafsky, D. (2002). Automatic Labeling of Semantic Roles. Computational Linguistics 28 (3), pp. 245288. [11] Gomaa, W. H. and Fahmy, A. A. (2013). A Survey of Text Similarity Approaches. International Journal of Computer Applications 68 (13), pp. 13-18. [12] Itzik, N., & Reinhartz-Berger, I., (2014) SOVA – A Tool for Semantic and Ontological Variability Analysis. In: Proceedings of CAiSE 2014 Forum pp.177-184.

[13] Kastner, C., Thum, T., Saake, G., Feigenspan, J., Leich, T., Wielgorz, F., and Apel, S. (2009). FeatureIDE: A tool framework for feature-oriented software development. 31st In the future we would like to extend the method to support IEEE International Conference on Software Engineering more sophisticated mixed perspectives that structure the features (ICSE’09), pp. 611-614. following predefined patterns, such as first cluster according to the objects and then cluster according to the actions, or first cluster [14] Landauer, T. K., Foltz, P. W., and Laham, D. (1998). according to the stakeholders (agents) and then cluster according to Introduction to Latent Semantic Analysis. Discourse Processes, the actions. It would be also interesting to explore merging several 25, pp. 259-284. perspectives into a single variability model. Additionally, we intend [15] Malik, R., Subramaniam, V., Kaushik, S. (2007). Automatically to empirically examine the comprehensibility of the generated Selecting Answer Templates to Respond to Customer Emails. models to different stakeholders for performing various tasks. The International Joint Conference on Artificial Intelligence (IJCAI’2007), pp. 1659-1664. 7. REFERENCES [1] Acher, M., Baudry, B., Heymans, P., Cleve, A., & Hainaut, J. [16] McGregor, J.D., Muthig, D., Yoshimura, K., Jensen, P. (2010) Guest Editors' Introduction: Successful Software Product Line L. (2013). Support for reverse engineering and maintaining Practices. Software, IEEE, 27(3), 16-21. feature models. Proceedings of the 7th VaMoS Workshop (p. 20). ACM. [17] Mihalcea, R., Corley, C., and Strapparava, C. (2006). Corpusbased and knowledge-based measures of text semantic [2] Acher, M., Cleve, A., Perrouin, G., Heymans, P., Vanbeneden, similarity. The 21st national conference on Artificial C., Collet, P., Lahire, P. (2012) On extracting feature models intelligence (AAAI’2006), Vol. 1, pp. 775-780. from product descriptions. Proceedings of the 6th VaMoS Workshop, ACM press, pp. 45-54. [18] Niu, N. and Easterbrook, S. (2008). Extracting and modeling [3] Bécan G., Acher M., Baudry B., Ben Nasr S., (2013) Breathing Ontological Knowledge into Feature Model Management, Technical report, Inria.

product line functional requirements. In the 16th IEEE International Requirements Engineering conference (RE’08), pp. 155-164.

[4] Berger, T., Rublack, R., Nair, D., Atlee, J. M., Becker, M., [19] Reinhartz-Berger, I., Itzik, N., and Wand, Y. (2014). Analyzing Czarnecki, K., & Wąsowski, A. (2013). A survey of variability Variability of Software Product Lines Using Semantic and modeling in industrial practice. In Proceedings of the Seventh Ontological Considerations Proceedings of the 26th International Workshop on Variability Modeling of Softwareinternational conference on Advanced Information Systems intensive Systems. pp. 7:1-7:8 . ACM. Engineering (CAiSE’14), LNCS 8484, pp. 150-164. [5] Bunge, M. (1977). Treatise on Basic Philosophy, vol. 3, [20] S.P.L.O.T Software Product Lines Online Tools, http://www.splot-research.org/. Ontology I: The Furniture of the World. Reidel, Boston, Massachusetts. [21] She, S., Lotufo, R., Berger, T., Wasowski, A., & Czarnecki, K. (2011). Reverse engineering feature models. 33rd IEEE [6] Bunge, M. (1979). Treatise on Basic Philosophy, vol. 4, International Conference on Software Engineering (ICSE'11), Ontology II: A World of Systems. Reidel, Boston, pp. 461-470. Massachusetts. [7] Davril, J. M., Delfosse, E., Hariri, N., Acher, M., Cleland- [22] Weston, N., Chitchyan, R., and Rashid, A. (2009). A framework for constructing semantically composable feature Huang, J., and Heymans, P. (2013). Feature model extraction models from natural language requirements. In Proceedings of from large collections of informal product descriptions. The 9th the 13th International Software Product Line Conference, pp. Joint Meeting on Foundations of Software Engineering, pp. 211-220. 290-300. [8] Dumitru, H., Gibiec, M., Hariri, N., Cleland-Huang, J., [23] WordNet. http://wordnet.princeton.edu/ Mobasher, B., Castro-Herrera, C., and Mirakhorli, M. (2011). [24] Wu, Z. and Palmer, M. (1994). Verbs semantics and lexical On-demand feature recommendations derived from mining selection. The 32nd annual meeting on Association for public product descriptions. 33rd IEEE International Computational Linguistics, pp. 133-138. Conference on Software Engineering (ICSE’11), pp. 181-190. [9] Ferrari, A., Spagnolo, G. O., & Dell'Orletta, F. (2013). Mining commonalities and variabilities from natural language