Building Software Reuse Library with Efficient Keyword based Search ...

2 downloads 30313 Views 148KB Size Report
This paper presents a new approach for building software reuse library based on keyword searching for storage and fast retrieval of software components.
TECHNIA – International Journal of Computing Science and Communication Technologies, VOL. 2, NO. 1, July 2009. (ISSN 0974-3375)

Building Software Reuse Library with Efficient Keyword based Search Mechanism Rajender Nath and Harish Kumar

Department of Computer Science and Applications, Kurukshetra University, Kurukshetra, India. [[email protected], [email protected]] [4] classified these methods into three types: 1) free-text keywords, 2) faceted index, and 3) semantic net based. In free-text keyword based approaches, information retrieval and indexing technology are used to automatically extract keywords from software documentation and index items. The free-text keyword approach is simple, and it is an automatic process. In faceted index approaches, keywords from program descriptions and documentation are extracted by experts, and keywords are arranged by facets into a classification scheme, which is used as a standard descriptor for software components. To remove ambiguities, a vocabulary is derived for each facet to make sure that the keyword matched can only be within the facet context. Faceted classification and retrieval has been proved to be very effective in retrieving reuse component from repositories, but this approach is labor intensive. In semantic-net based approaches, a large knowledge base, a natural language processor, and a semantic retrieval algorithm are required to semantically classify and retrieve software reuse components. Prieto-Diaz and Freeman [2] proposed the facets which are extracted by experts to describe features about components in their faceted classification scheme for software reuse. Features such as: the component’s functionality, how to run the component, and implementation details are used as component descriptors. A weighted conceptual graph to measure closeness by the conceptual distance among terms in a facet is used to determine the similarity between query and software components. Search and retrieval approaches, by Mili et al. [5], have been classified into four different types: 1) simple keyword and string match; 2) faceted classification and retrieval; 3) signature matching; and 4) behavior matching. A software library was designed by Mili et al. [6] in which software components are described in a formal specification: a specification is represented by a pair (S, R), where S is a set of specification, and R is a relation on S. This approach is classified as a keywordbased retrieval system. Matching recall is enhanced with sufficient precision: a match is considered as long as a specification key can refine a search argument. In addition to this, there are two retrieval operations: a) exact retrieval and b) approximate retrieval. If exact retrieval is not there, approximate retrieval is performed. Such

Abstract-Software reuse is the use of existing software components to build a software system. Effective storage and retrieval of software components is much essential in reuse process. The researchers have developed a number of software reuse techniques for storage and retrieval of components. No one technique is complete in its own; every technique has its own merits and demerits. This paper presents a new approach for building software reuse library based on keyword searching for storage and fast retrieval of software components. Keywords are matched with given query, character by character so that a fraction of match can also be given due weightage. To make this paper beneficial to new researchers, algorithms are explained in detail.

I. INTRODUCTION The concept of software reuse came into existence about four decades ago when McIlroy gave an idea of applying reuse in software in a NATO software engineering conference in 1968. No significant work was carried out for about one decade. About thirty years before, some work on reuse-based software technology started [1]. Software reuse is the use of already developed software components. A software component may be a specification document, a design component, a code component or a test case. A good reuse process requires effective classification, storage and fast retrieval of components. This paper presents a simple approach based on keywords to effectively store and retrieve software components. Even if the given specification/ query word does not fully match with the keywords of stored components of library, the number of characters of query word matched with keywords of components is calculated and taken into consideration for finding the extent of match. II. SOFTWARE REUSE CLASSIFICATION AND RETRIEVAL TECHNIQUES “A classified collection is not useful if it does not provide the search-and-retrieval mechanism to use it” [2]. At different times, different software reuse classification and retrieval techniques have been proposed and implemented. Based on researchers’ criteria and available software reuse systems, the methods of software reuse classification and retrieval have been classified differently, but with minor differences [3]. Ostertag et al.

398

Nath and Kumar: Building Software Reuse Library with Efficient Keyword based Search Mechanism

retrieved components require some modification to satisfy the specification. Vitharana et al. [1, 7] proposed a scheme to classify and describe business components within a knowledgebased repository for storage and retrieval. This scheme was divided into two parts: 1) classification and coding for business components and 2) knowledge-based repository for storage and retrieval. The similar components are classified into a single group, and symbols are coded. Thus all components are classified into different groups with components having similar functionality. They borrowed the idea [1] from the facetbased scheme to describe features of reusable software artifacts. Their classification and coding scheme considers higher level business oriented features. In their proposed classification and coding scheme, a business component is described by identifiers (structured information such as name, industry type), followed by descriptor facets (unstructured information, such as rules, functionality). In their knowledge-based repository design, a database is used as the repository because it is efficient and effective for storage, search, and retrieval; eXtensible Markup Language (XML) is used to code the knowledge base because XML is suitable to the extensible numbers of descriptor facets. Girardi and Ibrahim’s [8] proposed an approach for retrieving software artifacts which is based on natural language processing. In this approach, they expressed user queries and software component descriptions in natural language. Sugumaran and Storey [1, 9] presented an approach for component retrieval which was based on semantics. The approach employs domain ontology to provide semantics in refining user queries expressed in natural language and in matching between a user query and components in a reusable repository. Three major steps in this approach are: 1) initial query generation, 2) query refinement, and 3) component retrieval and feedback. This approach includes a natural language interface, a domain model, and a reusable repository.

function is a specification (multi word query) given by the users. Search function returns the component file name from the table. A link is established from returned file name to component file in the library. A user can decide by checking the name of the component file and select the component file by clicking on the link. The component file can be opened, checked for suitability, modified according to needs and can be saved by users at desired location. b) Storage of Component Files All the components of the library are stored in the memory of the computer in some folder. Name of component files and keywords are stored in a two dimensional matrix known as index table. Keywords of a component file are stored as a single multi word string. A delimiter can be used between two keywords. In the below table, ‘*’ is used as a delimiter. Corresponding to each component file, a counter (float variable) is also used to keep account of keywords matched. Initially value of each counter will be set zero. Sr#

a) Building Component Library Software Components are stored in the form of component files. Associated to each component file, an index table is maintained. Some keywords related to each component are also stored in the table [10]. Component file

Keywords showing functionality

1. 2. 3. 4.

File1 File2 File3 File4

KW11, KW12, KW13, KW14 KW21, KW22 KW31, KW32, KW33 KW41, KW42, KW43, KW44, KW45

2. . . .

File2 . . .

Keywords

Counter

KW11* KW12*KW13*KW14 KW21*KW22*KW23 . . .

0 0 . . .

c) Search Mechanism Multi word query (specification) entered by users is stored in string type array elements QUERY[1], QUERY[2] and so on. A list of common words like ‘in”, “on”, “the”, “of” etc. is stored at the time of library construction and these common words can not become part of query. Eg.“Reuse of Software” will be stored as QUERY[1] =” Reuse” and QUERY[2] =” Software”. These string type array elements are compared with keywords of component files one by one. When QUERY[i] matches with any of the keywords of a component file, value of its corresponding counter is incremented by 1. Fraction of match is also taken into consideration. It is possible by comparing QUERY[i] with keyword character by character. Let the number of character in QUERY [i] = y and number of characters matched with keyword of particular component = x (x is always less than or equal to y). Fraction of match (z) can be calculated as z = x / y. now the value of corresponding counter is incremented by z. First QUERY[1] is searched in the first row of keywords. Then QUERY[2] is searched in this row of keywords linearly and so on. After updating the value of first counter, the same procedure is applied on the second

III. PROPOSED APPROACH

Sr#

1.

Component file File1

For searching, a search function based on keyword is used to retrieve the required component. Input to this

399

TECHNIA – International Journal of Computing Science and Communication Technologies, VOL. 2, NO. 1, July 2009. (ISSN 0974-3375)

row and so on. Now the entire index table is sorted on counter column in descending order. This places the most relevant component file at first position with highest value of its counter, lesser relevant component at second place and so on. All the components with positive value of their counters are accessed and the components with zero value of their counters are discarded.

Algorithm for Searching

d) Updating Component Library A new component can be added to the library by storing the component in the library, making its entry in the index table and establishing a link from index table to memory location in the library where it is actually stored. Similarly when a component is to be deleted from the library, it is removed from the physical memory along with its entry in index table and link from index table to memory location. Component can be stored anywhere in the library where free space is there. To make the insertion easy, entry of new component in index table is made at last position. This will not disturb the rest of entries in the index table and also not affect the efficiency as index table is searched linearly. But when an entry of a component is deleted from the index table, rest of the entries will have to be shifted one position above to avoid null row in the table.

2.

3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

Declare one dimensional array QUERY [ ] of suitable length to store the words of given query

2.

Repeat for i=1 to n ; n is the total number of components in the library Set COUNTER[i] = 0 [End of loop] Set i=1 Print ‘Enter your specification/ query’ Repeat step 6 and 7 while ( Entered key =/= Return key) Read QUERY[i] i=i+1 [End of step 5 loop] Set m=i-1 ; m is the number of keywords entered by user Repeat steps 10 for i=1 to n Repeat step 11 to 14 for j=1 to m Calculate the no. of characters in QUERY[j]; Let it be y Compare QUERY[j] with STORED_KEYWORD[i] character by character; Let no. of matched characters=x Calculate float value z = x / y COUNTER[i] = COUNTER[i] + z [End of step 10 loop] [End of step 9 loop] Sort the table on COUNTER column in descending order Set i=1 Repeat step 18 and 19 while (COUNTER [i]=/= 0) Print STORED_COMPONENT[i] Calculate i = i + 1 [End of step 17 loop] Exit.

3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

Algorithm for Library Construction 1.

1.

13. 14.

Set SIZE=100 and SR=0 ;SIZE is size of library, any integer value can be taken in spite of 100 Declare parallel arrays a. S_NO[SIZE] ; int array to store Sr#

15. 16. 17.

b.

STORED_COMPONENT[SIZE]

c.

STORED_KEYWORD[20]

18. 19.

d.

COUNTER[SIZE]

20.

; string array to store file name

;string array to store keywords associated with each component file ; int array to store counting of

matches Set CH=’y’ while (CH = = ‘y’), repeat steps from 5 to 12 Set SR=SR+1 Print ‘Enter component file name’ Read STORED_COMPONENT[SR] Print ‘Enter keyword’ Read STORED_KEYWORD[SR] Set COUNTER[SR]=0 Print ‘Want to add another component? (y/n)’ Read CH [End of step 4 loop] Exit.

Search mechanism described above is based on blind search. Efficiency of search mechanism can be improved by classifying the components into different categories. Required component can be searched into that particular category in spite of searching in the whole library. This approach will save time and improve efficiency of the search. Efficiency can also be improved by using fast sorting method for sorting the index table. IV. CONCLUSION Reuse, as in other engineering disciplines, also evolved with fruitful results in case of software reuse. The basic step in reusing already developed software artifacts is to build a library of such components. Such library is not

400

Nath and Kumar: Building Software Reuse Library with Efficient Keyword based Search Mechanism

just a collection of software artifacts but it is built with the objective in mind that the components in such a library will be stored and retrieved for the purpose of reuse. Components to be stored are developed such that these become more and more reusable. Making such a reuse library requires some different mechanism for storage and retrieval of components. One such approach based on keyword searching was described in this paper with algorithm for building library and searching mechanisms. Fraction of match is also taken into consideration to make the retrieval mechanism more relevant. One of the main objectives of this paper was also to make it useful to new researchers, so the language of the paper and terminology used while writing algorithms were kept quite easy. Algorithms were also explained in detail.

[4] Ostertag, Eduardo, Hendler, James, Prieto-Diaz, Ruben, Braun, Christine; Computing Similarity in a Reuse Library System an AI-based Approach; ACM Transaction on Software Engineering and Methodology, 1992, vol. 1, no. 3, pp. 205-228. [5] Mili, Rym, Mili, Ali, Mittermeir, Roland T.; A Survey of Software Storage and Retrieval; Annals of Software Engineering, 1998, vol. 5, no. 2, pp. 349414. [6] Mili, Rym, Mili, Ali, Mittermeir, Roland T.; Storing and retrieving software components: a refinement based system; IEEE Transaction on Software Engineering, 1997, vol. 23, no. 7, pp. 445-460. [7] Vitharana, Padmal, Zahedi, Fatemeh M., Jain, Hemant; Knowledge-based repository scheme for storing and retrieving business components: a theoretical design and an empirical analysis; IEEE Transactions on Software Engineering, 2003, vol. 29, no. 7, pp. 649-664. [8] Girardi, M.R., Ibrahim, B.; Using English to Retrieve Software; Journal of System and Software; 1995, vol. 30, no. 3, pp.249-270. [9] Sugumaran, Vijayan, Storey, Veda C.; A SemanticBased Approach to Component Retrieval; The DATA BASE for Advances in Information Systems – Summer 2003, Vol. 34, No. 3, pp. 8-24. [10] Rajender Nath, Harish Kumar; Building Software Reuse Library; 3rd International Conference on Advanced Computing and Communication Technology- ICACCT-08; Asia Pacific Institute of Information Technology, Panipat , India; November 08-09, 2008, pp. 585-587.

REFERENCES [1] Yong-liu, Aiguang-yang; Research and Application of Software-reuse; Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/ Distributed Computing, IEEE, 2007, pp. 588-593. [2] Prieto-Diaz, Ruben, Freeman, Peter; Classifying Software for Reuse; IEEE Software, 1987, vol. 4, no. 1, pp. 6-16. [3] Haining Yao, Letha Etzkorn; Towards A Semanticbased Approach for Software Reusable Component Classification and Retrieval; ACM Software Engineering, 2004.

401