Veda - An On-line Generative Testing System - Semantic Scholar

2 downloads 20005 Views 123KB Size Report
An interesting feature of Veda's design are the security features built into it. Cryptographic ... computer can also advise a person to go to the next question when he has spent too much time on it, or ..... The parent has to pass the child a 'key' which we call the 'master key'. ..... Carnegie Mellon University, November 1973. 4.
Education and Information Technologies, 2(3), 1997, pp. 219-234

Veda - An On-line Generative Testing System KSR Anjaneyulu, R Chandrasekar and S Ramani National Centre for Software Technology Gulmohar Cross Road No. 9, Juhu Bombay, India 400 049 Email: fanji, mickey, [email protected]

Abstract Veda is a generative testing system which supports the design and administration of tests, on-line and off-line. Veda provides tools for test designers to create question generators. Veda authenticates the candidate’s identity, administers a test, evaluates the candidate’s answers and keeps a record of the candidate’s scores. The system also provides facilities for item analysis. There are provisions for computing statistical indices for individual questions, based on data from tests administered. These indices can be used for item evaluation and improvement. An interesting feature of Veda’s design are the security features built into it. Cryptographic techniques have been used to create a system offering a high degree of protection against leakage, theft or unauthorized modification. Veda was developed using the programming language C, under the UNIX operating system, and is in heavy use.

Keywords: Generative, Testing

1 Introduction Testing has a crucial role in on-line instruction. Testing permits the reliable sensing of the level of comprehension of a candidate. Within the area of testing, our work focusses on generative techniques [3]. It demonstrates that the technology to support the widespread use of generative tests has come of age. Our emphasis has been on tests involving multiple questions, used independently of any tutoring system. The tests can be anywhere from a few minutes to a few hours long, and are generally administered and graded on-line by the system. Similar tests, administered in a more conventional style, play an important role in education at all levels today. Teachers require tools to help them design and construct these tests. Sophisticated computer based tools have become increasingly available for this purpose. Computer assisted testing techniques have been used primarily for assisting teachers to construct tests, and analyse them off-line. One significant use of computers in this area has been in question banking; questions are kept in an on-line database [1,2] and accessed using selected attributes. Question banking systems provide for producing scrambled tests, randomizing the order of questions and randomizing the order of alternatives in multiple-choice questions. They also provide for formatting and printing of question papers. However, it is possible to use computers more effectively, utilizing their interactive capabilities to achieve a lot more. Computers can be, and should be, used to administer on-line tests, evaluate the answers of the candidate and award marks. Some of the advantages of automated testing are listed below: a) Testing can take place throughout the year 1

Education and Information Technologies, 2(3), 1997, pp. 219-234

b) c) d) e) f)

Professional effort (for test design, evaluation, etc) is not required every time the test is conducted Automated scoring is possible without restricting oneself to multiple choice questions Test results of candidates can be made available immediately Such tests do not require much change in existing practices Useful statistics can be gathered on-line, and used to improve the tests

In addition, the use of generative techniques provides the following advantages: a) If a candidate has to be tested repeatedly on a concept, generative mechanisms provide variety. They avoid the monotony and pointlessness of giving him the same questions again and again. b) The test continues to be useful even after it has been exposed to several thousand candidates. Further, on-line testing opens up a whole new spectrum of possibilities. It is possible to test a candidate in situations which cannot normally be tested in a traditional test. For example, there could be a simulation of an experiment and the candidate could be tested on his understanding of that experiment. It is also possible to do what cannot normally be done in a traditional test. For example, a candidate can be given a second chance to answer a question (possibly with lower credit) if he is wrong on the first attempt. The computer can also advise a person to go to the next question when he has spent too much time on it, or offer him a hint at that time. Veda is a generative on-line testing system developed at the National Centre for Software Technology (NCST). It has been primarily used to test candidates who apply for admission to NCST’s courses in Software Technology. Veda is also used in many courses being taught at NCST, covering subjects such as Pascal, UNIX, AI Programming, and Expert Systems. In this paper, we describe Veda, its design, implementation and use. We initially present the motivation for the work and then show how tests are created using Veda. We then describe the overall design of Veda. Later sections deal with security features offered by Veda, the provision to compute statistical indices for item analysis, our experience with on-line testing, the limitations of Veda, possible extensions and finally our conclusions.

2 A Typical Application of Veda Veda has been used extensively to test about a thousand candidates a year, over many years. These candidates take an entrance examination for NCST’s course in Software Technology. Out of these, about 90 are finally selected for admission. The entrance examination is split into two parts. The first part is a test of general aptitude that serves as a qualifying examination for the second part. The second part is a test of basic computer science and programming knowledge. This part is to be taken after the applicant reads recommended books on Computer Science and Pascal. On-line tests based on Veda are available to cover both parts. Candidates are tested on the following domains: quantitative reasoning, concepts relating to high school maths, visuo-spatial reasoning, logical reasoning and verbal ability. Part II, covering basic computer science and programming knowledge, was automated later.

3 Creating Generative Questions in Veda Most questions in Veda are of the generative type. This section deals with the structure of generative questions and the process of their creation. Later sections give a description of the Veda system. Consider the following question in geography: What is the capital of India? This question can be generalised to make it generative. The word ‘India’ can be made into a variable (a ‘slot’) so that the same generator can generate questions about capitals of different countries, thus: What is the capital of ? 2

Education and Information Technologies, 2(3), 1997, pp. 219-234

Individual instances are generated by substituting proper values in place of the slot . This generator tests for the same type of information i.e, the capital city of a country. It can generate different question instances by using a proper filler for the slot using a given set of countries. The set [India, Pakistan, Sri Lanka, Bangladesh] can be one such set. The answer will depend on the filler selected for the slot . In general, there can be more than one slot in a question template. In such a situation, the interdependence among slots could be complex. Now, let us consider a slightly more complex example. This example also illustrates how a generator evolves out of an instance. Consider a typical question on polygons. The shape of a field is a square. Its side measures 100 metres. What is the area of the field? This question can be made generative by identifying the parts that can be changed. For example the word ‘square’ can be changed to ‘rectangle’, ‘rhombus’ etc. So by making that part of the question a slot, we get a template that can generate a question on any quadrilateral. The template will be as follows: The shape of a field is a . Its side measures 100 metres. What is the area of the field? Now we can specify the set of quadrilaterals [square, rectangle, rhombus, ...] as the choice set for the slot . But the question is not yet complete. If the filler for the slot is any thing other than square, the area cannot be computed from the information we have so far. This is because the description gives only one dimension and other quadrilaterals like rectangle can be specified fully only with two dimensions. The question, therefore, has to give more information depending on the filler for the slot . We therefore have to make the description of the quadrilateral as a new slot which will take different fillers depending on the filler for . The template now becomes: The shape of a field is .

. What is the area of the field? The filler for the slot depends on the filler selected for the slot . For each choice of the filler for , the user has to specify the filler (a sub-template here) for the slot . If is ‘square’, then should be ‘The diagonal of the is ’ If is ‘rectangle’, then should be ‘It has a diagonal measuring and a side measuring ’ If is ‘rhombus’, then should be ‘The diagonals of the are and ’ If is ‘trapezium’, then should be ‘The two parallel sides are and and the distance between them is ’ Now the template is complete and it can be used to generate questions on different quadrilaterals. We can make the template more general by making the answer that has to be computed, also a variable (say ) instead of being only area. Now the template becomes: The shape of a field is .

. What is the of the field? Then we can specify the set [area, perimeter, diagonal] as the choice set for slot . We have to be careful in specifying the choice set for because the filler selected for should not be a value given in the description, i.e., the filler for the slot description. For example, if the filler for description for a square gives the length of the diagonal, then the filler for should not be the length of the diagonal. It can either be area or perimeter. One way to ensure this is to select two mutually exclusive sets for and . That means any of the parameters given in the choice set for should not be given in any of the fillers for the slots in the sub-template for . But this may be too strong a restriction. For example, when the filler for quadrilateral is ‘rectangle’ and the filler for description contains only two sides of a rectangle, then the filler for feature can be anything from the set [diagonal, area, perimeter] depending on the filler for description. We can generate meaningful problems by ensuring that the features selected to describe the given object are not the ones about which a question is asked. The filler for can be anything from the set [area, perimeter, diagonal], 3

Education and Information Technologies, 2(3), 1997, pp. 219-234

provided that the filler selected for description does not have that particular value. This can be achieved by using an additional constraint: when is ‘rhombus’ then not equal to ‘diagonal’ Thus, the process of creating generative questions involves the following stages: a) Creating question templates with markers to identify variable parts (i.e. slots), b) Specifying a set of possible fillers (choice set) for each slot, and the relationship among slots, and c) Specifying the answer, as a procedure.

4 Creating Tests Veda supports test designers in creating three types of questions, namely: a) Question Generators, b) Non-generative Questions and c) Comprehension Questions, in which several questions are based on a common paragraph All three types of questions are automatically graded. The system provides a set of tools that can be used to develop question generators, making this task relatively easy. The designer of the generator uses his understanding of the structure of a type of question, and the generative mechanisms available, to create questions See Figure 1 for an example. The nongenerative questions mentioned in (b) and (c) above, on the other hand, are simply stored as structured text and retrieved when required.

   Figure 1 comes here    Questions can be grouped into subject domains, such as quantitative, verbal, etc. These domains can be ordered in any sequence for presentation to the candidates. The domains are, therefore, the areas or topics on which the candidate is examined. For each question, the test designer is allowed to specify the maximum time recommended for the candidate to answer the question. During test administration, on expiry of this time, the candidate is given a warning and advised to go to the next question. On pre-tested questions, this time could be set at the time at which 90% of the candidates from the test group who got it right were able to answer it. The statistics collected by the system enable the computation of this time, called T90, for each question. Question generators are ‘C’ programs, which call generative tools in the form of a set of ‘C’ functions. After these programs are created, they can be easily integrated with the Veda system by just updating entries in the system files. Non-generative questions can also be added easily to the system’s databases using interactive facilities provided by the system. Though the system is primarily meant for on-line testing, it can also be used to generate tests which can be duplicated and administered off-line as conventional written tests.

5 Test Administration At the time of the test, the invigilator who looks after the system issues the candidate a roll number and a session password. The invigilator then starts a terminal session, entering his (the supervisor’s) password. The candidate then starts the test session using his session password and roll number. A combination of the two passwords is used to decrypt questions in the system. At the beginning of the test, the candidate is given a familiarization session. This session, which lasts about five minutes, allows the candidate to get used to the terminal keyboard. It also presents sample questions which familiarise him with the format of the test. The candidate is then given the real test. 4

Education and Information Technologies, 2(3), 1997, pp. 219-234

For each test, a qualifying mark can be fixed. When a question is presented, the number of additional marks that have to be obtained to qualify is displayed. The number of questions remaining in the test is also displayed, along with an indication of the remaining time. If a candidate gets a question wrong, he is allowed another attempt at it. If he tries again and his answer is correct, he is awarded half of the marks for the question. A candidate’s test can be terminated when it is clear that he does not have a chance of qualifying (i.e. if the number of marks yet to be earned is greater than the number of questions left). A candidate who obtains the qualifying mark can also have his test terminated 1. The total duration of the test can be set to any value as derived by the test designer; no time limit is imposed on individual questions. The warning given at the end of the time specified for individual questions is only by way of advice, and a candidate may decide to ignore it. At the end of the test, the system elicits and stores information about the candidate, for example his basic qualification, the field in which he has obtained his degree, and other information that may be relevant to those administering the test. The system automates most of the routine administrative functions that have to be carried out in conducting a test. Except for an invigilator who needs no significant knowledge of the system, no one else is required to look after the day-to-day administration of tests. There are routines to produce hall tickets(with passwords) for the candidates, print out the mark lists of the candidates at the end of the day, and produce a daily report on the performance of candidates.

6 Veda: A General Overview The general structure of Veda is shown in Figure 2. There are two system files that have to be maintained by the test designer. These are the ‘structure’ file and the ‘timeandmarks’ file. The ‘structure’ file specifies the domains included in the test, the order in which they are to be covered and the number of questions that have to be administered from each domain. The ‘timeandmarks’ file specifies the duration of the test, the total number of questions that should be administered to the candidate and the qualifying marks for the test.

   Figure 2 comes here    The ‘init’ program validates a candidate and gives him a familiarization session. It determines the order of questions in a domain and initiates three schedulers; one for each category of questions. The init program passes a ‘master key’ 2 and the order of the questions to the schedulers. The schedulers then present the questions to the candidate. In the case of the Question Generator Scheduler, a ‘question generator’ program is executed by an additional process to create the question. The schedulers are also responsible for maintaining relevant information about each question. Lists of question generators are kept in files, each list corresponding to the domains in which a person is being tested. Databases of short answer questions, each database corresponding to a different domain, and lists of comprehension passages are kept in separate files. For every candidate who takes a test, two files are created: a unique ‘test’ file and a unique ‘log’ file. The schedulers write the results for each question posed to the candidate in the ‘test’ file. The information about the candidate (name, address, qualification etc) elicited at the end of the test is also stored in this file. The ‘log’ file is used to record (in encrypted form) all questions that are given to a candidate and all the answers given by the candidate. If required, this file can be examined by the technical supervisors (the people who are concerned with the overall management of the test), for example, to deal with a complaint about the system. There is considerable security in the entire process, including the process of examining the log. 1 The test designer can arrange to let a test run to completion, presenting all questions to each candidate, or have it terminated

as mentioned above. 2 This is required for decryption of the question. See Section 8 for details

5

Education and Information Technologies, 2(3), 1997, pp. 219-234

7 UNIX Process Structure To describe how the question generators in the system work, it is necessary to first outline the structure of processes in UNIX that execute the generators in Veda. A ‘process’ is a program in execution. UNIX allows a user to have more than one process running at a time. It also provides mechanisms for one process to communicate with another. This feature is used in question generators in Veda. The process that runs initially in Veda is the ‘init’ program, which then passes control to the Question Generator Scheduler. This scheduler determines which question generators are to be run, and then ‘forks’ into two processes. The scheduler now becomes a parent process and the other process is the child process. The child process overlays a question generator on itself and executes it (see Figure 3). The parent process waits until the child completes its task and then continues.

   Figure 3 comes here    The parent has to pass the child a ‘key’ which we call the ‘master key’. This key is used to decrypt constituent items of text of the question to be produced, since these items are stored in files in encrypted form. The key is passed to the child process through a Unix ‘pipe’. Pipes in Unix allow processes that belong to the same hierarchy to communicate with each other. After the child process has finished its work, The child process then generates a question, takes the candidate’s response, evaluates his answer and passes back information (such as the marks obtained by the candidate for the question, time taken to answer the question etc) through the pipe to the parent. This important feature allows the test designer to design generators as independent modules, and easily integrate them with the system. Each generator is a ‘C’ program that can be developed in total isolation (using the tools provided by Veda) and then later integrated with the system.

8 Security Aspects Since the whole test is automated, it is important to protect the files related to the test. Any unauthorised access to questions from Veda’s database must be rendered impossible. It is not enough to rely on Unix file security, since the privileged user ‘root’(the system administrator) can override file security mechanisms. Veda uses encryption as its main protection mechanism. The scheme we use is safe, even if there are security lapses by those controlling the computer. All files in the system are encrypted, thus making it impossible for intruders to get to know the questions on the system. We envisage that three technical supervisors will look after the overall system. At least two of them are required to type in their passwords to encrypt (or re-encrypt) all files in the system. When the files in the system are encrypted, pairs of administrative and session passwords are produced. A different administrative password is generated for each day for which the test is being conducted. A session password is required for each candidate appearing for the test. Only a combination of the right administrative password and session password will allow a test to be taken. When a candidate comes to take the test, the administrative supervisor types in his password (the administrative password needs to be typed in only once a day). The candidate also types in his password, independently. These passwords are combined to get the master key. The program that determines the master key is part of the init program, which chooses the questions to be displayed to the candidate. The text of the question generators are encrypted before compilation. That is, only the alpha-numeric strings are encrypted and not the executable code. When a ‘question generator’ is invoked as a child process, the parent process sends the master key to the child, which uses it in decryption. Decryption of the question text is done in memory. Thus no plain text files are ever available in the filing system. Without the proper master key, the ‘question generating’ program that is invoked will not be able to carry out its task. 6

Education and Information Technologies, 2(3), 1997, pp. 219-234

Notice the multiple levels of security. At the top, no technical supervisor can compromise any information without the connivance of at least one other technical supervisor. The system is protected against misuse by the administrative supervisor or the candidate, since both the session password and administrative password are required to run this program. The master key may be changed at periodic intervals for added security.

9 Statistical Analysis of Test Scores Some questions that test designers have to ask themselves are: a) b) c) d)

How reliable are our tests? Are measurements repeatable? Do they measure what they should ? What can we do to improve our tests ? How good is our test in predicting real-life performance after the test?

Answers to the questions above may be obtained if we start with the observation: ‘Just as questions test people, candidates taking the test, test questions’. Test designers often have an ‘intuitive’ feel of how good a question is. Statistics is a powerful tool that allows test designers to get objective and quantitative measures of the quality of questions. However, teachers will not, in general, use statistical tools unless the necessary computations are automated and incorporated in testing software. In the following sections, we introduce some indices that we have used in our work that have helped us to improve our tests. Veda incorporates utilities that take test files of candidates as input and produce these indices.

9.1 Predictive Value (PV) Consider the classification of candidates into four categories with respect to

 

performance in the test as a whole, and performance in answering a specific, single question correctly.

The categories are: a) b) c) d)

Candidates who have not qualified in the test and have answered the particular question wrong Candidates who have not qualified in the test and have answered the question right Candidates who have qualified in the test and have answered the question wrong, and Candidates who have qualified in the test and have answered the question right.

With such a classification for each question, we would get a table like the one shown in Figure 4, where a, b, c and d denote the number of candidates in each category, e, f, g and h are row-wise and column-wise totals respectively, and k = a + b + c + d. Note that k is the total number of candidates who have attempted the question.

   Figure 4 comes here    What do these numbers tell us? Suppose a + d = k. What does this mean? It really means that everyone who qualified in the test got the question right and everyone who did not qualify in the test failed to answer the question. Such a question classifies all candidates perfectly, and so is a very good one. On the other hand, if b + c = k, it is clear that the question is very bad. We now define the Predictive Value (PV) of a question as the number of ‘properly’ classified candidates divided by the total number of candidates who attempted the question. In the case where b + c = k, PV = 0; where a + d = k, PV = 1. Based on the previous argument and these values of PV, we can infer that a question with a high PV is a good one and one with a low PV is a bad one. 7

Education and Information Technologies, 2(3), 1997, pp. 219-234

9.2 Level of Difficulty (LD) The Level of Difficulty (LD) gives a test designer a simple measure of how difficult a question is. The LD for a question is the ratio defined in Figure 5.

   Figure 5 comes here    A question with an LD of 0.0 is very easy, since everyone has got it right. On the other hand, a question with an LD of 1.0 is very difficult; no one has got it right. Both questions are undesirable in a test, as neither of them distinguishes one candidate from another. In other words, such questions do not contribute anything to the ‘discriminatory’ power of the test and so do not deserve inclusion in the test. A related question of relevance is: What should be the LDs of questions in a test? Is it better to have questions with an average level of difficulty? The answer seems to be yes; but the ‘average’ level of difficulty will have to be determined based on the level at which a test is meant to discriminate.

9.3 Correlation Coefficient (CC) If we correlate the scores obtained on a particular question with the total scores of the candidates for the test, the correlation coefficient will give us a measure of the ‘goodness’ of the question. A low correlation coefficient indicates that candidates who have got that question right have not done well in the test as a whole. On the other hand, a high correlation coefficient indicates that candidates who have got the question right have done well in the test. Obviously, the second question is a better question than the first.

   Figure 6 comes here    The CC also provides information about the classification of a question. Consider a test consisting of questions in three domains, say quantitative reasoning, high school maths and visuo-spatial reasoning. The classification of questions is decided by the test designer when he designs questions. Questions usually test abilities in more than one domain, as the domains involved are not mutually exclusive. They may not strictly belong to a particular domain, though they are classified in that domain. If we carry out a correlation analysis of the question score to the domain scores, we may find that the correlation with one domain X is greater than the correlation with the other domain Y in which it was originally classified. We might then decide that the question has been wrongly classified and decide to change the classification to X, based on the results obtained. This reclassification helps us to have more accurate tests for the future.

9.4 Chi-Square Coefficient The chi-square test employed by us in Veda evaluates the matrix of a, b, c, d given in Figure 4 to determine the statistical soundness of available evidence (see Figure 7). Unlike all other indices mentioned in this paper, the chi-square coefficient is dependent on the size of the test-taking population. The LD measure, for instance, may not change significantly as we go from information on 10 candidates to information on 100 candidates. But the chi-square coefficient does change significantly.

   Figure 7 comes here    9.5 Equivalence of tests A major feature of automated on-line testing is that each test that is administered is generated when needed. This is in contrast to conventional testing where a copy of one test is administered to a whole batch of candidates. In on-line testing, a specified number of questions can be selected from a large question bank and administered to each candidate at the time of the test. This has an obvious advantage: 8

Education and Information Technologies, 2(3), 1997, pp. 219-234

a candidate can get little benefit by consulting someone who has taken the test earlier. The protection of information about the test is further enhanced if the questions are generated rather than retrieved from passive storage. No candidate can then pass on specific numbers, names etc., as the answers expected, to other candidates. However, this creates a new problem. If the test given to candidate varies from that given to another, how do we compare the performance of one candidate with that of another? The problem can be posed in the following form, making reasonable assumptions and simplifications. If there is a question bank of M questions and each candidate is given N questions randomly picked out of these M, how do you ensure that tests given to different candidates are equivalent? Statistics provides the answer to this important question, if we accept a simple model described below. Let us assume that questions are drawn from a large population of questions with a mean level of difficulty, averaged over all questions, of LDp, and a standard deviation over the whole population of questions, SDp. Each test (set of questions) drawn from this population is a sample. We can, therefore, estimate the likely variation in the mean level of difficulty of the test, as we go from the test given to one candidate (consisting of N questions randomly selected from the total population of questions) to the test given to another candidate. An estimate of this, in the form of probable error of the level of difficulty of the test (in comparison to the LDp) is SDp/square-root(N) where N is the number of questions in the test. As an indication of variability, in one of series of tests we conducted we found that: LDp = 0.43, SDp = 0.21 and Probable error in the level of difficulty of a test is equal to 0.05. The convention used here involved normalizing the maximum score to 1.0. Therefore variability in LD is of the the order of 5%. This figure is well within the variability seen in conventional testing. The variability in LD from one test to another will be much less if a test covers several domains, and a fixed number of questions are selected from each domain. This will amount to stratified random sampling and will result in a smaller variation of LD from test to test. Based on this feedback, some minor changes were carried out on the system. Veda was later extensively tested in NCST in informal use and then put to regular use, to screen applicants to the one-year part-time course. About 600 to 1000 applicants have taken on-line tests generated by the system every year, over three years.

Summary of Experience and Findings a) It was found that a familiarization session carrying no marks, prior to the actual test, was extremely useful. Most of the candidates did not have problems using a terminal. This is probably because they had to do very little typing. b) The candidates appreciated their results being declared immediately on completion of the test. c) All questions in the test were of the short answer type, involving a number, a word or a few words being typed in. Such questions eliminate errors due to guessing that is common with multiple-choice questions to avoid ambiguities. It was found necessary, to provide hints along with the questions. These hints have to help the candidate uniquely determine the answer (see example in Figure 8).    Figure 8 comes here    d) Tests were initially administered on a HP9040 running HP-UX. Later they were administered on the VAX8600 running Ultrix. The system was found to be stable, and performed well even when the host system was fully loaded. e) The major overhead of the system arises from the ‘fork’ operation involved in running a question generator. Each question generator creates an additional process on the system. On the VAX8600, with 16 megabytes of memory, and about 20 other users, we were able to run 20 tests at a time with no problems whatsoever. This was adequate for our needs, and is probably much lower than the 9

Education and Information Technologies, 2(3), 1997, pp. 219-234

limit set by the capabilities of the computer. On any good general purpose computer running UNIX, every terminal available should be able to run a test, at any given time.

10

Limitations and Possible Extensions

Veda, like any other software system, has its own limitations. Some of these were envisaged beforehand. We have overcome some of these limitations now. Some existing limitations and possible extensions are given below: a) A person cannot skip a question and come back to it later. The facility to skip questions can be provided by storing the generated question along with the answer in a file (in encrypted form). We can then allow candidates to refer to questions skipped previously, and to attempt to answer them. b) The generators had to be programmed in C. We have developed an Authoring System for Veda [4]. This system takes a high level specification corresponding to a generative question and creates the C program that Veda requires. Thus Veda has been made easier to use, by the creation of this authoring system, as the test designer need not have programming experience. c) At present, we are using the statistical indices mentioned in section 9 to help the designer determine the goodness of a question. Ideally one should have these indices stored along with the questions. These indices could then be used to select the questions for a test. For example, a test designer may want only questions with a level of difficulty (LD) greater than 0.5. By using this index, the designer could have better control on the difficulty of the questions in the test. Currently we do not have a provision to store these parameters with the questions.

11

Applications for Veda

Applications that we envisage for Veda include the following: a) Veda can be used with a Computer Aided Instruction system. The CAI system could teach a candidate concepts in a certain subject. Veda could then test them to make sure they have understood what has been taught. Veda could be extended to provide references to remedial material that should be looked into, pin-pointing concepts that the candidate has not understood. b) Veda provides a mastery learning mode which we use quite frequently in our courses. The intention in this mode is not to award marks, but to allow the candidate to have multiple attempts at a test and determine what concepts he needs to revise. Veda keeps track of questions he has got wrong in an attempt and gives only those questions in subsequent attempts. c) Veda can be used for Distance Learning applications. Course participants could use remote access to a Veda system, over a data communication network. Veda could play a valuable role in evaluating student progress and in giving feedback. d) Veda can be used for a highly centralized testing, as in an Open University or a National Testing Service.

12

Conclusions

Testing is an integral part of instruction. In general, teachers are not able to test candidates as often as they would like to. This is because of the effort involved in designing and conducting tests. On-line testing seems to be one way of relieving the burden of a teacher. With on-line testing, it would be possible for candidates to take a test almost anytime, in any subject, and get feedback on their ability and knowledge. Considering the limited resources available in a developing country, testing can play a very invaluable role. Improvements in evaluation of student performance, using computer based testing, can be more cost-effective than automating the bulk of the instructional process. 10

Education and Information Technologies, 2(3), 1997, pp. 219-234

Acknowledgements The work described in this paper was carried out as part of the Knowledge Based Computer Systems Project funded by the Department of Electronics, Government of India, with assistance from the United Nations Development Programme. Several colleagues at NCST have contributed to the efficient administration of tests, and to the use of Veda in testing. We are grateful to all of them for their cooperation.

References 1. R Chandrasekar, SB Chikarmane, PM Desai, SD Laud and R Ramanan. Design and Implementation of an Instructional Data Base, in: Proceedings of EDINFO-82, International Symposium on Education in Informatics (Computer Society of India, Madras, 1992). 2. G Lippey (ed). Computer-Assisted Test Construction. (Educational Technology Publications. New Jersey. 1974). 3. S Ramani and A Newell, On the Generation of Problems, Technical Report, Dept of Computer Science, Carnegie Mellon University, November 1973. 4. P Srinivas, KN Prakash, KSR Anjaneyulu and S Ramani. An Authoring System for Generative Testing, in: V Rajaraman (ed), Proceedings of the Knowledge Based Computer Systems Conference - KBCS ’88 (Computer Society of India. Bangalore. 1988) 64-71.

11

Education and Information Technologies, 2(3), 1997, pp. 219-234

The area of a f pentagonal j hexagonal j octogonalg field is f A g square metres. If all its sides are equal in length, how long is a side of the field? Constraints: Side S is in the range 6.5 to 10.5, with step 1. Number of sides N = 5, 6 or 8 Algorithm for Generation: Choose S and N randomly, within the constraints specified. A = (N * S * S) /(4 * tan(180/N)) Substitute the computed quantity for A in the question. Answer = S Note: In this example, there is no need to ‘compute’ the correct answer, as you generate the question starting with the correct answer. However, other questions may require the answers to be computed, as a function of the attributes (slots). Figure 1: Example of a Question Generator

Structure file Init Program Timeandmarks file

Question Generators Scheduler

List 1 List K of ... of QGs QGs

Short Questions Scheduler

Database 1 Database L of ... of SQs SQs

Comprehension Questions Scheduler

List 1 List M of ... of CQs CQs

Figure 2: Veda - A General Overview

12

Education and Information Technologies, 2(3), 1997, pp. 219-234

_____________________ | Question Generator| | Scheduler | | (Parent Process) | |___________________| | |------------------ create another process and wait until the | | execute the generator child finishes | ________________ | | Question | | | Generator | | |(Child Process)| | |_______________| | | | | terminate |------------------store results in | candidate’s file Figure 3: Execution of Question Generators

How well does the question discriminate?

Not Qualified in Test Qualified in Test Totals

Got Ques Got Ques Wrong Right Totals _________________ | | | | a | b | e |_______|_______| | | | | c | d | f |_______|_______| g

h

k

PV = (a + d)/(a + b + c + d) = (a + d)/k

Figure 4: Predictive Value

13

Education and Information Technologies, 2(3), 1997, pp. 219-234

How Difficult is a Question?

LD =

No. of candidates who failed to answer the question correctly -------------------------------Total no. of candidates

=

a + c ----k

Figure 5: Level of Difficulty

Does the question fit in the test? Is it properly classified? Consider a) Correlation of Question score with total score for that domain b) Correlation of Question score with total test score. Figure 6: Correlation Coefficient

2 = (bc ; ad)2  k=e  f  g  h Figure 7: Chi-Square Test

Complete the following analogy: satellite : orbit :: rocket : _ _ _ _ _ _ _ _ _ _ (10 letters) Hint: t _ _ _ _ _ _ _ _ _ * The hint tells the candidate that the answer starts with the letter ‘t’, and is a word ten letters long Figure 8: Example of a Short-Answer Question

14