An Authorization System for Digital Libraries - CiteSeerX

4 downloads 404 Views 243KB Size Report
can be specified using a rule of the form: 〈u, employees, salary < 30k〉, pro-. vided that the database contains a class employee with an attribute salary). By contrast, due to the nature of ...... (http://www.dsi.unimi.it/rapporti.php). 5. E. Gudes, H. Song and ... Workshop on Logic Programming for Internet Ap-. plications, Leuven ...
myjournal manuscript No. (will be inserted by the editor)

An Authorization System for Digital Libraries E. Ferrari1 , N. R. Adam2 , V. Atluri2 ⋆ , E. Bertino1 , U. Capuozzo1 1

Dipartimento di Scienze dell’Informazione, Universit` a degli Studi di Milano, Via Comelico, 39/41 20135 Milano, Italy, {bertino,ferrarie}@dsi.unimi.it, ugo [email protected]

2

CIMIC and MSIS Department, Rutgers University, 180 University Avenue, Newark NJ 07102, [email protected], [email protected]

Received: date / Revised version: date

Abstract

Digital Libraries (DLs) introduce several challenging require-

ments with respect to the formulation, specification and enforcement of adequate data protection policies. Unlike conventional database environments, a DL environment typically is characterized by dynamic subject population, often making accesses from remote locations, and by an extraordinarily large amount of multimedia information, stored in a variety of formats. Moreover, in a DL environment, access policies are often specified based on subject qualifications and characteristics, rather than subject identity. Traditional authorization models are not adequate to meet access control requirements ⋆

The work of V. Atluri was partially supported by National Science Foundation

CAREER award under grant IRI-9624222.

2

E. Ferrari et al.

of DLs. In this paper, we present a Digital Library Authorization System (DLAS). DLAS employs a content-based authorization model, called Digital Library Authorization Model (DLAM) proposed earlier [1].

Key words

Access Control, Digital Libraries, Credentials.

1 Introduction

One of the challenges encountered in a digital library [2] environment is securing information. On one hand, the system must protect information against malicious use and corruption, and must ensure the privacy of its subjects. On the other hand, it should provide open access so that vendors and information producers can add/update information and services any time. In traditional environments, access control is performed against a set of authorizations stated by security administrators or subjects according to some security policies. Typically, an authorization is specified as a triple hs, o, pi, which states that a subject s is authorized to exercise privilege p on object o. Such traditional authorization models and mechanisms are not adequate to specify and enforce the data protection policies that are typical to a DL because of its highly dynamic subject population often making access from remote locations, and due to its extraordinarily large collection of objects. In this paper, we present a Digital Library Authorization System (DLAS) that employs a Digital Library Authorization Model (DLAM), originally proposed in [1], in which access control policies can be specified

An Authorization System for Digital Libraries

3

based on subject qualifications and characteristics (called credentials) and the concepts associated with the object. This enables enforcement of security policies in conventional libraries, thus making an easy transition from paper based libraries to their digital counterparts. With DLAM, it is not only possible to specify access control based on the content of the object, but on selective components of DL objects as well, e.g., part of a book, which is not always possible in traditional libraries. DLAM has been implemented using the examples from the Global Legal Information Network (GLIN), a project originally undertaken by the Law Library of Congress (LLoC).

Specifically, DLAM provides (1) flexible specification of authorizations based on the qualifications and characteristics of subjects (including positive and negative authorizations); (2) both content-dependent and contentindependent access control to digital library objects; and (3) varying granularity of authorization objects ranging from sets of library objects to specific portions of objects.

DLAS, the digital library authorization system that employs DLAM, has the following features: (1) it adopts a precomputation strategy for efficient evaluation and enforcement of authorizations, (2) it includes an authorization administrative module that enables specification of multiple credentials possessed by subjects as complex credential expressions, and specification of relationships among different credential types (concepts) as a credential (conceptual) hierarchy, (3) it includes a mechanism to address the resolu-

4

E. Ferrari et al.

tion of conflicting privileges arising due to the inheritance as a result of the hierarchical nature of credential types and concepts. The remainder of this section reviews the related research. Section 2 presents an overview of DLAM. Section 3 presents the DLAS architecture. Section 4 describes the features and tools of DLAS in detail and presents the algorithm specifying our access control strategy. Finally, Section 5 presents some conclusions.

1.1 Related Work

Since DL is a new and emerging area of research, there has been very little prior work that addresses security for DL. One of the first proposals of an access control model for WWW documents has been presented in [11]; in such a model, authorizations can be given either to the whole document or to selected portions within a document. Although we borrow from [11] the idea of selectively granting access to a document, our work substantially differs from the work in [11]. The major difference is that we support a flexible subject specification by which subjects are qualified by predicates on credentials. By contrast, the model in [11] does not support such a flexible subject specification, but relies only on subject-ids, which we believe is not suitable for an open access environment like DL. Another significant difference is that we provide content-based access control to DL documents. Winslett et al. [13,14] have first identified the need of subject credentials for providing access control in a DL environment. Building on this idea, we

An Authorization System for Digital Libraries

5

formalize the specification of subject credentials by providing a credential specification language. The advantage of using a formal language is that, it not only allows flexible specification of credentials, but also enables easy evaluation of authorizations and verification of the consistency of specifications. Another major difference is that we provide an articulated set of features for enforcing access control ranging from content-based authorizations for non-structured data, to the support for positive and negative authorizations and varying granularity of authorization objects. Due to the fact that we organize both concepts and credentials into hierarchies, our model may appear similar to the Orion authorization model [10]. Orion is specifically tailored for an object oriented DBMS storing conventional, structured data, as such, great attention has been devoted to concepts such as versions and composite objects, which are typical of an objectoriented context. However, authorizations in Orion are specified based on the hs, o, pi paradigm, which is not suitable for a DL environment as has been alluded to earlier. Content-dependent access control in the context of object-oriented databases have been proposed by Gudes et al. in [5] and by Bertino and Weigand in [3]. However, both proposals deal with conventional data (that is, data whose structure completely fits into a database schema). In such context, content-based access control can be enforced by imposing some conditions on the attribute values. To this purposes, authorization rules can be augmented with special predicates allowing one to specify conditions on the

6

E. Ferrari et al.

attributes’ values. (For instance, the security policy stating that a user u can access only the information on employees whose salary is less than 30K can be specified using a rule of the form: hu, employees, salary < 30ki, provided that the database contains a class employee with an attribute salary). By contrast, due to the nature of DL documents, content-dependent access control in a DL environment deals with the concepts a document is related to, rather than with attribute values. Access control enforcement thus becomes more difficult. First, it is necessary to have some mechanism in place that extracts concepts from documents. Then, it must be taken into account the relations that may exist among concepts in a given domain (for instance, a concept may subsume other concepts and so on). These aspects must be considered when enforcing access control. In this paper, we provide a complete architecture for dealing with content-based access control to DL documents which addresses all the abovementioned aspects.

Finally, note that the concept of credential has same similarity with that of role which is the basis for a class of models known as Role-based Access Control (RBAC) models [9,12]. Roles can be seen as a set of actions or responsibilities associated with a particular working activity. Under rolebased models, all authorizations needed to perform a certain activity are granted to the role associated with that activity, rather than being granted directly to users. Users are then made members of roles, thereby acquiring the roles’ authorizations. User access to data is mediated by roles; each user is authorized to play certain roles and, on the basis of the role, he/she

An Authorization System for Digital Libraries

7

can perform accesses on the data. Whenever a user needs to perform a certain activity, the user only needs to be granted the authorization of playing the proper role, rather than being directly assigned the required authorizations. A basic distinction between roles and credentials is that credentials are characterized by a set of attributes, and this allows us to grant access authorizations only to users whose credentials satisfy certain conditions (for instance, access to a document can be granted to all the users with a given age or with a given nationality). This can of course be done also through roles but it requires the creation of a distinct role for each condition we would like to enforce (for instance, enforcing the access control policy of the previous example requires the creation of two distinct roles, one corresponding to the users with the specified age, and the other corresponding to the users with the specified nationality). This makes the specification and management of authorizations very difficult, given also the large variety of users that typically access a DL.

2 Digital Library Authorization Model (DLAM)

In this section we briefly review the authorization model DLAM, proposed in [1]. We first characterize how DL objects are represented in DLAM. Then, we introduce the concept of credential, and the access privileges supported by DLAM. Finally, we show how all these components are used in the specification of access authorizations.

8

E. Ferrari et al.

DL Objects. Each DL document, referred to as digital library object (dlo), typically contains unstructured information of different media types (e.g., text, images, videos). Each dlo is associated with a unique object identifier (id), which is assigned by the system upon the object creation and remains unchanged for the whole life of the object. A dlo is a 4-tuple (i, slots, links, concepts), where i is the id of the dlo; slots is a set of slot names identifying relevant portions within the dlo, links is a set of link identifiers, and concepts is the set of relevant concepts in the dlo. The Object Base, denoted by OB, is the set of all dlos. A concept in a dlo means the abstractions or notions one expects to find in that object. In other words, a concept is a means for concisely describing the content of a dlo. However, concepts are not just keywords. For example, a combination of words in a document may elicit a sorrowful reaction from a reader although they do not contain any keywords related to “sadness.” Often, concepts in a given domain (denoted as CP) are interrelated, and therefore can be organized into a conceptual hierarchy, which is a partial order ≺CP . Given two concepts cp1 , cp2 ∈ CP, cp1 is a more specific concept than cp2 if and only if cp1 ≺CP cp2 . The ability to extract concepts and construct a conceptual hierarchy enables one to authorize a subject to access a certain dlo only if the object does not deal with a particular concept or a particular combination of concepts for which the subject does not possess the appropriate authorizations. In our DLAS, we use a document management mechanism that has been

An Authorization System for Digital Libraries

9

developed and implemented [6], currently in use by GLIN, which is capable of extracting concepts from dlos and building a conceptual hierarchy. We have augmented this system with content-based access control. Access authorizations can be given either on specific objects (by listing their ids) or on all the objects containing a particular concept (or a particular combination of concepts). Moreover, a finer-grained access control can be enforced by authorizing subjects to access only specific slots and/or links within an object.

Credentials. To allow for the specification of authorizations based not only on the subject identity but also on the subject characteristics, each subject is associated with one or more credential. A credential is a set of subject attributes that are needed for security purposes. Credentials are assigned when a subject is created and are updated automatically according to the subject’s profile. To make the task of credential specifications easier, credentials with similar structures are grouped into credential types. Credential types are organized into a hierarchy, referred to as credential type hierarchy. A credential type is a pair (ct id, attr), where ct id is the credential type identifier; and attr is a triple (a name,a dom,a type), denoting attribute name, attribute domain, and attribute type (either {opt,mand}), respectively. If a type = opt, then the attribute can assume a null value; by contrast, a type = mand indicates that a value for the attribute should always be supplied.

10

E. Ferrari et al.

A credential c, an instance of a credential type ct, is a 4-tuple (c id, subject id, state, ct id), where c id is the credential identifier; subject id is the identifier of the subject with whom the credential is associated; state = (a1 : v1 , . . . , an : vn ), where a1 , . . . , an are the names of the attributes of c, v1 , . . . , vn are their values, and ct id is the identifier of the credential type of which c is an instance. The set of attributes of a credential c instance of a credential type ct is the union of the set of attributes of ct and of all the attributes of the credential types which precede ct in the credential type hierarchy. Example 1 The following are examples of credentials: (c1 ,Helen,(age:null,address:‘New Street’, salary:2,000,nationality: ‘US’,national origin:‘Italy’), employee); (c2 ,Ann,(age:29,address:‘Carnaby Street’, salary:null,project:‘P-125’, nationality: ‘US’,national origin:‘US’),legal research analyst). △

In DLAM, authorizations can be given explicitly to subjects, by specifying their identifiers, or implicitly by imposing a set of conditions that the subject credentials must satisfy. These conditions are specified as a credential expression expressed by means of a formal language. For instance, the credential expression employee(X) ∧ X.nationality = ‘US’ denotes all the employees that are American citizens. Privileges. DLAM supports browsing and authoring privileges with various subtypes within each privilege type. These privilege types subsume the

An Authorization System for Digital Libraries

11

conventional privileges such as read and write. Browsing privileges allow subjects to see the information in an object (or in some of its slots) and/or to see the existence of a link in an object. Three different types of browsing privileges are supported: view, link and view-all. The view privilege allows a subject to see the information in an object or in some of its slots. The link privilege authorizes a subject to see the existence of a specific link. Finally, the view-all privilege subsumes both the link and the view privilege, since it allows subjects to see the information in an object and all the links it contains. Thus, the view-all privilege on an object is equivalent to the view privilege on the object and the link privilege on all the links contained in the object. Note that DLAM distinguishes between link and view privileges because such difference makes it possible to grant subjects access to the information in an object (or in some of its slots) without disclosing the relationships of this object to other objects. Authoring privileges allow subjects to modify the content of an object or create a link. DLAM distinguishes among the following authoring privileges: refer, to include a link in an object, append, to write information in an object (or in some of its slots) without deleting any pre-existing information,1 update, to modify the content of an object (or of some of its slots) and to include links in the object. Thus, the update privilege subsumes both the refer and the append privilege. Note that a subject holding the refer or

1

The append privilege can be used to authorize subjects to annotate a given

object, without modifying its content.

12

E. Ferrari et al.

the update privilege on an object can specify links from the object to any other object. No authorization is needed on an object to specify links with that object as destination of the link. Access Authorizations. In DLAM both positive and negative authorizations can be specified. Positive authorizations denote access permissions, whereas negative authorizations denote access denials. Authorizations apply to concepts, objects, slots within objects, as well as to links. An object authorization is a 4-tuple (crd-spec,ent-spec,priv,sign), where crd-spec is a credential specification denoting the subjects to whom the authorization is granted, ent-spec is an entity specification denoting the contents, objects, and/or the slots to which the authorization refers, priv is the privilege for which the authorization is granted, and sign ∈ {+, −} indicates whether the authorization is positive (+) or negative (−). A credential specification is either a set of subject identifiers or a credential expression. An entity specification, co-spec.slot-spec denotes the slots whose names are in slot-spec of the objects denoted by co-spec. co-spec is either a set of dlo ids or a conceptual expression, that is, a boolean expression of concepts in CP. If slot-spec = ∅, the specification denotes all objects whose identifiers are denoted by co-spec, whereas if co-spec = ∅, it denotes all slots whose names are in slot-spec, regardless of the objects in which they appear. Example 2 Authorization (employee(X),World Law Bulletin.Blue page report, view-all,-) prevents all the employees from seeing the informa-

An Authorization System for Digital Libraries

13

tion in the Blue page report of the World Law Bulletin. Authorization (NML employee(X), Imports Tax ∨ Tax Incentive, view,+) authorizes NML employees to see the information in all the objects containing the concept Imports Tax or the concept Tax Incentive, but not the links contained in such objects.



DLAM also allows the specification of authorizations on links. A link authorization is 4-tuple (crd-spec,link id,link,sign), where crd-spec is a credential specification, link id is the identifier of the link to which the authorization refers, and sign ∈ {+, −} indicates whether the authorization is positive (+) or negative (−). In the following, the term Authorization Base is used to denote a set of object and link authorizations. Moreover, given an object authorization A = (crd-spec,ent-spec,priv,sign) we denote with crd-spec(A), ent-spec(A), priv(A), and sign(A) the credential specification, the entity specification, the privilege, and the sign of A, respectively. Moreover, ent-spec(A).co-spec denotes the conceptual expression in ent-spec(A), whereas ent-spec(A).slot-spec denotes the slot names in ent-spec(A). Finally, if A is a link authorization link id(A) denotes the link identifier in A. Thus, a single authorization may be used to authorize a set of subjects to exercise the same privilege on the same objects, slots or links. These subjects can be explicitly specified, by listing their identifiers, or implicitly denoted by means of a credential expression. In order to enforce access control, it is necessary to identify, given an authorization, the set of subjects

14

E. Ferrari et al.

to which it applies. A critical issue is represented by the possible presence of null values in subject credentials. Indeed, suppose that the credential expression legal research analyst(X) ∧ X.age > 18, appears in an authorization A. Consider a subject u, with an associated credential of type legal research analyst, such that no value is specified for attribute age in his/her credential. The problem arises in determining whether authorization A applies to subject s, since we have no information on his/her age. In our model we take the most conservative approach: if A is positive, then A does not apply to subject s; by contrast if A is negative then it holds for subject s also. DLAM uses two functions to compute the set of subjects to which an authorization applies, namely Denotes and Undef that, given a credential expression ce, return respectively the set of subjects identified by ce and the set of subjects that do not satisfy ce because of null values in their credentials.2 Such functions are used by our access control mechanism, described in subsection 4.2, to verify whether an authorization applies to a given subject. Authorizations propagate along both the conceptual and the credential type hierarchy according to a set of propagation rules. Propagation rules are based on the principle that authorizations on a concept and/or credential type propagate to all the more specific concepts and credential types. Thus, an authorization given on the objects containing a given concept c implies an analogous authorization on all objects containing a concept more 2

Undef and Denotes are formally defined in [1].

An Authorization System for Digital Libraries

15

specific than c. Similarly, an authorization granted to subjects with a given credential implies an analogous authorization for all the subjects with a more specific credential with respect to the credential type hierarchy. Note that authorization propagation is very useful since it is a means to concisely express a set of related authorizations. Moreover, exceptions exceptions to such propagation policy can be always specified, since our model supports both positive and negative authorizations. However, negative authorizations introduce the possibility of conflicts in that a subject may simultaneously have both a positive and a negative authorization for the same privilege on the same object. To deal with such conflicts we have defined a conflict resolution policy [1], based on the concept of strongest authorization. Intuitively, an authorization is the strongest one with respect to a given subject s, object o and access mode m if whenever s requires to exercise m on o, the authorization prevails over all the other conflicting authorizations. Given a set of conflicting authorizations, the strongest one is determined by analyzing the credential and entity specifications of each authorization belonging to the set. For instance, authorizations explicitly given to subjects (that is, authorizations in which the subject identifiers explicitly appear) have higher priority than authorizations containing credential expressions, and authorizations specified on a particular object (that is, authorizations in which the object id appears) have higher priority than authorizations containing conceptual expressions. When conflicts are not solved by the conceptual and credential type hierarchy, negative authorizations are considered as prevail-

16

E. Ferrari et al.

ing. When an access request is submitted to the system, our access control mechanism first determines the set of authorizations which apply to the subject requesting the access. From this set, it selects the strongest authorizations and, on the basis of their sign, it decides whether the access can be (totally or partially) authorized or should be denied. More details can be found in [1]. (u,dlo) DLAM Implementation

Novel dlo Dlos Document Classification

Access Control Module

Module

Vu(dlo)

Dlos Concepts

Conceptual Hierarchy

Authorization Base

User Credentials

Credential Type Hierarchy

Fig. 1 DLAS architecture

3 DLAS Architecture

DLAM relies on the use of conceptual hierarchies for supporting contentbased access control for dlos. For this reason, the implementation of DLAM has been integrated with a mechanism that extracts concepts from objects, builds a hierarchy of concepts and then classifies new documents into one or more concept-classes. The overall architecture of the system, that we call DLAS – Digital Library Authorization System, is depicted in Figure 1.

An Authorization System for Digital Libraries

17

DLAS consists of two main modules: The Access Control Module, that is, the implementation of DLAM, and the Document Classification Module which enforces concept extraction from dlos and classifies them based on the extracted concepts. The mechanism we use for concept extraction is based on the work reported in [6], where a methodology to model a domain and then apply information extraction tools for both classifying documents and building a conceptual index of the documents has been developed. Documents are classified based on a conceptual hierarchy covering concepts within a given domain. Each node in the hierarchy represents a class and contains a set of concepts common to all documents within that class. In general, nodes towards the top of the hierarchy contain more general concepts that are common to most or all of the documents in the domain, whereas nodes towards the bottom of the hierarchy contain more specific concepts unique to those classes. An information extraction system is first created for each class (node) in the hierarchy and trained to recognize concepts present in the given class. This system uses a parser (MARMOT [7]) and a sentence analyzer (BADGER [8]). During training, the information extraction system builds a concept dictionary of syntactic and semantic constraints that indicate when a concept is present in the text. A novel document is classified by passing it down the hierarchy. At each node, the appropriate extractor is executed on the document and a list of instantiated concept definitions is formed. When a weighted number of concept definitions have been instantiated, the document is assigned membership to a class in the hierarchy.

18

E. Ferrari et al.

Each time a new document is acquired by the DL, it is processed by the Document Classification Module, which extracts the relevant concepts from this object. Information on the relevant concepts are then stored and used by the Access Control Module to perform content-based access control on dlos. When an access request is submitted, the Access Control Module verifies whether the access request can be (totally or partially) authorized or should be rejected. If an access request has to be partially authorized the system returns to the subject submitting the request a view of the requested document containing all and only those portions of the document for which the subject has an applicable positive authorization.3 Access control enforcement takes into account the credentials of the subject that submits the request and their relative positions into the credential type hierarchy, the authorizations stored into the Authorization Base, and the concepts associated with the requested document. Based on this information the correct view of the document is returned to the subject who submits the request. In the following section, we give illustrate the implementation of DLAM.

4 Implementation of DLAM

Our Access Control Module has been developed on top of the Oracle 8.03 DBMS, using Delphi 3 client/server. The module provides a set of graphical 3

By applicable positive authorization, we mean a positive authorization which

is not overwritten by a negative authorization, according to the conflict resolution policy.

An Authorization System for Digital Libraries

19

tools by which the Security Officer (SO) can easily perform access control and administrative operations and monitor the state of the Authorization Base. In the following, we first give an overview of the features supported by the Access Control Module, then we discuss the strategies we have adopted to efficiently enforce access control. Finally, we give examples of the graphical environment provided by the module.

Type Authorization Management

Subject Management

Object Management

Credential Management Access Control Management

Operations Add/remove authorizations View: all authorizations authorizations on a given object authorizations of a given subject the set of subjects/objects to which an authorization applies Add/remove subjects View: all the subjects the subjects associated with a given credential the credentials associated with a given subject the objects and links which can be accessed by a given subject Insert/delete an object Add/remove a slot/link from an object View: all the objects the objects containing a given concept the links/slots/concepts of a given object the subjects that can totally or partially access a given object Add/remove a credential type from the hierarchy Submit an access request View the result of an access request

Table 1 Access Control Module operations

20

E. Ferrari et al.

4.1 Access Control Module Features

Operations supported by our Access Control Module can be divided into the following categories:

– Authorization management: this category includes operations for granting and revoking access authorizations and for analyzing the authorizations stored into the Authorization Base. Authorizations can be viewed according to three different criteria: i) by listing all the authorizations; ii) by listing the authorizations of a specific subject; and iii) by listing the authorizations granted on a particular object. Once the authorizations have been displayed according to one of the above criteria, then it is possible to view the subjects and/or objects to which the authorizations apply. Such feature is particularly useful when the subjects (resp. objects) to which an authorization applies are implicitly denoted by means of a credential (resp. conceptual) expression. – Subject management: this category includes operations for inserting a new subject and filling in the corresponding credentials, for deleting an existing subject and for viewing the subjects authorized to access the system and their associated credentials. Viewing options include: view all the subjects authorized to access the system and their associated credentials; view the subjects associated with a given credential; view the credentials of a subject with a given identifier. Once a subject has been selected, it is possible to list the objects and links he/she can access.

An Authorization System for Digital Libraries

21

– Object management: operations in this category include operations to insert and remove a dlo, to structure a dlo into slots, to add or remove links from a dlo and to acquire information on the dlos stored into the DL. This latter function allows the SO to view the structure of a particular object (that is, its slots and/or links), the concepts associated with the object and the subjects that can access the objects (or some of its slots and/or links). – Credential management: these are operations to modify the credential hierarchy by adding or removing a credential type. – Access control management: to enforce access control. The above operations are summarized in Table 1.

4.2 Access control strategy Our strategy to enforce access control is based on a precomputation of the set of subjects and objects to which an authorization applies. We have adopted this strategy because checking whether an access request can be authorized may require the evaluation of several credential and entity specifications, that may be expensive at run-time. To reduce this cost, whenever an authorization is inserted into the Authorization Base, we calculate and store the set of objects and subjects to which it applies. Thus, access control becomes very efficient, since there is no difference in costs between contentdependent and content-independent authorizations and between authorizations with implicitly and explicitly denoted subjects. An algorithm enforcing

22

E. Ferrari et al.

Algorithm 41 Precomputation Algorithm INPUT: 1) An authorization A OUTPUT: OA : the set of dlo ids to which authorization A applies SA : the set of subject ids to which authorization A applies METHOD: 1. #OA computation # If A is an object authorization: If ent-spec(A).co-spec is a set of dlo ids: OA = ent-spec(A).co-spec If ent-spec(A).co-spec is a conceptual expression: For each dlo = (i, slots, links, concepts) ∈ OB: Let C(dlo) = concepts ∪ {cp | cp ∈ CP such that ∃cp′ ∈ concepts with cp′ ≺CP cp} If the concepts in C(dlo) satisfy ent-spec(A).co-spec: Add i to OA endfor endif If ent-spec(A).co-spec = ∅: OA = {i′ | (i, slots, links, concepts) ∈ OB, i = i′ and slots∩ent-spec(A).slot-spec 6= ∅} endif endif If A is a link authorization: OA = {i′ | (i, slots, links, concepts) ∈ OB, i = i′ and link id(A) ∈ links} 2. #SA computation # If crd-spec(A) is a set of subject ids: SA = crd-spec(A) If crd-spec(A) is a credential expression: If sign(A) = +: SA = Denotes(crd-spec(A)) else SA = Denotes(crd-spec(A)) ∪ Undef(crd-spec(A)) endif Fig. 2 An algorithm for computing the set of subjects/objects to which an authorization applies

such precomputation is shown in Figure 2. The algorithm is executed upon each authorization insertion and returns the set of subjects and objects to which the authorization applies. In the algorithm, we make use of functions Denotes and Undef introduced in Section 2. Algorithm 41 consists of two main steps, which respectively compute the set of objects and the set of subjects to which the authorization applies. Step 1 computes the set of dlos

An Authorization System for Digital Libraries

23

to which the new authorization applies. This set depends on the kind of the authorization (that is, whether the authorization is an object or a link authorization) and on its entity specification. Let us first consider the case of an object authorization. If the new authorization explicitly denotes a set of objects (by including their ids), then the set of objects denoted by the authorization is equal to the set of objects corresponding to such ids. If the authorization implicitly denotes a set of objects, by means of a conceptual expression, then it must be verified which of the objects in the Authorization Base satisfy this expression. To do this, for each dlo in the Authorization Base, the algorithm computes C(dlo), that is, the set of concepts characterizing dlo. This set is the union of the concepts associated with dlo and all the concepts more general than them. Finally, if the authorization denotes a set of slot names, then all the objects having one of these slots are included in the set of objects denoted by the authorization. If the authorization is a link authorization, then the object denoted by the authorization is the one which contains the link specified in the authorization.

Step 2 computes the set of subjects to which the new authorization applies. Also in this case, two different strategies are adopted depending on the credential specification of the new authorization. If the credential specification is a set of subject ids, then the set of subjects to which the authorization applies is equal to the set of subjects corresponding to such ids. By contrast, if the credential specification is a credential expression, then the set of subjects denoted by the authorization depends on its sign. If the

24

E. Ferrari et al.

authorization is positive, then the set of subjects denoted by the authorization is obtained by applying function Denotes to the credential expression. By contrast, if the authorization is negative this set must be extended by adding to it the set of subjects which do not satisfy the credential expression because of null values in their credential specification (this set is computed by function Undef). Note that, due to the precomputation strategy, administrative operations, such as the creation, deletion or modification of an object or the creation/deletion or modification of a subject, become more expensive. However, they are considerably less frequent than access requests. In the following we briefly sketch the strategies we have adopted for the management of each of the abovementioned administrative operations. When an object is acquired by DLAS, it is first processed by the Document Classification Module (see Figure 1) which extracts all its relevant concepts. At this point, all authorizations that contain a concept relevant for the new object (these concepts may be either the concepts extracted from the new object by the Document Classification Module or concepts more general than them) must be considered to check whether they apply to the new object. By contrast, object deletion only requires the removal of the deleted object from all the sets of objects denoted by the authorizations in the Authorization Base. The modification of an object can be handled using the same strategy employed for object acquisition. Moreover, it is important to note that operations such as the modification of an object content are not

An Authorization System for Digital Libraries

25

very frequent and, in general, they do not impact the concepts associated with an object and thus the conceptual expressions an object satisfies. The reason is that in our approach concepts are associated with an object on the basis of the analysis of the object semantics rather than simply on keywords or terms. Therefore, modifications that do not change the object semantics are not likely to impact the concepts associated with the object. As far as operations on subjects are concerned, the insertion of a new subject may require the update of some of the sets of subjects to which the authorizations in the Authorization Base apply, if any authorization exists in the Authorization Base which applies to the newly created subject. Thus, when a new subject is inserted, the system must select from the Authorization Base the authorizations having a credential expression in their credential specification, and it must verifies which of these authorizations apply to the newly inserted subject, on the basis of the subject credentials (such check is performed by applying function Denotes and/or Undef to the credential expression in the authorization). The modification of a subject credential can be handled using a similar strategy, whereas subject deletion can be managed similarly to object deletion.

4.3 Administrative tools

In the following we give examples of the graphical environment provided by the Access Control Module to perform administrative and access con-

26

E. Ferrari et al.

Fig. 3 Authorization specification

trol operations. We refer the interested reader to [4] for a more detailed description. New authorizations can be entered into the Authorization Base using the form of Figure 3. In this form the SO enters general information on the new authorization, such as the sign and the privilege of the authorization, the type of its credential specification (that is, whether the credential specification is a set of subject identifiers or an expression on subject credentials) and the type of the entity specification. After this preliminary information is entered, the SO must specify both the credential and the entity specification of the new authorization. These operations are performed using a graphical interface that dynamically changes according to the type of cre-

An Authorization System for Digital Libraries

27

dential and entity specification that must be entered. For instance, if the entity specification is a set of object and/or slot identifiers then the list of all the dlo and slot ids is displayed (see Figure 4). The SO can then select the objects/slots he/she wishes to include in the authorization and see their content. By contrast, if the entity specification contains a conceptual expression, the Access Control Module provides the SO with a visual language to build such expression (see Figure 5). Similar forms are provided to enter the credential specification.

Fig. 4 Objects and slots

Other visual environments are provided to update the credential type hierarchy. The insertion of a new credential type is performed using the form shown in Figure 6. The SO selects the position in the credential type

28

E. Ferrari et al.

Fig. 5 Specification of conceptual expressions

hierarchy where he/she wishes to insert the new credential type, by clicking on the credential type hierarchy shown in the upper left corner of the window. Then, he/she enters the name of the new credential type, and information on the credential type attributes (that is, the name and the type of the attributes and whether the attribute is optional or mandatory). When a credential type is deleted, the problem arises of managing the subjects that are associated with the deleted credential. The Access Control Module supports two different options: the recursive deletion of all the subjects associated with the deleted credential and the modification of the subject credential. If the latter option is chosen, then the SO must decide, for each subject associated with the deleted credential, which is the new credential to which he/she must be associated (by selecting a credential type

An Authorization System for Digital Libraries

29

Fig. 6 Credential type insertion

from the credential type hierarchy). This change may require the insertion of new attribute values since the attributes of the new credential differs from the attributes of the old one. Finally, access requests are submitted using the form shown in Figure 7. The subject enters his/her id and the id of the dlo to which he/she requires the access. When the Submit button is pressed, the access control algorithm is executed which verifies, according to the subjects and objects denoted by the authorizations in the Authorization Base, whether the subject has the right to access the requested object. If the subject does not have this right, then an alert message is returned to the subject, otherwise the object (or some of its slots and/or links) is displayed in the bottom part of the window.

30

E. Ferrari et al.

Fig. 7 Access control enforcement

5 Conclusions

In this paper, we have presented a Digital Library Authorization System supporting content-based accesses to DL documents. The system also supports flexible specification of authorizations based on the qualifications and characteristics of subjects and varying granularity of authorization objects ranging from sets of library objects to specific portions of objects. We plan to extend this work along several directions. A first direction deals with content-based access control for images and videos. One of the problems is related to the difficulty of automatic recognition of image and video contents. We plan to extend our model based on the state-of-the-art in multimedia. The second direction is related to extending our system in order to deal with issues such as copyright, privacy and integrity that are

An Authorization System for Digital Libraries

31

of critical importance to the DL environment. Further work includes an extensive investigation of the performance of our access control strategy.

References 1. N. Adam, V. Atluri, E. Bertino, and E. Ferrari. A Content-based Authorization Model for Digital Libraries. TR 98-104, CIMIC and MSIS Department, Rutgers University, 1998 (http://cimic.rutgers.edu/˜ atluri/tr98-104.ps). To appear in IEEE Transactions on Knowledge and Data Engineering. 2. N. Adam et al. Strategic Directions in Electronic Commerce and Digital Libraries: Towards a Digital Agora. ACM Computing Surveys, 28(4), December 1996. 3. E. Bertino and H. Weigand. An approach to Authorization Modeling in ObjectOriented Database Systems. Data and Knowledge Engineering, 12(1), 1994. 4. E. Ferrari, N. Adam, V. Atluri, E. Bertino, and U. Capuozzo. Implementation of DLAM: A Content-based Authorization Module for Digital Libraries. Technical report, Department of Computer Science, University of Milano, August 1998. (http://www.dsi.unimi.it/rapporti.php) 5. E. Gudes, H. Song and E.B. Fernandez. Evaluation of Negative, Predicate, and Instance-based Authorization in Object-oriented Databases. In Database Security, IV: Status and Prospects, Elsevier publisher 1991. 6. R. Holowczak. Extractors for Digital Library Objects. PhD thesis, Rutgers University, Department of MS/CIS, 1997. 7. Natural Language Processing Laboratory. MARMOT subjects Guide. Center for Intelligent Information Retrieval, University of Massachusetts, Amherst, 1996.

32

E. Ferrari et al.

8. Natural Language Processing Laboratory. Task Domain Specification and User Guide for BADGER and CRYSTAL. Center for Intelligent Information Retrieval, University of Massachusetts, Amherst, 1996. 9. Proceedings of the Fifht ACM Workshop on Role-Based Access Control, Berlin, Germany, 2000. 10. F. Rabitti, E. Bertino, W. Kim, and D. Woelk. A Model of Authorization for Next-generation Database Systems.

ACM Trans. on Database Systems,

16(1):88–131, March 1991. 11. P. Samarati, E. Bertino, and S. Jajodia. An Authorization Model for a Distributed Hypertext System. IEEE Transactions on Knowledge and Data Engineering, 8(4):555–562, 1996. 12. R. Sandhu et al. Role-based Access Control Models. IEEE Computer, pages 38-47, 1996. 13. K.E. Seamons, W. Winsborough, and M. Winslett. Internet Credential Acceptance Policies . In Proc. Workshop on Logic Programming for Internet Applications, Leuven, Belgium, July 1997. 14. M. Winslett, N. Ching, V. Jones, and I. Slepchin. Using Digital Credentials on the World-Wide Web. Journal of Computer Security, 5, 1997.