Digital Identity Matters - CiteSeerX

8 downloads 2355 Views 245KB Size Report
appending their signatures, which can themselves be witnessed as authentic. .... Leibniz's law works well for specifying the identity relation in which two objects ...
Digital Identity Matters Arthur Allison, James Currall, Michael Moss & Susan Stuart University of Glasgow, UK August 2003 Abstract Digital objects or entities present us with particular problems of an acute nature. The most acute of these are the issues surrounding what constitutes identity within the digital world and between digital entities. These are problems that are important in many contexts but, when dealing with digital texts, documents and certification, an understanding of them becomes vital legally, philosophically and historically. Legally the central issues are those of authorship, authenticity and ownership; philosophically we must be concerned with the sorts of logical relations that hold between objects and in determining the ontological nature of the object; and historically our concern centres around our interest in chronology and the recording of progress, adaptation, change and provenance. Our purpose in this current paper is to emphasise why questions of digital identity matter and how we might address and respond to some of them. We will begin by examining the lines along which we draw a distinction between the digital and the physical context and how, by importing notions of transitivity and symmetry from the domain of mathematical logic, we might attempt to provide, at least interim, resolutions of these questions.

Introduction In this paper we will confront an issue which is of immense significance to an immense range of people, from archivists and librarians, to philosophers, historians and lawyers. The issue in question is how we should present, and attempt to resolve, the difficulties surrounding the identity of digital objects or entities; more explicitly, how we would identify a digital object as being the object it purports to be and the criteria we would need to establish if we are to re-identify it over a period of time. Broadly expressed, our claim is that the normal identity criteria for identity between physical objects with, possibly, physical properties, fail in the digital context where we are dealing with objects with a dubious ontology. We will demonstrate this failure, clarify why it is a serious problem, provide a formulation for one way of making sense of it and show what can be gained by this formulation.

What matters? To disclose the depth of the problem we must first lay out the much more familiar territory of physical documents and texts that are used to provide proof of transactions from the trivial purchase of a train ticket to serious matters of the declaration of war. Such documents are produced to certify that transactions have taken place; that I have, for example, purchased a house or piece of land and that, beyond all reasonable doubt, it now belongs to me. It is usual that certificates of this kind are drawn up in such a way as to ensure that there can be no doubt as to their authenticity and meaning. A train ticket will include a date to show when it was purchased and a number indicating where and from which dispensing machine it was purchased. It will be printed on paper or card of a specific size and shape, which is common to the train operator. It will state the extent of the journey and the conditions governing the journey. Taken together all these features allow railway officials to tell, often at a glance, that the ticket is genuine. Documents which support more ’significant’ transactions, such as the purchase of land or the birth of a child, follow similar but often more rigorous procedures to ensure their evidential value but, like the train ticket they have a form which can be recognised without necessary reference to any form of deep content. A legal document will be drawn up using a specific form of words common to the relevant jurisdiction and be certified as authentic usually by witnesses appending their signatures, which can themselves be witnessed as authentic. To be valid and to guard against accidental loss and forgery, many documents are engrossed in registers held by the appropriate authorities with holographic copies held by the other party to a transaction. The process of creating, maintaining and caring for registers has grown up over a long period of time and is designed both to obviate the risk of fraud and to safeguard the guardians themselves from accusations

of fraud. Entries and pages in registers are usually numbered so that removal can be easily detected. Alterations are often witnessed by initials and entries can be signed off. It is commonly accepted that the ’original’ document is the version which is witnessed or engrossed, certainly not drafts or copies. Copies can be accepted as original, providing legally binding processes for creating holographs are in place. Registered documents can be locked up in a secure place where they will remain until opened. Even if the mice have been busy or all the parties to the documents are dead, it is usually possible from their form to be certain of their authenticity to a reasonable and legally acceptable degree. For many sensitive documents creation and custodianship are separated for exactly the same reasons that front and back office operations are segregated in commercial transactions. The function of the custos is to take care of the document as a trustee for the public good, guaranteeing that it cannot be tampered with and only permitting authorised, supervised access, if at all. It is the duty of the custos to produce the document as evidence if required and to object to its subsequent alteration even by the creator without due process. The custos usually has the expertise to recognise the authenticity of a document from its tokens, the material on which it is held, its form, the character of the printing or writing and so on, or if there is doubt, from whom to seek expert opinion. Even today where custodianship has been delegated by juridical officials to archivists who insist on particular environmental conditions for storage, the activity of securing the document has changed little since Roman times; they are still physically locked away and access controlled by agreed procedures. There is the abiding issue of who is to be the custodian of the custodian, as Juvenal in Satire VI 346-8 so tellingly pointed out nearly two thousand years ago when he wrote: “I hear all this time the advice of my old friends: ‘Put a lock and keep your wife indoors.’ Yes, but who will guard the guards themselves? ”.1 There may not be cunning wives or perhaps husbands trying to seduce archivists to escape, but there are plenty of cunning people keen to seduce them to get in so as to alter the record. This problem has been addressed by putting in place procedures within the archive, which are independent of the documents or registers, that make interference easy to detect when combined with the tokens for authentication employed by the creators. The material alone can often confirm substitution and breaks in cataloguing sequences removal. It is axiomatic that the custodes cannot on their own authority destroy or cancel documents and registers, but they are required to do so and to certify that it has been done. In other words there has been no change in fiduciary responsibility or in the way it is discharged. There have been two notable changes in recent years; the first is the sheer volume of documents which now need to be stored in this way, reflecting the growth in individual rights and expectations; and the second is the manner in which the information, the content of the document is stored. All financial institutions are now expected to retain ‘know you customer’ (KYC) forms which record the process of any sale of a product. Failure to do so and to follow recognised procedures may result in heavy penalties. Many organisations out of convenience have opted for electronic solutions and moreover the courts have taken the logical view that a document created electronically is the ’original’ and not the printed or holographic representation. This has happened without considering the implications for established modes of custodianship. It is impossible to apply the concept of ‘locking up’ with a key to the electronic environment and the custos should be reluctant to accept the fiduciary responsibility of guaranteeing that the document cannot be tampered with. The use of electronic devices raises serious questions not just about guardianship but also about form. For example the contents of a KYC form will probably be held in a series of tables in a database and only displayed as a form for convenience. The content is only held in the form while it is being completed or viewed, but it can also be viewed in other ways which can include other information about the individual, for example, such as their credit rating, which is not on the form but is held in the database. Moreover the content itself is not held within the database as a set of related characters but in binary code and, what is more, the way the KYC form appears on screen is software and hardware dependent, and this can have a significant effect, not only on the way that it appears but also, on its content. These problems are serious, with legal implications that we cannot continue to overlook. Without an understanding of what might constitute the identity conditions of digital entities the whole concept of progress in the information age is based on not only shifting, but sinking sand.

1

“[A]udio quid ueteres olim moneatis amici, ‘pone seram, cohibe’. sed quis custodiet ipsos custodes? cauta est et ab illis incipit uxor.”

Identity Matters The logic of identity Establishing the identity of physical objects in the physical world has by no means proved to be a trivial task but, by and large, we accept as true the logical claim that if two things, x and y, have the same properties then they are the same object. This principle, often referred to as Leibniz’s Law, or the Identity of Indiscernibles, is usually expressed as follows: if, for every property F, object x has F if and only if object y has F, then x is identical to y, or (∀F )( Fx ↔ Fy ) → x = y . The converse of the principle, the Indiscernibility of Identicals, is formulated in a similar way: x = y → (∀F )( Fx ↔ Fy ) .2 By having the same properties we mean that they are equivalent in the strong sense of everything that is true about x is true about y, and that x and y are intersubstitutable salve veritate. It follows from this that x and y could not be equivalent in all of their properties and yet stand in relation R to one another for, as Leibniz says, an increase in number – that there are two discernible entities – indicates a difference in properties – x and y must have at least one property different making it possible to distinguish the two things as distinct. Leibniz’s law works well for specifying the identity relation in which two objects stand in virtue of their respective properties, though, it should be noted that for Leibniz two things that are identical in every respect must in fact be one thing, and that one thing would, of course, be unable to stand in relation to itself. Although this is an interesting issue it cannot concern us directly in this paper. A further notion we must appeal to if we are to have a reasonable understanding of identity relations between objects is that of transitivity. Euclid explains this axiom in his Common Notions3 as Things which are equal to the same thing are also equal to one another. So, for example, if x=y and y=z, then it must be the case that x=z, where the equality relation between x, y and z is transitive. A particularly good, and very familiar, example of a transitive relation is ‘being an ancestor’. So, for example, if Edward is the ancestor of Fred and Fred is an ancestor of Jane, then Edward is an ancestor of Jane, though it should be noted that in this case no equality of Edward with Fred or Jane, Fred with Jane or Edward, or Jane with Fred or Edward is implied. ‘Being a parent’ is a good example of a non-transitive relation, since Edward’s being a parent of Fred’s and Fred’s being a parent of Jane’s does not mean that Edward is a parent of Jane’s. So, why are the notions of identity, equivalence and transitivity important in the digital context; well the same object may appear in different guises and the identity of the one object being represented in different ways may not be immediately obvious and; then again, different objects may appear in the same guise, they may be indiscernible by their representation alone thus implying their identity when an examination of the underlying properties of the objects would reveal that they are in fact distinct. Our particular concern in this paper is with digital entities and the bitstreams of which they are comprised, and how the identity conditions for one object with another fail whilst continuing to appear to be met. In mathematics functions map a domain to a range and we might say, in this instance, that we have a domain: bitstreams, which we will name D, that can be mapped onto a range: representations, which we will name R. The function, in this case, will be the action of the web browser, which we will name b. All of which can be stated as: b : D → R . We could now define equivalence as x ~ x ' if b(x) = b(x') , which would, of course, also entail that x ' ~ x , where x,x'∈D. This is a proper equivalence relation, which is to say that it is both symmetric and transitive. However, when bitstream x is acted upon to produce bx the representation we now have might be referred to as y and not x. We might express this as x→bx=y. Having gone through this transformation we now find that to regain x by b-1 would be impossible since b-1 does not exist; there is no way to go back to the previous state of x from where we are now. The relationship is asymmetrical and nontransitive and it would be clearly false to say of y that it is equivalent to x, for there are no conditions under which we could effect y in any way so as to produce x, but also, under our definition of equivalence, x and y would have to be members of the same domain and they are not.

2

The conjunction of the two, rather than the former principle by itself, is sometimes known as Leibniz’s Law. 3

An etext of Euclid’s Common Notions is available at http://aleph0.clarku.edu/~djoyce/java/elements/bookI/cn.html

Here we state only that there is no inverse of b, though if there were some function c which maps R to D such that c(b(x))~x , which does not imply that c(b(x)) = x . Leaving us with a very curious state of affairs indeed, especially because c is not an inverse of b since it cannot take us back to the bitstream x. Which is to say that, as long as the set of equivalent bitstreams has more that one member in it, the inverse function of b will not be unique, the best you can do is find a function c which maps R to D to give you something that is equivalent to the original x though not - except by an extraordinary chance identical to it. And, now by introducing the set (domain) of stylesheets (S), we can perhaps see the complexity of the situation a little more clearly. If b : D, S → R , then .b( x, s1 ) → r1 ∈ R , .b( x, s 2 ) → r2 ∈ R , and so on. Which, even if it did entail that x~x' is b(x, s1 ) = b( x' , s1 ) would not necessarily entail that x∼x' is b( x, s 2 ) = b ( x ' , s 2 ) .

Before we examine this issue more fully in the section, Digital Identity Matters, below, we will attempt to tease out the troublesome notions of originality, authenticity, identity and equivalence in terms of types and tokens. We intend to demonstrate that the notion of an ‘original’ or type of which its token is said to be a representation is curiously problematic and even more especially so within the digital environment. The language of types and tokens is common within the discourse of aesthetics and it is from this domain that we will begin our discussion. The type / token distinction has been utilised in order to distinguish art forms with ‘unique’ objects, for example, sculpture and painting, from those where there is no single unique object, for example, a play or a piece of music that will have a multiplicity of performances. When we speak of Raoul Dufy’s Trouville we refer to his 1907 composition using oil on canvas and not to any of the many reproductions of the work. It is unique, even if it does change over time due to deterioration of the paint or restoration, the type from which the tokens or reproductions are derived. But the situation is not always this straightforward. To begin with, types and tokens are slippery characters that are not forever fixed as either type or token; thus a token can become a type, and we might say of Nina Simone’s rendition of Feeling Good that it is the definitive version and that any other rendition is a token of that type. Secondly, we are not always dealing with physically unique objects, Schubert’s Unfinished Symphony is not a physical object, though we might want to argue that the original score is a physical manifestation of it; and finally, we can see that the usual mental / physical divisions that we use to categorise ‘objects’ are not always clearly appropriate when we talk of performances or a digital image of something, perhaps, Lara Croft, in virtual space. Hartley Slater touches on some aspects of these difficulties when he speaks of the differing temporality and spatiality of works of art, but he continues, falsely in our view, to maintain that all realisations are physical. We must first distinguish the artwork from its notation or “recipe,” and from its various physical realizations. Examples would be: some music, its score, and its performances; a drama, its script, and its performances; an etching, its plate, and its prints; and a photograph, its negative, and its positives. The notations here are “digital” in the first two cases, and “analogue” in the second two, since they involve discrete elements like notes and words in the one case, and continuous elements like lines and colour patches in the other. Realizations can also be divided into two broad types, as these same examples illustrate: there are those that arise in time (performance works) and those that arise in space (object works). Realizations are always physical entities. 4 We may accept that the realisation of something is a token and that it might be physical and, we might even accept that when we talk of the idea, of which the realisation is a token, that the idea is a type which might be considered to be mental but, when we talk of the bits of information that, when interpreted in a particular way, produce before us an image of Sulley from Monsters Inc., do we say that the bits are the type structured in such a way as to represent this particular image or token and, if we do, are we to conclude that the bits, and indeed the image, settle neatly into our mental : physical dichotomy. These are not questions that can be easily resolved and the legal implications of this irresolvability are only now beginning to be realised. In our quest for the ‘type’ we are looking for something that we might refer to as the original, an object or entity which is often highly prized and the subject of much legal wrangling. Works of art are not the only things over which questions of originality – and forgery – are posed. Documents of all sorts go through several drafts and sometimes an earlier draft may supersede a later one in terms of becoming

4

Hartley Slater at http://www.utm.edu/research/iep/a/aesthetics.htm.

the main text to which subsequent reference is made. Although all the drafts are considered to be ‘originals’, for legal purposes the ‘final’ version is considered to be the original embodying the intention of the author/authors and such intention may include instructions about representation as, for example, in George Herbert’s poem Wings or Stravinsky’s recording of his own works or, more mundanely, a title deed to a property which must conform to a certain standard to be recognisably original. Scholars are interested in the drafts as they reveal details of the decisions and choices involved in the process of arriving at the final product. With the advent of the computer the production of a text has changed since corrections can be made on line in a way that was only possible previously by marking a paper draft. It also becomes possible for other hands to contribute directly to the draft, and designers of word processing packages have wrestled with ways of tracking such changes and differences in authorship, but they are on the whole clumsy and unsatisfactory. The plotting of the development of a text is now much more haphazard than it once was and this will have tremendous consequences for future scholarship. Draft documents will be preserved but they will not represent the evidential milestones they once did. In the past archivists insisted that ‘archives’ consisted of original materials to distinguish themselves from libraries which held objects where there were multiple copies. Roneo machines and photocopies blurred this distinction but the character of the output from copiers usually made it possible to differentiate a copy from an original. This distinction is now lost as the computer has the ability to generate a very large number of copies - or, more accurately, renditions - and distribute them widely. Nevertheless if a text is to be published in a formal sense, rather than simply being distributed, it will need to be transformed. This usually begins with sub and copy editing, most of which is still done on paper renditions largely to preserve the audit trail, although the alterations are transferred to the bit stream provided to the publisher by the author. At the same time a designer will normally also be involved in choosing typefaces, layout, position of illustrations, and so on. The production team will clear rights and arrange for tables and other illustrative material to be created. There is an assumption that these processes have been rendered obsolete by the new technology but that is to confuse the content with the creation/delivery mechanisms. Why should it be that consistency, punctuation, grammar, intelligibility to an audience and so on (the stock-in-trade of editors) is less important on line than in a printed book. Moreover since the resource is made available on line the user simply views a rendition of the bit stream held on a server in a form either dictated by choice or happenstance. Even if the bit stream has been dignified with an ISBN or ISSN number, it does not exist like a printed book in multiple identical copies but only in multiple almost certainly non-identical renditions. Across the board the bitstream is the constant, although behaviours are not. In paper publishing the bitstream ceases to be operated on by a word-processing package and moves to a typesetting platform which will be used to create the film for the printer. Once the format has been established no further variation is possible; this is not the case with an on-line edition of a text and it is this lack of constancy, the difficulty in locating the original document, the lack of identity between rendered texts, and the nonequivalence of what the transformation throws up that we want to emphasise here. See Figure 1.

Digital Identity Matters In the physical world we can perceive a large proportion of the objects directly, although of course that perception is dependent on many factors, some of which are at the level of the individual doing the perception. In the digital world the problems of individual perception are overlaid by problems involving variation in the mechanisms that bring the object to a state at which individual perception starts. In order to examine the ramifications of identity in the digital world, we need to examine rather more closely what a digital object is. A fairly full exposition of this is to be found in Thibodeau (2002). Briefly, digital object is a bitstream; a series of zeros and ones which, when taken together, encode information in a particular format. Unlike objects in the physical world, digital objects cannot be perceived directly by human observers. As we write this paper in our word processing program, we see the combinations of letters that form words only as a result of a whole set of interacting pieces of hardware - keyboard, processor, screen, and so on - and software, for example, the operating system, word processing software, keyboard and screen drivers. These components interact via a set of interacting stages and processes which are neither obvious to the user nor invariant from computer setup to computer setup. The content of the paper, as perceived by us in the role of document creator, passes through these stages and processes to produce representations on our screen and in our disk storage. These are summarised in figure 1 and exert a variety of effects on the bitstream to produce a highly mediated experience of the digital object for the human observer; changing any one of these components even slightly has the potential to change the experience of the digital object.

As has already been noted, this ‘intermediation’ of an individual’s experience raises a number of challenges for us as a society; technically, culturally, legally and philosophically. Figure 1: Simple Rendering

Both the hardware and software environments are in a state of constant flux. For example, Microsoft will produce a new version of its Windows operating system and Office suite every 18 months and many organisations replace personal computers on a cycle of less than five years. Against this background it is fairly certain that if we wish to revisit the digital object that holds this paper in say five years time, no component of the hardware or software environment will be the same as its equivalent in use today. Consequently, we should be concerned with establishing criteria that will guarantee that the experience of the digital object will be the same in the hardware and software environment of tomorrow. If we do not make this our current concern we will encounter legal and archival problems on a scale hitherto unknown. Here we are referring to a coarse level of the components that make up the computer system but, in addition, there are more subtle changes possible between computer systems. Even if my colleague and I have identical computer systems with identical operating systems, word processing software and drivers, we are still unlikely to have exactly the same experience of a particular digital object. This might be because the way in which we have configured our word processing programs might be different and we might use different printers which will effect the way that the file is presented on screen as well as the format of the printed document. The experience of a digital object is therefore not fixed merely by the bitstream involved, being heavily mediated by hardware and software which itself is subject to frequent and arbitrary change. We can expect poor fixity of the experienced digital object, and without fixity there is no way to establish identity and notions of an original, of authenticity and so on. Digital objects are bitstreams, as described above, whether they are digitised from physical objects or born directly into the digital world. Whilst appearing to the human observer as being composed of a random pattern of bits, a digital object actually has a well-defined structure consisting of at least two levels: encoding and format. The encoding specifies what groups of bits (often but not always of a fixed length referred to as bytes) represent. The format specifies how sub parts of the digital object are arranged and what they ‘mean’. To the human observer, many formats and encodings simply have little or no discernible meaning, at least in terms of, the real purpose of the digital object, for example, picture letter, paper and so on. Thus, if either the encoding or the format are unknown, the information is effectively inaccessible to either human or machine. At a technical level, by examining the bitstream itself, it is possible to say whether or not one bitstream is the same as another. But, given that human observers do not experience bitstreams in this way this is not going to be enough to settle most disputes about identity. Moreover, and as we have already

indicated, a bitstream is likely to be represented inconsistently in different hardware, software and configuration environments which leads to the possibility of ‘same bitstream, different perceived object’. However, the possibility of there being ‘different bitstream, same perceived object’ is also entirely plausible given that there are many ways to achieve a particular screen (or printer) effect, and this is in addition to any difference in perception as a result of features of any individual observer. A final set of philosophical and legal difficulties centre around the fact that in the digital world it is very easy to do things that are much more difficult in the physical world, for example it is very easy to make exact copies of an object whilst at the same time it is extremely difficult to detect changes made to a copy of an object. This allows me to make an exact copy of a document that I have worked on together with my colleagues and to make changes in such a way that it is effectively impossible for anyone to tell whether it is my copy or their’s that is the original.5 In the physical world there are almost always clues left by the copying or change processes, that allow forensic analysis to detect which is the original and which the falsified copy. In the case of a paper object there are many clues to both originality and change in, for example, inks, indentations, missing elements, torn edges, and many other things that do not have equivalency in the digital object nor, as yet, in processes associated with digital objects. Ideally we must put in place some technical measure or other procedure that will frustrate tampering or, at least, make it possible to determine when it has happened. Given that, it is unsurprising that the acceptance of digital objects as evidence is problematic. It would be virtually possible for a court to judge whether the document with which they are being presented is even close to that presented to one of the parties to a disagreement whilst entering into a contract. More difficult still would be documents relating to an individual but maintained by a third party in a fiduciary function (such as case notes); such documents may never be seen, at least not in their entirety, by the individual to whom they pertain In the traditional paper world, although it is considerably easier to detect tampering in documents originating prior to the invention of the photocopier than later documents, it is not to ‘technical measures’ that societies have turned to address the problem of authenticity. The primary mechanisms for establishing authenticity are processes and cultural conventions. These come in a variety of forms, including: editorial and publishing processes, form and format of different types of correspondence, rubric and layout of forms, etc. For example, this paper’s ‘genuineness’ derives as much from the editorial and publishing processes of JASIST, as it does from any technical measures. It is to these processes and cultural conventions that academic review boards would turn when evaluating it, rather than whether or not some computer file has been altered. This aspect is added to our earlier figure in figure 2.

Recap: The Boundedness of Digital Objects We have argued that increasingly there are problems defining the boundaries of digital objects; both at the bitstream and experiential levels. Two examples serve to illustrate some of the dimensions to this problem. A digital object such as a web page when represented to human observers might include hyperlinks to external digital objects as a realisation of, for example, references, footnotes, appendices, that the user of the document may follow, or not, at will. It might have graphical elements as links to external digital objects (graphics) that are embedded in the page by the web server but which are otherwise separate. In this example the web server and the user contribute to a determination of the bounds to the digital object and, as a result, the experienced digital object does not necessarily have the same bounds as the bitstream digital object; combining as it does several distinct bitstream objects. In the case of a (web-based) form such as a KYC form, the digital object exists only as an experienced object that requests and reveals a small number of fields stored in one or more database tables. The experienced digital object is brought together from many discrete components for a single transaction, never to be experienced in that way by anyone again. Increasingly web sites are employing this mode of operation to provide users with a personalised web experience with pages being assembled on demand using information that is current at a particular time and which may never have been seen in exactly that way before and never will be again. In this example, the experienced digital object simply never exists as a discrete bitstream object. The challenge of proving what the experienced digital object was for a particular individual in the past is considerable in both these examples. What is crucial is having some way of defining what the salient

5

A very clear account of a comparison of originals and copies in the physical and digital worlds is to be found in David Levy 2000

features of the experienced digital object are, an audit trail of the processes that maintain the component parts and the way that they have been assembled. In addition to our having adequate knowledge of the technical means by which the digital object was put together to enable it to be experienced, it is also essential that we have some knowledge of the experiential context in respect of purpose and circumstances. Figure 2: Rendering, Process and Cultural Mechanisms

How should we proceed? Thibodeau (2002) provides a good discussion of both the technical and other approaches to the preservation of digital objects and the Interpares Project Preservation Task Force suggests a ’Preserve Electronic Records’ model (Interpares Project 2001) which sees the problem as being more about processes and than technology. We suggest that devising purely technical measures or different forms of technology to solve this problem would be counter-productive. Further levels of technology simply add more processes intermediated by layers of software and hardware that then stand between the ‘evidence’ and the parties to an agreement. Digital signatures, for instance, will in theory allow parties to establish that a digital object has not been altered since the agreement was made, but this is essentially at the bitstream level and not at the level of the experience of the parties concerned. The situation has been further complicated by the fact that we require a ‘black box’ of software to make the necessary assertions and then we have to think of what measures we could use to establish that the ‘black box’ itself has not been tampered with. We suggest that there are three strands to a solution to the problems of digital identity. As mentioned briefly above we might think of starting by defining the ‘salient features’ of digital objects in terms of their purpose and mode of use. In this we broadly align with the simple case example exhibited in the work of Moore et al (2000a & b) at the San Diego Supercomputer Center. The salient features of a digital object are intended to ensure that the object’s content and utility remains the same in spite of differing experiences of the object and will consist of those features that are essential for that object to continue being that object from one experiential manifestation of it to another. One immediate advantage of this approach is that it takes us to a level beyond having to

examine the characteristics of a bitstream. An obvious disadvantage might be that it makes it difficult to define more than a small subset of features in any general sense, but an example of a general salient feature in the case of a text document might be that the written characters (however encoded) should remain the same and appear in the same order. In any particular case we might expect an author to be in the best position to provide more object-specific salient features for their own work, but this would only be feasible if there were tools to assist in this as it would be unrealistic to expect every digital object creator to undertake this activity from scratch. An overall strategy might be to begin by asking more generally what the salient features of particular types of digital object are, and to work from there. Having once defined the salient features, we would need to find ways of capturing and preserving them, and this would require the devising of auditable processes that ensure that the features are not altered by the transformations of the bitstream that are necessary from time to time in order to preserve it within new hardware and software environments. Processes surrounding the creation, treatment and use of physical documents and other objects have evolved over hundreds of years. The digital world as we know it barely 50 years old. As the digital world slowly matures, it will acquire processes to support the establishment and maintenance of the trustworthiness of its objects. Our second strand is concerned with hastening this process. There has been a tendency, particularly amongst technologists, to think that processes in the physical world are rendered unnecessary and superseded by the digital world. A good example of this in publishing, where some believe that publishing as an activity adds no value except distribution of information and that consequently the world wide web removes the need for both publishing processes and publishers.6 We maintain that process is, if anything, more important when we are dealing with objects, the fixity of which is at best questionable, thus projecting issues of identity to the fore. In order to understand any object and its significance, the person experiencing it must have a context to set it in. In the physical world, some of that context is readily apparent through direct perception by the human observer, some accompanies the object, either directly or indirectly, and yet more comes from an implicit understanding of the process environment in which the object was created. The digital world is still insufficiently mature to have developed a framework for context capture and retention to be a routine part of digital object creation and management. The third strand of our prognosis therefore is the development of a robust contextual environment for digital objects, that is both trustworthy and useful. To present digital evidence, those assessing it need to be able to trust the processes of curation, transformation and rendering taking the bitstream from what was actually created (the content), through to what was experienced by those involved at the time and what they are experiencing in assessing the evidence. It must be made possible for a creditable set of salient features to be produced on demand, for a trustable set of processes to have been followed in its creation and curation and for the digital object to be set in a sufficiently complete context for the experience of it to be meaningful.

Conclusion At least part of our conclusion must be that processes and cultural mechanisms surrounding digital objects, analogous to those surrounding objects in the physical world, will have to be developed. These will rarely be one-for-one analogues, since the requirements for credibility in the digital world are quite strikingly different. The central issue is one of trust and credibility.7 Traditional roles and relationships will need to be carefully examined in the light of a clear understanding of the philosophical, legal and archival issues, with the role of digital trustee having to involve a very subtle blend of traditional trustee skills with a high level of technical understanding and competence. It is clear that a great deal more creative research work is required in multiple areas, including: the processes by which digital objects are manipulated and rendered, the framework within which they are used, the cultural conventions that will build up over time and tools to help digital object creators to embed their ‘creations’ in a rich metadata environment which effectively and efficiently captures their ‘essence’. This represents a major challenge of the digital age that we ignore at the peril of our cultural heritage.

6

This is in part a backlash against academic publishing, where academics have traditionally given away their copyright to publishers, in exchange for peer review and whose institutions have then bought back access to the material through costly journal subscriptions (The SPARC initiative http: //www.arl.org/sparc/ is addressing the peer review publishing end of this problem) 7

This theme is explored by Clifford Lynch 2000

Bibliography Cantwell Smith, B. (1996) On the Origin of Objects, Bradford Books, MIT Press Garrett, J. & Waters, D. (1996) Preserving Digital Information, Report on the Task Force on Archiving Digital Information, Commission on Preservation and Access, Washington, DC, (http://www.rlg.org/ArchTF/tfadi.index.htm) Interpares Project (2001) “How to Preserve Authentic (http://www.interpares.org/documents/ptf_draft_final_report.pdf)

Electronic

Records”,

Juvenal, (1970) D. Ivnii Ivvenalis satvrae XIV. Fourteen satires of Juvenal, edited by J.D. Duff, London: Cambridge University Press Levy, D. M. (2000)“Where’s Waldo? Reflections on Copies and Authenticity in a Digital Environment”, (http://www.clir.org/pubs/reports/pub92/levy.html) Lynch, C. (2000) “Authenticity and Integrity in the Digital Environment: An Exploratory Analysis of the Central Role of Trust”, (http://www.clir.org/pubs/reports/pub92/lynch.html) Moore, R., et al. (2000a) “Collection-Based Persistent Digital Archives - Part 1”, (http://www.dlib.org/dlib/march00/moore/03moore-pt1.html) Moore, R., et al. (2000b) “Collection-Based Persistent Digital Archives - Part 2”, (http: //www.dlib.org/dlib/april00/moore/04moore-pt2.html) Peirce, C. S. (1931-58): Collected Writings (8 Vols.) (Ed. Charles Hartshorne, Paul Weiss & Arthur W Burks). Cambridge, MA: Harvard University Press Peirce, C. S. (1992)Reasoning and the logic of things: the Cambridge conferences lectures of 1898, (ed. Kenneth Laine Ketner, with an Introduction by Kenneth Laine Ketner and Hilary Putnam), Harvard University Press, Cambridge, Mass Rothenberg J. (2000) “Preserving Authentic Digital Information, In: Authenticity in a Digital Environment, Council on Library and Information Resources" (http://www.clir.org/pubs/reports/pub92/rothenberg.html) Stravinsky, I. (1975) An Autobiography, Calder and Boyars Tallis, R. (1999) Explicit Animal, St. Martin’s Press, LLC Thibodeau, K (2002) “Overview of Technological Approaches to Digital Preservation and Challenges in Coming Years”, (http://www.clir.org/pubs/reports/pub107/thibodeau.html) Wollheim, R. (1980) Art and its Objects, 2nd ed. Cambridge University Press, Cambridge