Proceedings of The International Conference on Cloud Security Management ICCSM‐2014 The Cedars University of Reading Reading, UK 23‐24 October 2014 Edited by Dr Barbara Endicott‐Popovsky
Copyright The Authors, 2014. All Rights Reserved. No reproduction, copy or transmission may be made without written permission from the individual authors. Papers have been double‐blind peer reviewed before final submission to the conference. Initially, paper abstracts were read and selected by the conference panel for submission as possible papers for the conference. Many thanks to the reviewers who helped ensure the quality of the full papers. These Conference Proceedings have been submitted to Thomson ISI for indexing. Please note that the process of indexing can take up to a year to complete. Further copies of this book and previous year’s proceedings can be purchased from http://academic‐bookshop.com E‐Book ISBN: 978‐1‐910309‐65‐0 E‐Book ISSN: 2051‐7947 Book version ISBN: 978‐1‐910309‐63‐6 Book Version ISSN: 2051‐7920 CD Version ISBN: 978‐1‐910309‐64‐3 CD Version ISSN: 2051‐7939 The Electronic version of the Proceedings is available to download at ISSUU.com. You will need to sign up to become an IS‐ SUU user (no cost involved) and follow the link to http://issuu.com Published by Academic Conferences and Publishing International Limited Reading UK 44‐118‐972‐4148 www.academic‐publishing.org
Contents Paper Title
Author(s)
Page No.
Preface
iii
Committee
iv
Biographies
vi
Research papers
Administration of Digital Preservation Services in the Cloud Over Time: Design Issues and Challenges for Organizations
Parvaneh Afrasiabi Rad, Jörgen Nilsson and Tero Päivärinta
1
Preservation Services Planning: A Decision Support Framework
Ingemar Andersson, Göran Lindqvist and Frode Randers
9
Secure Video Transcoding in Cloud Computing
Mohd Rizuan Baharon, Qi Shi, David Llewellyn‐ Jones, and Madjid Merabti
18
A Conceptual Model of a Trustworthy Voice Signature Terminal
Megan Boggess, Brooke Brisbois, Nicolai Kuntze and Barbara Endicott‐Popovsky
27
Maintaining streaming video DRM
Asaf David and Nezer Zaidenberg
36
Formalization of SLAs for Cloud Forensic Readiness
Lucia De Marco, Sameh Abdalla, Filomena Ferrucci, and Mohand‐Tahar Kechadi
42
Retention and Disposition in the Cloud ‐ Do You Really Have Control?
Patricia Franks and Alan Doyle
51
National Intelligence and Cyber Competitiveness: Partnerships in Cyber Space
Virginia Greiman
59
Preservation as a Service for Trust: An InterPARES Trust Specification for Preserving Authentic Records in the Cloud
Adam Jansen
67
The Ten Commandments of Cloud Computing Security Management
Issam Kouatli
73
Segmentation of Risk Factors associated with Cloud Computing Adoption
Easwar Krishna Iyer
82
How NSA’s surveillance programs influence cloud services outside the US?
Jyrki Kronqvist and Martti Lehto
90
Authenticity as a Component of Information Assurance and Security
Corinne Rogers
101
Secure Cloud Based Biometric Signatures Utilizing Smart Bobby Tait Devices
109
PhD Research Papers
119
Digital Photographs in Social Media Platforms: Preliminary Findings
Jessica Bushey
121
Diversified‐NFS
Martin Osterloh, Robert Denz and Stephen Taylor
130
Cache Side‐Channel Attacks in Cloud Computing
Younis Younis, Kashif Kifayat and Madjid Merabti
138
Master’s Research papers
147
In Kernel Implementation of RSA Routines
Asaf Algawi, Pekka Neittaanmäki, Nezer Zaidenberg and Tasos Parisinos
149
Evyatar Tamir, Nezer Zaidenberg and Pekka Neittaanmaki
154
TrulyTrusted Operating System Environment
i
Paper Title
Author(s)
Page No.
HoneyGuard ‐ Opensource VSaaS system
Nezer Zaidenberg and Ramat Gan
158
Non Academic Paper
163
Archives in the Cloud: Challenges and Opportunities
Christopher Fryer, and Adrian Brown
165
ii
Preface These Proceedings are the work of researchers contributing to the 2nd International Conference on Cloud Security Management Security (ICCSM 2014), being held this year at the University of Reading, UK on the 23‐24 October 2014,. The conference chair is Dr John McCarthy, Vice President, from the Cyber Security, ServiceTech, UK and the Programme Chair is Dr. Barbara Endicott‐Popovsky, from the Center for Information Assurance and Cybersecurity, University of Washington, Seattle, USA. As organisations rush to adopt Cloud Computing at a rate faster than originally projected , it is safe to predict that, over the coming years, Cloud Computing will have major impacts, not only on the way we conduct science and research, but also on the quality of our daily human lives. Computation research, education, and business communities have been exploring the potential benefits of Cloud Computing and the changes these imply. Experts have predicted that the move to the cloud will alter significantly the content of IT jobs, with cloud clients needing fewer hands‐on skills and more skills that administer and manage information. Bill Gates was recently quoted: “How you gather, manage, and use information will determine whether you win or lose.” Cloud Computing impacts will be broad and pervasive, applying to public and private institutions alike. Regardless of the rate of uptake, Cloud Computing has raised concerns. Despite the fact that it has huge potential for changing computation methods for better and provides tremendous research and commercial opportunities, it also creates great challenges to IT infrastructure, IT and computation management and policy, industry regulation, governance, the legal infrastructure and—of course—to information security and privacy. Wikileaks demonstrated the ease with which a massive set of confidential documents, collected and maintained in digital form, can be disseminated. The move to the cloud poses an even greater challenge, aggregating even more massive amounts of information, opening up even greater vulnerabilities, before we have even gained an understanding of the security implications. ICCSM aims to bring together the academic research community with industry experts working in the field of Cloud Security to hear the latest ideas and discuss research and development in this important area. In addition to the papers in these proceedings being presented there will be keynote addresses from Dr Sally Leivesley, Newrisk Limited, London, UK on the topic of “Strategic Threats in Cyberspace: Was MH370 the first Cyber Hijack?”. Bryan Mills, founder of the Computer Services Association, UK and Dr. Carsten Rudolph from the Fraunhofer Institute for Secure Information Technology SIT on the topic of “Designing Security into Advanced Cyber Infrastructures ‐ A Vision” With an initial submission of 51 abstracts, after the double blind, peer review process there are xx research papers published in these Conference Proceedings, including contributions from Canada, Finland, India, Israel, Lebanon, Nigeria, South Africa, Suomi, Sweden, UK, USA. We wish you a most enjoyable conference. Dr Barbara Endicott‐Popovsky Programme Chair October 2014
iii
Conference Executive Conference Executive Dr John McCarthy, Cranfield University/UK Defence Academy, UK Barbara Endicott‐Popovsky, Center for Information Assurance and Cybersecurity, University of Washington, Seattle, USA Mini track chairs Dr Luciana Duranti, School of Library, Archival and Information Studies, University of British Columbia, Vancouver, Canada Ir Daniel Ng, Kun Hang Group, Shanghai, China Dr. Nasser Abouzakhar, University of Hertfordshire, UK Dr Martti Lehto, University of Jyväskylä, Finland Dr Brent Lagesse, University of Washington – Bothell, USA Dr. Geethapriya Thamilarasu, University of Washington ‐ Bothell, USA Committee Members The conference programme committee consists of key people in the information assurance, information systems and cloud security communities around the world. Dr. Nasser Abouzakhar (University of Hertfordshire, UK), Dr. Todd Andel (University of South Alabama, USA), Darko Androcec (University of Zagreb, Faculty of Organisation and Infomratics, Croatia), Dr. Olga Angelopoulou (University of Derby, UK), Mario Antunes (Polytechnic Institute of Leiria & CRACS (University of Porto), Portugal), Prof. Alexander Bligh (Ariel University Center, Ariel, Israel), Andrew Blyth (University of Glamorgan, UK), Colonel (ret) Colin Brand (Graduate School of Business Leadership, South Africa), Dr. Svet Braynov (University of Illinois at Springfield, USA), Bill Buchanen (Napier University, UK), Prof. David Chadwick (University of Kent, UK), Haseeb Chaudhary (Plymouth University and RBS, UK), Dr. Joobin Choobineh (Texas A&M University, USA), Prof. Sam Chung (University of Washington, Tacoma, USA), Dr. Nathan Clarke (University of Plymouth, UK), Dr. Ronen Cohen (Ariel University Centre, Israel), Prof. Manuel Eduardo Correia (DCC/FCUP Oporto University, Portugal), Dr. Paul Crocker (University of Beira Interior, Portugal), Geoffrey Darnton (Bournemouth University, UK), Evan Dembskey (UNISA, South Africa), Dr. Ratnadeep Deshmukh (Department of Computer Science and IT, Dr. Babasaheb Ambed‐ kar Marathwada University, India), Dr. Frank Doelitzscher (University of Applied Sciences Furtwangen, Germany), Patricio Domingues (Polytechnic Institute of Leiria, Portugal), Prokopios Drogkaris (University of Aegean, Greece), Marc Dupuis (Uni‐ versity of Washington Tacoma, USA), Barbara Endicott‐Popovsky (University of Washington Seattle, USA), Daniel Eng (C‐ PISA/HTCIA, China), Prof. Dr. Alptekin Erkollar (ETCOP, Austria), Dr./Prof Iancu Eugenia (Stefan cel Mare University, Romania), Dr. Cris Ewell (Seattle Children's, USA), John Fawcett (University of Cambridge, UK), Prof. Eric Filiol (Ecole Supérieure en In‐ formatique, Electronique et Automatique, France), Prof. Steve Furnell (University of Plymouth, UK), Dineshkumar Gandhi (Adroit Technologies, india), Javier Garci'a Villalba (Universidad Complutense de Madrid, Spain), Dr. Samiksha Godara (Sham‐ sher Bahadur Saxena College Of Law, India), Dr Habib Goraine (University of South Wales, UK), Virginia Greiman (Boston Uni‐ versity, USA), Dr. Michael Grimaila (Air Force Institute of Technology, USA), Prof. Stefanos Gritzalis (University of the Aegean, Greece), Dr. Marja Harmanmaa (University of Helsinki, Finland), Jeremy Hilton (Cranfield University/Defence Academy, UK), Aki Huhtinen (National Defence College, Finland), Dr. Berg Hyacinthe (Assas School of Law, Universite Paris II/CERSA‐CNRS, France), Prof. Pedro In cio (: University of Beira Interior, Portugal), Dr. Abhaya Induruwa (Canterbury Christ Church University, UK), Ramkumar Jaganathan (VLB Janakiammal College of Arts and Science (affiliated to Bharathiar University), India), Hamid Jahankhani (University of East London, UK), Dr. Helge Janicke (De Montfort University, UK), Dr. Kevin Jones (Airbus Group Innovations, UK), Nor Badrul Anuar Jumaat (University of Malaya, Malaysia), Maria Karyda (University of the Aegean, Greece), Vasilis Katos (Democritus University of Thrace, Greece), Jyri Kivimaa (Cooperative Cyber Defence and Centre of Ex‐ cellence, Tallinn, Estonia), Spyros Kokolakis (University of the Aegean, Greece), Dr. Marko Kolakovic (Faculty of Economics & Business, Croatia), Prof. Ahmet Koltuksuz (Yasar University, Dept. of Comp. Eng., Turkey), Theodoros Kostis (Hellenic Army Academy, Greece), Prof. Easwar Krishna Iyer (Great Lakes Institute of management, Chennai, India), Ren Kui (state university of new york at buffalo, USA), Pertti Kuokkanen (Finnish Defence Forces, Finland), Rauno Kuusisto (Finnish Defence Force, Finland), Harjinder Singh Lallie (University of Warwick (WMG), UK), Dr. Laouamer Lamri (Al Qassim University and European University of Brittany, Saudi Arabia), Martti Lehto (National Defence University, Finland), Yale Li (Cloud Security Alliance Seat‐ tle, USA), Juan Lopez Jr. (Air Force Institute of Technology, USA), Volodymyr Lysenko (University of Washington, USA), Dr. Bill Mahoney (University of Nebraska, Omaha, USA), Dr. Hossein Malekinezhad (Islamic Azad University of Naragh, Iran), Mario Marques Freire (University of Beira Interior, Covilhã, Portugal), Ioannis Mavridis (University of Macedonia, Greece), Dr. John McCarthy (Cranfield University , UK), Rob McCusker (Teeside University, Middlesborough, UK), Dr. Srinivas Mukkamala (New Mexico Tech, Socorro, USA), Dr. Francisca Oladipo (Nnamdi Azikiwe University, Nigeria), Dr. Deanne Otto (Air Force Insitute of Technology Center for Cyberspace Research, USA), Tim Parsons (Selex Communications, UK), Dr. Andrea Perego (European Commission ‐ Joint Research Centre, Ispra, , Italy), Dr Bernardi Pranggono (Glasgow Caledonian University, UK), Dr. Yogachandran Rahulamathavan (City University London, UK), Dr. Ken Revett (British University in Egypt,, Egypt ), Dr. Keyun Ruan (University College Dublin, Ireland), Prof. Henrique Santos (University of Minho, Portugal), Ramanamurthy Saripalli (Pragati Engineering College, India), Prof. Chaudhary Imran Sarwar (Mixed Reality University, Pakistan), Sameer Saxena (IAHS iv
Academy, Mahindra Special Services Group , India), Corey Schou (Idaho State University, USA), Prof. Dr. Richard Sethmann ( University of Applied Sciences Bremen, Germany), Dr. Armin Shams (National University of Ireland, University College Cork, Ireland), Dr. Yilun Shang (Singapore University of Technology and Design, Singapore), Dr. Dan Shoemaker (University of De‐ troit Mercy, Detroit, USA), Paulo Simoes (University of Coimbra, Portugal), Prof. Jill Slay (University of South Australia, Austra‐ lia), Dr. William Spring (University of Hertfordshire, UK), Prof. Michael Stiber (University of Washington Bothell, USA), Iain Sutherland (University of Glamorgan, Wales, UK), Anna‐Maria Talihärm (Tartu University, Estonia), Sim‐Hui Tee (Multimedia University, Malaysia), Prof. Sérgio Tenreiro de Magalhães (Universidade Católica Portuguesa, Portugal), Prof. Dr. Peter Trommler (Georg Simon Ohm University Nuremberg, Germany), Dr. Shambhu Upadhyaya (University at Buffalo, USA), Renier van Heerden (CSIR, Pretoria, South Africa), Rudi Vansnick (Internet Society, Belgium), Stilianos Vidalis (Newport Business School, Newport, UK), Prof. Kumar Vijaya (High Court of Andhra Pradesh, India), Dr. Natarajan Vijayarangan (Tata Consul‐ tancy Services Ltd, India), Nikos Vrakas (University of Piraeus, Greece), Mat Warren (Deakin University, Australia, Australia), Dr. Tim Watson (University of Warwick, UK), Dr. Santoso Wibowo (Central Queensland University, Australia), Mohamed Reda Yaich (École nationale supérieure des mines , France), Dr. Omar Zakaria (National Defence University of Malaysia, Malaysia), Dr. Zehai Zhou (University of Houston‐Downtown, USA), Prof. Andrea Zisman (City University London, UK),
v
Biographies Conference Chair Dr John McCarthy is a world renowned authority on cybersecurity strategy, development and imple‐ mentation. He holds a PhD in Cybersecurity and eBusiness Development and is an internationally rec‐ ognized author of a number of academic papers discussing all aspects of cybersecurity in the modern world. John is frequently invited to sit on expert panels and appear as an expert speaker at well‐known cybersecurity events. Past appearances have included talks on ICT Security in the Modern Airport, Se‐ curity in the Digital Age and SCADA threats in the Modern Airport at various prominent international conferences. John is also a leading expert on social engineering awareness training and best practice. John has worked with ServiceTec since 2003 and currently holds the position of Vice President of Cybersecurity for the Air‐ port CyberSec division. His responsibilities include the development and management of a complete set of cybersecurity ser‐ vices created specifically for the airport industry, as well as the deployment of those services at some of the world’s busiest airports. Furthermore, John’s impressive list of posts include seats on a number of prominent US committees that offer ad‐ vice and policy guidance to the US government on cybersecurity matters. He is also panel member of the American Transport Research Board researching cybersecurity best practice for airports throughout North America and an active member of the British Computer Society (BSC), Elite and the International Committee on Information Warfare and Security.
Programme Chair Dr Barbara Endicott‐Popovsky holds the post of Director for the Center of Information Assurance and Cybersecurity at the University of Washington, an NSA/DHS Center for Academic Excellence in Informa‐ tion Assurance Education and Research, Academic Director for the Masters in Infrastructure Planning and Management in the Urban Planning/School of Built Environments and holds an appointment as Research Associate Professor with the Information School. Her academic career follows a 20‐year career in industry marked by executive and consulting positions in IT architecture and project management. Barbara earned her Ph.D. in Computer Science/Computer Security from the University of Idaho (2007), and holds a Masters of Science in Information Systems Engineering from Seattle Pacific University (1987), a Masters in Busi‐ ness Administration from the University of Washington (1985).
Keynote Speakers Dr Sally Leivesley is the Managing Director of Newrisk Limited is an advisor to companies and govern‐ ments on catastrophic risk. Risk assessment, crisis management, mitigation and training covers: techni‐ cal, financial, cyber, chemical, biological, radiological, nuclear and explosives risks, aviation and trans‐ port, global terrorism, natural and industrial hazards and operational risks within many industry and government fields. She has been contracted as an independent catastrophic risk advisor to provide re‐ ports or publications on three nuclear incidents Three Mile Island, Chernobyl and Fukushima and as a risk advisor to an inquiry on a rail mass casualty incident. Her current interests are innovation, exercises and research in catastrophic smart systems risks– smart cities, smart critical national infrastructure (CNI) and smart planes and the development of system of systems solutions. Dr Leivesley holds memberships on the Register of Security Engineers and Specialists (RSES), the International Association of Bomb Technicians and Investigators (IABTI), the Royal United Services Institute for Defence Studies and the Australian College of Education. She holds Fellowships with the Institute of Civil Protec‐ tion and Emergency Management and the Royal Society of Arts, Manufacturing and Commerce. She is a Member of the In‐ formation Assurance Advisory Council (IAAC) Community of Interest, the Independent Information Security Group (IISYG), and the Safeguarding Intelligent Buildings, Stakeholder Group, IET. Dr Leivesley also held an Adjunct Appointment to the Nuclear Science and Engineering Institute (NSEI) University of Missouri, USA (2012‐13). Bryan Mills has a long history of managing IT service companies in the UK, the Netherlands and other European countries. At Cambridge he read Economics and Law; after a brief spell in the Diplomatic Service, he joined Burroughs Machines and began his long association with Information Technol‐ ogy.He co‐founded CMG (Computer Management Group) in 1964 where he held the positions of Chairman and CEO, for 17 years. He retired to Ireland in 1981 where he was actively involved in the Irish software and services industry. In 1984 he became a director and subsequently UK chairman of F International. He then set up Pi Holdings by acquiring several IT service companies. After Pi Hold‐ ings he became an executive and non‐executive director in a number of IT services and other businesses before joining Ser‐ viceTec in 1996 as a consultant, shortly becoming a director and then Chairman. At ServiceTec he played a leading role in the creation of ServiceTec Airport Services International, after the Group focussed on the airport services market. He also led the investigation of new markets, culminating in the approach to the cybersecurity market. Bryan founded and was first Presi‐ dent of the Computer Services Association (now Intellect), was a very early Liveryman of the Worshipful Company of Infor‐ mation Technologists and Freeman of the City of London, and was Vice‐Chairman of the Industrial Participation Association. vi
Mini Track Chair Dr. Luciana Duranti is Chair of Archival Studies at the School of Library, Archival and Information Studies of the University of British Columbia, and a Professor of archival theory, diplomatics, and the manage‐ ment of digital records in its master’s and doctoral archival programs. She is Director of the Centre for the International Study of Contemporary Records and Archives (CISCRA) and of InterPARES, the largest and longest publicly funded research project on the long‐term preservation of authentic electronic rec‐ ords (1998‐2018), the “Digital Records Forensics” Project, and the “Records in the Clouds” Project. She is also co‐Director of “The Law of Evidence in the Digital Environment” Project.
Biographies of Presenting Authors Parvaneh Afrasiabi Rad Received Bsc. degree in Information technology from Amirkabir University of Technology (Tehran Polytechnic), Iran (2008) and Msc. degree in information security from Luleå University of Technology, Sweden (2012). Started her research as a PhD student at Luleå University of Technology (2013) in field of digital preservation by joining de‐ partment of computer science, electrical and space engineering. Researches long term digital preservation, information secu‐ rity, information systems. Ingemar Andersson has a position as a lecturer in department of Computer Science, Electrical and Space Engineering at Luleå University of Technology. Since 2008, he has been involved in R&D EU‐projects at the LDP‐centre. Mohd Rizuan Baharon. Currently, he is a PhD student at Liverpool John Moores University, Liverpool, United Kingdom. He completed his master degree (MSc in Mathematics) in 2006 and his undergraduate studies in 2004 at University Technology Malaysia, Malaysia. His research interests lie in the area of Cryptography and Computer Network Security. Megan Boggess just completed her first of two years working on her Master of Science in Computer Science and Systems at the University of Washington, Tacoma. She is also a recipient of the National Science Foundation's CyberCorp: Scholarship For Service award. Outside of class and this paper, Megan is interested in robots and the intersection of data privacy and se‐ curity. Erik Borglund, PhD in computer and System science is a researcher at the Archival and information management school of Mid Sweden University, and his main research interests are digital recordkeeping, cloud computing, document management, information systems in crisis management, information systems design and Computer Supported Cooperative Work. Erik Borglund has been sworn police officers for 20 years, before he became 100% academic. Brooke Brisbois is a graduate student studying Information Management. She is also a recipient of the National Science Foundation's CyberCorp: Scholarship For Service award. Her interests within the realm of cybersecurity include data visualiza‐ tion and user experience design. Jessica Bushey is a doctoral candidate School of Library, Archival and Information Studies (SLAIS), University of British Co‐ lumbia, Canada. Dissertation explores contemporary photographic practices utilizing social media platforms; investigates implications activities have on digital images as trustworthy records. Jessica graduated as a Research Assistant, InterPARES Trust Project and Law of Evidence in Digital Environment. Publications include, "Web Albums: Preserving the Contemporary Photographic Album," in The Photograph and The Collection and "Convergence, Connectivity, Ephemeral and Performed: New Characteristics of Digital Photographs," in Australian journal Archives & Manuscripts. Lucia De Marco is a third year PhD student of the joint Information Systems and Software Engineering program held by Uni‐ versity of Salerno and University College Dublin. She graduated with a B.Sc. in 2007 and an M. Sc. in 2011, both in Computer Science. She is working on Proactive and Reactive Cloud Forensics since 2012. Robert Denz is a computer engineering PhD candidate at Dartmouth College. His research area is in multicore technologies and their applications to cloud security. Robert came to Dartmouth College in the winter of 2012 after spending five years in the computer security industry. Patricia Franks coordinates the Master of Archives and Records Administration degree. She was team lead for the ANSI/ARMA standard, Implications of Web‐based Collaborative Technologies in Records Management, and the technical re‐ port, Using Social Media in Organizations. She is author of the book, Records and Information Management and a member of the interdisciplinary research team InterPARES Trust. vii
Christopher Fryer Since January 2014 Christopher Fryer has worked at the Parliamentary Archives as Senior Digital Archivist. Previously he was Digital Curator for Northumberland Estates. He complicated his MSc in Information Management and Preservation at the University of Glasgow in 2011. He has worked in various roles including Project Officer for a Museum Gal‐ leries Scotland funded project. Virginia Greiman Professor of Megaprojects and Planning at Boston University and former Deputy Counsel and Risk Manager to Boston’s $14.9 billion Big Dig Project. She is a recognized scholar and expert on megaprojects, project complexity, and in‐ ternational law and development. She previously served as a diplomatic official to the U.S. Department of State and USAID. Issam Kouatli is an associate professor in the field of Management Information Systems. He was granted PhD degree (1990) in the Engineering school in Birmingham University (UK). Main research interest is intelligent systems in general and Fuzzy systems in specific. The emergence of new Cloud Computing concept became a recent research interest as well. Easwar Krishna Iyer is Associate Professor in Marketing at Great Lakes Institute of Management, India. A triple post graduate (last one in Energy Management from University of Houston) and on the verge of finishing his PhD in the area of Cloud Com‐ puting, Prof. Easwar is slowing spreading his research into technology convergence and green computing. Jyrki Kronqvist holds a Master´s degree in mathematics and information technology from the Jyväskylä University, where he is currently working towards his dissertation in the area of information security. His research interests include information security, cyber espionage, cloud computing and data encryption. He has also a career outside academic world and he has worked in large global organizations in several information security related positions. Currently he is working as an informa‐ tion security manager in a nordic IT company. Martin Osterloh is a computer engineering PhD candidate at Dartmouth College. His research area of interest is in reverse engineering of binaries and their applications to cloud security. Martin came to Dartmouth College in the winter of 2012 after spending two years at the University of Ireland in Galway, Ireland. Corinne Rogers is an adjunct professor (diplomatics, digital records forensics) and doctoral candidate (researching concepts of authenticity of digital records, documents, and data and authenticity metadata) at the University of British Columbia. She is Project Coordinator of InterPARES Trust – international multidisciplinary research into issues of trust in digital objects in online environments, and a researcher with the Law of Evidence in the Digital Environment Project (Faculty of Law, UBC). Bobby Tait received his D.Com from the University of Johannesburg (UJ) in 2009, he lectures subjects in biometrics and In‐ formation security at UJ for 10 years. In 2012 he accepted a position at the University of South Africa (UNISA), where he is heading a biometric security research team focusing on cloud security. Stephen Taylor is a Professor of Computer Engineering at Dartmouth College and a nationally recognized leader in cyber se‐ curity. Among other awards, he has received Secretary of Defense and USAF Medals for Public Service and the DARPA Direc‐ tors Award for Outstanding Portfolio of Technical Programs. Younis A. Younis is a PhD student in the School of Computing and Mathematical Sciences at Liverpool John Moores Univer‐ sity. His research interests include securing access to cloud computing, public key cryptography and wireless sensor networks boundary recognition problems. Younis has an MSc in computer network security from Liverpool John Moores University. Nezer Zaidenberg is faculty member with Shenkar College of engineering and design, Israel and researcher in the University of Jyväskylä, Finland. Nezer current project is TrulyProtect a trusted computing platform. Nezer prime research interest is trusted computing, virtualization and systems.
viii
Administration of Digital Preservation Services in the Cloud Over Time: Design Issues and Challenges for Organizations Parvaneh Afrasiabi Rad, Jörgen Nilsson and Tero Päivärinta Luleå University of Technology, Luleå, Sweden
[email protected] [email protected] [email protected] Abstract: In organizations, information systems produce increasing amounts of digital information that needs to be preserved. Simultaneously, human intervention in preservation activities needs to be minimized, due to scarce resources available for preservation. Thus, digital preservation services need to be integrated with other organizational information systems as seamlessly as possible, and preservation activities need to be maximally automated. Selection of adequate cloud‐based preservation services poses yet another set of challenges for reaching the seamless integration and automation. Two sources of challenges relate to administrating integrated digital preservation and information systems. Firstly, information systems change over time. Secondly, digital preservation services change over time. This paper contributes by identifying and analyzing scenarios and design issues of administering digital preservation in connection to the changes in information systems and preservation services over time. We also suggest seven generic tasks of the “preservation administration” component, which are needed between digital preservation services and information systems and discuss about the choices of locating the operative responsibilities and middleware. Keywords: digital preservation, integration, information systems, cloud
1. Introduction Long‐term digital preservation (DP) provides processes and tools to store and access digital information beyond life‐cycles of individual information systems and applications (e.g., CCSDS 2012). In any modern organization, information systems (IS) repositories are the sources of digital content that needs to be preserved. Increasing amount of digital content and limited human resources for conducting preservation tasks raise a need for increased automation of integrated preservation and access processes between IS and DP (see Stewart 2012), including related administration tasks. However, obstacles for adopting DP in organizations and personal information management exist (Ross 2012, Kanhabua et al. 2014). One reason for low adoption of DP is the gap between IS and DP services. For example, the prevailing standards for DP, such as the Open Archival Information System (OAIS) reference model (CCSDS 2012), do not address integration issues between IS and DP (e.g. Korb and Strodl 2010). Accordingly, Päivärinta et al. (2014) suggested a generic model for a middleware to support seamless interaction between IS and DP systems. The middleware creates a bridge between one or many IS and one or many preservation services. One organization can, on one hand, use many preservation and storage services, e.g., in the cloud, while, on the other hand, a particular preservation service can accept submissions and access requests from several information systems (Päivärinta et al. 2014). Cloud preservation services provide a specialized online storage repository that can be utilized as a component of a complete DP service. A preservation service includes a set of management and preservation functionalities (Thibodeau and Zangaro 2013) such as ingest and access which is different from a simple storage in the cloud. However, both preservation services and IS are subject to changes through time. New business processes and application components together with modifications to the existing applications lead to frequent upgrades to IS. In addition, DP service implementations and interfaces can also be subject to modifications or transformations. Thus, preservation administration activities, supported with adequate middleware, are required to ensure maximally automated preservation and access processes between IS and DP services. Moreover, knowing that the middleware is aimed to be developed as a bridge between IS and DP, it is a viable proposition that the preservation administration should, in itself, be maximally automated. In this article, we point out scenarios and design issues for this problem. The rest of this article is organized as follows. Section 2 describes related research. Section 3 summarizes a recently suggested generic model for integration between DP and IS (Päivärinta et al. 2014) followed by section 4 that summarizes four scenarios needed to alter such integration over time. Section 5 outlines design
1
Parvaneh Afrasiabi Rad, Jörgen Nilsson and Tero Päivärinta
issues for preservation administration tasks and responsibilities. Finally, we also discuss about our contributions with avenues for further research in section 6 and conclude in section 7.
2. Related work In a recent literature review on integration between DP and enterprise content management, Päivärinta et al. (2014) recognized that seamless solutions for integrating IS and DP services remain yet to be developed. Traditionally, semi‐automated approaches to leverage existing work on preservation metadata and software tools have been suggested (e.g., Hunter and Choudhury 2004). Preservation software modules can be made available as web‐services added with a semantic description of the service according to which the service will be automatically discovered in the web (ibid.). A semi‐automated approach can already provide a combination of services for a particular digital object (ibid.). Hunter and Choudhury developed also a semi‐automated approach to preserve composite digital objects using semantic web services (Hunter and Choudhury 2006). In 2008, the Planets project (Preservation and Long‐term Access through NETworked Services; Farquhar and Hockx‐Yu, 2008) commenced with the goal of building services and tools to help ensure long‐term access to digital objects. Such models and their implications evidence the growth of digital collections which triggers the demand for establishing and maintaining an interaction between DP services and IS repositories. However, increasing rates of content creation and information collection means that many enterprises would no longer be able to keep on addressing their requirements within a single DP system. In this respect Hitchcock et al. (2005) argue for using external DP service providers. Therefore, organizations (and information systems) should be able to utilize a combination of cloud‐based DP services that sufficiently address the preservation requirements (e.g. Andersson et al. 2013). However, integration between IS and DP is still in its infancy. While a few recent research projects, such as Protage (Saul and Klett 2008) and SHAMAN (e.g. Wittek and Daranyi 2012), have proposed partial solutions and prototypes for such integration, mostly from the sole viewpoint of DP, the problem of preservation administration on integrated and automated solutions between IS and DP still remains to be concretized (Päivärinta et al., 2014). The SHAMAN project (Birrell et al. 2011) developed a preservation management and planning interface to manage creation of preservation information under the ingest phase. Alongside, it has also created its initial workflow support and tools from creation of objects and context data (ibid). SHAMAN also provides tools for adoption of objects, including their descriptive metadata, context metadata, and preservation metadata (ibid). Tessella Digital Preservation delivered the first version of their ‘Safety Deposit Box’ (SDB) service in 2003 to automate the ingest and access processes (Tessella 2013). SDB claims that its ingest process is flexible and expandable and aimed to deliver as much automation as possible or required. The service includes also a web‐ based interface allowing users to browse the collection of digital information and search for content. SDB’s key features are designed and implemented so that an information system could communicate with a digital archive operated by SDB. The source of information can be any specific program or a basic file system, and the metadata can be either extracted or merely described. In whatever form information arrives to an archive, SDB can be configured to accept it into the system (Tessella 2013). To access information, a user interface is available for locating the content of interest and accessing metadata. For small dissemination packages, users can directly download information. Higher volumes of information can be downloaded via configurable access workflows. Tessella’s SDB, indeed, provides automation to ease interaction of an information systems and a DP service, providing tools for configuring a one‐to‐one integration their service and an information system. However, SDB is designed from the viewpoint of the DP service provider, and it lacks functionality for managing all preservation needs and services in use from the viewpoint of the client organization.
3. Problem scenarios of interaction between IS and DP Päivärinta et al. (2014), following the idea of the Protage (Saul and Klett 2008) and SHAMAN projects (Birrel et al. 2010), suggest a middleware to be developed and placed between preservation services and IS. A middleware provides the only feasible solution in cases where system integrations need to be feasibly managed among several systems (Linthicum 2000) Such middleware to integrate IS and (OAIS‐compatible) DP services should involve at least processes for pre‐ingest, access, and preservation administration (Päivärinta et al., 2014). Pre‐ingest involves a set of preservation activities before the ingest process. The access process in middleware includes those tasks that would mediate requests for information from an IS to an OAIS and succeed the access to preserved information in an OAIS, but needed before the client IS can finally access the information (Päivärinta et al. 2014). In addition, preservation administration is a group of tasks and services that are needed for configuring, automatizing, and managing the pre‐ingest and access processes (ibid). Figure
2
Parvaneh Afrasiabi Rad, Jörgen Nilsson and Tero Päivärinta
1 illustrates this middleware and its processes. It also shows on a general level how the middleware processes are supposed to communicate with the components of IS and with an OAIS‐compatible DP service. SIP (submission information package) is an information package that is delivered by the producer of information to the preservation system (CCSDS 2012). DIP (dissemination information package) is an information package that is sent by an archive to the consumer in response to a request to the preservation system (ibid). The three middleware processes are defined in order to automatize interactions between IS and DP services. Integration between DP services and IS is needed when information needs to travel as smoothly as possible both from IS repositories to DP and from DP service(s) to IS. For a piece of information that is communicated from IS to DP, the journey starts from a repository in an IS through pre‐ingest in the middleware, which prepares the information for preservation and sends a SIP forward to the ingest component of a DP service (Päivärinta et al. 2014). On the way back, the access component of the DP system fetches a piece of information from the preservation storage, packs the information object and metadata to a DIP which the access (middleware) function delivers to the access function of an IS (ibid). In addition to pre‐ingest and access processes, the middleware also administrates integration (potentially more than one) of IS with (potentially more than one) DP services, which would be a typical scenario in the future‐oriented cloud‐based services to be provided for digital preservation (e.g., Andersson et al. 2013).
Figure 1 Model for seamless interaction between an IS and a DP service (Päivärinta et al. 2014)
4. Generic Scenarios of Managing Integrated Preservation Over Time in the Cloud Traditionally, digital preservation (DP) has been designed “holistically”, which means that a single preservation system is supposed to aggregate the storage, ingest and dissemination, metadata collection and preservation functionalities into one whole (Lavoie and Dempsey 2004). However, non‐aggregated approaches to configure preservation systems are also possible where different functionalities of DP are divided into services that can be distributed over multiple service providers. The service providers could each be specialized in a segment of the entire preservation process. The services are supposed to offer functionalities separately yet in an interoperable manner that can be combined together in a variety of ways in order to respond to an activity initiated by an IS. For example, one IS may require a continuous process of ingest and access whereas another could submit material to an archive on an irregular basis. A description of every service should be available in its service profile to make its discovery possible. The service profiles are later on used for matching service requests and service providers by a semantic matchmaker (Hunter, Choudhury 2004). As advances are made in preservation process and software, preservation tools and services evolve, and new standards rise. Accordingly, new preservation services must be able to be incorporated. In addition to access to preservation services and their discovery, the link between IS and DP also needs to be adapted to the new services and standards. Service recommender systems and decision support services could assist organizational archivists to select optimal combinations of services for specific circumstances. On the other hand, increasing functionality of web‐based information systems, integration of data warehouses and enterprise information portals, and modern mobile devices provide very heterogeneous interfaces to access and preserve information resources (Munkvold, Päivärinta, Hodne and Stangeland 2006). Therefore, a flexible administration structure and support is needed that establishes interactions between IS and DP while being able to handle potential changes in the IS system architecture.
3
Parvaneh Afrasiabi Rad, Jörgen Nilsson and Tero Päivärinta
Accordingly, on one hand, several systems are free to select among a combination of DP services on the cloud that could develop over time. In this case, the interaction link between an IS system and a DP service should be able to automatically identify the update/change and make a decision for whether to move on to the newer service. On the other hand, a single DP service might face a change in an IS application interface. Here, as well, the middleware needs to identify the change and take proper actions: for example, to stop the service, to adapt to the change if possible, or to recommend another service. Altogether, four scenarios arise from the discussion above:
Steady IS and steady DP service This scenario identifies a situation in which an IS is integrated to a DP service where neither of two systems are expected to be altered over time (when alteration includes upgrade of a software segment or whole system, change of a software segment or whole system, or switching to a different system or service). Interaction between IS and DP service is intended to be smooth and maximally automated. Currently, this type of one‐to‐one configuration for integrating IS and DP is supported by preservation service providers such as Tessella, as described above.
Steady IS and altered DP services In this case an IS is assumed to remain unchanged over time while integrated with one or many DP service(s) that may be subject to upgrades. After an upgrade, the IS may sometimes need to switch to a different DP service. Alterations as such would impact on interactions between IS and DP services over time. However, the interactions should stay maximally automated. This scenario, and also the following ones, introduce challenges for maintaining and administering integration between IS and DP services over time.
Altered IS and steady DP service This scenario raises the same challenges as the previous one, but from the opposite viewpoint. In this case, alterations occur to an IS whereas a DP service is assumed to remain unchanged. Here, too, the IS‐DP‐interaction is affected and maintenance of automation and integration are challenged.
Altered IS and altered DP services This scenario covers all the scenarios mentioned above. An information system can be subject to alterations, while it must be able to interact with one or more changeable DP services over time. As well, more than one information systems of the organization in question may wish to ingest and/or access to a particular DP service from the selected service provider. Hence, the future system integration and preservation administration solutions should be designed accordingly. If a solution can address this main scenario, it should also be able to address all the other above‐mentioned challenges as well.
5. Design Challenges and Issues For each scenario above, automation and integration of IS and DP services is of foremost importance. In this regard, here, two groups of design issues are identified. The first group of issues refers to tasks and responsibilities of administering preservation. In addition, a designer needs to address where to organizationally locate the middleware and related tasks
5.1 Preservation administration tasks The preservation administration component (with the related tasks) is located in the middleware together with the pre‐ingest and access components (Päivärinta et al. 2014). According to our analysis in light of the scenarios to manage integration between IS and cloud‐based DP services, the preservation administration tasks can be divided into three main categories (Table 1): automatizing ingest, automatizing access, and general administration tasks. In any of the scenarios, no preservation will take place before a DP service is chosen based on the organizational requirements and cost constraints. This preservation administration task has been under study e.g. in the recent ENSURE project, which aimed at decision‐support for conceptualizing organizational quality requirements for preservation and mapping them in connection to technical descriptions and characteristics of
4
Parvaneh Afrasiabi Rad, Jörgen Nilsson and Tero Päivärinta
cloud‐based preservation services (see Andersson et al., 2013). Anyhow, research on how to implement this task is still in its very beginning. After the DP service is chosen and contracted with the service provider, a maximally automated workflow needs to be configured for ingesting selected information content from IS according to chosen policies to the selected service. This component should take advantage of the available standards for exchanging information content and metadata, such as CMIS (Choy et al. 2011) or METS (LoC 2014), to ensure the maximally automatized configuration operations as well. While “plug and play” functionality to configure ingest from IS to DP services may still require significant research and development efforts, it should be pursued to the maximum extent. Currently, full integration of the ingest process requires necessarily human interaction and application development, while the focus of the future research should be in the problem of automatizing such interoperability to the extent possible. Similar challenges apply also after a preservation service has been established and some content preserved. If an information system needs to be able to browse also the preserved content, and eventually to access that, the preservation administration needs to configure accesses to the preserved resources by request and, ultimately, to re‐contextualize copies of the preserved information resources back to active information systems if needed. Here, it needs to be noted that the system accessing the preserved resources is not always necessarily the system which had initially preserved them. If access to the preserved content resources is large‐scale and frequent, the middleware should also be capable of optimizing the browsing and access conditions accordingly, e.g. by using some cache‐databases for frequently used contents. While browsing solutions (such as Tessella’s SDB above) exist, the integration issues for providing possibilities for full‐fledged re‐contextualization of a preserved resource to another system surely need significant research and development attention also in the future.
Automatizing ingest
Automatizing access
General administration tasks
Preservation administration tasks
Choose a DP service
Configure ingest workflow
Configure access
Optimize access conditions
Manage updates to IS, DP service and middleware (including the standard application processing interfaces)
Manage costs
Manage exceptions and errors
Table 1 preservation administration tasks The preservation administration task, supported by a middleware, needs also to take care of updates to the connected information systems, preservation systems, and the standard application processing interfaces in use. Moreover, in any larger‐scale preservation solution, the administration task needs to monitor and manage costs of preservation (and access) being ultimately prepared for switching a service provider if a more optimal competitor can be located, e.g., in the cloud. Finally, automation of the ingest and access processes and requests will also most likely create exceptions and errors over time, which need to be solved. This would be on the responsibility of the administration task / component. If we compare the suggested administration tasks to the above‐mentioned scenarios, we can see that they correspond logically to the most challenging scenario as follows. In the first place, the tasks address the need for establishing and maintaining fully automated ingest and access processes. Secondly, the suggested administration tasks monitor for both needs for changes caused by external events and possible changes initiated in the preservation environment in itself, and can, in turn, again launch the configuration tasks accordingly. Hence, at the logical level, we can argue that at least all of these seven abstract tasks for the preservation administration component will be needed for addressing the digital preservation scenarios listed
5
Parvaneh Afrasiabi Rad, Jörgen Nilsson and Tero Päivärinta
above. Future research should focus on the possibilities and design choices, which would make their maximal automation and computer support possible.
5.2 Where To Locate Preservation Administration and Middleware Although a middleware solution seemed like an obvious choice for the scenarios given above (Linthicum 2000), one question that is not as easily addressed is – who should own and operate the middleware and be responsible for preservation administration? There are three options to choose between here; the content producer could own and manage the middleware, the preservation organisation could own and manage the middleware, or the middleware could be owned and operated by a third party. There are benefits and detriments with all, however. 5.2.1 Middleware managed by producer organization Having the middleware operating under management and ownership of the content producer organization could lead to that integration with the content producer’s system(s) would be optimized. One drawback might be that there is a too tight coupling between the systems on both a practical and a logical level, which would lead to difficulties to change either the IS or the middleware. Hopefully the middleware would however adopt common interfaces and standards for external communication with the preservation system(s), so that it would be easy to change preservation system provider. 5.2.2 Middleware managed by preservation organization With the middleware managed and owned by a preservation organization, it would likely be focused on support for whatever repository system the preservation organization is running. The ingest process would follow the rules and policies set up by the preservation organization, and if the content producers would need to use several preservation organizations/services, they might need to follow several different protocols and formats for ingest, perhaps also access. 5.2.3 Middleware managed by third party If the middleware is operated and owned by a third party, it would be likely that this third party provider would be interested in providing solutions for several different content producers, as well as for several different preservation systems. This would mean that although both the producer systems and the preservation systems would need to follow some common standards and principles, it would be standards and principles that are well known, and most likely even tailor made, for a particular area of application. It would be in the interest of the third party provider to make sure that the middleware supports well‐established practices and standards in e.g. content management and digital preservation. In that way, all producer systems that support e.g. the Content Management Interoperability Services (CMIS; Choy et al. 2011) would be able to ingest objects into the middleware which then would process and package the objects according to e.g. METS principles (LoC 2014). Our initial choice would be to opt for having the middleware provided by a third party, in order to facilitate different producers, and producer systems, as well as the possibility for one producer having one form of access to several different preservation systems, e.g. for different types of objects.
6. Discussion ‐ Contributions and Implications for Future Research In this article, we identified challenges and design issues for an integrated model of preservation services in the cloud and Information Systems. We specifically discuss the challenges that are brought about by developments and upgrades that take place in time. In addition, we suggest design issues to address each challenge in a systematic manner. The identified scenarios and design issues complement the preservation‐ oriented view of the OAIS model by addressing a need for middleware and a set of preservation administration tasks, which should take place between OAIS‐compliant DP services and organizational information systems. Noteworthy in this article is the recognition of a scenario for future arrangements for digital preservation in which changes/upgrades is envisioned to happen in both IS and DP services in use. We take a few steps forward from the state‐of‐the‐art that indicates the preservation situation in which automation is provided one‐to‐one between assumingly steady IS and DP. For being able to tackle the scenario of integrating ever‐ changing information systems with multiple, cloud‐based, preservation services, we suggest seven abstract tasks which any preservation administrator in an organization needs to consider, and which should be ideally
6
Parvaneh Afrasiabi Rad, Jörgen Nilsson and Tero Päivärinta
automatized to the extent possible. The identification of these tasks opens interesting avenues for future research and development efforts in the field. To achieve automation and seamless interaction between IS and DP services, we realize that one should also decide and design where to locate the middleware components and administrative tasks. Our suggestion for taking responsibility for preservation administration tasks and middleware goes beyond what has been identified in the recent SHAMAN (2010) and Protage (2008) projects. As such, our work above concretizes further the abstract suggestion for a general‐level middleware by Päivärinta et al. (2014). Our future research will focus on actual challenges of how to implement the middleware and preservation administration tasks when integrating IS and DP services. This work takes place, e.g., as a part of the on‐going ForgetIT (2014) project. The project implements solutions for integrated and automated ingest and access processes through which we will also learn concretely more about the challenges and current potential for automatizing the preservation administration tasks and components. While we regard our analysis above as a useful start for this journey, we also recognize that there are a number of other issues that need to be considered while developing integration‐ready DP services in the cloud. For example, while today’s storage clouds typically aim at providing low‐cost storage space, they give few guarantees to the reliability and security of the stored information. A crucial aspect of enterprise risk management with regard to digital preservation of business processes is a careful analysis of the service dependencies in connection to specific business processes.
7. Conclusion Information systems and cloud‐based digital preservation services are subject to upgrades and will change over time. This poses several challenges for the ever‐increasing need for digital preservation in organizations with scarce human resources which can be used for actually doing and managing the task. It is thus important to build more integrated preservation solutions between IS and DP services where the preservation tasks remain smooth and maximally automated regardless of alterations in IS or DP services over time. Moreover, we evidently need better middleware and system support for maximally automated preservation administration, as described above. However, having a middleware between IS and DP service in itself does not guarantee automation to a high extent. The four problem scenarios and seven core tasks of preservation administration provide understanding of implementation challenges and requirements of such middleware to support automation of ingest, access and, finally, even maximum automation of preservation administration in itself.
Acknowledgments This work was partially funded by the European Commission in the context of the FP7 ICT project ForgetIT (under grant no: 600826).
References Andersson, L. Lindbäck, G. Lindqvist, J. Nilsson, and M. Runardotter, (2011). Web Archiving Using the Collaborative Archiving Services Testbed, In Proceedings of the e‐2011. Andersson, I., Randers, F., Runardotter, M., Nilsson, J., & Päivärinta, T. (2013). Design Principles for a Quality Model Supporting the Selection of a Cloud‐based Preservation Solution. Paper presented at Information Systems Research Seminar in Scandinavia, Oslo, Norway, August 11‐13. Birrell, D., Menzies, K., Maceviciute, E., Wilson, T., Wollschläger, T., Konstantelos, L., ... & Zabos, A. (2011). SHAMAN: D14.2 ‐ report on demonstration and evaluation activity in the domain of" memory institututions". CCSDS (2012) Reference Model for an Open Archival Information System (OAIS). Consultative Committee for Space Data Systems. Washington DC: CCSDS Secretariat / Magenta Book. Choy D., Müller F., and McVeigh R. (2011). Content Management Interoperability Services [CMIS] Version 1.1. OASIS Content Management Interoperability Services [CMIS] TC. Farquhar, A., & Hockx‐Yu, H. (2008). Planets: Integrated Services for Digital Preservation. Serials: The Journal for the Serials Community, 21(2), pp. 140‐145. ForgetIT, (2014). Abstract ForgetIT Project. Available at: http://www.forgetit‐project.eu/en/about‐forgetit/abstract/ [22 May 2014] Hitchcock, S., Brody, T., Hey, J., & Carr, L. (2005). Preservation for Institutional Repositories: Practical and Invisible. In, Ensuring Long‐term Preservation and Adding Value to Scientific and Technical Data (PV2005), The Royal Society of Edinburgh, Edinburgh, Scotland, 21 ‐ 23 Nov 2005. 9pp. Hunter, J., & Choudhury, S. (2004). A Semi‐automated Digital Preservation System Based on Semantic Web Services. In Digital Libraries, 2004. In proceedings of the 2004 Joint ACM/IEEE Conference on (pp. 269‐278). IEEE.
7
Parvaneh Afrasiabi Rad, Jörgen Nilsson and Tero Päivärinta
Hunter, J., & Choudhury, S. (2006). PANIC: an Integrated Approach to the Preservation of Composite Digital Objects Using Semantic Web Services. International Journal on Digital Libraries, 6(2), pp. 174‐183. Kanhabua, N., Niederée, C., & Siberski, W. (2013). Towards Concise Preservation by Managed Forgetting: Research Issues and Case Study. In Proceedings of the 10th International Conference on Preservation of Digital Objects (iPres’ 2013). Korb, J. and Strodl S. (2010). Digital Preservation for Enterprise Content: a Gap‐analysis Between ECM and OAIS, presented at the iPres 2010, Wien, Lavoie, B. F. (2004). The Open Archival Information System Reference Model: Introductory Guide. Microform & imaging review, 33(2), pp. 68‐81. Lavoie, B., & Dempsey, L. (2004). Thirteen Ways of Looking at... Digital Preservation. D‐Lib magazine, 10(7/8), 20. Linthicum, D. S. (2000). Enterprise Application Integration. Upper Saddle River, NJ: Addison‐Wesley Professional. LoC. (2014) Metadata Encoding and Transmission Standard Official Web Site. Library of Congress. Available at: http://www.loc.gov/standards/mets/ [22 May 2014]. Munkvold, B. E., Päivärinta, T., Hodne, A. K., & Stangeland, E. (2006). Contemporary Issues of Enterprise Content Management. Scandinavian Journal of Information Systems, 18(2), pp. 69‐100 Päivärinta T., Afrasiabi P., Nilsson J. (2014). The Problem of Seamless Interaction between Enterprise Content Management and Long‐Term digital preservation: a Literature Review. Manuscript submitted for publication. Ross S. (2012). Digital Preservation, Archival Science and Methodological Foundations for Digital Libraries, New Review of Information Networking, 17(1), pp. 43–68. Saul C. and Klett F. (2008). Conceptual Framework for the Use of the Service‐oriented Architecture‐Approach in the Digital Preservation, presented at the iPres 2008, pp. 229–234. Stewart C. (2012). Preservation and Access in an Age of E‐Science and Electronic Records: Sharing the Problem and Discovering Common Solutions, Journal of Library Administration, 52(3–4), pp. 265–278, Tessella. (2014). SDB Key features. Available: http://www.digital‐preservation.com/solution/safety‐deposit‐box/sdb‐key‐ features/. [28 April 2014]. Thibodeau, C. and Zangaro, S. (2013). Digital Data Archive and Presentation in the Cloud ‐ What to do and What not to do. Presentation at Storage Networking Industry Association. Wittek P. and Darányi S. (2012). Digital Preservation in Grids and Clouds: A Middleware Approach, Journal of Grid Computing, 10(1) March, pp. 133–149.
8
Preservation Services Planning: A Decision Support Framework Ingemar Andersson, Göran Lindqvist and Frode Randers Luleå university of technology, Luleå, Sweden
[email protected] goran.lindqvist@ldb‐centrum.se
[email protected] Abstract: Commercial organizations are experiencing a growing need to access business‐critical data in the longer term of their operations. Governmental regulations as well as commercial interests influence this need. Organizations are willing to procure cost‐effective services to this end ‐ services that are increasingly based as public or private cloud solutions. With the advent of autonomous cloud services comes the possibility to assemble (mix and match) preservation services in a workflow‐based service‐oriented solution. Following the interaction with information managers in (three) commercial organizations operating in different markets and after a review of current literature, we have revealed a lack of comprehensive guidelines and decision support in service selection as part of preservation planning. Existing models and frameworks used for assessing the quality of preservation services either manage performance‐based features that service provider’s offer or the technical details of the preservation actions themselves. In this paper we present our preservation‐ planning framework (Preserv‐Qual) that addresses the need for decision support in the selection of preservation services that explicitly acknowledge the differences among aspects of information use within an organization. We describe the outcome from an evaluation of the framework in three commercial organisations as a service quality assessment and decision support tool. This paper shows how our framework supports the use of existing and proven methods, models and principles for service assessment, digital preservation and decision support. Keywords: Digital preservation, Preservation planning, Decision‐making, Cloud computing
1. Introduction The problem of ensuring future access to digital objects, referred to as Digital Preservation (DP), is a multi‐ faceted effort related to preserving a set of qualities of the digital object to the future users of the object. A typical quality to preserve could for instance be authenticity (Nilsson 2009; Duranti 2005). With an increase in both the need to store information as well as the growth of data volume, cost‐effective preservation solutions have become more interesting. Issues associated with saving data over a longer period of time have traditionally been the focus of libraries, archives, government agencies, and academic institutions (Ross 2012; Anderson 2008). Recently, interest in this area also comes from commercial organizations. This interest is among other things driven by requirements for traceability of business transactions, legal obligations, etc. Being able to access data over a longer period of time has become an important part of their business model. This is especially true in businesses in the financial and medical care sectors, where evidence of the validity of relied upon data is crucial (Edelstein et al 2011). Until now, solutions for long‐term digital preservation have been designed as monolithic systems located in a discrete environment. The development of scalable solutions designed as cloud services has become an alternative for long‐term digital preservation (DuraSpace 2014; Preservica 2014). The ability to assemble (mix and match) autonomous preservation services, known as cloud broker solutions (Hogan 2011), has also become feasible. In keeping pace with this progress, there has been a rapid proliferation of vendors that complies with the new service model. This places new challenges on the preservation planning process in terms of how to compose preservation services. Among the challenges that need to be addressed you will find; having a variety of service providers, how do you choose among them, selecting to whom you will entrust your data for a long period of time? Should you choose a single service provider or opt for a solution of configurations of services from different providers (the mix‐and‐match approach)? It is no longer just a question of the sustainability of the individual service provider infrastructure, nor of the issues surrounding the data itself (such as choice of file format, software, or even security). As the environment in which all organizations operate is subject to change over time, the requirements regarding what to preserve, how to preserve, and why to preserve may change as well. Because of these social and technical factors of change, we need to capture the purpose of preserving the information into the planning process itself due to the predictive nature of the problem of ensuring future access to information. As a result of this, the act of planning for digital preservation using a set of services becomes more complex.
9
Ingemar Andersson, Göran Lindqvist and Frode Randers
The objective of this research is to develop and demonstrate a framework (Preserv‐Qual) as a foundation for design of decision support systems supporting the selection of composite preservation services. The decision support is based on the assessment of intersubjective factors such as staff turnover and economic stability of the individual service providers as well as technical factors such as choice of storage technology. The key motivation to our work is outlined above. The development of the framework is based on use cases from three different business cases. The organizations are partners in a large project we were involved in (ENSURE 2014). Based on a review of existing literature, the framework is influenced by existing methods for assessing quality of service and recommended practices for assessing the trustworthiness of digital preservation solutions. The paper is organized as follows; in the next section we describe background and prior work in digital preservation planning and different quality aspects related to digital preservation systems and cloud services. Thereafter, we describe an excerpt of the usage scenarios and presentation of findings from the project. This is followed by the presentation of our framework. The paper continues with a description of the application of the framework and a condensed evaluation. In the last section we discuss the implication of the framework and directions for future work.
2. Background and Related Work 2.1 Digital Preservation Planning: Quality of Service Development of digital preservation systems is not new. In 1996 the Consultative Committee for Space Data Systems developed OAIS (CCSDS 2012), a high‐level model for the operation of archives. Within the preservation area it has been important to establish trust by verifying whether a preservation system fulfills the OAIS standard. For this, various methods as DRAMBORA (McHugh et al. 2008) and auditing frameworks as Nestor (Nestor 2006) and Trusted Digital Repository (TDR) (TDR 2011) have been developed. Alongside, methods for assessing the quality of services have been developed. An overwhelming majority of these have focused on benchmark tests and provides a catalogue of metrics and methods appropriate for performance‐based assessments, such as CloudCmp (Li 2010), SMICloud (Garg 2013). Figure 1 shows an illustration of the preservation planning landscape for a cloud broker preservation solution. At one end we have a client organization, according to the OAIS called "Producer" in need of preservation of a digital collection. The client organizations are also part of the entity called "Consumers" i.e. users of the solution. Other kind of consumers is authorities that verify regulatory compliance of data. Between these stakeholders, we have service providers (SP) that provides technical services required to make data available over time. These services are divided according to the OAIS (CCSDS 2012) in the entities of ingest, data management, archival storage, management, and access (see Figure 1). The Producer organization is responsible for the process of selecting the appropriate mix of services based on the characteristics and the purpose of use of data. Challenges in this preservation planning process do not only include the choice of a preservation strategy, such as the choice of an appropriate file format or the choice of preservation actions – such as migration tools (Becker et al. 2009; Farquhar & Hockx‐Yu 2008). Additionally, you now also face the choice of storage and compute services, services for verification of integrity, services for encryption and decryption, services for monitoring the composite service, services for ensuring global security policies over all constituent sub‐ services, services for ingesting and disseminating information, etc. Since the choice of preservation services composed of a set of these sub‐services has to be made in light of cost and quality aspects, risk mitigation has to be addressed. This is a risk management problem that should be supported by tools for decision‐making (Ross 2012).
10
Ingemar Andersson, Göran Lindqvist and Frode Randers
Figure 1: Preservation services planning landscape What is needed is a framework that takes a more holistic and comprehensive view of the preservation planning process – a framework that provides guidelines for client organizations in choosing an optimal preservation plan reflecting both trust and performance of the services offered by the individual service providers.
3. A Decision Support Framework for Preservation Services Selection The development of our framework (Preserv‐Qual) is part of the ENSURE cloud‐broker digital preservation solution (ENSURE 2014) which is a solution composed of autonomous services where especially the distributed storage and compute services are key components of interest. Other component parts of the solution are services for integrity checking (fixity), transformation, and encryption offered by different service providers (SP). Our framework is part of a decision support component, a Configurator, as an aid for business organizations in the selection of the appropriate (mix and match) of services based on quality. The mission of the configurator is to be able to create an optimal solution for its commercial customer (i.e. the owner of the data) and as such to strike a balance between, on one hand, cost and economic performance and, on the other hand, quality of service. A condensed version of the ENSURE system process is: 1) capture user input as basic business requirements and data policies, 2) generate candidate services as output, 3) evaluate those outputs, 4) select the outputs for the user that is suitable for decision‐making, 5) presentation of candidate solutions to the user for selection, 6) install the selected services corresponding to the chosen configuration in the preservation runtime system. Development of Preserv‐Qual is based on three different use‐cases from clinical‐trial, healthcare, and financial services obtained through discussions and interviews distilled from separate revisions of the interviews. Here is a condensed excerpt from two different use‐cases (ENSURE scenarios 2014) that influenced the development of our framework. Scenario 1: clinical trials are conducted to bring new drugs to market. Data from medical records are the basis of these experiments and can be used for future reviews. Clinical trial data must be kept for at least 15 years for regulatory reasons in a way that ensures its authenticity, viability and security that ensures authenticity
11
Ingemar Andersson, Göran Lindqvist and Frode Randers
and validity of the stored data. Beyond the regulatory requirements, there is also a desire to ensure the usability, convenient access to the data, so that audits or inspections may be performed in a cost effective manner. Scenario 2: the financial sector is characterized by ever increasing volumes of high frequency market and transaction data. During the past decade, a particular focus has also been placed on research and development in the financial business performance improvements of IT infrastructure. The financial sector is characterized by a variety of rules and obligations at the national and at the EU level – to act in accordance to rules and legal norms – which has gained increasing importance. Regulations also encompass the archiving of obligations regarding trade and customer information in investment advice, among other things as a countermeasure to fraud and money laundering. In particular, any information received from – or provided to – the client must be maintained for the entire duration of the contractual relationship beyond the legal minimum retention period of five years. Compliance with all such rules and obligations are monitored and reviewed on an annual basis by the regulatory authorities responsible for these controls and their implementation. In summary, a number of observations can be made from usage scenarios. An important observation is that there are different motives for using preserved information. This leads to that different quality requirements must be met to varying degrees depending on the purpose of use. Based on an analysis of scenarios, related work and literature review, we identified three key dimensions that have influenced our framework (Andersson et al. 2014). Table 1 captures these dimensions. Table 1 Summarizes the conceptual dimensions that have influenced our framework. Dimension Trustworthiness Quality of Service Purpose of Use (PoU)
Description Determines the confidence of services in a preservation perspective. Defined by quality factors: authenticity, viability, and security. Estimation of performance‐based factors and the ability to move data. Defined by quality factors: accessibility and portability This determines the use of preserved data. Defined by different purposes: evidential, historic, and business.
Figure 2 presents a holistic view of the Preserv‐Qual framework in its context; supporting a business organization in the process of selecting an optimal cloud broker DP solution for its purpose of use based on quality. The figure shows how the different quality dimensions are used in the framework. The framework is described by the splitting into the layers of Cloud Service, Audit, Configuration, Run‐time, and Cloud Consumer Layer.
Figure 2: The Preserv‐Qual framework
12
Ingemar Andersson, Göran Lindqvist and Frode Randers
3.1 Cloud Service Layer The services that will be included in the service‐brokered DP solution are storage and compute services, migration services, fixity services to check accidental corruption of content or bit rot, and various encryption services to protect against unauthorized access and malevolent corruption of content. Different individual service providers may offer these services. The storage and compute services are the cornerstones in the solution and will support other services, e.g. a migration service will use the computation service to get work done. The services are selected as candidate services and registered in the audit result database (QE database).
3.2 Audit Layer – Quality Assessment This layer is responsible for carrying out the quality assessment and keeping track of the candidate service quality measurement result. Different types of services require different metrics and assessment techniques. It is necessary to assess and compare the same types of services in the same way, using the same metrics and the same assessment scale. The framework supports a classification of quality factors (Q‐factors) in two dimensions Trustworthiness and Quality of Service. Trustworthiness represents Q‐factors and metrics related to the management of digital objects in a reliable manner and refers to mechanisms, procedures, staff competence and organizational viability. Related to the Quality of Service dimension are Q‐factors and metrics that enable measuring of performance and portability. The results of the service assessments are registered in the QE database used as input to the Quality Engine in the configuration layer.
3.3 Configuration Layer ‐ Quality Engine This is a central part of the framework that is responsible for capturing basic requirements, calculation of quality scores and to express the outcome of the quality assessment to the users charged with making decisions. Examples of basic requirements are data transformation policies, geographical restriction on placement of data and the grouping of usage needs by defining Purpose of Use (PoU). The configuration layer will then produce proposals on various parameterized preservation plans that contain the basic requirements and the available services which conform to the requirements. The outcomes from the Q‐factor ranking component are each factor’s weight of importance in relation to PoU. The Q‐factor value is obtained by a pairwise comparison of each value and a mathematical calculation in accordance with the Analytic Hierarchy Process (AHP) (Saaty 2008). For each Q‐factor adequate metrics are identified depending on the type of service that we refer to. The sum of metric values (audit data) obtained by the service measurement instrument defines the service fulfilment rate for each Q‐factor. The calculator component computes a quality score based on service audit data and Q‐factor ranking. Input to this layer is a parameterized preservation plan proposal with service specification, triggering events, digital object specification, basic requirements, and defined PoU. A graphical user interface (GUI) presents detailed data from quality measurement results for each service that is part of the preservation plan proposal. The results are presented to preservation planners by scores, graphical charts and quality risk expressions and the quality results are presented along with the cost assessment. The preservation services plan that match the client organizational needs is selected as the preservation service execution plan as input to run‐time layer. The objective of the configuration layer is to be able to create an optimal solution for the organization.
3.4 Cloud Consumer Layer ‐ Preservation Planning This layer is responsible for communicating with the users of the system. The main objective of the framework is to support the decision‐making of users of the system in selecting the preservation plan that best fits the organizational needs. The client organization can specify the basic operational requirements and policies as how long each type of data must be preserved affecting horizon of configuration, data transforming policy (e.g. compression limitations that affect usability) and if there are geographical restrictions on placement of data. The requirements are part of a parameterized preservation plan as input to the configuration layer. Users responsible for preservation planning are presented with the quality assessment result of each proposed solution with access to details by the visualizer GUI. The users can either modify initial requirements and policies and execute the plan another loop in the configuration layer, or choose a solution from the proposed. If needed in the future, the organization is able to execute a reconfiguration process with a new set of requirements. The framework has been validated in use cases from financial, healthcare and clinical trials sector.
13
Ingemar Andersson, Göran Lindqvist and Frode Randers
3.5 Run‐Time Layer – a Cloud Broker Preservation System The run‐time layer is responsible for instantiating a preservation system based on the selected services (plugins) available in the repository according to the selected preservation plan. The configurator provides a preservation plan that invokes a plugin manager in the service aggregation component to install the plugins requested by the preservation plan, and then passes control to the workflow engine in the runtime engine component. The storage service component is based on the Preservation DataStores in the Cloud (PDS Cloud) (Rabinovici‐Cohen 2013). PDS Cloud is a preservation aware storage service infrastructure component that provides an abstraction over multiple cloud storage and compute providers.
4. Application and Evaluation The application of the framework Preserv‐Qual is divided into two major phases. The first phase in preparation of quality assessments, providing data to the QE database that stores the results of the quality measurements of potential services. These services are considered to be candidate services, which may – or may not ‐ be part of a future DP solution. The second phase uses the information in the QE database for calculating the quality score for the aggregated services in each DP solution proposal. Figures 3 and 4 show how the Preserv‐Qual framework can be operationalized.
4.1 Audit Layer ‐ Quality of Service Assessment Each candidate services that can be included in a proposed DP solution must be assessed, where each class of service requires a specific type of quality assessment. In Figure 3 the main components of the first phase are presented. The “evaluate quality factor for purpose of use” is the Q‐factor ranking component. Another activity is related to the measurement of a Storage and Compute (S&C) service in two different process activities. The first process relates to the measurement of the Trustworthiness dimension with related Q‐ factors of authenticity, viability, and security. The Trustworthy Digital Repository Checklist (TDR 2011) is a suitable instrument for performing this assessment. The TDR‐checklist supports the assessment of service mechanisms that span from organizational staff competence and financial strength to technical infrastructure mechanisms. The second process of the S&C measurement is related to the Quality of Service dimension with related Q‐factors of accessibility and portability. Suitable instruments for this measurement are cloud service benchmark tools as CloudCmp (Li at al. 2010) and CloudHarmony (CloudHarmony 2014) and service specifications. The last activity is related to the quality assessment for fixity, encryption, and transformation services. The evaluation of fixity and encryption services is based on existing ratings of algorithms. The evaluation of transformation services is done by an internal component that compares properties before and after migration of object. Software that could be used in this activity is the PLANETS testbed (Farquhar & Hockx‐Yu 2008).
4.2 Configuration Layer ‐ Quality Engine This phase (figure 4) is triggered by the reception of the preservation plan (GPP). The GPP is an XML‐based specification composed of candidate services, purpose of use, digital object, basic requirements, and preservation event specifications. A quality score for each GPP is calculated based on the results of various plugin measurements and the result from the Q‐factor ranking component obtained from the QE database. Output is a quality score and risk expressions.
14
Ingemar Andersson, Göran Lindqvist and Frode Randers
Start QOE Assesment
Preservation Services Specification
Collect fixity service result
[Other]
Plugin?
[S & C]
Evaluate storage & compute service Evaluate quality factors relevancy for purpose of use
Fixity algorithm evaluation results
Assess storage & compute quality factors (Trustworthiness) Instrument:TDR
Collect encryption service result Encryption algorithm evaluation result
Expert AHP-ranking of Quality factors related to Purpose of Use (Business, Evidence, Historic)
Assess storage & compute quality factors (Quality of Service) Instrument:Benchmark
Assess transformation services Instrument:Offline-Quality Engine, Planets etc
Storage & compute assessment results
Transformation services results
QE Database
Figure 3: Preserv‐Qual – Audit layer: quality of service assessment
Figure 4: Preserv‐Qual – Configuration layer: quality engine
Quality factors weighting results
4.3 Evaluation The evaluation of the ENSURE system (ENSURE evaluation 2014) showed that the use workflows were logical, intuitive and clear. The system was demonstrated for each of the users, followed by a session where end users were able to perform a complete walk through of the system. The applied evaluation method was more qualitative than quantitative. Individual user feedback from the in‐house tests was collected. Users were asked to give their subjective assessment of system usability (utility). The Q‐factors used in the test of the configurator, which Preserv‐Qual is part of, was demonstrated to be relevant to users in this decision‐making context. The result of the evaluation phase showed that the system proved to be reliable and trustworthy, and
15
Ingemar Andersson, Göran Lindqvist and Frode Randers
designed to meet the auditability criteria as part of its quality assessment. All necessary Q‐factors ‐ authenticity, viability, and security issues were addressed, fulfilling the precondition of trustworthiness.
5. Discussion and Concluding Remarks Our framework builds upon previous work in the area of digital preservation by supporting existing methods for measuring degree of trust in a preservation system (TDR 2011). The framework has also been influenced by, and provides support for ‐ parts of existing methods for measuring the quality of cloud services (Garg 2013; Li 2010). Preserv‐Qual can act as a foundation for design of decision support systems to be used in the preservation planning process. This is the core contribution of our work. The key motivation for our work is the growing volume of data that has to be saved for a long period in time. This has increased a demand for cost‐effective preservation solutions. With this backdrop, cloud‐broker solutions composed of autonomous preservation services has become a realistic option. This provides an opportunity for organizations to assemble a preservation solution adapted to needs. This expands the view of the preservation planning process (CCSDS 2012). This is in contrast to previous preservation planning approaches that focused on the selection of appropriate file formats and migration services (Becker et al. 2009; Farquhar & Hockx‐Yu 2008). We argue that this process has to be supported by a new kind of decision aid – a system that supports the organization in the selection of preservation services based on quality aspects adapted to their individual purpose of use. A practical application of our framework has been tested in the ENSURE project (ENSURE evaluation 2014) with satisfactory results, but there is still room for further development, such as continued research in identification of appropriate quality factors and improved decision aid in the interpretation of the results. These are the main incentives for further research and development of our framework.
Acknowledgements The research leading to these results has received funding from the European Community´s Seventh Framework Programme (FP7/2007‐2013) under grant agreement n° 270000.
References Anderson, M. and Mandelbaum, J. (2008), Planning for the “long term”... in library time, Digital Archive Preservation and Sustainability (DAPS) Workshop, MSST2008 25th IEEE Symposium on Massive Storage Systems and Technologies, Baltimore, MD, September 22, Available at: [Accessed 24 June 2014] Andersson, I., Randers, F., & Sein, M. K. (2014) “A Conceptual Framework for Preservation Planning”, International Journal of Digital Curation, in review process. Becker, C., Kulovits, H., Guttenbrunner, M., Strodl, S., Rauber, A., & Hofman, H. (2009) ”Systematic planning for digital preservation: evaluating potential strategies and building preservation plans”, International Journal on Digital Libraries, 10(4), 133‐157. CCSDS. (2012) Reference model for an Open Archival Information System (OAIS) Magenta Book CCSDS 650.0‐M‐2. [online] Available at: [Accessed 22 May 2014] CloudHarmony, [online] Available at:[Accessed 22 May 2014] Duranti, L. (2005) “The long‐term preservation of accurate and authentic digital data: the INTERPARES project”, Data Science Journal, 4(25), 106‐118. DuraSpace. DuraCloud. [online] Available at: [Accessed 22 May 2014] Edelstein, O., Factor, M., King, R., Risse, T., Salant, E., & Taylor, P. (2011). ”Evolving domains, problems and solutions for long term digital preservation”. Proceedings of iPRES, 2011. ENSURE. Enabling kNowledge Sustainability Usability and Recovery for Economic value. [online] Available at: [Accessed 22 May 2014] ENSURE evaluation, Activity V Evaluation and conclusion, [online] Available at:[Accessed 22 May 2014] ENSURE scenarios, Activity V Scenario definitions, [online] Available at:[Accessed 22 May 2014] Farquhar, A., and Hockx‐Yu, H. (2008) ”Planets: Integrated services for digital preservation. Serials”, The Journal for the Serials Community, 21(2), 140‐145. Garg, S. K., Versteeg, S., & Buyya, R. (2013) “A framework for ranking of cloud computing services”, Future Generation Computer Systems, 29(4), 1012‐1023.
16
Ingemar Andersson, Göran Lindqvist and Frode Randers
Hogan, M., Liu, F., Sokol, A., & Tong, J. (2011) ”Nist cloud computing standards roadmap”, NIST Special Publication, 35. Li, A., Yang, X., Kandula, S., & Zhang, M. (2010) ”CloudCmp: comparing public cloud providers”, In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pp. 1‐14. McHugh, A., Ross, S., Innocenti, P., Ruusalepp, R., & Hofman, H. (2008), “Bringing self assessment home: repository profiling and key lines of enquiry within DRAMBORA”, In Archiving Conference (Vol. 2008, No. 1, pp. 13‐19). Society for Imaging Science and Technology. Nestor. (2006) Catalogue of Criteria for Trusted Digital Repositories, [online] Available at: [Accessed 22 May 2014] Nilsson, Jörgen (2008) “Preserving useful digital objects for the future” Dissertation, Luleå tekniska universitet. Preservica. World leading digital preservation technology in the cloud. [online] Available at: [Accessed 22 May 2014] Rabinovici‐Cohen, S., Marberg, J., Nagin, K., & Pease, D. (2013) “PDS Cloud: Long term digital preservation in the cloud”, In Cloud Engineering (IC2E), 2013 IEEE International Conference, pp. 38‐45. Ross, Seamus. (2012) “Digital preservation, archival science and methodological foundations for digital libraries”, New Review of Information Networking, 17(1), 43‐68. Saaty, T. L. (2008) “Decision making with the analytic hierarchy process”, International journal of services sciences, 1(1), 83‐ 98. TDR, Trustworthy Digital Repository Checklist (2011) [online] Available at: [Accessed 22 May 2014]
17
Secure Video Transcoding in Cloud Computing Mohd Rizuan Baharon, Qi Shi, David Llewellyn‐Jones, and Madjid Merabti School of Computing and Mathematical Sciences, Liverpool John Moores University, UK
[email protected] [email protected] D.Llewellyn‐
[email protected] [email protected] Abstract. Video Transcoding is one of the recent services available online nowadays, provided by cloud service providers like Amazon Web Services (AWS). This service enables a user to convert a video format from one to another in a very convenient way. Traditionally, the transcoding process requires a user to buy and manage transcoding software, and to have a correct amount of computing resources to run the transcoding job. Therefore, using the services from the cloud allows a user with limited computing resources and storage spaces to delegate that heavy job to the cloud. As the cloud has powerful computing resources and massive storage spaces, the transcoding process of a video can be done very efficiently and economically. To transcode a video, all of the video data have to be uploaded to the cloud storage. However, outsourcing videos that may contain sensitive information such as private video or video for business purposes does not guarantee security and privacy as the third party like clouds has the ability to access them. To overcome such a problem, transcoding a video in an encrypted form is required. Currently, a Fully Homomorphic Encryption (FHE) scheme seems to be one of the potential solutions to the stated problem as the scheme allows arbitrary computation on encrypted data without decryption. However, existing FHE schemes are suffering from efficiency issues for their implementation. Thus, in this paper, we propose a Homomorphic Encryption (HE) scheme based on integers. We design the scheme based on asymmetric encryption that utilizes a public key for encryption and a secret key for decryption. Furthermore, we offer an appropriate technique for transcoding the encrypted video data using the MPEG video compression technique without disclosing any video content to the cloud. In the meantime, we provide a scheme with better efficiency as all calculation for the transcoding process will be executed over the integers. This means that, we consider all the input as the integers, performing all the operations over the integers, and return the resulting integers. Keywords: Video transcoding, fully homomorphic encryption scheme, cloud computing, asymmetric encryption scheme, the integers, mpeg video compression technique
1. Introduction With the emerging technology of cloud computing, more and more services have been offered and delivered through the Internet including video transcoding. Video transcoding techniques are essential solutions to convert a video format from one to another, along with possible different solutions. Video transcoding can help video content providers not to prepare all the formats of the video with the contents. When a specific format with a particular encoding and resolution is demanded by a customer, video content providers can use transcoding technology to offer a suitable format of the requested video. Transcoding is essential to let the video to be stored and carried in a compressed format so that storage space and communication bandwidth can be scaled down. To provide a transcoding capability, video content providers frequently resort to a server technology like the cloud to acquire necessary computing powers and storage space. There are many transcoding solutions vendors such as Amazon Web Services (AWS), Zencoder and Panda, which provide real‐time video transcoding services based on cloud computing. Their approaches seem to be reasonable, as video transcoding for a large number of clients needs a great amount of resources such as computing power, memory, and storage space (Ko et al. 2013), (Ashraf et al. 2013), (Jokhio et al. 2013). Many video stream consumers often require to view the videos through various devices, such as digital TVs, smart phones, and tablets. To satisfy their demands, video content providers need to prepare multiple types of video contents for the same video titles, as various video codecs and devices are available and used by consumers (Ko et al. 2013). In addition, the storage of multiple transcoded versions of each source video requires a large amount of disk space. Infrastructure as a Service (IaaS) clouds provide virtual machines for creating a dynamically scalable cluster of servers. Likewise, a cloud storage service may be used to store a large number of transcoded videos (Jokhio et al. 2013).
18
Mohd Rizuan Baharon
Prior to transcoding taking place in the clouds, video data need to be uploaded to their storage. However, confidentiality becomes one of the primary concerns if the video data contain private information or video with commercial values. Such an issue can be directly resolved by encryption techniques. However, conventional cryptographic techniques which in the main aim at encrypting text data are not very suited for video data. This is because such techniques cannot process a large amount of video data in real‐time. Moreover, it is almost impossible to adapt them to special video application paradigms which pose special requirements that are never encountered when encrypting text data (Liu & Koenig 2010). Research on securing multimedia data is getting more and more attention from academia as well as from a business perspective. Since the mid‐1990s, many research efforts have been devoted to the development of specific video encryption algorithms. Early overviews of these algorithms are applied in two research works done by Fuhrt and Kirovski, and Liu and Eskicioglu. These papers qualify them as selective encryption algorithms. This point of view is not entirely right. It grasps an essential characteristic of most video encryption algorithms that only selected parts of the video stream are encrypted to reduce the encryption burden. Since then, a lot of multimedia encryptions have been proposed with consideration of a balance between efficiency and security. This is due to multimedia data having a large size and efficiency needs to be prioritized to design encryption schemes. Even though such criteria have been considered, most of the proposes schemes cannot be implemented as they still have weaknesses and need to be amended. In the meantime, none of the multimedia encryption scheme has applied homomorphic properties in designing an encryption scheme. To the best of our knowledge, we are the first to apply a homomorphic encryption scheme to allow video transcoding to be processed in the cloud environment. In this paper, we propose a homomorphic encryption scheme for video transcoding. Our system is based on asymmetric encryption that utilizes a multiplicative group over the integer. It has been designed to enable the MPEG compression technique to be processed with the encrypted Discrete Cosine Transform coefficients. Such a technique is widely employed in multimedia data for entertainment as well as business purposes. The rest of this paper is organized as follows. We first briefly describe the background of related work in section 2. Section 3 reviews the details about MPEG video compression technique to compress a video into the MPEG format. We then explain the system fundamentals, requirements and other details in Section 4. In Section 5, we provide the security details of the proposed system. Section 6 explains the secure process of video transcoding using the proposed system in the cloud environment. Finally, the conclusion and further work are given in Section 7.
2. Background 2.1 Security of Multimedia Data and Cloud Computing Saeed B., and Majid N., have proposed the used of simple and lightweight stream cipher algorithm to secure the multimedia data after taking into consideration of such data consist of excess volume of information and they need real‐time uses(Ashraf et al. 2013). To secure the data by means of encryption, additional computation is needed. Thus, their aim is to be balanced between security and the necessity of synchronization. This encryption was proved to be secured by Cloude E. Shannon in 1949, but the key stream must be generated completely at random with at least the same length as the plaintext and cannot be used more than once. Such requirement makes the scheme very trivial to be implemented in practice, and as a result the schemes have not widely used except for the most critical applications (Bahrami & Naderi 2013). Furthermore, according to the survey made by Fuwen L., and Hartmut K., many encryption algorithms had been proposed operating after compression, but only two of them operating before the compression schemes which are the Pazarci‐Dipcin scheme and the correlation‐preserving video encryption scheme. However, both of them were proven unsecure enough where the former is not secured against brute force attacks, and the known or chosen‐plaintext attacks, while the latter is not secure against known‐plaintext attacks. Moreover, the latter scheme has great limitation as it is merely applicable to video codecs that use only intra‐frame technology, such as M‐JPEG. It cannot be deployed for the widely used video codecs that apply the hybrid coding technologies, such as MPEG‐2 and H.264 (Liu & Koenig 2010).
19
Mohd Rizuan Baharon
In other research works, there were several approaches have been proposed to avoid decryption of protected multimedia content at mid‐network nodes. Mou et al., have designed a secure media streaming mechanism by making use of the existing highly studied cryptographic techniques. They have proposed a secure media streaming mechanism which combines encryption, authentication, and transcoding to address content protection, sender authentication, and media adaptation, respectively, and coherently. However, their scheme cannot be implemented in a cloud computing environment as they assumed mid‐network proxies are trusted devices, so decryption can also be done on mid‐network proxies for the purpose of transcoding. This contradicts with our assumption as the cloud service provider is an untrusted party. The cloud providers are just responsible for transcoding job and they are supposed not to see the content of the video they are processed (Mou et al. 2009).
2.2 Fully Homomorphic Encryption Scheme and its Efficiency Computation of arbitrary functions on encrypted data is a desired application of many online service providers like cloud service provider as it offers tremendous benefits to their business and to the users as well. By enabling computation on encrypted data, many applications can be outsourced into the cloud to process the encrypted data without decryption. Such a process can guarantee the security and privacy of the data processed by those applications. With the advent technology provided by the cloud, many organizations have started to think and move their in‐house business operations to the cloud. This is due to the cloud providing huge advantages, including ample computing resources and storage spaces on an as‐pay‐as‐you‐use basis (Marston 2011). In addition, those fantastic services can be leveraged in a very convenient way with minimum IT facility requirements such as desktop machines and the Internet. In order to fully leverage services from the cloud, a user needs to outsource its data into the cloud storage. Nevertheless, outsourcing such information that may contain private data raises security and privacy concerns as the safety of the data is not guaranteed (Zissis & Lekkas 2012). Thus, to preserve the security and privacy of the data, computation of arbitrary functions on encrypted data is certainly required, which enables the user to perform operations on its data without revealing the data contents to the cloud. To achieve such a requirement, a lot of research has been conducted from various perspectives such as theoretical research and practical applications to enable arbitrary functions to be computed on encrypted data (Tibouchi 2014), (Kim et al. 2013), (Gentry & Halevi 2011), (Brakerski 2012), (Fan & Vercauteren 2012), (Sahai 2008). Research of computation on encrypted data started 30 years ago when Rivest, Adleman, and Dertouzos invented the RSA cryptosystem (Davis 1987). Even the RSA scheme has a special feature which is homomorphic under multiplication, but it is still unable to compute arbitrary functions on encrypted data, as to achieve that, the encryption scheme must support homomorphism in both additions and multiplications (Gentry 2010). Since then, a lot of methods had been proposed, but they were still partially homomorphic or fully homomorphic under certain conditions (Boneh 2005), (Naehrig et al. 2011). This process continued until a smart guy Gentry has found a homomorphic scheme that supports both additions and multiplications, so as to enable computation of arbitrary functions on encrypted data. The proposed scheme is called a fully homomorphic encryption (FHE) over Lattices (Gentry 2009). Since then, all the proposed FHE schemes have followed the blueprint of Gentry’s scheme (Smart & Vercauteren 2010). His achievement has proven that computation on arbitrary encrypted data is achievable and provided a new direction for other researchers to conduct research to allow computation on arbitrary encrypted data. Even computation on an encrypted data is achievable and many optimizations have been made to improve the existing schemes; their implementation in any practical system still far from practical as efficiency is the biggest obstacle. According to (Gentry & Halevi 2011), (Brakerski 2012), and (Brakerski et al. 2012) this efficiency problem comes from various aspects such as key generation algorithm, the noise growth, and a technique that has been used to reduce the noise. As such in (Dijk et al. 2010), the public key generated with high complexity is unpractical to be implemented in any practical system (Mandal et al. 2011). Furthermore, as all of the existing schemes based on the blueprint of the Gentry’s scheme, each ciphertext generated has noise attached for security reason. When the ciphertext is processed, the noise will be growing and once it exceeded the size of the secret key, i.e. the threshold; the decryption will be wrong. To remedy such a problem, squashing can be used to reduce the noise. However, this technique is time‐consuming and need to be minimized or omitted as proposed in (Tibouchi 2014), (Kim et al. 2013), (Brakerski 2012).
20
Mohd Rizuan Baharon
3. MPEG Video Compression Technique Raw video contains a large amount of data. On the other hand, communication and storage capabilities are limited and thus expensive. For example, a given HD video signal might have 720 ∗ 1280 / , and a playback speed of 40 frames/sec and this produces an information flow of: ∗ 884.74 / . (1) For a channel with bandwidth 50 / , it requires the video to be compressed by a factor of about 18. The way this is achieved is through video compression. Video compression is done through reduction of redundancy and irrelevancy.
3.1 The Overview The MPEG bit‐stream structure can be expressed in an abstract way as in figure 1. The figure shows the bit‐ stream structure that result from video compression algorithms. The 8x8 block values are coded by means of discrete cosine transform.
Figure 1: MPEG Codecs video in a hierarchy of layers (Deneke 2011)
3.2 The DCT and IDCT The Discrete Cosine Transform DCT Formula is a technique to compress an MPEG Video format. The DCT is one of the most popular transforms used in multimedia compression. It is an orthogonal transform without complex computation whose inverse can be well computed. For highly correlated image data, the DCT provides as efficient compaction and has the property of separability. According to Equation (2) in two‐ dimensional condition, the DCT operates on by block of pixels , , and its output is a block with by block of pixels , . ∑ ∑ cos (2) , cos , With, if
0,
1 if
0
√
;
if
0,
1 if
0
√
From Equation (2), , is the brightness of the pixel at position , . , is a set of by coefficients representing the data in the transformed matrix value at position , . A set of waveforms is defined for each possible value of (usually 8, thus there exist 64 waveforms). Each coefficient can be seen as the weight of each of these basis patterns or waveforms. By summing all the waveforms scaled by the corresponding weight, the original data can be recovered (Bahrami & Naderi 2013).
21
Mohd Rizuan Baharon
The Inverse Discrete Cosine Transform IDCT formula is given in Equation (3). ∑ ∑ , ,
(3) where , is a transform matrix value at position , and , is the original pixel of the video content as described above. The IDCT is used by the decoder to reconstruct the pixel values of the compressed video.
4. The system The system is constructed upon some fundamentals and requirements as explained below.
4.1 Multiplicative Group over the Integer The system uses multiplicative group over the integers. The group can be explained as follows: 0,1,2, … , 1. Let be the integer and ∈ . are all integers in which less than i.e. ∗ 2. consists of all elements of which have a multiplicative inverse modulo i.e. ∗
∈
:
1 for some ∈
Example: Let consider the subset ∗ 1,2,4,5,7,8 of group of 6 elements. This can be shown as in table 1. Table 1: The Multiplicative Group over
∗
.
∗
1 .
.
together with multiplication modulo 9, forms a
.
∙ 1 2 4 5 7 8
1 1 2 4 5 7 8
2 2 4 8 1 5 7
4 4 8 7 2 1 5
5 5 1 2 7 8 4
7 7 5 1 8 3 2
8 8 7 5 4 2 1
Proposition 1: For every ∈ where is natural number, the set ∗ forms a group under multiplication modulo . 1 and Proof: Suppose that , ∈ ∗ . Then, there exist , ∈ such that 1. Clearly ∗ ∈ . Hence ∈ . Proposition 2: For every ∈ , Then ∗ ∈ ∶ gcd , 1 . Proof: Suppose there exist , ∈ such that , . It follows that if gcd , 1, then 1, then for any ∈ , ∈ 1 mod , so that ∈ . On the other hand, if gcd , 0, , 2 , … , , so that 1 mod , hence ∉ ∗ .
4.2 The Subgroup Decision Problem Let a security parameter in positive integers and let , , ∗ be a tuple produced by an algorithm where . The subgroup decision problem can be considered as follows: Given , ∗ and an element ∈ ∗ , output ‘1’ if the order of is and output ‘0’ otherwise; That is, without knowing the factorization of the group order , decide if an element is in a subgroup of ∗ . This problem is named as the subgroup decision problem. For an algorithm , the advantage of in solving the subgroup decision problem is defined as: , , ∗ ∈ , , ∗ ∈ 1: , ∗, 1: . , ∗, ∗ , ∈ , ∈ ∗
22
Mohd Rizuan Baharon
Definition 1: The algorithm satisfies the subgroup decision assumption if for any polynomial time algorithm is a negligible function in (Boneh 2005). such that
4.3 The System Algorithm The system is built upon four algorithms and can be explained as below. KeyGen ( ). to obtain a tuple , , ∗ . Suppose and is a large Given a security parameter ∈ , run random prime number. Select two random generators , ∈ ∗ and set . Then is a random generator of the subgroup of ∗ of order . The public key , ∗ , , , and the secret key . Encrypt( , ):. . To encrypt a message using the public key , select a random Let ∈ 0,1, … , ∩ with ∈ 0,1, … , 1 and compute mod (4) Output as the ciphertext. Decrypt , . , observe that To decrypt a ciphertext using the private key (5) mod . To recover , compute the discrete logarithm of base ̅ . Since 0 , then this Let ̅ takes expected time √ using Pollard’s Kangaroo Method (Boneh 2005). Homomorphic properties A scheme C is said to be homomorphic under ∗ operation if (6) ∗ Theorem 1: Homomorphic under addition operation The system C is said to be homomorphic under addition operation. Proof: , and ∈ , and ∈ ∗ of messages Let ( , ∗ , , ) be a public key. Given encryption 0,1, … , respectively, then these messages can be added together, by computing mod . mod , and mod . Then, According to Equation (4), mod mod ̃ mod where ̃ 2 mod , thus the system is homomorphic under addition operation. ∗ Since , ,…, , , ):To evaluate the system over addition operation, Let Evaluate( , ,…, be ciphertexts of plaintexts , , … , respectively. Then, mod
∑
The result of summation ∑
̃
∏ can be obtained as below:
23
̃
mod mod
Mohd Rizuan Baharon
∑ ̃
mod
∑ ∑
Takes discrete log based log
̃
mod
mod
on both side of the equation; ∑
log
mod
mod .
5. System Security The system C is secure based on the hardness of the subgroup decision problem. The proof given as below. Theorem 2: The system C is semantically secure assuming satisfies the subgroup decision assumption. Proof: Suppose a polynomial time breaks the semantic security of the system with advantage . An algorithm that breaks the subgroup decision assumption is constructed with the same advantage. Given , ∗ , as input, algorithm works as below: 1. picks a random generator ∈ ∗ and gives algorithm the public key , ∗ , , . 2. Algorithm outputs two messages , ∈ 0, 1, . . . , to which responds with the ciphertext ∈ ∗ for a random ← 0, 1 and random ← 0, 1, . . . , 1 . 3. Algorithm outputs its guess ′ ∈ 0, 1 for . If ′ algorithm outputs 1 (meaning is uniform in a subgroup of ∗ ); otherwise outputs 0 (meaning is uniform in ∗ ). It is easy to see that when is uniform in ∗ , the challenge ciphertext is uniformly distributed in ∗ and is independent of the bit . Hence, in this case ′ 1/2. On the other hand, when is uniform in the ‐subgroup of ∗ , then the public key and challenge given to are as in a real . It now semantic security game. In this case, by the definition of , we know that ′ follows that satisfies and hence breaks the subgroup decision assumption with advantage as required (Boneh 2005).
6. Secure Video Transcoding in the Cloud 6.1 Computation on Encrypted Floating Point Number Most of the existing encryption systems are incompatible to be used with floating point numbers mainly when the cryptosystem uses modulo operation over the integer. This is due to modulo operation over the integer will always return the output as an integer form. In our proposed system, we offer a system that can compute floating point numbers by applying an appropriate approach as described below. Multiplication of a floating point number with an encrypted data Suppose an integer is encrypted using the proposed scheme, . To multiply the ciphertext with a fraction , the steps below need to be observed: 1. Determine the precision of the computation. For instance, let consider the precision is . 0.
…
2.
Represent as a floating point number with – precision,
3. 4.
… with 10 to convert it into an integer form … , 10 ∈ . Multiply 0. Multiply the ciphertext, with the integer, … , 10 . All operations are executed in modulo for security reason. Decrypt the result using the scheme’s decryption algorithm, . Multiply the output, with 10 to retrieve the final result as shown in Equation (7)
5. 6.
24
Mohd Rizuan Baharon 1
10
6.2 Secure MPEG Video Compression Technique As described in subsection 6.1, the proposed approach allows the MPEG video compression technique to be processed by the clouds securely without revealing any content of the video to the cloud providers. To illustrate how the process works over the ciphertext form, let consider the following computation on a block of frame of the size 8 8 pixels that has widely used in the MPEG compression technique. In this case, let consider the following 8 8 bar diagram and black‐white frame as shown in figure 2.
Figure 2: Bar diagram and Black‐white frame ,
… … … …
,
… … …
,
… ,
,
… …
,
As shown in figure 2, the bar diagram and the black‐white frame are normally used to represent the brightness of each pixel in 8 8 block. Both of the diagrams can be represented as a matrix block . In order to compress the matrix block, the DCT formula will be implemented. The first element in the compressed block is computed as below. Since 8, 0, and 0, then Equation (2) can be expressed as 0,0
1 4
,
As the formula contains a fraction , the proposed approach as described in subsection 6.1 has to be implemented to convert the fraction into an integer. Then, the result is represented as a vector as below. ∑ ∑ , 0.25 10 , 10 0,0 10 where 2. Prior to the compression process taking place, every element , in the matrix block needs to be , for security reasons. Once encrypted, the first compressed data in an encrypted form encrypted 1 can be computed as below. 7
0,0
1
7
25
1
0
,
, 102
0
The process is repeated until all the data in the block are completely compressed. For validation purposes, the decryption algorithm has to be implemented to the result of the compressed data in ciphertext form. The result, 1 0,0 is decrypted as below. 1 1
Then, the result
0,0
1
0,0
0,0
102 , 102
102 , 102 is divided by 102 .
0,0 102 102 To validate this result, it has to be compared with the compression result computed in plaintext form. 0,0
25
Mohd Rizuan Baharon
7. Conclusion We have proposed a new technique that allows video transcoding to be leveraged securely in the cloud environment. The technique utilizes the homomorphic property in a multiplicative group over the integer as it enables multimedia data to be added in an encrypted form without decryption. Such a salient feature is extremely useful in order to allow cloud providers to do the transcoding job without being able to see the video data they process. For the security aspect of the system, it is based on the subgroup decision problem. With the selection of suitable parameters and a strong enough key for encryption, the system is robust and secure to protect multimedia data while processing in the cloud environment. In order to execute such a system, we have proposed a new technique to deal with floating point numbers in this paper. Such an approach is essential as the process of video transcoding consists of a computation of floating point numbers. For future work, we will be analysing the system based on its performance using Matlab simulation. The result mainly the efficiency will be compared to other existing systems in order to have a better encryption system.
References Ashraf, A. et al., 2013. Stream‐Based Admission Control and Scheduling for Video Transcoding in Cloud Computing. 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, pp. 482–489. Bahrami, S. & Naderi, M., 2013. Encryption of multimedia content in partial encryption scheme of DCT transform coefficients using a lightweight stream algorithm. Optik ‐ International Journal for Light and Electron Optics, 124 (18), pp. 3693–3700. Boneh, D., 2005. Evaluating 2‐DNF Formulas on Ciphertexts. In Second Theory of Cryptography Conference, TCC 2005, Cambridge Proceedings, pp. 325–341. Brakerski, Z., 2012. Fully Homomorphic Encryption without Modulus Switching from Classical GapSVP. IACR Cryptology ePrint Archive, 78. Brakerski, Z., Gentry, C. & Vaikuntanathan, V., 2012. (Leveled) fully homomorphic encryption without bootstrapping. Proceedings of the 3rd Innovations in Theoretical Computer Science Conference on ‐ ITCS ’12, pp. 309–325. Davis, G., 1987. Processing Encrypted Data. Communications of the ACM., 30 (9), pp. 777–780. Deneke, T., 2011. Scalable Distributed Video Transcoding Architecture. Abo Akademi University. Dijk, M. Van et al., 2010. Fully Homomorphic Encryption over the Integers. In IACR Cryptology ePrint Archive, pp. 1–28. Fan, J. & Vercauteren, F., 2012. Somewhat Practical Fully Homomorphic Encryption. In IACR Cryptology ePrint Archive. Gentry, C., 2009. A Fully Homomorphic Encryption Scheme. Stanford University. Gentry, C., 2010. Computing arbitrary functions of encrypted data. Communications of the ACM, 53 (3), p.97. Gentry, C. & Halevi, S., 2011. Fully Homomorphic Encryption without Squashing Using Depth‐3 Arithmetic Circuits. 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, pp. 107–109. Jokhio, F. et al., 2013. A Computation and Storage Trade‐off Strategy for Cost‐Efficient Video Transcoding in the Cloud. 2013 39th Euromicro Conference on Software Engineering and Advanced Applications, pp. 365–372. Kim, J. et al., 2013. CRT‐based Fully Homomorphic Encryption over the Integers. Cryptology ePrint Archive, Report 2013/057. http://eprint.iacr.org/, pp. 1–18. Ko, S., Park, S. & Han, H., 2013. Design analysis for real‐time video transcoding on cloud systems. Proceedings of the 28th Annual ACM Symposium on Applied Computing ‐ SAC ’13, p. 1610. Liu, F. & Koenig, H., 2010. A survey of video encryption algorithms. Computers & Security, 29 (1), pp. 3–15. Mandal, A., Naccache, D. & Tibouchi, M., 2011. Fully Homomorphic Encryption over the Integers with Shorter Public Keys. Advances in Cryptology–CRYPTO 2011, Springer Berlin Heidelberg., pp. 1–24. Marston, S., 2011. Cloud Computing – The Business Perspective. Sciences‐New York, pp. 1–11. Mou, L. et al., 2009. A secure media streaming mechanism combining encryption, authentication, and transcoding. Signal Processing: Image Communication, 24 (10), pp. 825–833. A Naehrig, M., Lauter, K. & Vaikuntanathan, V., 2011. Can homomorphic encryption be practical? Proceedings of the 3rd ACM workshop on Cloud computing security workshop ‐ CCSW ’11, p. 113. Sahai, A., 2008. Computing on Encrypted Data. Information Systems Security, Springer Berlin Heidelberg., pp. 148–153. Smart, N.P. & Vercauteren, F., 2010. Fully Homomorphic Encryption with Relatively Small Key and Ciphertext Sizes. Lecture Notes in Computer Science, 6056, pp. 420–443. Tibouchi, M., 2014. Scale‐Invariant Fully Homomorphic Encryption over the Integers. Cryptology ePrint Archive: Report 2014/032, pp. 1–18. Zissis, D. & Lekkas, D., 2012. Addressing cloud computing security issues. Future Generation Computer Systems, 28 (3), pp. 583–592.
26
A Conceptual Model of a Trustworthy Voice Signature Terminal Megan Boggess1 3, Brooke Brisbois2 3, Nicolai Kuntze4 and Barbara Endicott‐Popovsky2 3 1 Institute of Technology, UW Tacoma 2 iSchool, UW Seattle 3 Center for Information Assurance & Cybersecurity, UW 4 Fraunhofer‐Insitute for Secure Information Technology SIT, Darmstadt, Germany
[email protected] [email protected] [email protected] [email protected] Abstract: Organizations often transmit phone conversations via private business exchanges (PBXs) using Voice over Internet Protocol (VoIP), which is evermore frequently available via cloud computing. Conversations that utilize this technology are easily recorded, and consequently can become digital artifacts that are available for use as forensic evidence. However, their soundness as evidence could be called into question because these artifacts can be manipulated. As a solution to this problem, we present a concept to achieve non‐repudiation for natural‐language conversations by electronically signing packet‐based, digital, voice communication. A digital, voice communication is made up of two parts: a bidirectional data stream and a temporal sequence. Ensuring the security of such a communication involves protecting both its integrity and authenticity. This is achieved using a signature, key‐based approach that is conceptually close to the protocols already inherent to VoIP, allowing it to be interoperable with existing VoIP infrastructures. Additionally, in order to develop a fully trustworthy voice signature terminal, we incorporate various design principles defined by the Trusted Computing Group (TCG) with regard to both software and hardware. By signing both the data streams and the temporal sequence of a communication, various attacks are mitigated. These include tampering with the content of the conversation in order to change the meaning; deliberately suppressing packets; falsely claiming that packets did not arrive; and finally sending dual data streams with the intention to replace or invalidate data. A trustworthy voice signature terminal will provide complete non‐repudiation of conversations by protecting the integrity of voice conversations, authenticating speakers, and electronically signing voice communications. Keywords: Electronic signature; non‐repudiation; voice over IP; interval signature; cryptographic chaining; natural‐ language communication; Trusted Computing Group (TCG)
1. Introduction The current trend for businesses is to move operations to the cloud, utilizing it for any number of capabilities. Internet‐based telephony is not immune to this trend, which continues to expand based on the introduction of unified communications to 75 percent of enterprises (Wlodarz 2013). Voice over internet protocol (VoIP) is the lynchpin of unified communications, allowing for the use of VoIP desk phones, mobile VoIP, and other business‐grade VoIP solutions that will only continue to grow. However, as the use of VoIP solutions continues to expand, so will the security threats associated with the technology–even more so when they are moved to the cloud, as many business solutions are these days. Most of these threats target voice conversations that are communicated over VoIP, where there are many ways an attacker can manipulate these conversations to their liking. In general, VoIP conversations are more amenable to attack because of their digital nature, unlike traditional analog voice communications, which require that the attacker have physical access to the transport medium. The data transmitted across digital networks can be intercepted, stored, and manipulated at several different points along the logical system, with absolutely no physical access needed (Jones 2005). Additionally, efforts to add security features to VoIP products are generally insufficient or outdated. For example, Special Publication 800‐58: Voice Over IP Security was published by the National Institute of Standards and Technology in 2005, with no changes to date (Kuhn 2005). Considering these facts, a cloud‐hosted VoIP service faces serious threats. In their “VoIP Security and Privacy Threat Taxonomy,” the VoIP Security Alliance (VOIPSA) identifies seven overarching threats that VoIP users must be aware of: social threats, eavesdropping, call pattern tracking, traffic capture, interception and modification, service abuse, and intentional interruption of service (VOIPSA 2005). Of these threats, our paper focuses on the interception and modification of VoIP conversations, which, as defined by VOIPSA, are methods “by which an attacker can see the entire signaling and data stream
27
Megan Boggess et al
between two endpoints, and can also modify the traffic as an intermediary in the conversation” (VOIPSA 2005). While these attacks can be executed via the Internet, our concept is particularly relevant to VoIP conversations that happen via the cloud. As previously stated, many businesses now use Software‐as‐a‐Service (SaaS) VoIP solutions for all voice communications. This is true of our specific use case as well. For example, let us imagine an international bank; we’ll call it Bank of the World. Bank of the World has offices across the globe, in over 30 countries. For a simplified communication solution, they use a single, cloud‐hosted VoIP service for all phone calls. An employee from the New York City branch is speaking with an employee from the London branch over VoIP phones. Their conversation is of a highly sensitive nature, and needs to be kept confidential. The information security officer of World Bank would of course be concerned with many different attack vectors, but again, our focus is on interception and modification. For our purposes, we imagine that an attacker knows that our employees’ conversation is taking place and he wants to make changes to the conversation to suit his desires. In real‐time, the attacker manipulates the conversation by omitting and/or rearranging packets of voice data, leading to a different conversation than intended by the two employees. He can even submit two streams of audio: one that the receiving party hears and understands, and another stream (not heard or noticed by the receiving party) that invalidates the recorded, signed conversation or embeds other words into the signed conversation. Additionally, neither of these employees is particularly tech‐savvy, and as such, have no idea that their conversation has been tampered with at all. In our use case, the attack itself is quite complicated, though not impossible. The attacker will need to use voice recognition and pattern matching, but the technology for the actual execution of the attack is outside of the scope of research for our paper. Suffice it to say that, given the technology available these days to anyone who takes the time to learn how to use it, properly or improperly, it is not such an outlandish threat. We also must differentiate our concept from that of voice forensics. Our concept does not make use of voice recognition, which is a “biometric modality that uses an individual’s voice for recognition purposes” as defined by the Federal Bureau of Investigation’s Biometric Center of Excellence, though it could be included in future research (FBI). Additionally, our concept does not give details on how the conversation was modified, nor is it preventative. The purpose of our concept is instead to accurately report the state of the conversation, and to therefore let the communicating parties know whether or not their conversation has been modified or tampered with in any way. Our concept’s contribution to voice forensics is that of non‐repudiation: it would be proof of the authenticity of the conversation, and therefore uphold or discredit its evidentiary value. Therein lies the basic outline of our security concept: the non‐repudiation of voice communications that take place via cloud‐based VoIP services. As such, we depart from the basic security aspects of VoIP communication and view conversations on a transactional level between caller and callee. The top‐level category of protection targets that we consider is non‐repudiation of conversations. Three tasks of ascending complexity are addressed in the present work: 1) Protection of the integrity of voice conversations. Protecting a (recorded, digital) voice conversation from falsification and tampering with is different from protecting the integrity of other digital data due to the relevance of the temporal context. In particular, we consider packet ordering and loss, as well as the assignment of a creation time to each conversation. These considerations will protect against attacks by either party aimed to change the conversation by the omission of sections, rearrangement of sections, and/or packet loss simulated by deliberate packet suppression. 2) Authentication of speakers. An initial authentication of caller’s and callee’s devices is the basic approach to this problem. While this issue could be resolved in principle solely on the transport layer, it is advantageous to combine it with methods of 1) to obtain proof that a (recorded) conversation was carried out completely from the authenticated devices. Our solution allows for proof that is created in real‐time through the creation of hash chains throughout the duration of the conversation. 3) Electronic signatures over voice conversations. Building on 1) and 2) it is possible for voice conversations to achieve the level of non‐repudiation provided by electronic signatures over digital documents, i.e., an expression of will. For this, the aforementioned tasks must be complemented by a proof of possession of a trustworthy signature token and device, and the intention to sign. Measures providing for the cohesion and
28
Megan Boggess et al
integrity of these security processes includes using a replay window for both parties, numbering and assigning timestamps to RTP packets, and ensuring channel cohesion. We address further requirements for tasks 1‐3 in the following section.
2. Requirements for Audio Voip Signatures The central requirements for achieving non‐repudiation by signing VoIP are related to the information security requirements of confidentiality, integrity, and availability. Of this triad, integrity is the requirement most needed to achieve non‐repudiation for digital, packet‐based natural language communication. In order to ensure integrity, it must be evident and assured that a communication was not changed at any point in time, be it during transmission or after. This includes not only the audio communication content, but any relevant metadata created or used during a call as well. To ensure integrity, we look with particularity to the data that authenticates the communication partners. Due to the special features of voice communication, i.e., a bidirectional, full‐duplex interactive conversation, only both channels together provide the necessary context to fully understand the content of the conversation and to make use of the inherent security that interwoven natural language conversations provide. To ensure that the any part of the recorded conversation is not tampered with or deleted, the envisaged system needs to assure the users of conversational cohesion. By cohesion, we mean that the integrity of the temporal sequencing of the communication and the direction of its data need to be protected in a way that makes later tampering practically unfeasible, i.e., by sufficiently strong cryptographic methods. Additionally, because cohesion is a feature related to time it entails a subsidiary requirement, namely the secure assignment of temporal context to the conversation. Each conversation must be reliably associated with a certain time, which must be as close as possible to the conversation’s start and the initiation of the signing. Therefore, drift of the time base should be mitigated during a signed conversation. Finally, cohesion also refers to qualitative aspects of the communication channel. With regard to written documents, a signatory is well advised not to sign a document that is illegible or ambiguous. Analogously, the quality of the VoIP channel must be maintained to a level that ensures understandability to both partners during the time span in which the conversation is signed. We do not address availability or confidentiality in our solution. Though later availability of the conversation is needed in order for it to be admissible as evidence, the secure archiving of VoIP communication is not in the scope of this paper. A solution using the methods that we exhibit here is contained in (Hett 2006). Similarly, we do not address confidentiality since it can be resolved at the transport layer. There are further requirements for the system during the conversation. Firstly, it is highly desirable, from both a security and an efficiency viewpoint, for the system to apply signatures while users are talking. This requires that the VoIP conversation be signed and secured as “close” as possible to its transmission in real‐time, and conceptually close to the actual VoIP stream. Concurrently, the system must alert the user that an attack is happening in real‐time. Knowledge of an attack is of little use to the user after the fact; to maximize the security benefit of such a system, the user must be informed of suspect activity as it is happening. Finally, it is a requirement that the methods presented in this paper can be applied hassle‐free and be easily understood by the end‐user. We do not want to bombard them with gratuitous complexity, but instead provide them with a simple, user‐friendly way to know if the integrity of their conversation is intact.
3. Creating Signatures Signed voice transactions are built on the ability to authenticate the communicating parties and to protect the integrity of voice conversations. In order to do this, the authentication scheme must provide both strong protection of the authentication secrets, as well as a provable trust in the authentication hardware. We have introduced the VoIP signing protocol as a building block to protecting the integrity of voice conversations. In addition to this, we must testify that the hardware and software of the device meets the established requirements to be considered as providing trustworthy input and signature creation units (CEN Workshop Agreement 2004). The solution to this is our trustworthy voice signature terminal. The Trusted Computing Group (TCG) defines the design principles of Trusted Computing (TC) and trusted platforms. Using these principles, we can shield secrets and prove the state of the trustworthy voice signature terminal. The TCG has introduced the Trusted Platform Module (TPM) as the root of trust in a device. The TPM
29
Megan Boggess et al
is a specialized hardware token that is integrated into a trusted platform. It shields the private portions of keys, allowing keys to be created, stored, and used in a secure way. The TPM also stores and shields trust measurements. Trust measurements can be used to perform a remote attestation where a third party receives these trust measurements signed by the TPM with the Attestation Identity Key (AIK). Based on this data, the receiver (and now verifier) can extract two properties about the device. First, the receiver can know whether or not the data packet is created by a well‐configured trusted platform. This is done by inspecting the transmitted AIK certificate. Second, the receiver can determine whether or not the device’s hard‐ and software have been altered and whether or not it is in a trustworthy state. This state is proven by the transmitted trust measurement. The verifier can vet the trust measurement by comparing it with an additional log provided to show how the measurement was computed. In this log, every component of the device is recorded with its own integrity value, e.g. a hash of a firmware or software. The verifier has to know all of the reference values to validate the log. Though it is possible for a malicious device to modify the log, it cannot tamper with the signed measurement. By introducing TC in the context of a VoIP signature device and implementing the signature device as a trusted platform, the desired security requirements are fulfilled. The resulting trusted voice signature terminal would rely on the following base technologies: Certificates and an appropriate infrastructure are used to authenticate a communication party in the context of a voice transaction, and to link a public key to this party. The usage of Trusted Computing as part of the concept to sign VoIP communications also reveals who issued this certificate protected by hardware which establishes a strong identity of the device. In the context of TC, certificates are used to establish trust in the instantiation of a trusted platform. To testify certificates, it is necessary to supply an infrastructure that provides a root of trust to verify the origin of a certificate. To testify the integrity of a single program in the signature device (in our case, the signature application), it is necessary to prove the integrity of the runtime environment. Therefore a measurement of the boot process‐‐ including the involved hardware‐‐is required. TC allows the trusted boot process to solve this problem. During boot, every component measures the following component before it is started. TCG has introduced a special component as the root of this chain, which measures itself and the BIOS, reports these values to the TPM, and loads the BIOS. The BIOS in turn measures, reports, and loads the boot loader, and so on, until all of the hardware has been measured. The integrity of the runtime environment is then proven. Proving the integrity of the involved devices and storing this information in the voice stream is important for the initialization of trust between the communication parties. The attestation protocol enables a mutual approval of the integrity of the respective signature devices. As an extension, a trusted platform can offer a reliable time source that is linked to an external clock provided by a service provider, e.g., a cloud service provider of VoIP communications, which enables a trustworthy time stamping by the signature device itself. Thus it is possible to bind a conversation to a certain time without the necessity to request an external time stamp for every conversation. Because voice communication is limited to one channel, namely audio, it is not necessary to protect every I/O channel of the signature device. By only protecting the voice channel we assume that the protection level of the envisaged Trusted Voice Signature Terminal is equivalent to the general concept proposed in the CEN Workshop Agreement 14170. This assumption is based on the functionality offered by the pictured protocol and the enforcement provided by TC technology. Figure 1 shows a schematic presentation of the envisaged Trusted Voice Signature Terminal. An external PKI infrastructure enables the device to verify third parties’ certificates. The trusted platform offers I/O channels, signature creation, and a reliable time that can be used to create time stamped information blocks.
30
Megan Boggess et al
Figure 1: Schematic presentation of the Trusted Voice Signature Terminal. This device is designed as a trusted platform. On this platform trustworthy computing can perform several tasks. It is important to once again emphasize the method’s simplicity of use. By using a TPM and creating a trusted voice signature terminal, users don’t need any other authentication devices, e.g., a smart card. Likewise, it is not a necessity for users to give their consent. The trustworthiness of the device and the voice transaction just exists, until any minute element within the device is tampered with. Then the trustworthiness does not exist. The binary simplicity of the system allows the user to ensure non‐repudiation without any unnecessary complications. Finally, it is important to note that even if a TPM fails, this means that only the one device is tampered with but not all the conversations are exposed.
4. Concept for Voip Signatures in the Cloud Once TC is applied and a trusted voice signature terminal device instantiated, signatures can then be used to ensure the integrity of the conversation. It is important that the implementation is efficient and scalable. Therefore solutions like signing every single packet with a signature algorithm, such as RSA, is not efficient with respect to bandwidth and storage capacities, nor are they sufficient enough to protect the full conversations. Signing each packet alone easily uses more than 100 bytes to store a signature of an RTP‐packet with only 44 bytes of sampled audio, which is quite computationally expensive. Alternatively, we introduce the central concept of intervals and interval signatures.The implementation of the signing scenario is based on the SIP/RTP protocol, but the method could also be applied to other protocols like the Inter‐Asterisk‐protocol IAX (Spencer 2008) or the well established H.323 (H.323 2009). The signing protocol extends SIP/RTP (Schulzrinne 2003) in a compatible way to transport signatures and acknowledgments of signatures. Instead of modifying a particular type of phone, we choose to implement this as a proxy that intercepts the SIP‐call‐signalling and also the RTP‐audio‐streams. A strength of the technique is that it does not modify, or in any way delay, the transported audio stream. Instead signatures are transported sparsely and separately from the audio stream. This is key in that quality of service is not greatly affected by this technique. To begin, each party collects packets in intervals of adjustable length, e.g., one second. Every second, the collected packets are sorted by sequence number and their hashes are assembled in a data‐structure with additional meta‐information, such as direction, sequence numbers, and time. As shown in Figure 2, this small data‐structure is then signed with a conventional signing algorithm like RSA, using the private key of party A. They are then sent to B, who stores them together with the collected RTP packets he actually received. Note that the full packets are transported only once as in a normal RTP stream. Bandwidth and CPU‐time are saved
31
Megan Boggess et al
since computing hashes is much less expensive compared to computing RSA‐signatures, making the whole method applicable in the first place.
Figure 2: The main scenario where A possesses a digital certificate and provides non‐repudiation for the whole conversation and B archives it for later possession. As a side note, it would be possible to further reduce the bandwidth usage since, in principle, the sequence number of the packet is enough for B to reconstruct the hash. Thus the transmission of hashes could be avoided. However, transmitting only packet numbers would bear the cost of additional consistency checks on the part of B. We do not discuss this implementation detail further because the result would be the same but presentation would be more complex than a signature built only on actual data. It is also important to stress that the signature for a complete interval is broken if any of the hash‐values become invalid. If any bit of the signature or in the associated RTP packets is changed or if any packet is missing, that interval and the entire conversation is considered invalid. This is in strong contrast to the technique of stream signatures presented in (Perrig 2000) where the authors show methods of signing unidirectional broadcast traffic in a way that makes it possible to still check a signature if packets are lost in transmission. Although lost packets are very common in VoIP scenarios, they are potential attack vectors to our approach. Consequently, we use a different technique to deal with packet loss, which will be discussed shortly. Signed intervals alone do not ensure cohesion. An attacker could exchange parts of the conversation or cut them out. Therefore we make use of hash chains: every interval contains, embedded in its metadata, a hash of the last interval including its signature. In this way, signatures and hashes are interleaved ensuring that there is a continuous stream of signatures building an unbreakable chain, as demonstrated in Figure 3.
Figure 3: The hash‐chain of signed interval‐packages in our data format. Note that B stores the collected packets next to A’s signature for each frame. The chaining of intervals is further extended to factor into the bidirectional nature of the call. Both channels are interwoven and the chaining applies to both channels. An interval of packets from the channel A → B contains a hash of the last signed interval from the channel B → A and so on.
5. Resulting Signed Data Format As non‐repudiation of calls is only meaningful if the party who is interested in using a conversation as evidence – in this case party B – the signed conversation must be recorded. Special emphasis must be put on the format in which the calls are stored, i.e. the final outcome of the signing protocol. All intervals of a call are simply
32
Megan Boggess et al
stored continuously by the proxy‐software of party B. In general, additional timestamps (as can be seen in the start‐chunk in Figure 3) may help pinpointing the exact start and duration of the call. The format is a simple, chunk‐based, continuous data format starting with an initial chunk containing meta‐ data (SIP‐URIs of caller and callee, date and time of start of call, and mapping of codecs to RTP‐payload fields). After that, signed intervals are stored as they are produced, reducing working memory requirements to the order of magnitude of one interval. At the end of the call, either by normal hang‐up or by some predetermined policy, a special chunk is added which contains time and the reason for the termination of the call. For each interval, the associated chunk contains the collected RTP packages of this interval and the following signed data: the direction/channel of the interval (from A to B or vice versa), the date and time, the list of absolute package sequence numbers, and the hashes of each considered package. This data is embedded in a PKCS#7 signed envelope container. The signing is done by party A using his private key. Only the first PKCS#7 envelope needs to store the whole certificate chain; all other envelopes do not need to store any certificates.
6. Signing Protocol In this section we describe the protocol that is used to transport acknowledgments, signalling, and signatures for the normal RTP packets containing multi‐media frames. The protocol is a simple and compatible extension to the existing SIP/SDP/RTP system and can be transported easily by the existing infrastructure. The signing protocol includes the typical solutions for common problems arising from NAT routers, like firewall‐hole‐ punching and TURN servers (Rosenberg 2009). In the signalling phase of a SIP, all the descriptions of the call parameters, including IP addresses, ports, and multimedia codecs and their parameters, are transported in the body of SIP messages using the simple SDP protocol (Handley 1998). SDP is able to negotiate the transport characteristics of more than one media per call, e.g., an audio stream, a video stream, and in addition an application‐stream for, e.g., whiteboard data. We use such an additional application stream for transportation of signature data. Note that this stream is unreliable and datagram‐based and not a reliable stream like TCP because it is based on RTP which runs on UDP. For the initial signalling of the fact that someone is calling who has signature capabilities and is willing to sign the conversation, we use the k‐value of the SDP protocol, which is used for securing the call using SRTP (many other variants may exist). Here, A transmits essentially the same data as in the first chunk in Figure 3, signed using his private key. To further describe our protocol, it must be stressed that even though both channels of the duplex conversation need to be signed to provide cohesion, in our base scenario, only party A needs to have a private key for signing. A must sign both directions of the conversation. Accordingly, the signing protocol differs for both channel directions which are described in the next section.
7. Channel Signing Specifics For the channel direction A→B, A could simply send an interval every second that contains the hash‐values of all the RTP‐packets of this interval. If any packet loss occurs – which is normal in current networks – B wouldn’t be able to provide the missing packets as evidence. In this case, his whole archive of the call would be deprived of probative force. Therefore, in our protocol, B transmits a list of all sequence numbers of the packets he received during an interval to A, who will then send the requested signature that covers exactly the received packets. More precisely, both A and B continuously collect all packets for the channel A→B in a small buffer, as indicated in the left‐ and rightmost parts of Figure 4. Whenever B’s interval timer expires, he will send a list of packet sequence numbers that were collected during that interval to A. A will then create the hash‐values out of the collected packets and create and sign an interval data structure as described in Section 4.2 and transmit it to B. B then checks the signature and hashes and stores this together with the collected packets and drops the collection of packets from memory. A can drop his complete packet collection from memory one iteration later, because if the communication that is described in this paragraph fails, then B will retransmit its packet list until A has successfully transmitted the interval signature. While waiting for the response, the interval
33
Megan Boggess et al
timer of B is temporarily suspended as it can only fire after an interval signature was successfully transmitted. The whole process is sketched in Figure 4.
Figure 4: Schematics of Signing Protocol for Direction A→B The communication of signatures for the channel B→A is a li le bit different, as shown in Figure 5. Again, A and B both collect all packets for the current interval. This time, however, A decides on the precise point in time at which a new interval starts. Then A takes the collected and sorted packets, which are exactly the packages that were not lost in the transmission from B to A, creates hashes over them, and sends them in a signed interval data structure to B. When B receives this it has to send a short acknowledgement package, containing the number of the interval, to A. If A does not receive this before a timeout occurs, he will resend the signature package. A will not start a new interval until the acknowledgement package is received, and the interval timer is also temporarily suspended.
Figure 5: Schematics of Signing Protocol for Direction B→A B also has to check the received signature package. B has to check that the quality of service of the conversation is signed by A in the direction/channel B→A. This enables B to detect for instance where A wants to conceal (on protocol/signature level) portions of B’s utterances in the signed conversation. If A refuses to sign enough of what B sends, B should consider the conversation tampered with. It is important to remember that this method signs the device and the channel of the stream: it describes the trustworthiness of the device and channel. The signature of the channel does not, in any way, protect or describe the integrity of the actual contents of the RTP packets themselves. In actual implementation, there would be a simple notification to parties A and B if or when either the device or the channel were compromised. PBXs have evolved to include full graphical screens, therefore allowing a simple notification message of “compromise” to be sent so the user can take appropriate actions. It is important to emphasize
34
Megan Boggess et al
that this notification would be simple–any user should be able to understand that there is an attack underway. This immediate visual notification to users wouldn’t distinguish between the device or channel being compromised, but upon review of logs associated with either the TPM or the signature application, this could be determined by an IT professional.
8. Conclusion The solution that we propose can be seamlessly applied to existing work environments that employ PBX with VoIP technology. While it doesn’t limit various attacks that involve manipulation or deletion of data, it does guarantee the state of the data, i.e., if the data has been tampered with or not. Our technique is simple and does not require complex implementation on the part of the user, nor does it require extra or extensive system resources to provide the signatures (unlike an RSA signature). Instead it offers a simple way to guarantee non‐repudiation in voice conversations over PBX.
References CEN Workshop Agreement 14170. (2004), [online], http://ipsec.pl/files/ipsec/Archiwum/cwa14170‐00‐2004‐May.pdf. Federal Bureau of Investigation (FBI) Biometric Center of Excellence. (n.d.), “Voice Recognition,” [online], http://www.fbi.gov/about‐us/cjis/fingerprints_biometrics/biometric‐center‐of‐excellence/modalities/voice‐ recognition. H.323 (2009) Packet‐based multimedia communications systems, [online], http://www.itu.int/rec/T‐REC‐H.323‐200912‐ I/en. Handley, M. & Jacobson, V. (1998) SDP: Session Description Protocol, [online], http://www.ietf.org/rfc/rfc2327.txt Hett, C., Kuntze, N., & Schmidt, A. U. (2006) A secure archive for Voice‐over‐IP conversations. Paper in Proceedings of the third VoIP Security Workshop, Berlin, Germany, June. Hoene, C., Karl, H., & Wolisz, A. (2004) A perceptual quality model for adaptive VoIP Applications. Paper in Proceedings of SPECTS ’04, San Jose, USA, July. Jones, A. (2005) “The future implications of computer forensics on VoIP.” Digital investigation 2, 206‐208. Kuhn, R. D., Walsh, T. J., Fries, S. (2005) Security Considerations for Voice Over IP Systems, [online], http://csrc.nist.gov/publications/nistpubs/800‐58/SP800‐58‐final.pdf Perrig, A., Canetti, R., Tygar, J. D., & Song, D. (2000) Efficient authentication and signing of multicast streams over lossy channels. Paper in IEEE Symposium on Security and Privacy, Berkeley, USA, May. Rosenberg, J., Huitema, C., & Mahy, R. (2009) Traversal Using Relay NAT (TURN). Available: http://tools.ietf.org/html/draft‐ietf‐behave‐turn‐16. Schulzrinne, H., Casner, S., Frederick, R., & Jacobson, V. (2003) RTP: A Transport Protocol for Real‐Time Applications, [online], http://www.ietf.org/rfc/rfc3550.txt. Spencer, M., Capouch, B., Guy, E., Miller, F., & Shumard, K. (2008) IAX: Inter‐Asterisk eXchange Version 2, [online], http://tools.ietf.org/html/rfc5456. VoIP Security Alliance (VOIPSA). (2005) “VoIP Security and Privacy Threat Taxonomy,” [online], http://www.voipsa.org/Activities/VOIPSA_Threat_Taxonomy_0.1.pdf. Wlodarz, D. (2013) “Moving to VoIP? 10 things you need to know before ditching the PBX,” [online], http://betanews.com/2013/12/17/moving‐to‐voip‐10‐things‐you‐need‐to‐know‐before‐ditching‐the‐pbx/.
35
Maintaining streaming video DRM Asaf David1 and Nezer Zaidenberg2 1 Tel Aviv‐Jaffa academic college, Tel Aviv, Israel 2 Shenkar college of Engineering and Design, Ramat Gan, Israel
[email protected] [email protected] Abstract: In the recent years the Internet has become the leading platform for rich multimedia content distribution. (video streaming) The proper licensing of internet data is turning to be a major concern for media owners and providers. DRM solutions are the standard tool used to enforce proper terms of content use both in the academia [Dahl 2012] and practice [Zeus 2014 and other competing products] While many DRM methods were proposed and implemented over the years. These method fall under white box protection and obfuscations. Ways to circumvent these DRM methods are usually found short time upon release. Often these systems are attacked by exploiting the ability of a malicious client to intercept the tool in runtime and obtain the encryption key. This work demonstrates a streaming video DRM solution that prevents this vulnerability; The decryption of the video and the rendering of frames take place in the kernel, inaccessible to malicious user processes. Selective video decryption take place at runtime to ensure the video security. We believe this system achieves an improved level of security compared to existing DRM solutions. Methods to govern digital distribution of digital content such as the digital millennium copyright act (DMCA) prove ineffective in the effort to prevent piratic distribution. Therefore a technological method is required that prevents distribution and also pinpoint the video source is required. (watermarking) The system we describe in this paper is an additional incremental effort on top of TrulyProtect infrastructure for digital content delivery. This paper describes unique method to deliver watermarked, copyrighted video that is protected by TrulyProtect (Averbuch et al 2013, Zaidenberg et al 2013) method as opposed to software distribution. Keywords: DRM, DMCA, Video, Conditional access
1. Introduction Data security is crucial for maintaining the business model of commercial multimedia applications. A simple use case is a Pay‐Per‐View service, in which a customer purchase the rights to view a specific event via private telecast. From the perspective of the media publisher, it is imperative that said customer will not be able to redistribute the purchased media to non‐paying users. Technological solutions which enable publishers to control the distribution of digital information are known as Digital Rights Management (DRM). The digital information is usually a copy‐protected media such as electronic books, video games, audio/video files or streams. A simple way to limit the redistribution of the content is the encrypt it. Indeed, many DRM solutions are based on media encryption. For example, one of the earliest DRM systems employed was Content Scramble System (CSS), which was used on almost all commercially produced DVD discs. It utilized a proprietary 40‐bit stream cipher algorithm. One of the problems involved in software‐based media encryption is that it is only effective while the decryption key remains a secret. Since the decryption is executed at the client‐side, it is exposed to interception by a malicious user. This in turn may lead to leakage of the decryption key. Our aim in this work is to present and implement a secure platform in which decryption can be done. This platform will prevent malicious users the ability to intercept the process and steal the key.
2. Introduction to digital video encoding The term ‘digital video’ refers to the capturing, manipulation, and storage of moving images that can be displayed on a computer screen. Contrary to analog video, digital video is represented by a binary bit stream. The stream is sampled from a continuous signal (such as camera and microphone). The samples are encoded according to some video coding standard. Data compression is usually applied during the encoding to reduce the amount of information stored. The aim is to maximize the compression while minimizing the impact on the video quality.
36
Asaf David and Nezer Zaidenberg
A video encoding process involves several consecutive steps. The decoding process involves similar steps, applied in reverse order. The following is a schematic flow of the encoding and decoding processes:
Figure 1: A video encoding process
2.1 Prediction Stage In the prediction stage, the encoder attempts to minimize the amount of stored information by exploiting similarities in the video content. Rather than storing the entire video content, some parts can be predicted based on other parts. This permits to store only the prediction error (also called residual). The video is processed in frame chunks called Macroblocks (blocks of 16×16 displayed pixels). Two common types of prediction are used:
In intra‐frame prediction, pixel values are predicted based on pixel values in neighboring macroblocks of the same frame.
In inter‐frame prediction, pixel values are predicted based on pixel values of the same macroblock in neighboring frames (preceding and/or succeeding).
Figure 2: Previously coded pixels The frames are categorized to 3 groups, based on the prediction types allowed. I‐frames only allow for intra‐ frame predictions; They are the least compressible but don't depend on other frames to decode. P‐frames allow both intra and inter‐frame predictions (which makes them more compressible than I‑frames), but only allow past frames in the inter‐frame predictions. B‐frames allow for both types, with both past and future frames for inter‐frame predictions. They achieve the highest amount of data compression. The encoder is free to make many choices, attempting to minimize the residuals. For example, In intra‐frames, prediction mode (controls which neighboring pixels are used for prediction) as well as the block sizes (either
37
Asaf David and Nezer Zaidenberg
16×16 or 4×4) can be chosen. In inter‐frames, the encoder selects a macroblock partition (16×16 blocks can be broken into blocks of sizes 8×8, 16×8, or 8×16, while 8×8 blocks can be broken into blocks of sizes 4×4, 4×8, or 8×4).
2.2 Transform stage In the transform stage, blocks of residual samples are transformed to the frequency domain. MPEG‐2 used the 8×8 discrete cosine transform (DCT) matrix, while H.264 uses a simpler 4x4 integer transform matrix, approximating DCT. After the transformation, the coefficients are scaled and quantized. The quantization is lossy and thus degrades the video quality (this is the only lossy stage in the encoding process). The coarser the quantization, less symbols will have to be encoded (which will reduce the video size, but will hurt the video quality).
2.3 Entropy encoding stage The entropy encoding stage serializes the transform coefficients into a binary stream. MPEG‐2 used a simple run‐level encoding, exploiting the fact that many of the transform coefficients are zero. H.264 use more sophisticated encoding, allowing a choice between CAVLC (context‐adaptive variable‐length encoding) and CABAC (context‐adaptive binary‐arithmetic coding). CAVLC is the standard method using simple variable length huffman‐like codes. It is much simpler and requires considerably less processing to decode than CABAC, although it does not compress the data as effectively.
3. Selective encryption When the need for video encryption was initially emerged, the video content was treated as ordinary binary data, and was encrypted with standard block ciphers (such as DES and AES). Indeed, such ciphers can and do provide excellent level of security. Unfortunately, decryption of such ciphers is relatively expensive computationally; It essentially requires dedicated hardware to support high definition, real‐time streamed video. One strategy to solve this problem is to selectively encrypt only small portions of the video content while leaving most of it in plaintext. The decryption required is thus minimal and can be done in real‐time. To make a selective encryption algorithm effective, the encrypted portions must be vital for a proper viewing experience, yet sparse to reduce the decryption costs. For example, a simple algorithm will encrypt the video headers (video headers contain metadata at various levels, such as the video stream, the frame and the video slice). This methods is largely ineffective, because the headers are repetitive and their content can be easily deduced by an attacker from the frames themselves. Another algorithm will encrypt only I‐frames, as they’re vital to decoding P and B‐frames. This method is also problematic, as I‐frames comprises most of the stream content, so the decoding complexity remains high. A large number of publications have suggested different methods to implement selective encryption of video content (Stutz 2011) . A few examples for information which proved useful for encryption are prediction modes, residual transformed coefficients and motion vectors. Each methods rates differently in a number of properties, such as security (each method will leak some of the image qualities), compression efficiency, computational complexity and preserved functionality (such as format‐compliance, packetization, scalability, transcodability, and watermarking). Of course, different business use‐cases require different security levels; A certain leakage that might not be considered a security threat in a commercial content distribution scenario, may be intolerable in a privacy‐protection scenario.
4. White‐box cryptography Traditional security model, called “Black‐box cryptography” assumes that the attacker is only given a black‐box access (i.e., inputs/outputs) to the cryptographic algorithm. i.e. the attacker can read signals sent by others and send signals to the box but not change the box itself. In contrast, “White‐box cryptography” model assumes that the attacker can also access and control the execution environment (Joye 2008). Despite this openness, a good WBC implementation should offer the adversary no or little advantage over a black‐box implementation. Effectively, White box cryptography aims to keep the secret key hidden (i.e. inaccessible to the attacker) throughout the working of the algorithm. In our case the video signal may arrive uncontested to the End User device (Set top box, Satellite TV or CableTV Tuner etc.) but the End User controls the device (i.e.
38
Asaf David and Nezer Zaidenberg
5. Methods Used 5.1 Decryption in a secure environment The aforementioned selective encryption methods share a common limitation: They rely on the existence of secure key, unknown to an attacker. However, as the decryption takes place in the client side, a malicious process running along side the decoder process can access the decoder memory and steal the decryption key (for example, under Windows operation system, a process with the PROCESS_VM_READ permission may use the ReadProcessMemory(Microsoft 2011) to read data from an area of memory in a specified process). In other words, these methods are effective in the black‐box cryptography model, but not in the white‐box cryptography model. Our system aims to resolve this limitation by confining the decoding to a secure environment, inaccessible to malicious user processes ‐ namely the Hypervisor itself. We permit the user‐space to access the video content in one of exactly two states: either encoded and encrypted, or decoded and decrypted (i.e. actual frames). The intermediate state, encoded but decrypted, is the asset we aim to protect. The following diagram illustrates the data flow:
Figure 3: The data flow
5.2 Limitations While this work attempts to describe a secure DRM platform, some attack vectors were not addressed. They are listed below:
The decoded video frames are exposed; A malicious user may aggregate them while playing the video and then re‐encode and re‐distribute it. This threat, commonly named ‘the analog‐hole’, is a fundamental and inevitable vulnerability in copy protection schemes for digital, non interactive data. Re‐encoding the frames adds latency and damages video quality making this attack unsuitable for some application.
Our system is effective as long as the TrulyProtect system is efficient. attacks on trulyProtect (such as attacks on it’s hypervisor redpill working on emulators etc.) will also break our playback system. While possible, this is out of scope for this work and may be addressed by future research.
5.3 Selective Encryption For this work we implemented a selective encryption algorithm. Our algorithm is a simplified version of an efficient video encryption algorithm called RVEA (Real Time Video Encryption Algorithm(Shi et al 1999)). RVEA achieves its real‐time capabilities by remaining very low weight, manipulating only a minimal number of bits in the video stream. RVEA was originally targeted for the MPEG‐2 standard. Our version adapts it for H.264 video stream. The algorithm works by manipulating only the sign bits of the transform coefficients. As such, it requires no extra storage in addition to the video itself. On top of that, the encrypted stream is itself a valid H.264 bit stream. The encryption uses a RC4 stream cipher, initiated with a 256 byte video key.
39
Asaf David and Nezer Zaidenberg
The process takes place inline to the encoding of the video, during the entropy encoding stage. Each 4x4 block of quantized residuals is encoded using CAVLC coding. For each block an array levels[N] is given, where N is the amount of non‐zero residuals in the block (N ≤ 16). To encrypt the level array we do the following:
Pack its sign bits into either one or two bytes (one if N ≤ 8). A sign bit equals 1 if the level is positive and 0 if negative.
Feed the packed byte(s) to a stream cipher, outputting one or two bytes.
Modify the sign bits to match the output bits.
As an example, consider the levels array is as follows: [635, ‐9, ‐5, 4, 4, 2, ‐1, ‐1, 1]. Packing the sign bits into two bytes will yield: 39h (00111001b), 80h (10000000b), which we’ll feed to the stream cipher. Assuming the stream cipher output was DAh (11011010b), 5Bh (01011011b), the encrypted level array will be modified to [‐635, 9, ‐5, 4, 4, ‐1, 1, 1, ‐1]. The following pair of pictures illustrate the effect of the selective encryption on the resulting images. On the left, the image was decrypted with valid key, while on the right an invalid key was used:
Figure 4: pictures illustrate the effect of the selective encryption on the resulting images As can be seen, the general details of the picture still visible, but the viewing experience is severely damaged.
5.4 Key‐agreement scheme The selective encryption is essentially a symmetric cipher. Thus, the encryptor (the server) and the decrypter (the client) must negotiate a secret symmetric video key before encryption takes place. We used the following simple scheme:
The server generates a public and private key pair. It publishes the public key.
The client generates (in the kernel module) a random 256‐byte string (the video key) and encrypts it using the server public key. The client sends the encrypted string to the server, attached to the requested video id.
The server reads the video key and decrypts it, using its private key. It then starts to selectively encrypt the requested video using the decrypted key, and at the end sends it back to the client.
Having the video key, the client can decrypt the video.
The presented scheme is secure only if the following assumptions hold:
The server private key is indeed private, i.e. no one can access and use it.
The client uses the correct and valid server’s public key when encryption the generated video key (i.e. the public key was not forged by some malicious user).
For this work we assume both assumptions made above hold (For example, we assume the server’s certificate was issued and help in a secure certificate authority server that the client trusts).
40
Asaf David and Nezer Zaidenberg
6. Summary DRM solutions decrypting video in the client‐side are exposed to the threat of key stealing. In this work we’ve presented and implemented a platform for video DRM which overcomes this vulnerability; The security is achieved by selectively encrypting a H.264 encoded video file in the server, and decrypting it inside the client’s kernel. This ensures no malicious user‐space process running in the client can intercept the decryption and steal the key.
References Averbuch A., Kiperberg M., Zaidenberg N. (2013) "Truly Protect : An Efficient VM‐Based Software Protection” IEEE System journal September Dahl Eric (2012) “Securing Digital Video” Springer David Asaf and Zaidenberg Nezer (2013) “Truly Protect Video Delivery” in ECIW 2013 405‐408 http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6553406 Joye, Marc. (2008) "On white‐box cryptography." Security of Information and Networks (2008): 7‐12. MSDN ‐ Microsoft. (2011) "ReadProcessMemory function” (Windows) 18 Apr. 2014 Shi, Changgui, Sheng‐Yih Wang, and Bharat Bhargava. 1999 "MPEG video encryption in real‐time using secret key cryptography." in Proc. Int. Conf. Parallel and Distributed Processing Techniques and Applications 1999. Stutz, Thomas, and Andreas Uhl. (2012) "A survey of h. 264 avc/svc encryption." Circuits and Systems for Video Technology, IEEE Transactions on 22.3 (2012): 325‐339. Zaidenberg Nezer, Resh Amit, Gavish Limor, Kiperberg Michael and Khen Eviatar(2013) “TrulyProtect 2.0 and attacks on TrulyProtect 1.0” ECIW 2013 Zeus Kerravala (2014) “Cisco at CES: Videoscape + NDS = Unity” 2014 http://www.networkworld.com/article/2223825/cisco‐subnet/cisco‐at‐ces‐‐videoscape‐‐‐nds‐‐‐unity.html
41
Formalization of SLAs for Cloud Forensic Readiness Lucia De Marco1,2, Sameh Abdalla1, Filomena Ferrucci2, and Mohand‐Tahar Kechadi1 1 School of Computer Science and Informatics, University College Dublin, Ireland 2 Department of Management and Information Technology DISTRA ‐ MIT University of Salerno, Fisciano, Italy lucia.de‐
[email protected] [email protected] [email protected] [email protected] Abstract: The provisioning of a Cloud service requires a number of parties to engage in a legal commitment towards one another. The responsibilities and expectations of both customers and providers are governed by means of a Service Level Agreement (SLA) contract that they sign prior the activation of the service. Once a Cloud service is provided by a party to another it is assumed that all the conditions, i.e., clauses, mentioned in the SLA are read and fulfilled by those who signed it. The notion of Forensic Readiness (FR) is introduced in literature to help controlling and monitoring the behaviour of a computing architecture through a dedicated system; the essence of this capability is motivated by the necessity of facilitating some digital investigations in terms of time and costs. In some cases, such capability can be meant also for alerting and prevent any system attack attempts. In the Cloud, some crimes can be related to violations of the previously agreed upon service security measures. The logging process or documentation of Cloud service violations can then be used by either the provider or the customer to evaluate the quality of the service in subject. In addition, they can lead any of the involved parties to take the necessary legal actions. In this paper we emphasize the importance of automating the process of discovering Service Level Agreement violations in Cloud services. We propose a formal framework for building a Cloud Forensic Readiness System (CFRS) that considers the technical aspects of SLAs while monitoring the fulfilment of the addressed service. The system can eventually issue warnings and alerts to the involved parties as soon as a service violation is detected. Our approach represents SLAs clauses in terms of formal rules that can then be used as system inputs to validate whether an action occurring in a given Cloud architecture is a service level violation or not. Keywords: Cloud forensic readiness system, service level agreements, SLA formal specification, cloud security, cyber crimes
1. Introduction Digital Forensics (DF) is probably one of the most challenging and interdisciplinary research topics of the last decades. It has been dealing with all the aspects concerning the scientific management of digital crimes in computers and digital devices (Palmer 2001). By following the ICT progresses, different branches of such discipline have been developed, such as network (Palmer 2001), mobile (Jansen and Ayers 2007), and Cloud forensics (Ruan et. al 2011), among others. In most cases, some adaptations of traditional tools and procedures have been introduced for properly managing the crimes happened in different computing architectures (Garfinkel 2010; Casey 2011; Ambhire and Meshram 2012). Instead, in Cloud architectures the established forensic techniques and methodologies cannot be successfully applied due to the complexity of the challenges deriving from some Cloud weak points (Reilly et. al 2010, 2011; Ruan et. al 2013; Mishra et. al 2012; Birk and Wegener 2011; Birk 2011; Dykstra and Sherman 2012). Additionally, the Cloud involvement in sophisticated cybercrimes is likely to increase (CSA 2013). A manner for leveraging some challenges can be the adding of a Digital Forensic Readiness (DFR) capability (Tan 2001; Rowlingson 2004; De Marco et. al 2013). Its main concern is about the necessity of minimizing times and costs necessary for conducting a forensic investigation while maximizing the hosting computing architecture capability of being prepared for managing them. Usually such capability is implemented through an information system that performs some readiness activities, e.g., collection and monitoring of critical and sensitive data. In some cases, specific actions can be automatically triggered by system components dedicated to implement DFR. In fact, its importance has been recognised (Grobler and Louwrens 2007) and several Forensic Readiness System (FRS) proposals conceived for different computing architectures include this aspect (Endicott‐Popovsky et. al 2007; Mouton and Venter 2011; Valjarevic and Venter 2011; Reddy and Venter 2013; Dykstra and Sherman 2013; Trenwith and Venter 2013; De Marco et. al 2013). Cloud Computing (CC) is delivered to the users in the form of services regulated by a Service Level Agreement (SLA) contract (Mell and Grance 2011) co‐signed by a Cloud Service Provider (CSP) and a customer. In such contract several clauses are established; they may concern data access rights, behaviour and usage, quality of service constraints, and so on, related to both parties (Baset 2012; Patel at. al 2009). They are written in natural language by using legal jargon and have legal validity in courts (Baset 2012).
42
Lucia De Marco et al
Due to their structure and contents the SLAs are taken into consideration by the forensic community. Indeed, some attempts have been made for representing them in a machine readable format by using formal specification techniques (Gaudel 1994), such that the agreed clauses can be automatically monitored. We believe that SLAs must be considered in Cloud Forensic Readiness (CFR); they must be properly formalized for being automatically managed by a Cloud Forensic Readiness System (CFRS).
2. Related works The Service Level Agreement (SLA) contacts have been considered by the forensic community for being automatically monitored in order to guarantee the respect of the included clauses, even though not necessarily for forensic purposes. In most cases, the manner for implementing such natural language‐based clauses adopted some formal specifications methods. For instance, in (Czajkowski et. al 2002) the focus is on the design of a protocol for negotiating SLAs among several actors. Different types of SLAs are defined, and some formalism is presented, such as the usage of tuple for describing an SLA. Also some definitions concerning the metrics to use for services are provided. In (Skene at. al 2007) the SLAs are formalized by using set theory for defining the concepts of actions, actors, events, parties, actions’ requirements. The purpose is to determine the possible SLAs’ degree of monitorability in the context of services provisioning through the Internet. In (Paschke and Bichler 2008) a framework called Contract Log for monitoring SLAs is presented, which uses a set of formalisms. The SLAs have been categorized on depending on the purpose they have been written for; then, the contents' concepts have been abstracted and included in a conceptual framework. Different kinds of rules composing such contracts were identified, such as derivation rules, reaction rules, integrity rules and deontic rules; all of them were included in a homogeneous syntax and knowledge base. Finally, the conceptual framework has been evaluated by a tool running some specific test suites. In (Unger et. al 2008), the concepts of Parties, SLA Parameters, and Service Level Objectives were used for formalizing Services Oriented Architecture SLAs in order to provide a manner for aggregating more SLAs in a single Business Process. The proposal uses the formalisms of tuple, logic predicates, Boole Algebra, and Normal Forms. In (Ishakian et. al 2011) formal specifications were used for representing SLAs and rules transformation in order to address the issue of verifying efficient workload co‐location of real time workloads. The approach allows transforming the SLAs whenever they don’t meet the workload efficiency requirements into an equivalent SLA that respects the same QoS. The used formalism is the tuple. The proposal includes a reasoning tool used by the transformation rules’ process that comprehends inference rules based on a database of concepts, propositions, and syntactic idioms. Instead, SLAs monitoring for a Storage as a Service facility is undertaken by (Ghosh and Ghosh 2012) where a design model for a dedicated system was provided. The tuple formalism for representing the SLAs clauses was used; they are decomposed in several services Parameters, established by defining some Service Level Objectives (SLO) describing the QoS levels to guarantee; the SLOs are measured by some Key Performance Indicators (KPI) defined as the atomic metrics to use. Additionally, roughly 30 research projects results are illustrated in a European Community Report (European Commission 2013). Such projects cover different and complementary automatized aspects of the SLAs lifecycle, such as specification modelling, management, real‐time and storage Cloud constraints, SLA enforcement, and others.
3. Reference Architecture for a Cloud Forensic Readiness System A Cloud Forensic Readiness System is represented by the Reference Architecture depicted in the complementary views of Figures 1 and 2. The main components of such a system are represented as rounded rectangles modelling a set of software modules dedicated to specific operations; they communicate each other via dedicated Open Virtualization Format (OVF) channels, which is a standard language suitable for both the design and the distribution of applications to be executed on different VMs. The CFRS communicates with the underlying Cloud architecture for real‐time data collection, respecting existing laws and privacy policies. Such data comprehends both Cloud Services artifacts, outputs from some existing Cloud monitoring tools, and some logs. From the bottom white rectangles in both pictures we can understand the details of these three types of data, defined as Cloud common features by CSA (2011) and considered a “must have” requirement for a CSP that needs to add an FR capability. The collected data will be encrypted and stored in dedicated components of the CFRS thus being compliant with the widely adopted British ACPO (2007) and American National Institute of Justice (NIJ 2008) guidelines concerning the preservation of the potential digital evidence. The whole CFRS activities and modules are constantly running and collecting data for obtaining always up‐to‐date versions. The Data Management sub‐system performs Forensic analysis and knowledge extraction in order to construct a correct and reliable events’ timeline.
43
Lucia De Marco et al
Figure 1: CFRS Reference Architecture (De Marco et. al 2013) All this information is fed to the Intrusion Detection sub‐component that is responsible to analyze when a Cloud incident is likely to happen. It implements specific policies for managing suspicious behaviors, and considers also the co‐signed SLA fed to the CFRS as a set of formal rules. The Intrusion Detection module strongly collaborates with the Events Alerting one, which generates alarms related to the detected suspicious behaviors. The Data Mining module can generate the case related digital evidences; they are then managed by the Preservation of Digital Evidences module, where dedicated policies and routines are implemented. Finally, the chain of custody (CoC) report (NIJ 2008), which is necessary for investigations, is performed by the Chain of Custody sub‐system, where additional data‐related information, e.g., location, date, time, time zone, and system component, have to be recorded. The CFRS can be responsible to interact with the competent bodies involved in the criminal cases management; indeed, the system may need to transmit the retrieved data belonging to the arisen cases, together with the digital evidences and the chain of custody documents, to pursue with law enforcement activities. For this aim, dedicated interfaces and communication modules are included in the CFRS reference architecture. The main purpose for needing a system component for SLAs monitoring is that they are considered as a trigger for specific Forensic Readiness actions. On depending on the closeness of the detected Readiness Events to violations, some specific alerts and warnings are launched. In some cases, some crime‐preventive actions can be performed, such as data collection, interruption of specific data transmission or storage, among others. They can be automatically instantiated if already provided by specific SLAs’ clauses concerning these situations; in other cases their presence must be properly evaluated.
44
Lucia De Marco et al
Figure 2: CFRS Reference Architecture (De Marco et. al 2014)
4. SLA Formalization In the Cloud most SLAs are contracts co‐signed among a Cloud Service Provider (CSP) responsible of providing a number of computing services through the Internet, and a Customer who exploits the services provided. Additional SLAs negotiations and co‐signing situations can exist wherein, for example, SLAs co‐signed among different CSPs for hardware and software resources outsourcing, and SLAs involving Third Parties. Customers of a given Cloud service are likely to be unaware of the complete flow of data among different sub‐providers due to the fact that the chain of sub‐services necessary to accomplish an activity and the related SLAs is not disclosed to unconcerned parties. Figure 3 shows a possible Customers and Providers interactions governed by SLAs. Interactions, and SLAs, can expand depending on the Cloud services outsourcing chain. In our research we consider the basic case, namely SLAs co‐signed between a CSP and a Customer (top of Figure 3). To the best of our knowledge, a standard about the contents of a Cloud SLA does not exist, but there are some common sections identified among most CSP by Baset (2012), where the anatomy of such contracts was drawn. Authors of the later paper affirm that an SLA is composed of: 1) the provided service levels and the metrics used for guaranteeing them, 2) the services time duration and their granularity, 3) the billing structure, 4) the policy about the level measurement, and 5) the reporting manners about service guarantee violations.
45
Lucia De Marco et al
Figure 3: Possible interactions involving SLAs
4.1 Forensic Readiness Events A Forensic Readiness system for the Cloud is responsible to implement a Forensic Readiness capability. This is meant to observe and monitor the changes of status of the underlying computing architecture to render it forensically ready. Its principal purpose is to record all the events happening in such environment, namely the Cloud, in order to facilitate the subsequent criminal investigations. Such events representing the changes of status and the operations happened, include some important investigative details, such as, the sender, the recipient, the operation, the date and time. In this context we can derive that a Forensic Readiness system is composed of a set of Subjects, Objects, and Actions, representing the set of Forensic events recording the changes of status of a Cloud environment. 4.1.1 Subject During an FR system life cycle, the recording activity involves the presence of subjects. They are the entities performing operations into a computing architecture. They can be both humans and system processes, e.g., an Internet browser session or a Skype ID. Let s be a subject; it belongs to the set of subjects S, where S = {s1, s2, s3, … , sh}, h 4.1.2 Object An Object is the target of an activity performed by a subject. An Object can be both a digital file, and a software or hardware resource. Let O be the set of Objects, o O where O = {o1, o2, o3, … , oi}, i . An object o O is described by a set of properties and it can be defined as o = {p Po | o αo p}. Po is the set of properties that can be used to describe an object; example of properties can be filename, date of creation, size and so on. αo is the relation used for connecting an object o O to a property p Po that describes it. Nevertheless, an object o O is described by a composition of one or several properties p Po; consequently O⊆ P(Po), where P(Po) is the set of all the subsets of Po. 4.1.3 Activity An activity is an operation executed by an entity with the effect of changing one or more statuses of a system, in our case of a CC environment. Let A be the set of Activities, a A where A = {a1, a2, a3, … , aj}, j . In a computing system we can have three main types of activities, namely movement m, processing p, and storage s (see Figure 4); all of them involve the presence of a subject who is the activity performer and an object that is the target of the operation. The movement activities include all the operations that move an object from a digital location to another; the storage activities instead reflect the operations dedicated to store an object into a specific digital location; all the other types of operation are generalized as processing
46
Lucia De Marco et al
activities. The three types of activities can be defined as a mathematical function that transforms the status of an object of the Cloud environment. All of them have the same domain and codomain sets, namely the set of objects O in both cases, because these functions transform an object into another one with some different properties. : → : → : → Figure 4: Mathematical Representation of FR Activities 4.1.4 Event An event is a finite set of activities performed by one or more subjects targeting one or more objects, recorded at a specific time interval. Let e be an event, let E be the set of events, then E = {e1, e2, e3, … , ek}, k . An event is defined by a mathematical tuple e = 〈 S, O, A, tstart, tend 〉 , where S is the set of Subjects O is the set of Objects A is the set of Actions tstart is the starting time tend is the ending time, where tstart ≤ tend.
4.2 SLA and Cloud Forensic Readiness System In a CFRS the SLAs have a big importance; they define what are the Cloud Services offered to the Customer by the Cloud Service Provider. An SLA is composed of a set of clauses, which describe all the constraints, behaviours, and duties of the co‐signer parties in order to guarantee the predefined Cloud Service(s) level. For instance, some clauses concern the metrics necessary for measuring the described service(s) level attributes, such as latency or average transmission errors rate. A Customer using a Cloud Service can utilize and generate some Cloud data, usually in the form of digital files described by different computing properties, such as date of creation, size, and type. A clause c belongs to the set of Clauses C, C = {c1, c2, c3, … , cl}, l . The SLA’s clauses are fed as input to a CFRS and represented as formal rules, in particular as predicates of the first order logic, like A B. In this paper we assume that the set of rules is consistent. Each rule expresses a constraint to be respected by both parties in the context of the Cloud Service(s). For instance, a security constraint can be the allowance of a certain number of a Cloud Service login attempts by the same IP address. Assuming that such number is three, and assuming that an SLA clause affirms that at the fourth attempt the Cloud Service Provider will deny the possibility of a further login. The related formal rule can be login attempts > 3 login denied In the CFRS a recording of the events in the form described in Section 4.1.4 will happen: in this case three events representing the three login attempts are recorded; each of them comprehends subjects, objects, actions, start time, and end time. e1 =〈 “IPaddress1”, “CloudServiceHomePage” , p(CloudServiceHomePage), tstart1, tend1 〉 e2 =〈 “IPaddress1”, “CloudServiceHomePage” , p(CloudServiceHomePage), tstart2, tend2 〉 e3 =〈 “IPaddress1”, “CloudServiceHomePage” , p(CloudServiceHomePage), tstart3, tend3 〉 For all the three events we have that IPaddress1 S; CloudServiceHomePage O; login is a processing activity type represented as p(CloudServiceHomePage). Finally, the three events have their start and end times. In case more than three login attempts will happen, additional FR events similar to e1, e2, e3 are recorded. On depending on the countermeasures designed for security violations, the system will be allowed to behave in specific manners, even though it is not required by the Forensic Readiness capability itself. In order to
47
Lucia De Marco et al
demonstrate the positive side effects of the presence of a CFRS, we will illustrate a simple scenario. For instance, in order to avoid the unhallowed login attempts, the system can be capable of raising warnings/alerts to both the CSP and Customer when the second login attempt is recorded, such that the limit condition of three attempts is not reached, and the recorded Cloud behaviour is compliant with the security measures. There exists a module of the CFRS reference architecture depicted in Figures 1 and 2, namely the Events Alerting one dedicated to generate such alerts. Differently, stronger reactions can be triggered when a third login attempt is recorded; for example, they can be designed to deny a subsequent login action due that the maximum number of attempts is reached. Nevertheless, different behaviours can be implemented in case additional attempts are allowed and the FR capability cannot interrupt the flow of actions happening into the Cloud; in this particular option, CFRS is only dedicated to record the events as soon as they are run. Figure 5 depicts the CFRS modules interaction.
Figure 5. Excerpt of CFRS Modules Interaction
4.3 SLA and FR Formalization The SLA can be formally represented as an element of the set of SLA. Let sla be and SLA, it belongs to the set of SLA SLA, SLA = {sla1, sla2, sla3, … , slam}, m . At the light of all the previous considerations, an SLA can be represented as the following tuple sla = 〈 csp, csc, C, tstart, tend 〉 where csp is a Cloud Service Provider csc is a Cloud Service Customer C is the set of Clauses composing the SLA represented as predicates of the first order logic tstart is the starting time of the SLA validity tend is the ending time of the SLA validity. Let csp be a Cloud Service Provider; it belongs to the set of CSPs CSP, CSP = {csp1, csp2, csp3, … , cspn}, n . Let csc be a Cloud Service Customer; it belongs to the set of CSCs CSC, CSC = {csc1, csc2, csc3, … , csco}, o . Finally, a Cloud Forensic Readiness capability fr is an element of the set of Forensic Readiness capabilities FR, FR= {fr1, fr2, fr3, … , frp}, p . An FR capability fr can be formally represented as the following tuple fr = 〈 S, O, A, E, sla 〉 where S is the set of Subjects O is the set of Objects A is the set of Activities E is the set of recorded FR Events sla SLA is the considered Service Level Agreement contract.
48
Lucia De Marco et al
5. Discussion The aim of this paper is to address the problem of managing SLAs contracts as part of a Cloud Forensic Readiness system by using a mathematical model. This has two distinct aspects; firstly, we propose a reference architecture of a CFRS with the purpose of preparing the Cloud for easy and quick digital forensic investigations. The new system has to be able to monitor the Cloud, to detect system artifacts, log, and all the necessary information for a CFRS correct behaviour. In general, we can confirm that the provision of forensic readiness capability in the Cloud can lead to several positive side effects, such as, an improvement of the customers’ data privacy, an enhancement of the Cloud internal security, a manner for guaranteeing the respect from all the parties of the contractual duties and constraints. Secondly, we develop a mathematical model for an SLA, forensic readiness events and capability, which are part of the CFRS. The main goal, here, is to identify the principal entities, and their relationships. Moreover, the manner for representing the contractual clauses and the events arisen by their violations is also included. All these concepts are not exhaustive when compared to real cases, but they cover only basic ones, i.e., the presence of one SLA, one CSP, and one Customer. Even though it is limited to specific cases, the formalism can be considered exhaustive and consistent, due to the lack of contradictory rules, affirmations, and assumptions.
6. Conclusions and future work Cloud SLAs management is an emerging research topic; it can be considered as a specific case of natural‐ language‐based document management. The main purpose is to render some sections of such contract as machine‐readable and manageable in order to prevent as soon as possible cyber‐crimes related to Cloud data. The best manner for achieving this purpose is to exploit mathematical formalisms for abstracting the structure and the contents of SLAs, which are represented in the form of mathematical rules. Such rules are used as input of a CFRS component that is in charge of constantly monitoring the distance between the expected and the actual behaviours. As soon as a reasonable distance is detected, proper Forensic Readiness actions can be triggered. In this paper we focused on a specific case, namely the one involving the existence of an SLA co‐ signed among a customer and a provider, thus circumscribing the Cloud dimension. We have provided formal definitions for the basic concepts about SLA management and the formalization of some SLA’s clauses. In the future we intend to extend this proposal, in order to cover other cases, namely the ones involving a chain of CSPs, and an intersection of SLAs. One of the achievable purposes is to understand the complete users and providers data flow and paying attention on possible policies and laws violations, so that proper warnings can be triggered as soon as one of them is likely to be violated. For these extensions, proper formal rules will be provided, in order to demonstrate also the scalability of the approach. The final part of our research work will comprehend a prototype for our set of rules in order to verify if they can be easily monitored and effectively used for preventing some Cloud crimes by witnessing the actual Cloud actors’ and data behaviour.
References ACPO ‐ Association of Chief Police Officers (2007) “Good Practice Guide for Computer Based Electronic Evidence”, [online], http://www.acpo.police.uk/asp/policies/Data/ACPO%20Guidelines%20v18.pdf Ambhire, V. R., Meshram, B. B. (2012) “Digital Forensic Tools”, IOSR Journal of Engineering, Vol 2, No. 3, pp 392‐398. Baset, S.A. (2012) “Cloud SLAs: present and future”, ACM SIGOPS Operating Systems Review, Vol 46, No. 2, pp 57‐66. Birk, D. (2011). “Technical challenges of forensic investigations in cloud computing environments”, Workshop on Cryptography and Security in Clouds, pp 1‐6. Birk, D., and Wegener, C. (2011). “Technical issues of forensic investigations in cloud computing environments”, IEEE Sixth International Workshop on Systematic Approaches to Digital Forensic Engineering, pp 1‐10. Casey, E. (2011) Digital Evidence and Computer Crime, 3rd Edition, Academic Press, New York. CSA ‐ Cloud Security Alliance (2011) “Security Guidance for Critical Areas of Focus in Cloud Computing v 3.0.”, [online], https://cloudsecurityalliance.org/guidance/csaguide.v3.0.pdf CSA ‐ Cloud Security Alliance (2013) “The Notorious Nine Cloud Computing Top Threats in 2013”, [online], https://downloads.cloudsecurityalliance.org/initiatives/top_threats/The_Notorious_Nine_Cloud_Computing_Top_Th reats_in_2013.pdf Czajkowski, K., Foster, I., Kesselman, C., Sander, V., Tuecke, S. (2002) “SNAP: A protocol for negotiating service level agreements and coordinating resource management in distributed systems”, Job scheduling strategies for parallel processing, Springer Berlin Heidelberg, pp 153‐183. De Marco, L., Kechadi, M‐T., and Ferrucci, F. (2013) “Cloud Forensic readiness: Foundations”, Proceedings of the 5th International Conference on Digital Forensics & Cyber Crime, LNICST series, to appear. De Marco, L., Ferrucci, F., and Kechadi, M‐T. (2014) “Reference Architecture for a Cloud Forensic Readiness System”, EAI Endorsed Transactions on Security and Safety, ICST, to appear.
49
Lucia De Marco et al
Dykstra, J., and Sherman, A.T. (2012) “Acquiring Forensic Evidence from Infrastructure‐as‐a‐Service Cloud Computing: Exploring and Evaluating Tools, Trust, and Techniques”, Proceedings of the 12th Annual DF Research Conference, Digital Investigation, Vol 9, pp 90–98. Dykstra, J., and Sherman, A.T. (2013) “Design and Implementation of FROST: Digital Forensic Tools for the OpenStack Cloud Computing Platform”, Proceedings of the 13th Annual DFRWS Conference, Digital Investigation, Vol 10, pp 87‐95. Endicott‐Popovsky B., Frincke, D., Taylor, C. (2007) “A Theoretical Framework for Organizational Network Forensic Readiness”, Journal of Computers volume, Vol 2, No. 3, pp 1‐11. European Commission ‐ Directorate General Communications Networks, Content and Technology – Unit E2 – Software and Services, Cloud (2013) “Cloud Computing Service Level Agreements ‐ Exploitation of Research Results”, [online], Editor: Dimosthenis Kyriazis, http://ec.europa.eu/information_society/newsroom/cf/dae/document.cfm?doc_id=2496 Garfinkel, S. L. (2010). “Digital forensics research: The next 10 years”, Digital Investigation, Vol 7, pp 64‐73. th Gaudel, M.C. (1994) “Formal specification techniques”, Proceedings of the 16 International Conference on Software Engineering, pp 223‐227. Ghosh, N., Ghosh, S.K. (2012) “An approach to identify and monitor SLA parameters for storage‐as‐a‐service cloud delivery model”, Globecom Workshops (GC Wkshps), pp 724‐729. Grobler, T., Louwrens, B. (2007) “Digital forensic readiness as a component of information security best practice”, Proceedings of New Approaches for Security, Privacy and Trust in Complex Environments, 22nd International Information Security Conference, Vol 232, pp 13‐24. Jansen, W., and Ayers, R. P. (2007) “SP 800‐101. Guidelines on Cell Phone Forensics”, NIST Technical Report, Gaithersburg, MD, United States. Ishakian, V., Lapets, A., Bestavros, A., Kfoury, A. (2011) “Formal Verification of SLA Transformations”, 2011 IEEE World Congress on Services, pp 540‐547. Mell, P., and Grance, T. (2011) “Final Version of NIST Cloud Computing Definition”, [online], http://csrc.nist.gov/publications/nistpubs/800‐145/SP800‐145.pdf Mishra, A. K., Matta, P., Pilli, E. S., and Joshi, R. C. (2012). “Cloud Forensics: State‐of‐the‐Art and Research Challenges”, International Symposium on Cloud and Services Computing, pp 164‐170. Mouton, F., and Venter, H.S. (2011) “A prototype for achieving digital forensic readiness on wireless sensor networks”, Proceedings of IEEE AFRICON, pp 1‐6. NIJ ‐ National Institute of Justice (2008) “Electronic Crime Scene Investigation Guide: A Guide for First Responders”, [online], http://www.nij.gov/publications/pages/publication‐detail.aspx?ncjnumber=219941 OVF ‐ Open Virtualization Format Standard [online], http://www.dmtf.org/standards/ovf Palmer, G. (2001) “A Road Map for Digital Forensic Research”, Report from the First Digital Forensic Workshop. Paschke, A. and Bichler. M. (2008) “Knowledge representation concepts for automated SLA management”, Decision Support Systems, Vol 46, Issue 1, pp 187‐205. Patel, P., Ranabahu, A. H., and Sheth, A. P. (2009) “Service Level Agreement in Cloud Computing”, [online], http://corescholar.libraries.wright.edu/knoesis/78 Reddy, K., and Venter, H.S. (2013) “The architecture of a digital forensic readiness management system”, Computers & Security, Vol 32, pp 73‐89. Reilly, D., Wren, C., and Berry, T. (2010) “Cloud computing: Forensic challenges for law enforcement”, Proceedings of the International Conference on Internet Technology and Secured Transactions (ICITST), pp 1‐7. Reilly, D., Wren, C., and Berry, T. (2011). “Cloud computing: Pros and cons for computer forensic investigations”, International Journal Multimedia and Image Processing, Vol 1, No. 1, pp 26‐34. Rowlingson, R. (2004) “A ten step process for forensic readiness”, International Journal of Digital Evidence, Vol 2, No. 3, pp 1–28. th Ruan, K., Carthy, J., Kechadi T. and Crosbie, M. (2011) “Cloud forensics: an overview”, Proceedings of the 7 IFIP International Conference on Digital Forensics, Advances in Digital Forensics, Vol. 7. Ruan, K., Carthy, J., Kechadi, T., and Baggili, I. (2013). “Cloud forensics definitions and critical criteria for cloud forensic capability: An overview of survey results”, Digital Investigation, Vol 10, No. 1, pp 34‐43. Skene, J., Skene, A., Crampton, J., and Emmerich, W. (2007) “The monitorability of service‐level agreements for application‐ service provision”, Proceedings of the 6th international workshop on Software and performance, pp 3‐14. Tan, J. (2001) “Forensic Readiness, Technical report”, [online], @Stake Organization, Cambridge, MA, USA http://isis.poly.edu/kulesh/forensics/forensic_readiness.pdf Trenwith, P.M., and Venter, H.S. (2013) “Digital forensic readiness in the cloud”, Proceedings of Information Security for South Africa, pp.1‐5. Unger, T., Leymann, F., Mauchart, S., Scheibler, T. (2008) “Aggregation of Service Level Agreements in the Context of Business Processes”, Proceedings of the 12th International Conference on Enterprise Distributed Object Computing, pp 43‐52. Valjarevic, A., and Venter, H.S. (2011) “Towards a Digital Forensic Readiness Framework for Public Key Infrastructure systems”, Proceedings of Information Security South Africa (ISSA), pp 1‐10.
50
Retention and Disposition in the Cloud ‐ Do You Really Have Control? Patricia Franks1 and Alan Doyle2 1 San Jose State University, San Jose, CA, US 2 University of British Columbia, Vancouver, BC, CA
[email protected] [email protected] Abstract: Effective Information Governance is increasingly recognized as an imperative for corporate compliance and risk mitigation. Records retention and disposition schedules serve to cut costs for discovery and storage as well as reduce risk and increase compliance. By 2017, nearly one half of all large enterprises are expected to be engaged in hybrid (i.e., public/private) cloud computing (Babcock 2013). Although a greater portion of the organization’s records will be in the “possession” or “custody” of a cloud service provider, the organization maintains ultimate responsibility to preserve and produce those records for as long as necessary. It is, therefore, essential that organizations are able to “trust” that their records residing in the cloud can be retained and disposed of in accordance with the same requirements that govern the retention and disposition of records stored within the enterprise. In 2013, a multi‐disciplinary, international research project known as InterPARES Trust (ITrust) was formed to explore issues concerning digital records entrusted to the Internet. The team studying the topic of retention and disposition in a cloud environment conducted an extensive literature review and identified a number of overarching issues related to the implementation of a defensible retention and disposition plan, including metadata control, ownership, records portability, digital continuity, and long‐term sustainability. These issues were further explored in relation to two categories of records: those with short‐term retention requirements and those with long‐term retention requirements. Records residing in the clouds that have a short retention requirement face challenges that include reliable capture, management, control, and disposition. Records with long‐term retention requirements pose additional challenges related to physical or intellectual transfer to an archive, digital preservation, migration, and long‐term sustainability. This paper provides a synthesis of the current literature, including articles documenting the emergence of new approaches related to records retention and disposition in a cloud environment. Keywords: retention, disposition, cloud services, functional requirements, preservation, records management
1. Introduction Records managers may be involved in all aspects of the information management in the entity they work for, but it is the two essential tasks of the records manager that are the focus of this paper: retention and disposition of records residing in the clouds. Today records are predominantly created and stored electronically. This electronically stored information (ESI) may be retained within equipment owned and controlled by the enterprise or by third parties in a cloud environment. Regardless of the location of that information or the party that has physical control, the records are the responsibility of the organization and, as such, must be retained and disposed of in a manner consistent with the organization’s basic retention and disposition requirements (ARMA 2010, p. 6). The International Organization for Standardization (ISO) describes basic retention and disposition systems requirements in this manner: “Record systems should be capable of facilitating and implementing decisions on the retention or disposition of records. It should be possible for these decisions to be made at any time in the existence of records, including during the design stage of records systems. It should also be possible, where appropriate, for disposition to be activated automatically. Systems should provide audit trails or other methods to track completed disposition actions” (ISO 15489‐1 2001, p. 10). Retention is the continued possession, use, and control of the records that must be kept to meet administrative, fiscal, legal, or historical requirements. Disposition incudes the “range of processes associated with implementing records retention, destruction or transfer decisions which are documented in disposition authorities or other instruments” (ISO 15489‐1 2001, p. 3). Cloud providers, as agents for the organization, are expected to integrate retention and disposition functionality within their systems and/or to provide a secure,
51
Patricia Franks and Alan Doyle
convenient, cost effective method for export of those records to the organization for management and preservation when necessary.
2. Problem Statement The growth and use of big data and social media coupled with the perceived benefits of cloud services—e.g., elasticity, continuous availability, and cost savings—have changed the way organizations do business. Electronic records that were once stored and managed on servers within the organization are increasingly hosted in a cloud environment under the control of a third party. It is essential that public and private organizations are able to “trust” that their records residing in the cloud can be retained and disposed of in accordance with the same requirements that govern the retention and disposition of records stored within the enterprise. This prompts two questions: “How does the use of cloud services affect our capability to retain and dispose of records in accordance with the law and other applicable guidelines?” “What can be done to mitigate any risks arising from the gaps between our ability to apply retention and disposition actions to manage records residing within the enterprise and those residing in the cloud?” The answers to these questions will result in requirements for service providers and systems that store records in the cloud that, if present, would engender trust in the client organization that the records will be retained and disposed of in accordance with a legally defensible retention and disposition plan.
3. Review of Literature A key concept that emerged through a review of the literature is that the cloud is an ecosystem consisting of cloud providers, customers, digital device manufacturers and bandwidth providers, and content companies, situated globally and governed by national and international legal and regulatory regimes (Rayport et al. 2009). Consequently retention and disposition control results in complex management challenges and decisions— there is no one‐size fits all scenario (InterPARES Trust 2014). Effective governance of records in the cloud presupposes that records are identified and retention and disposition requirements have been determined. It is suggested that service agreements and contracts include “enforcement of retention periods” as one of their key components (Blair 2010). However, the provision of such an element in a contract does not guarantee systematic retention and disposition of records, as the multiplicity of a cloud provider’s data centers situated in different geographic locations may cause legal jurisdiction issues (Cunningham 2010). In addition, multiple data centers result in multiple copies of records, and as it may be impossible to determine the exact number of copies, a secure disposition cannot be guaranteed (Stuart and Bromage 2010).
3.1 Not all Information is Equal The primary purpose of a records retention and disposition schedule is to ensure that records are retained only as long as necessary and then disposed of when they no longer have value (Franks 2013, p. 100). Records and information created within the organization and transferred to a cloud vendor can be categorized and registered with a records management system before exiting the enterprise. Records and information generated in the cloud are under the physical control of a third party and must be captured after creation and then categorized and managed. Industries that produce big data—e.g., transportation, utilities, and biosciences—must capture and categorize information and records electronically and automatically (without human intervention to impose retention rules). Not all information in the cloud rises to the level of a record; however, the organization is still responsible for managing nonrecord content.
3.2 Nonrecords and Ephemeral Records A nonrecord is any document, device, or item, regardless of physical form or characteristic, created or received that does not serve to document the organization, functions, policies, decisions, procedures, operations, or other activities of the organization. Nonrecords include copies of records, junk mail/spam, and listserv materials.
52
Patricia Franks and Alan Doyle
Records and nonrecords may be intertwined. For example, the organization may consider a social networking profile a record but consider comments nonrecords. Those comments may not need to be retained according to the records retention schedule. The should, however, be monitored, since security and privacy risks emerge through posts that may reveal trade secrets or violate company policy (Franks 2013, p. 179). Nonrecords are termed ephemeral records by some organizations and included on a disposition authority, such as one developed by the Government of Western Australia. They have no continuing value, are generally only needed for a few hours or a few days, and may not need to be placed in the official recordkeeping system— examples include mailing lists stored in cloud email systems and rough drafts of reports held in file hosting services (storage utilities).
3.3 Functional Requirements for Records Retention and Disposition Key records management standards and guidelines were examined as part of the literature review (primarily ISO 15489, ISO 283081, ISO 16175, DoD 5015.2 and MoReq 2010), as they inform software retention and disposition functional requirements. The first records management standard, ISO 15489, was designed to meet the ongoing needs for recordkeeping in a business environment and is of use in government and non‐government organizations. Part 1, General, details the high‐level functional requirements for retention and disposition, whether in the enterprise or in the cloud (2001). Part 2, Guidelines, gives an overview of the processes and factors to be considered for implementation (2001). . ISO 23081, Managing Metadata for Records, is a three‐part technical specification that defines metadata needed to manage records. Part 1 addresses the relevance of records management metadata in business processes and the different roles and types of metadata that support business and records management processes. Part 2 covers conceptual and implementation issues, and Part 3 provides guidance on conducting a self‐assessment on records metadata in relation to the creation, capture and control of records. ISO 16175, Principles and Functional Requirements for Records in Electronic Office Environments, Parts 1, 2, and 3, is applicable to products often termed “electronic records management systems” or “enterprise content management systems.” The functional requirements applicable to retention and disposition which should be extended to the cloud include the directive that systems need to dispose of records in a systematic, auditable and accountable way in line with operational and legal requirements (part 1, 3.1); a list of functional requirements related to disposition authorities (policies) that authorize the disposal of records by destruction, transfer of control, or application of a review period (part 2, 3.6); and functional requirements for business systems that include disposition of records and maintenance of the metadata of destroyed records as a record of the disposition activity (part 3, 3.4). The Electronic Records Management Software Application Design Criteria Standard, DoD 5015.02‐STD (2007), establishes design criteria of electronic records management software applications (RMAs) that meet the requirements of the U.S. Department of Defense. The standard emphasizes key principles of records management such as file plan creation (record group classification), access controls, data discovery, metadata, records search and retrieval, vital records, records retention scheduling, records destruction, and interoperability between solutions. Functional requirements for retention and disposition include those that allow only authorized individuals to create, edit, and delete retention schedule components of records categories and provide the capability for only authorized individuals to extend or suspend (freeze) the retention period of record folders or records beyond their disposition. Two additional requirements relate to the identification and presentation for disposition of records, including their metadata, that have met their retention period and to the application of the records retention schedule to backup copies. MoReq 2010, Modular Requirements for Records Systems, published by the Document Lifecycle Management forum (http://moreq2010.eu/), contains functional requirements for the management of an Electronic Records Management System (ERMS) that can provide a single service or several services bundled together including classification service, model metadata service, disposal scheduling service, disposal handling service, and export service. MoReq 2010 introduced a new concept over previous versions: records need not be stored in a central repository but can be managed in place where they are created or in specialized business applications.
53
Patricia Franks and Alan Doyle
A disposal authority sets disposal scheduling, and the destruction of records occurs at the record level in a bottom‐up approach. The destruction process means that only some metadata, some event histories, plus the content of the record are deleted, leaving behind a residual entity for audit purposes. The aggregate record is automatically destroyed when all records associated with it have been destroyed. This standard provides an XML schema for the export of records so that no matter when a record is exported, its retention and disposition policies will be transferred to a new system allowing for continuity of the records retention schedule. In this respect MoReq 2010 has many of the features that will help to support emerging technologies such as Cloud Computing. Based on an analysis of the retention and disposition functional requirements recommended in the de jure and de facto standards, questions related to retention and disposition in a cloud environment were formulated and categorized as follows:
Establishing disposition authorities. (Retention and disposition schedules): Can retention periods be applied? Can destruction actions be automated?
Applying disposition authorities. Can a disposition authority—retention and disposition specifications—be applied to aggregations of records? Can records be retained indefinitely or destroyed or transferred at a future date? Can destruction actions be automated?
Executing disposition authorities. Can records (and backups) be deleted according to the schedule? Are users alerted of conflicts related to links from records to be deleted to other records aggregations that have different disposition requirements? If more than one disposal authority is associated with an aggregation of records, can all retention requirements be tracked to allow the manual or automatic lock or freeze on the process (e.g., for litigation or legal discovery)?
Documenting disposal actions. Are disposal actions documented in process metadata? Can all disposal actions be automatically recorded and reported to the administrator?
Reviewing disposition. Are electronic aggregations presented for review along with their records management metadata and disposal authority information so both content and records management metadata can be viewed? Are all decisions made during review stored in metadata? Can the system generate reports? Is the ability to interface with workflow facility to support scheduling, review, and export transfer process provided or supported?
Additional considerations are those for import and export of records, security in transit and at rest, and compatibility of metadata schema used in repository with other systems, such as Enterprise Content Management or Records Management Systems.
4. Overarching Issues Among the overarching issues facing organizations considering trusting records to the clouds are metadata control, ownership, records portability, digital continuity, and long‐term sustainability. These issues can be investigated as related to either short‐term or long‐term retention requirements for both structured and unstructured data. In order to ensure records are retained and destroyed in compliance with the retention rules of the entity, it is imperative that records managers have access to the material (e.g., through a cloud service provided dashboard).
Metadata control and ownership. Over “93% of the respondents to a survey conducted through the University of British Columbia expressed concern over ownership of data and metadata,” and access issues were the most frequently experienced issues (34%) related to cloud usage (UBC 2013). Bailey and Wu (2012) warn that although customers retain ownership of the data they provide and any data they purchase to enrich it, there may be limitations to the form in which the data is returned to them as well as the amount of provider‐owned metadata returned. Sufficient metadata must be retained to satisfy access and retention requirements. Organizations employing a private cloud are at an advantage in managing the metadata in the cloud site.
Records portability. Cloud portability is the ability to move records from one cloud‐computing environment to another. The US National Archive and Records Administration (NARA) posits, “A lack of portability standards may result in difficulty removing records for recordkeeping requirements or
54
Patricia Franks and Alan Doyle
complicate the transition to another environment.” Technical transparency and industry‐wide standards are needed to allow for easy portability, which will facilitate cloud content transfer and lower an organization’s exposure to solution/vendor lock‐in.
Digital Continuity. The United Kingdom National Archives describes digital continuity as “the ability to use digital information in the way that you need, for as long a you need” so that you can find it when you need it, open it as you need it, work with it in the way you need to, understand what it is and what it is about, and trust that it is what it says it is.” Digital continuity occurs when there is a convergence of business needs with usable, complete, and available information assets and the technical services and environment that support business use. Retention and disposition challenges related to digital continuity posed by the use of cloud systems include the lack of contractual arrangements and controls to ensure the safe custody and proper preservation of records.
4.1 Short‐Term Retention Requirements Short‐term records have a limited useful life and are retained only as long as needed for the primary purpose for which they were created; they have administrative (operational), legal (regulatory), and/or fiscal value. Organizations operating in heavily regulated industries, such as the Finance Industry, may receive guidance to help them manage records created when they utilize blogs, microblogs and social networking sites to reach out to current and prospective clients. For example, the Finance Industry Regulatory Authority’s FINRA Regulatory Notice 12‐29, February 4, 2013, categorizes communications shared by the finance industry as retail communications since they are distributed or made available to more than twenty‐five retail investors within a thirty calendar‐day period. Social media communications fall in this category and are exempt from pre‐use approval requirements; must be managed “after” posting; must comply with NASD Rule 2210(b)(4)(A) concerning recordkeeping requirements; and must be retained for a period of three years (two years on the premises). Short‐term records also include those generated dynamically by applications deployed in the cloud as a result of business transactions. This data can be downloaded to an enterprise content management system or managed within the cloud environment. Legal concerns related to internal operations include updating records schedules to reflect new records series resulting from cloud applications and implementing access policies for data hosted by third parties. External considerations that must be resolved include accepting only vendor Terms of Service Agreements (TSAs) and Service Level Agreements (SLAs) that are consistent with the organizations goals and objectives. The elastic nature of the cloud and the free (e.g., social networks) or low‐cost cloud services (file storage services) available make disposition appear less urgent. Unless the location of all electronic records is known and rules are in place to dispose of these records automatically, disposition actions may be haphazard. The routine destruction of records is essential to a defensible records management program, yet many institutions effectively never destroy electronic records.
4.2 Long‐Term Records Issues and Challenges Records with long‐term retention requirements present challenges related to the cloud provider’s ability to maintain the integrity of the records over the long term, migrate records to new file formats and systems, and export the records and associated metadata to the enterprise for long‐term or permanent preservation. Standardized metadata is essential for system interoperability; however, proprietary systems do not adhere to a standardized records management metadata schema. At some point, temporary records must be destroyed. Electronic systems, including those in the cloud, must allow for rules that classify records and establish a destruction date. Automatic notification of the impending destruction is required to allow suspension of destruction if necessary, such as in the case of impending litigation.
55
Patricia Franks and Alan Doyle
If a cloud preservation service is employed, the organization must be confident that the data is in the custody of a trusted provider that employs a preservation system that is compliant with the Open Archival Information System (OAIS) reference model and protected by a sound disaster recovery plan. Some archives not only protect their collections but also provide access for researchers and other interested parties. Providing access to digital collections in a cloud archive may be an added requirement when searching for a cloud preservation solution.
5. Mixed Trust Response to Retaining Information in the Cloud There seems to be a tension between the economic benefits of cloud computing versus the potential risks pertaining to legal compliance, security, and data integrity issues in cloud adoption. While some end‐users and customers report glowing experiences with their transition to the cloud as cited by Ajero (2012) and Greengard (2013), others still provide cautions. Wang et al. note that Small and Medium Enterprises (SMEs) are more likely to transition to the cloud because of their lack of a deep IT infrastructure; however, large enterprises, especially multinationals, typically have complex technologies, systems, cultures, and politics, which result in trust barriers to cloud adoption (2012). The authors suggest the potential development of decision‐modeling tools to aid in the cloud provider selection process and service agreement negotiations. Venter et al., discuss that C‐levels often report concern about data security and integrity in multi‐tenancy and distributed storage environments. Consequently there appears to be a trend when dealing with the retention and disposition of sensitive and highly confidential information to trust only a private cloud deployment (Freguson‐Boucher et al.; Viewpointe) or a hybrid configuration (Géczy et al.; HP), over public cloud models that are mostly associated with multi‐tenancy and distributed data centers.
6. New Approaches to Manage Retention and Disposition in the Cloud New approaches to apply retention policies and enforce deletion are emerging; however, such solutions are heavily Information Technology, computer, and software oriented. RIM professionals and archivists are not often key stakeholders in the process. Websites and white papers of a number of cloud providers—ArchiveSocial, Microsoft Azure, Microsoft, Cloud Kite, Egnyte, Gimmal, GoGrid, Google Apps, HP Records Manager, IBM Cloud, Office 355, Rackspace, Smarsh, and CenturyLink—were examined. Product documentation reflects that the data centers of most vendors are designed to be compliant with physical and network security, for example Statement of Accounting Standard number 70 (SAS70), SSAE, ISO 27001, US‐EU Safe Harbor, HIPPA or GLBA. Only HP’s Records Manager, which can be deployed as either a private or hybrid solution, adheres to ISO 15489:2001 and elements of ISO 16175 and is certified by DoD5015.2V3 (perpetual) and VERS (HP 2014). According to Li et al., ”scalable management of data retention policies” can be achieved by encrypting data stored in the cloud and securing the key at a secure data center (2012). Nicolaou et al. (2012) and Rabinovici‐ Cohen et al. (2011) recommend that all data, in transit and at rest, should be encrypted. Srinivasan proposes a cloud security infrastructure for customers to control their virtual machine, monitor the access logs of cloud providers, and protect their data by holding the encryption key on‐site (2013). Tang et al. propose FADE (file assured deletion) encryption technology to implement and execute retention and disposition policies. This technology will also facilitate complete data withdrawal when switching vendors (2012). Muthulakshmi et al. propose a framework called Cloud Information Accountability (CIA) that will allow users to audit their data and copies made without their knowledge in the cloud environment (2013). In their article, Rabinovici‐Cohen et al. introduce SIRF (self‐contained information retention format) as a means of authenticating data stored in a cloud system (Ibid). Cohasset Associates presents EMC Data Domain Retention Lock, which is compliant with the MoReq2010 criteria of discreetness, completeness, immutability, and destructibility (2013).
56
Patricia Franks and Alan Doyle
Hitachi Data Systems explains that Hitachi Content Platform (HCP) ensures retention and disposition in the cloud environment, enables litigation hold or release, and provides assurances for data segregation in a multi‐ tenancy environment. Askhoj et al. suggest remodeling the OAIS with a Platform‐as‐a‐Service (PaaS) Layer, Software‐as‐a‐Service (SaaS) Layer, Preservation Layer, and Interaction Layer in order to preserve records in the cloud (2011). Preservica Cloud Edition, a workflow‐based digital preservation service that conforms to the OAIS digital archiving standard (ISO 14721:2003), is offered on Amazon Web Services (AWS). AWS provides durable storage that maintains multiple copies in multiple locations, verifies the integrity of data stored using checksums, and repairs corrupted files using redundant data. Preservica organizes and processes information ingested into the system to ensure it is findable and preserves it so that it is useful (Preservica 2014).
7. Conclusion Cloud computing is complex, and similarly retention and disposition in the cloud becomes complex due to multi‐tenancy, cross‐border legal issues, required assurances that copies in multiple locations are destroyed at the time of disposition or successfully “frozen” if a legal hold is required. The literature review indicates that good risk analysis and management coupled with information governance principles can help to guide organizations and cloud providers in assuring defensible retention and disposition in the cloud. There is lack of a strong voice of records management professionals in cloud computing; and the research literature on retention and disposition in the cloud is still emerging. It is imperative that records managers work with their institutions to make clear that records, in any format and any location, must be scheduled for retention or disposition. Records that are allowed to escape the control of records managers and records schedules are records that could cost the enterprise money, reputation, or its own history. The use of cloud services is driven by a desire to improve processes and reduce costs while retaining control—without effective retention and disposition, it will accomplish neither.
References Ajero, M. (2012) Random access: Can We Trust Our Studio Materials to the "Cloud"? American Music Teacher, Vol. 62, No.1, pp 50‐51. [Online]. http://www.thefreelibrary.com/Random+access%3A+can+we+trust+our+studio+materials+to+the+%22cloud%22%3F‐ a0299885768 Arma International. (2010) Guidelines for Outsourcing Records Storage to the Cloud, Overland Park, KS. Askhoj, J., Sugimoto, S., and Nagamori, M. (2011) “Preserving Records in the Coud,” Emerald, Vol. 21, No. 3, pp 175‐187. [Online]. http://dx.doi.org/10.1108/09565691111186858 Babcock, C. (2013) “Gartner: 50% of Enterprises Use Hybrid Cloud by 2017,” Information Week. [Online]. http://www.networkcomputing.com/cloud‐infrastructure/gartner‐50‐‐of‐enterprises‐use‐hybrid‐cloud‐by‐2017/d/d‐ id/1111769 [21 May 2014]. Bailey, D. and Wu, J. (2012) “Seeing the Future of Cloud Computing Standards,” Cutter IT Journal, Vol. 25, No. 8, p 14. Blair, B. (2010) “Governance for Protecting Information in the Cloud,” Information Management, Vol. 44, No. 5, p. 1. Cohasset Associates. (2013) MoReq2010: EMC Data Domain Retention Lock Compliance Edition, Chicago, IL. [Online]. http://www.emc.com/collateral/analyst‐reports/cohasset‐dd‐retention‐lock‐assoc‐comp‐assess‐summ‐ar.pdf Cunningham, P. (2010) “IT's Responsibility for Security, Compliance in the Cloud.” Information Management, Vol. 44, No. 5. DoD 5015.02‐STD. (2007) Electronic Records Management Software Application Design Criteria Standard. [Online]. http://www.dtic.mil/whs/directives/corres/pdf/501502std.pdf Ferguson‐Boucher, K., & Convery, N. (2011) Storing Information in the Cloud – A Research Project. Journal of the Society of Archivists, Vol. 32, No. 2, pp. 221‐239, doi:10.1080/00379816.2011.619693 Finra. (2013) Regulatory Notice 12‐29: Communications with the Public. [Online]. http://www.finra.org/web/groups/industry/@ip/@reg/@notice/documents/notices/p127014.pdf Franks, P. (2013) Records and Information Management, Neal‐Schuman, Chicago, IL. Géczy, P., Izumi, N., & Hasida, K. (2013) Hybrid Cloud Management: Foundations and Strategies. Review of Business & Finance Studies, Vol. 4, No. 1, pp. 37‐50. [Online]. http://search.proquest.com.libaccess.sjlibrary.org/docview/1445008520?accountid=10361 Greengard, S. (2013) Building Clouds that are Flexible and Secure. Baseline. [Online]. http://www.baselinemag.com/cloud‐ computing/building‐clouds‐that‐are‐flexible‐and‐secure/ Hitachi Data Systems. (2013) Introduction to Object Storage and Hitachi Content Platform, Hitachi Data Systems, Santa Clara, CA. [Online]. http://www.hds.com/assets/pdf/hitachi‐white‐paper‐introduction‐to‐object‐storage‐and‐hcp.pdf
57
Patricia Franks and Alan Doyle
HP. (2014) HP Records Manager: A Single Solution for Enterprise‐Scalable Document and Records Management. [Data sheet]. [Online]. http://www.autonomy.com/odoc/assets/global/pdf/Products/Participate/records‐ manager/20140328_RL_HP_Records_Manager_EDRMS_Capabilities_web.pdf [24 July 2014]. InterPARES Trust. (2014) “Retention and Disposition in a Cloud Environment Literature Review.” InterPARES Trust, University of British Columbia, Vancouver, CA. International Organization for Standardization. (2001) ISO 15489‐1. Information and documentation—records management—Part 1: General, p. 3, 10. International Organization for Standardization. (2001) ISO 15489‐2. Information and documentation—records management—part 2: Guidelines. International Organization for Standardization. (2010) ISO 16175 ‐1:2010 – Information and documentation‐‐Principles and functional requirements for records in electronic office environments—Part 1: Overview and statement of principles. International Organization for Standardization. (2011) ISO 16175‐2:2011 – Information and documentation—Principles and functional requirements for records in electronic office environments—Part 2: Guidelines and functional requirements for digital records management systems. International Organization for Standardization. (2010) ISO 16175‐3:2010 – Information and documentation—Principles and functional requirements in electronic office environments—Part 3: Guidelines and functional requirements for records in business systems. International Organization for Standardization. (2006) ISO 23081‐1:2006 – Information and documentation—Records management processes—Metadata for records—Part 1: Principles. International Organization for Standardization. (2009) ISO 23081‐2:2009 – Information and documentation—Managing metadata for records—Part 2: Conceptual and implementation issues. International Organization for Standardization. (2011) ISO/TR 23081‐3:2011– Information and documentation‐‐Managing metadata for records—Part 3: Self‐assessment method. Li, J., Singhal, S., Swaminathan, R., and Karp, A.H. (2012) “Managing Data Retention Policies at Scale,” IEEE Transactions on Network and Service Management, Vol. 9, No. 4, pp. 393‐406. Muthulakshmi, V., Yaseen, A.A., Santhoshkumar, D., and Vivek, M. (2013) “Enabling Data Security for Collective Records in the Cloud,” International Journal of Recent Technology and Engineering Vol. 2, No.1, pp.163‐167. Nara. (2010, September 8) “Guidance on Managing Records in Cloud Computing Environments,” NARA Bulletin 2010‐05. [Online]. http://www.archives.gov/records‐mgmt/bulletins/2010/2010‐05.html [25 June 2014]. National Archives of the UK. (n.d.) [Online]. http://www.nationalarchives.gov.uk/information‐management/our‐services/dc‐what‐is.htm [21 May 2014]. Nicolaou, C., Nicolau, A., and Nicolau, G. (2012) “Auditing in the Cloud: Challenges and Opportunities,” The CPA Journal Vol. 82, No. 1 pp. 66‐70. Preservica. (2014, July 24) Digital Preservation Using Cloud Services. [webinar]. [Online]. http://preservica.com/resource/dp‐using‐cloud‐storage2/. [24 July 2014]. Rabinovici‐Cohen, S., Baker, M., Cummings, R., Fineberg, S., and Marberg, J. (2011, June) “Towards SIRF: Self‐Contained Information Retention Format,” The 4th Annual International Conference on Systems and Storage, Haifa, Israel, doi: 10.1145/1987816.1987836. Rayport, J., and Hayward, A. (2009) “Envisioning the Cloud: The Next Computing Paradigm,” Marketspace. [Online]. http://www.hp.com/hpinfo/analystrelations/Marketspace_090320_Envisioning‐the‐Cloud.pdf Smarsh. (n.d.). [Online]. http://www.smarsh.com/archiving‐and‐compliance. [24 July 2014]. Srinivasan, S. (2013) “Is Security Realistic in Cloud Computing?” Journal of International Technology and Information Management Vol. 22, No. 4, pp. 47‐66. Stuart, K. and Bromage, D. (2010) “Current State of Play: Records Management and the Cloud.” Records Management Journal, Vol. 20, No. 2, pp. 217‐225. Tang, Y., Lee, P.P., Lui, J.C., and Perlman, R. (2010) “FADE: Secure Overlay Cloud Storage with File Assured Deletion,” Security and Privacy in Communication Networks, Vol. 50, pp. 380‐397. University of British Columbia. (2013). Records in the Cloud (RiC). [Online]. http://www.recordsinthecloud.org/ritc/research [21 May 2014]. Viewpointe. (2013) Information Governance and Cloud Computing: Approaches for Regulated Industries [White paper]. [Online]. http://www.ciosummits.com/Information_Governance_and_Cloud_Computing_Approaches_for_Regulated_Industries.pdf Wang, H., He, W., & Wang, F. (2012) Enterprise Cloud Service Architectures. Information Technology & Management, Vol. 13, No. 4, pp. 445‐454. doi:10.1007/s10799‐012‐0139‐4. Western Australia, Government of, State Records Office. (2013) DA 2013‐017 General Disposal Authority for State Government Information. [Online]. http://www.sro.wa.gov.au/sites/default/files/general_disposal_authority_for_state_government_information.pdf [21, May 2014].
58
National Intelligence and Cyber Competitiveness: Partnerships in Cyber Space Virginia Greiman Boston University, USA
[email protected] Abstract: The paradox of national security and cyber competitiveness raises significant questions of balancing cybersecurity interests with national intelligence, corporate competitiveness, information sharing in cyberspace, and concern for the individual’s privacy rights and civil liberties. The expansion of cloud services and big data collection methods around the globe has led to increasing concerns about the privacy of individual users as a result of the United States’ increasing dependency on cyber systems for national defense and intelligence. Cyberspace operates in a dynamic paradigm and the global interconnectedness of cyberspace enhances the magnitude of the acceleration of change and the need for international coordination and collaboration. Based on empirical research and an analysis of international and national cybersecurity initiatives and policies, this paper explores the advantages and the challenges of establishing a global legal and policy framework for cyber activity that balances national and private interests and that would enhance confidence and improve legal certainty in the global electronic marketplace. Much has been written about the legal rights and interests of government, private industry and individual users in cyberspace. However, relatively little has been written about how codes of conduct, standards and collaborative efforts can be used to structure advancement in technological knowledge for the benefit of all users and how these efforts can better prioritize the rights and responsibilities of each of the actors in cyberspace. The paper is organized to first present the important research questions raised by public and private organizations and individuals concerning the current policy gaps in harmonizing the concerns of these sectors, suggests areas for collaboration, and benchmarks areas for improvement to bring together the pressing concerns of these stakeholders to better frame the decision space for cyberspace policy. A global legal framework that balances national and private interests would enhance confidence and improve legal certainty in the global electronic marketplace. Based on empirical research this paper explores the unique characteristics of cyberspace and the existing legal concepts and doctrines for national intelligence and cyber security and organizes a conceptual legal structure for balancing the triumvirate of national security, corporate competitiveness and individual privacy. Keywords: cyber security, national intelligence
1. Introduction Based on empirical research and an analysis of international and national cybersecurity initiatives and policies, this paper explores the advantages and the challenges of establishing a global legal framework for cyber activity that advances the goals of national intelligence and technological cooperation and innovation, while protecting the competitive advantage of private business and the rights of the billions of users around the globe. The research seeks to emphasize the nature of cyberspace, its jurisdictional boundaries and the required controls and transparency necessary to advance opportunity while protecting the sacredness of individual privacy in surveillance societies. A pressing question for government policy makers and Congress is who bears the ultimate responsibility for a cybersecurity breach or attack? The scope of cyber responsibility and authority by private industry is unclear. Remarkably, if responsibility ultimately rests with private industry, the allocation has not been made by any clear direction of Congress, but incomprehensibly by default (Trope and Humes 2014). Private industry presently has little guidance on how to respond to many cyber threats and what the private sector’s role should be in the absence of clear policy, regulations and standards. Cyber‐security scholars have argued that, rather than thinking of private companies merely as potential victims of cyber‐crimes or as possible targets in cyber‐conflicts, we should think of them in administrative law terms similar to the challenges modern administrative states face in fields like environmental, antitrust, products liability and public health law (Sales 2013). Understanding the problem in regulatory terms provides an entirely new methodology for thinking about cyber security. Scholars that have explored the evolution of best practices for developing international cybersecurity legal frameworks have recommended an on‐going process and a dynamic “bottom up” approach. Among the problems with taking a comprehensive approach (or "top down" approach) to cybersecurity legal frameworks is that the term "comprehensive" means all things to all people… the meaning varies depending on the physical, educational, and economic resources available in different jurisdictions (Satola and Judy 2011). Because cyber technology is in constant flux the need to have the experts involved at every turn is critical in
59
Virginia Greiman
staying current with the changing needs of the intelligence and cyber security communities. Finding solutions to balance the tensions between sharing information, the need for secrecy, and privacy and civil liberty concerns is an important goal of this research. During recent hearings in the United States of the statutorily mandated Privacy and Civil Liberties Oversight Board (PCLOB), the need for technological expertise was frequently cited in fulfilling the national security responsibilities of all government branches including the Executive Branch, Congress, the FISA courts and other federal and state courts under various constitutional provisions, federal and state statutes, regulations and policies (PCLOB 2013). As detailed more fully below, the single greatest difficulty encountered thus far in the development of a legal response lies in the transnational nature of cyberspace and the need to secure international agreement for broadly applicable laws, best practices and standards controlling offenses in cyberspace.
2. Research Questions In the course of the research, the author has interviewed 15 experts on national intelligence and cyber security representing 5 government agencies, 3 non‐profits, and 7 technology companies engaged in the business of cybersecurity. In addition, relevant reports, press releases and articles issued by these organizations were reviewed as well as transcripts from hearings concerning the challenges facing these organizations. Highlighted in Table 1 below is a summary of the research objectives as well as the key questions posed to the experts relating to the intersection, interconnectedness and interoperability of national intelligence, privacy rights, and cyber security. Objective 1. To survey the national and international legal frameworks relevant to national intelligence and cyber security in the United States 2. To analyze the impact of the National Institute of Standards and Technology (NIST) Framework for Improving Critical Infrastructure Cybersecurity on business competition 3. To determine the benefits and challenges of cyber security information sharing between governmental agencies, the government and the private sector and the government and its citizens 4. To assess the collaborations between the government and the private sector and the challenges and conflicts they face 5. To understand the implications of the government’s national security policy on the individual’s right to privacy
6. To analyze the government’s current organizational structure and the conflicts that exist between the government’s interest in national intelligence and the private sector’s interest in remaining competitive?
Research Questions Are the national and international laws effective in advancing the goals of national intelligence and of cyber security? Does the NIST Framework address properly the concerns of private industry in promoting economic prosperity and innovation? What information needs to be shared between the government and the private sector? How would it make a difference? How will the recipient use the information? What initiatives have been established by law enforcement and service providers to protect private users from surveillance in the cyber sphere? Do our laws properly address the individual’s right to privacy including the right to understand the implications of the government’s national security policy? What initiatives have been established to encourage inter‐agency coordination and have they been effective? Have the private sector competitive interests been considered in the government’s national intelligence policy?
Table 1: Research Questions
3. Methodology The objective of this research study was to learn about the perceptions of key participants in national intelligence and cyber security and the issues they felt were most pressing in terms of the conflicts and challenges that exist in their field of expertise and the solutions that might best address these concerns.
4. The Cyber Triumvirate: Conflicts and Collaborations The use of cyber technology by corporations, government agencies and private individuals for differing purposes creates overlapping responsibilities and interests that have resulted in conflicting goals and challenges. National intelligence has become entangled with the need for cyber security, which have become intertwined with corporate competitiveness concerns, which in turn have amplified the inadequacy of privacy protection. Although it is possible that each of these concerns could have been addressed separately by evolving laws and regulatory structures over the past decade, the law’s slow response has allowed each concern to increase in magnitude and complexity, while exacerbating concerns in previously unrelated areas. The resulting triumvirate of privacy, data breaches, and cyber‐security is now far more complex than each of the individual issues and dominates much of the discourse about cyber breaches and cyber security. Priorities of each of these actors creates conflicts that result in competing concerns that cannot be easily reconciled. It
60
Virginia Greiman
also raises the importance of the need for collaboration to resolve universal problems and the need for better national intelligence and defense. Figure 1 diagrams this paradox as well as the various challenges and conflicts that exist among the competing actors that make up the triumvirate.
Figure 1: The Cyber Triumvirate
5. The National Intelligence Cyber Security Conundrum The events of September 11 fundamentally changed the context for relationships between the United States and other main global powers, and opened vast opportunities for cooperation. Since 9/11, the United States has seen a series of organizational and structural changes to address national security defense through direct and continuous action including the prevention of cyber‐attacks using all the elements of national and international power. In 2007 President Bush announced a Comprehensive National Cybersecurity Initiative (CNCI 2008) which heralded a coordinated government response. Two years later, President Obama announced the results of his own cybersecurity review identifying cybersecurity as one of the most serious economic and national security challenges we face (CPR 2009). In 2013 The White House issued Executive Order (EO) 13636 establishing a cyber‐environment that encourages efficiency, innovation, and economic prosperity. There is no doubt that counterintelligence problems within cybersecurity are from the perspective of the government, the most significant problem nations face today. However, the governmental strategy to combat cyber‐attacks is complicated by a complex, interconnected intelligence infrastructure. The U.S. Intelligence Community (I.C.) is a coalition of 17 agencies and organizations, including the Office of the Director of National Intelligence (ODNI), within the Executive Branch that work both independently and collaboratively to gather and analyse the intelligence necessary to conduct foreign relations and national security activities. Each of these agencies has highlighted the increasing number of threats that are cyber‐based and the dangers it poses to our national and economic security. The White House Chief Cybersecurity Coordinator, Michael Daniel expressed the conundrum as follows: “Enabling transparency about the intersection between cybersecurity and intelligence and providing the public with enough information is complicated. Too little transparency and citizens can lose faith in their government and institutions, while exposing too much can make it impossible to collect the intelligence we need to protect the nation” (Daniel 2014).
6. The Major Legal Regimes Governing Cyber Threats Though discussing all the laws and doctrines applicable to cyberspace is beyond the scope of this paper, this section will provide a high level overview of the major legal systems governing cyber law in the United States. These systems include: (1) intelligence law for actions relating to cyber‐based national security threats, (2) privacy law for data protection and civil liberties rights, (3) international law for armed conflict and human rights violations, and (4) criminal law for cyber‐attacks and computer crime.
61
Virginia Greiman
6.1 Intelligence Law Significantly, intelligence law emanates from the President’s inherent constitutional authority to respond to armed attacks including cyber‐attacks as part of his executive authority. Intelligence law serves to delimit the nature of intelligence activity that US officials may undertake, especially when operating domestically. In general, the purpose of the domestic law is to permit the exploitation of foreign intelligence sources while protecting American civil liberties. It also serves as a foundational source for the authorization of all national intelligence activity. There are few cyber‐specific laws with the exception of the Computer Fraud and Abuse Act (CFAA). There are, however, a number of authorization and limitation laws of general applicability governing national intelligence. The most important of these affecting cyber security are: (1) The United States Foreign Intelligence Surveillance Act (FISA); and (2) the statutory law creating the intelligence operations of the United States. A primary interpreter of the FISA law, is The United States Foreign Intelligence Surveillance Court (FISC) also called the FISA Court, established and authorized under the Foreign Intelligence Surveillance Act of 1978 (Cohen and Wells 2004). The FISA Court oversees requests for surveillance warrants against suspected foreign intelligence agents inside the United States by federal law enforcement agencies. Such requests are made most often by the National Security Agency (NSA) and the Federal Bureau of Investigation (FBI). Its powers have evolved and expanded to the point that it has been called "almost a parallel Supreme Court." Under the Foreign Intelligence Surveillance Act (FISA), the government may apply for court orders from the FISA Court to, among other actions, require U.S. companies to hand over users’ personal information and the content of their communications. Civil liberties groups and large technology companies have asked the FISA Court, in several different cases, to roll back the secrecy that regularly shrouds all of its filings, decisions, and precedential law. The controversy over the secrecy of the FISA Court will continue unless and until the government determines that disclosure will not impact national security.
6.2 Data Privacy and Civil Liberties Concerns over data privacy are closely linked to the government’s national surveillance activities. In comparison to other countries, the US does not have a comprehensive overarching national data privacy regime and has required data privacy based on sector specific legislation or regulations. Constitutional approaches have been adopted in the European Directives and the Council of Europe (CoE) Convention for the Protection of Individuals with regard to Automatic Processing of Personal Data. More than 80 countries have adopted privacy legislation including many countries in Europe, Latin America and the Caribbean, Asia and Africa (Greenleaf 2014).Threats to cybersecurity come from a number of sources, including outdated legal architecture that doesn't necessarily reflect or apply well to the Internet and a dissonance of policy and legislative approaches by countries that make international collaboration and cooperation on certain levels difficult. Data security protection in the United States emanates primarily from state legislation. As many as 46 states have data breach notification statutes for breaches of sensitive information (NCSL 2012). The Fourth Amendment, one of the Constitution’s greatest privacy protections, can often be a critical factor in whether or not law enforcement can obtain the evidence needed to obtain a conviction. This is no more evident than in cybercrimes where access to forensic information often contained on home or office computers or cell phones is valuable evidence of criminal intent. A lack of consistency in privacy laws across jurisdictions makes monitoring compliance with regulatory requirements and assessing risk of non‐compliance difficult and expensive for the private sector service providers. While a fundamental legal right to data protection is recognized both in national laws and in certain regional legal doctrines, on the international level there is no binding data protection law. Instead, treaties such as the Universal Declaration of Human Rights of 1948 (UDHR) and the International Covenant on Civil and Political Rights of 1966 (ICCP) have been relied upon to fill this gap. Microsoft in its 2010 Cloud Computing Advancement Act (CCAA) called for the reconciliation of the conflict of law issues in data protection through a multilateral framework by treaty or similar instrument.
6.3 International Law There has been much academic debate on how to define cyberspace as it has evolved into a recognized war‐ fighting domain (Mudrinich 2012). There is an international consensus among scholars and the U.N. that cyber‐ attacks may be an armed attack under the U.N. Charter even though such an attack is not explicitly mentioned in the Charter (Weissbrodt 2013). There is no international consensus on the definition of what constitutes an armed attack, even in the kinetic sphere. Generally, however, such an assessment looks to the scope, severity,
62
Virginia Greiman
immediacy, and directness of the use of force in issue (Schmitt 2011). Even if it can be determined that a cyber‐ attack is an armed attack, one must next resolve whether the attack can be attributed to a nation‐State. In the cyber world, attribution will often require crossing several sovereign boundaries, and if responsive force is to be used, the actions taken will occur within some other State’s territory. But, in the cyber realm, a State is often not directly involved in the cyber‐attack; the attack is the product of independent or semi‐independent actors. A cyber‐treaty or code of conduct containing normative rules would help define the actions that would constitute a cyber‐threat. Scholars have viewed the need for a cyber‐treaty with different lenses. Some scholars argue that a cyber‐treaty would complement and clarify existing regulations regarding use of force in the U.N. Charter and customary international law, but a better option is to focus on developing state practice in a rational way to help develop new customary international norms (Handler 2012). Others contend that a treaty which bans the use of cyber‐attacks or limits their use is not realistic because there is currently no way to ensure compliance (Shackelford & Andres 2011).
6.4 Criminal Law The complexity of developing a cybercrime regime is best demonstrated by the fact that the U.S. government can charge cyber related crimes under more than forty five different federal statutes. The most prominent of these statutes is the Computer Fraud & Abuse Act (CFAA) which makes it a crime to access a protected computer. Other statutes that must be analyzed include the Wiretap Act, the Electronic Communications Privacy Act, the U.S.A. Patriots Act superseded by the current Foreign Intelligence Surveillance Act (FISA) of 2008 and the Stored Communications ACT (SCA) which requires a warrant for police to search messages that have been in storage for 180 days or less. In addition to federal statutes, in 1978, state legislatures began enacting computer crime statutes. Since then, every state has enacted some form of computer‐specific legislation (Kleindienst 2009). Prosecution of cybercrimes under state statutes continues to increase (Perry 2006). One of the most important efforts at harmonization of national laws in the cyber law regime is the enactment of the first international treaty on internet crimes, The European Council Convention on Cybercrime. Of the fifty‐one signatories that have signed the Convention, 39 have ratified the Convention including the United States. The Cybercrime Treaty has received criticism from the outset. Some arguing that its provisions are too weak, while others claim it raises constitutional privacy concerns under the Fourth Amendment (Lessig 1995). Others have praised the law as providing the best legal framework for the international community (Crook 2008), and for recognizing the need for a mechanism allowing law enforcement to investigate offenses and get evidence efficiently, while respecting each nation’s sovereignty and constitutional and human rights (Rustad 2009).
6.5 Storm Clouds Ahead The broad definition of cyber law and security raises global concerns about the massive amount of data secured in the cloud. Cloud computing is a style of computing in which infrastructure or software are provided to a user as a service through the Internet (NIST). By its nature the cloud permits the creation of systems with different trust levels at different tiers of interaction. Cloud technology is emerging as one of the largest potential areas for cyber threats. To address these concerns, cloud security risks have been raised by many governmental and other organizations including the Department of Homeland Security (DHS), the National Security Agency (NSA), the Federal Bureau of Investigation (FBI), the Federal Trade Commission, the Council of Europe, and the Organization of Economic Cooperation and Development (OECD), yet no universal solution has been developed. A few of the most pressing challenges for cloud security are: (1) Severely underdeveloped legal frameworks to deal with cloud management; (2) ambiguity as to what state, federal and international civil and criminal laws are relevant to the cloud; (3) the absence of a treaty framework to address the broad potential reach of cloud computing; and (4) the lack of an overarching national and international data privacy regime.
6.6 Information Sharing The difficulty of any regulatory approach or any approach with a strong governmental input can readily be seen in the challenges we face in the comparatively simple world of information sharing. One would think that identifying and communicating about new cyber threat developments would be relatively simple to achieve. It is not. Everyone purports to believe that we need more information sharing but no consensus exists on precisely what that means, or whether it would truly be effective. What information needs to be shared by the
63
Virginia Greiman
government with the private sector and what from the private sector should be shared with the government? How would it make a difference? And of course, how will the recipient use the information? There are huge disincentives to reporting cyber intrusions. If, for example, a company in the technology industry finds out that it has a vulnerability, then public disclosure of the fact will convert the vulnerability into a stock valuation problem. Governments have viewed information sharing as a way to gain knowledge from the private enterprises which own and operate large critical infrastructures to assist in intelligence gathering. The effects of globalization and the inherent conflicts of interest create a potential for allowing the process to be used as a weapon of information warfare through abuse of the information sharing structure (Ryan 2012). This knowledge includes information on vulnerabilities and attacks experienced by the owners and operators of critical infrastructures. In 2013 President Obama issued Executive Order 13636, Improving Critical Infrastructure Cybersecurity. There are three key elements to the EO: (1) privacy and civil liberties, (2) information sharing, and (3) a cybersecurity framework. In compliance with the EO, the U.S. Department of Commerce, National Institute of Standards and Technology (NIST), recently issued a Preliminary Cybersecurity Framework which adopts standards and best practices that address all of the key elements (NIST Framework 2013). The Framework references technical controls, but also provides a risk management framework, including resilience and is not just focused on protection. Models should be developed that encourage not just information sharing but problem solving that provide incentives and reduce liability would be well received in the technology and innovation communities. In addition to domestic partnerships, governments, scholars and professionals worldwide all agree that if we are going to reduce or eliminate illegal activity in cyber space, we must join together the efforts of the public and private sectors to bridge the gap between national policy‐making and the operational realities on the ground. Agencies including the FBI, INTERPOL, and private corporations such as Microsoft, Google, eBay, and American Express have been active in building information sharing alliances. The important question to be analyzed is not whether there should be public private partnerships but what form these partnerships should take.
6.7 International Standards and Incentives Notably, despite efforts by the technical community, national and inter‐governmental organizations, and international standards organizations there has been little movement on legislation for international cyber security standards. More can be done through the UN to devise international cyber norms and rules. Diplomatic initiatives like the OECD’s Financial Action Task Force have used international standards to cause countries to improve their responses to money‐ laundering issues. A similar effort such as a Cybersecurity Action Task Force might yield some results in the cyber domain (ABA 2009). Some of the more significant efforts in developing international standards have emanated from the U.S. Department of Homeland Security, the U.S. Department of Commerce, the European Commission, and several prominent organizations, including the National Institute for Standards and Technology (NIST), IEEE, the Cloud Security Alliance (CSA), the Organization for Economic Cooperation and Development (OECD) and the International Telecommunications Union (ITU). These efforts are important because they can serve as models for treaties, future national legislation, international agreements and private contracting. Standards have proven to be a successful starting point for major initiatives and should be reviewed and prioritized by all nations interested in finding agreement and developing a framework for harmonization of the law, clarifying the role of the provider and required technical capabilities, and reducing conflict among countries, individuals, and public and private organizations interested in advancing the economic and social benefits of sound cyber policy.
7.
Frameworks to Improve the Paradox of National Intelligence, Corporate Competitiveness and Privacy
Based on an analysis of the current cyber security legal paradox and the research conducted, summarized in Table 2 are recommended cyber security frameworks and the actors and effort required to implement the recommended frameworks.
64
Virginia Greiman
Frameworks and Realization Amendment of Section 215 of the Patriots Act and section 702 of the FISA Laws to allow for more transparency and disclosure of decision process.
Multilateral Treaty on Data Protection to guarantee universal protection of data consistent across borders.
International Cyber Security Standards to allow for clarification of the role of private sector operators and the specific technical capabilities required.
Amendment of Executive Order 13636 to provide for: (1) the creation of new entities for information sharing; (2) exemption of the private sector from liability stemming from mandated information sharing; (3) and incentives to encourage voluntary sharing.
Development of statutorily mandated performance criteria and data collection
Global Partnerships to advance collaborations between public and private organizations to protect against serious intrusions, national threats to security and cross border criminal activity.
Actor(s) and Effort Required Congress should require written opinions by the FISA Courts on all decisions to build trust and understanding and to create precedent for future decisions with limited exceptions for national security threats. A multilateral data protection treaty has been proposed in various bills in the U.S. Congress but never acted upon including Microsoft’s 2010 Cloud Computing Advancement Act. The reform of the Electronics Communications and Privacy Act (ECPA) by Congress to define and provide stronger protections for individuals and businesses would be a good starting point followed by the expansion of the role of the Privacy and Civil Liberties Oversight Board (PCLB) which at present has limited authority. A Multilateral Treaty could then be developed through a working group of the United Nations. Congress should enact minimal standards in cooperation with private industry and standards organizations as has been done with the environmental laws, nuclear regulatory regimes and law enforcement to provide for clear responsibilities of the private sector in assisting government and law enforcement in cyber security investigations and enforcement. At present there are few guidelines and limited direction from the federal government creating inconsistent responses and actions from the private sector. Information sharing is critical to all aspects of cyber security and national intelligence. The federal government is dependent upon the private sector for technological innovation and knowledge. Congress should consider incentives such as tax credits, subsidies, or financing of technological research and education to enhance technological advancement in cyber security while limiting the liability of the private sector for mandatory information sharing. As evidenced by recent hearings of the Privacy and Civil Liberties Board (PCLB) and the issuance of the Cybersecurity Framework by the National Institute of Standards and Technology (NIST) Congress does not have the information they need to determine whether we have been effective in reducing intrusions into America’s infrastructure and whether international cooperation has improved. Statutorily mandated performance criteria would assist in improving cooperation and measuring success. Partnering has been an effective tool in the United States to advance knowledge about attribution and detection of cyber security threats, however, global partnerships presently are limited particularly in the area of intrusion detection, law enforcement assistance and technical research. Global partnerships should be developed and expanded through public/private networks and alliances similar to the ICANN Alliance for identification and protection of Internet and domain property rights, the International Cyber Security Protection Alliance (ICSPA), the European Network and Information Security Agency (ENISA) and The International Criminal Police Organization (INTERPOL). Global partnerships would provide the government with the ability to select and implement best practices worldwide.
Table 2: Recommended Cyber Security Frameworks
8. Conclusion Cyber threats have created a paradox among the goals of national intelligence, cyber security and privacy rights due to technological convergence and infinite jurisdictional issues. Scholars agree that how this paradox is managed has serious legal implications in the United States and around the world. There are new laws and standards being proposed that could change the responsibilities of both the government and the private sector actors. This paper explored and delineated where frameworks are needed to advance the goals of the public and the private sectors through cyber partnerships. The next step is for the United States to take the lead and work with other countries to develop an international framework based on minimally acceptable standards, and where feasible enshrine these in international treaties. Balancing the interests of national security, corporate competitiveness and privacy requires a well‐planned comprehensive framework that will foster a better cybersecurity policy to address the diverse interests that must co‐exist in cyberspace.
65
Virginia Greiman
References (ABA) American Bar Association (2009) National Security Threats in Cyberspace Post‐Workshop Report, A workshop jointly conducted by American Bar Association Standing Committee on Law and National Security and national Strategy Forum, Part of the McCormick Foundation Conference Series, September. Cohen, D. B. and Wells, J. W. (2004) American National Security and Civil Liberties in an Era of Terrorism, Palgrave Macmillan, New York, p. 34. Crook, J.R. (2008) “Contemporary Practice of the United States Relating to International Law: U.S. Views on Norms and Structures for Internet Governance”, The American Society of International Law, American Journal of International Law, 102 A.J.I.L. 648, 650. (CNCI) Comprehensive National Cybersecurity Initiative, March 2, 2010, National Security Presidential Directive 54/Homeland Security Presidential Directive 23 (NSPD‐54/HSPD‐23), January 8, 2008. Daniel, M. (2014) Heartbleed: Understanding When We Disclose Cyber Vulnerabilities, The Whitehouse Blog, Michael Daniel, Special Assistant to the President and The Cybersecurity Coordinator, April 28, accessible at http://www.whitehouse.gov/blog/2014/04/28/heartbleed‐understanding‐when‐we‐disclose‐cyber‐vulnerabilities. (EO) Executive Order 13636: Improving Critical Infrastructure Cybersecurity Incentives Study Analytic Report, June 12, 2013. (EO) Executive Order 13231 on Critical Infrastructure Protection, October 16, 2001. Greenleaf, Graham (2014) "Global Data Privacy Laws: 89 Countries, and Accelerating," Social Science Electronic Publishing, Inc., February 16. Handler, S.G. (2012) “The New Cyber Face of Battle: Developing a Legal Approach to Accommodate Emerging Trends in Warfare,” 48 Stan. J. Int’l L. 209, 216‐19. Kleindienst, K., Coughlin, T. M., and Paswuarella, J.K. (2009) “Computer Crimes”, American Criminal Law Review, 46 Am. Crim. L. Rev. 315, 350. Lessig, Lawrence (1995) “The Path of Cyberlaw”, 104 Yale L.J. 1743, 1743‐45. Mudrinich, E.M. (2012) “Cyber 3.0: The Department of Defense Strategy for Operating in Cyberspace and the Attribution Problem,” 68 Air Force Law Review 167. (NCSL) National Conference of State Legislators (2012). State Security Breach Notification Laws, August 20, Denver: CO. (NIST) National Institute of Standards and Technology (2011) “The NIST Definition of Cloud” Special Publication 800‐145, U.S. Department of Commerce, Gaithersburg: MD, September. (PCLOB) Privacy and Civil Liberties Oversight Board Public Hearing (2013). Consideration of Recommendations for Change: The Surveillance Programs Operated Pursuant to Section 215 of the USA PATRIOT Act and Section 702 of the Foreign Intelligence Surveillance Act, Washington: DC, November 4. Perry, S.W., (2006) Bureau of Justice Statistics, DOJ, National Survey of Prosecutors. Rustad, M. (2009) Internet Law in a Nutshell, Thomson Reuters, St. Paul, MN. Ryan, J. (2012) “Use of Information Sharing Between Government and Industry as a Weapon,” Leading Issues in Information Warfare & Security Research, Vol. 1, Reading: UK Academic Publishing International Limited, First Published in the Proceedings of the 2nd International Conference on i‐Warfare and Security, 2007. Sales, N.A. (2013) “Regulating Cyber‐Security,” 107 Nw. U.L. Rev. 1503. Shackelford, S.J. and Andres, R.B. (2011). “State Responsibility for Cyber Attacks: Competing Standards for a Growing Problem,” 42 Geo. J. Int’l L. 971, 995‐96. Schmitt, M.N. (2011) “Cyber Operations and the Jus Ad Bellum Revisited”, 56 Vill. L. Rev. 569, 576‐77. Satola, D. and Judy, H. L. (2011) “Electronic commerce law: towards a dynamic approach to enhancing international cooperation and collaboration in cybersecurity legal frameworks: reflections on the proceedings of the workshop on cybersecurity legal issues at the 2010 united nations internet governance forum,” 37 Wm. Mitchell L. Rev. 1745. Trope R.L. and Humes, S.J. (2014) “Before rolling blackouts begin: briefing boards on cyber attacks that target and degrade the grid,” 40 Wm. Mitchell L. Rev. 647. Weissbrodt, D. (2013) “Cyber‐Conflict, Cyber‐Crime, and Cyber Espionage”, 22 Minn. J. Int’l L. 347. Zagaris, B. (2009) “European Council Develops Cyber Patrols, Internet Investigation Teams and Other Initiatives Against Cyber Crime”, International Enforcement Law Reporter, Cybercrime; Vol. 25, No. 3.
66
Preservation as a Service for Trust: An InterPARES Trust Specification for Preserving Authentic Records in the Cloud Adam Jansen School of Library, Archival and Information Studies, University of British Columbia, Vancouver, Canada
[email protected]
Abstract: Major issues of trust, authenticity and custodial obligations regarding the management and storage of Internet‐ based records have yet to be resolved, but the adoption of new Cloud based technologies will not wait for legal and regulatory systems to catch up. There exists a strong need for clearly articulated requirements describing effective, tested methods for maintaining the authenticity of records that are removed from their system of creation and placed into the care and custody of Cloud Service Providers. As part of the InterPARES Trust research project, Preservation as a Service for Trust (PaaST) is developing a set of preservation services that detail those actions, business rules and necessary metadata that provide supporting evidence of the authenticity of records entrusted to the Internet. Keywords: Trust; Cloud Computing; Privacy; Access; Digital Preservation; InterPARES
1. Introduction The rapid increase in network bandwidth combined with a precipitous drop in the price per gigabyte of storage devices has presented new commercial opportunities for economy‐of‐scale savings by leveraging computer resources across multiple organizations in a centralized data center. By utilizing a common infrastructure in a shared tenancy environment, an Internet‐based service model (e.g. the Cloud) offers organizations the potential for lower upfront costs, decreased technical staffing requirements, and easy expansion on demand. Given these potential monetary incentives, large numbers of both public and private organizations are turning to Cloud Service Providers (CSP) to create, store and access vast amounts of records in the highly networked, and some would argue easily compromised, environment of the Internet. Many of the organizations that are switching to CSPs are institutions in which the public places an immense amount of trust to protect their often personal and sensitive records ‐‐ such as banks, public utilities, hospitals and government. Yet these public trust organizations often migrate their records from internally managed data centers to offsite CSPs without fully understanding the implications that such a paradigm shift entails. Basic records management questions ‐‐ such as where the records are being stored, how are the organization’s records being managed, are the hosted records even manageable, who is responsible for their management, and how long will the records remain accessible ‐‐ often go unasked. Given that Cloud storage itself is still in the early stages of development and definition, providers may have not even considered those questions let alone be able to provide answers. As the Internet is, by definition, a global system of interconnected computer networks, records placed in the Cloud can, and often do, reside across multiple juridical boundaries. The legal liability for security breaches and maleficence in massively distributed, multi‐tenant environments is currently unclear, as are the disclosure, subpoena and access rules, regulations, and laws regarding the records being stored. A major area of concern when relying upon CSPs to store important records of legal value is whether those records can be trusted once they leave the control of the creating organization (Duranti, Rogers 2012). If those records are needed again, will sufficient documentation detailing the chain of custody of those records ‐‐ from the time they left the organization and were transferred to the CSP, along with all internal movement, modification, access and relevant CSP policies and procedures ‐‐ be accessible, or even created? Who is responsible for producing the needed documentation? Who owns that documentation produced? To address these issues, records creators and CSPs must work together to confirm that the appropriate mechanisms are in place to ensure that those records that are transferred from one to the other remain, and can be proven to be, authentic.
2. Authenticity of digital records in cyberspace In Archival Science, authenticity encompasses the whole of the context in which the records were created, managed, accessed and stored from the moment of their creation throughout their entire existence. Authenticity is of particular importance with regards to records, a specific sub‐class of data defined
67
Adam Jansen
(InterPARES2) as “a document made or received in the course of a practical activity as an instrument or a by‐ product of such activity, and set aside for action or reference”. Records are granted a special allowance under the hearsay rules (Federal Rules of Evidence 2013 s VIII,803(6)) that allows them to be submitted as evidence of the activity that created them, regardless of the format upon they are stored on (i.e. paper, photographs, microfilm or digital). In order for a record to serve as a faithful witness to the activities in which it participates, however, that record must be authentic; that is, it must be what it purports to be and free from manipulation, substitution or falsification. A presumption of authenticity is afforded to a record when it can be shown that it was created in the usual and ordinary course of business to serve the administrative needs for which it was created. Authenticity can be ascertained by establishing the identity of that record and by demonstrating its integrity. The identity of a record is derived from those attributes that uniquely characterize and distinguish it. These attributes of identity allow the examiner to differentiate one record from another based on the parties involved, time and location of creation and use, why and how it was created, and so forth. The integrity of a record concerns its wholeness and soundness; that is, that the record possesses all of its necessary parts and its condition is unimpaired (MacNeil). In this sense, integrity goes beyond the tradition IT concept of being unchanged in the underlying digital bit order. Rather, integrity refers to the degree to which the message for which the record was created is capable of being conveyed over time, complete and unaltered. This presumption of authenticity is strongly influenced by the facts of its creation, handling and custody. When moving a record across space (e.g. transport across a network) or through time (e.g. setting it aside for later retrieval), the essence of the record must not be altered in the process. Retaining authenticity past its creation requires that the record be created, managed and stored in accordance with regular, documented procedures that can be attested to (Eastwood 1994) through an unbroken chain of custody. Providing evidence of the methods used to transmit records across space and through time becomes increasingly important when the record is removed from its original system of creation, as such movement frequently alters or drops important supporting metadata. The stronger the procedures of creation and handling, the stronger the presumption of authenticity that can be afforded to a record (Authenticity Task Force 2002). Conversely, when a record is transmitted without documenting the procedures or chain of custody, it becomes nearly impossible to provide sufficient evidence of the identity and integrity to support its authenticity. Once the presumption of authenticity that is afforded to a record is comprised, it is difficult, if not impossible, to reassert.
3. Trust and digital records in an increasingly networked society To explore these issues regarding the trustworthiness of digital records and data uploaded to Internet‐based environments, the InterPARES Trust (ITrust) research project under Project Director Dr. Luciana Duranti of the University of British Columbia was funded by the Social Sciences and Humanities Research Council of Canada (SSHRC) with the goal (InterPARES Trust 2013a) to: …generate the theoretical and methodological frameworks that will support the development of integrated and consistent local, national and international networks of policies, procedures, regulations, standards and legislation concerning digital records entrusted to the Internet, to ensure public trust grounded on evidence of good governance, a strong digital economy, and a persistent digital memory. The team of researchers working on ITrust represent universities and institutions, both public and private, from around the globe with expertise in archival science, records management, diplomatics, law, information technology, communication and media, e‐commerce, health informatics, cybersecurity, information governance and assurance, digital forensics, computer engineering, and information policy. The researchers are organized into five regional teams ‐‐ North America, Latin America, Europe, Asia and Multinational Organization – that will each undertake a specific stream of research that is germane to their collective expertise and geo‐political environment. Among the larger institutions involved in the project are: British Library, European Commission, International Federation of Red Cross and Red Crescent Societies, International Monetary Fund, International Records Management Trust, National Institute of Standards and Technology, NATO, UNESCO, University of British Columbia, University College London, and University of Washington. To provide a common foundation across all the teams, the research project will build upon the previous three phases of InterPARES research (1998‐2012) by testing and extending those findings through extensive case study, literature and legislative review, and new exploratory research.
68
Adam Jansen
ITrust research has been organized into five broad topic domain areas determined to be of particular importance when creating, managing or storing digital records within an Internet‐based environment, whether organizationally or third party hosted. This domain identification and division facilitates targeted research and analysis in the following areas of concern:
Infrastructure: relating to system architecture as it affects those records that are created, managed, or stored in Internet‐based environments. Examples of areas to be studied under this domain include: types of Internet‐based environments and their reliability (e.g. public and private Clouds); types of contract agreements (e.g. service level agreements or SLAs) and their coverage, flexibility and modification; and the cost of services, both upfront and hidden.
Security: relating to the protection of Internet‐based records. Examples of areas to be studied under this domain include: security technologies (e.g. encryption, sharding, obfuscation); data breaches, prevention and handling; cybercrime; risks associated with shared tenancy and third party providers; information assurance; auditability and conduct of audits; and backup policies.
Control: relating to the management of Internet‐based record environments. Examples of areas to be studied under this domain include: authenticity, reliability and accuracy of online records; integrity metadata; chain of custody through social and technological barriers; retention and disposition of records within an online environment; transfer to and retrieval from online environments; intellectual and access controls provided by online systems.
Access: relating to open access and/or open data stored online. Examples of areas to be studied under this domain include: balancing and enforcing the right to know vs. right to be forgotten; ensuring privacy; accountability; and transparency of actions and services.
Legal: relating to juridical issues that arise from the creation, storage, management and use of records online. Examples of areas to be studied under this domain include: the application of legal privilege (including the issue of extra‐territoriality); documenting and enforcing legal holds; providing a chain of evidence; authentication of Internet‐based materials provided as evidence at trial; certification of records stored online; and those soft laws (e.g. UN standard‐setting instruments) that have an impact on records.
4. Preservation Services for Online Environments With the adoption rate of Cloud services outpacing legislation and case law, there exists a strong need for a clearly articulated set of functional requirements defining records‐related services that support the presumption of authenticity within an online environment. Under the ITrust Control Domain, Preservation as a Service for Trust (PaaST) seeks to develop a preservation model that expresses actions and attributes capable of supporting the authenticity of records that are created, managed or stored within Internet‐based environments. The purpose of PaaST (InterPARES Trust 2013b) is to: …provide insight and guidance to both those who entrust records to the Internet and those who provide Internet services for the records. The project will address relevant requirements, insights and concerns developed in other ITrust projects to enrich and strengthen its models. To provide a strong foundation for the proposed preservation services, the PaaST project team (comprised of researchers Dr. Luciana Duranti, Dr. Kenneth Thibodeau, Adam Jansen, Courtney Mumma, Daryll Prescott, and Corrine Rogers, and Graduate Research Assistants Mel Leverich and Shyla Seller) will leverage the Chain of Preservation (CoP) model developed by InterPARES2 (Eastwood et al., 2008). The CoP model stipulates that preservation activities begin with the creation of the record and must be continuously managed throughout the lifespan of that record. As a record moves from creation to active and then inactive stages of its lifecycle, the actions and attributes that are needed at a specific stage to support the record’s authenticity also change. To reflect this changing nature, PaaST has identified five distinct preservation services to be modelled:
Preservation Receive Service – to ensure that a set of records transferred is complete and intact, and is in compliance with any agreements that are in force between the transferring party and the receiving preservation party (e.g. a Cloud Service Provider).
Record Characterization Service – to capture, report and make available those key attributes that characterize the records being transferred (including those that associate digital objects to a single record or aggregates of records, enable the assessment of identity or integrity, or are necessary for ongoing preservation of the records).
69
Adam Jansen
Authenticity Assessment Service – to capture and make available those authenticity attributes for comparison to other records as well as to set criteria of authenticity.
Preservation Storage Service – to capture, report and make available those attributes concerning the storage of the records, the movement within the storage system, and the replacement or upgrade of storage media and related technologies.
Preservation Transformation Service ‐ to capture, report and make available those attributes related to the migration, conversion or transformation of the digital objects that constitute a preserved record or the software used to translate the digital bits into a human readable form.
Each service will consistent of a set of uniquely numbered functional requirements for the proposed service, the pre‐conditions that must exist before the service can operate, a main work flow detailing the sequential normal operation of the service, and a listing of alternate flows that address the various error and exception handling conditions that can arise. Each functional requirement will detail an operation that the service must provide and those attributes that it must record. Operations and attributes will follow a standardized naming convention that supports access and reusability across other operations and services. To promote the adoption and implementation of the preservation services being developed, ITrust is working with the Object Management Group (OMG) to advance PaaST as an OMG specification and eventual submission as an international standard.
5. Object Management Group and Standardization The OMG in an international, non‐profit, open membership technology standards consortium whose mission is: …to develop, with our worldwide membership, enterprise integration standards that provide real‐ world value. OMG is also dedicated to bringing together end‐users, government agencies, universities and research institutions in our communities of practice to share experiences in transitioning to new management and technology approaches like Cloud Computing (Object Management Group 2014). To support its mission, OMG hosts organizations such as the Cloud Standards Customer Council (CSCC) and the Consortium of IT Software Quality (CISQ) at its quarterly technical meeting to increase exposure to specifications in process, to foster cross sector collaboration and to encourage knowledge sharing between the organizations. The specifications development process employed by OMG differs from most other standards bodies in that OMG has a “No Shelf‐ware” policy; that is, all specifications that go through the OMG approval process must have a working product that validates the specification and guarantees that the specification is immediately usable upon publication. Support for the specification continues after its publication as well with OMG producing educational and training seminars, workshops, certification processes and books. Amongst its better known published specifications are several modeling languages that are widely used in software and systems development, including: Unified Modeling Language (UML), System Modeling Language (SysML) and Model Driven Architecture (MDA). Maintaining a strong relationship with both public and private sector organizations allows for smaller vertical industry‐oriented standards bodies, consortia and groups (such as ITrust) to work with the OMG to develop the metamodels, Application Program Interfaces (APIs) and specifications that are designed by, and meant for, specific industries. While OMG has been producing highly technical and widely used specifications (such as UML) that cross multiple industries, as a group OMG relies upon input and suggestions from new consortium members to address emerging challenges that affect a specific sector or cut across many sectors – such as the challenges of storing and preserving authentic digital records online that ITrust is researching. To foster a cross pollination of sector‐specific knowledge and experience, OMG maintains a membership exchange with industry organizations by way of reciprocal membership agreements with groups such as: Association of Information and Image Managers (AIIM), Open GIS Consortium, Integrated Justice Information Systems (IJIS) Institute, and World Wide Web Consortium (W3C). One of the benefits of ITrust working through OMG to develop PaaST as a published specification is that, in addition to the professional and industry organizations, OMG maintains a close relationship with several other global specification and standards bodies. Working closely with these organizations helps to reduce the redundancy, and occasional conflict, which could result if multiple organizations publish similar specifications and/or standards. To reduce the potential for overlap, OMG maintains a formal liaison procedure with groups
70
Adam Jansen
such as: International Organization for Standardization (ISO), European Computer Manufacturers Associations (ECMA), Institute for Electrical and Electronics Engineers (IEEE), and two Accredited Standards Committee (ASC) committees ‐‐ X12 (electronic data interchange) and T1M1 (network management). Of particular importance to the ITrust project is the special relationship OMG has with ISO. OMG approved specifications are recognized by ISO as Publically Available Specifications, allowing them to be fast tracked directly onto a final ballot for approval by the ISO Committee on Information Technology Standards (ISO/IEC JTC1).
6. Next Steps for PaaST The suite of preservation services being modelled by PaaST are still in the early stages of an estimated two to three year development and approval cycle. The initial drafts of the functional requirements and workflows have been created for three of the services (Preservation Receive, Record Characterization and Authenticity Assessment) with the remaining two (Preservation Storage and Preservation Transformation) in the process of being scoped and defined. While the preservation services continue to be expanded and revised, other projects and domains within ITrust will provide feedback on the work product and, based on research in their own areas, propose additional preservation services as the need is identified. Once the ITrust team review is complete, the services developed by PaaST will be formatted into a single OMG Request for Proposal (RFP) and brought before the OMG’s Government Information Sharing and Services Domain Task Force (GovDTF) for review. The intent of an OMG RFP is the articulation of all the information needed by a software development team to create a functional application that preforms all the operations detailed within the functional requirements ‐‐ i.e. make software that does everything it is supposed to do. As stated in The OMG Hitchhiker’s Guide (Object Management Group 2008), an RFP is: … a statement of industry need and an invitation to the software supplier community to provide a solution, based upon requirements stated within. The process of identifying need is a culmination of experience within an OMG technical group…and solicitation of industry recommendation. While the RFP is not prescriptive in the sense of dictating how the solution is presented, it does provide guidelines – requirements – that again are derived from the sources noted above. The PaaST RFP will contain the aforementioned functional requirements, pre‐conditions, and main and alternate workflows for each of the preservation services, along with UML Class diagram of all the methods and attributes described in the functional requirements and other supporting material deemed helpful to implementers. If approved by GovDTF, the RFP will be issued and any OMG member organization may submit a solution for evaluation. Based on the submissions received, the RFP will then be forwarded for OMG approval as a specification, or further revised to address any issues or concerns that arise.
7. Conclusion The goal of ITRUST is to generate the theoretical and methodological frameworks that will support the development of an integrated network of policies, procedures, regulations standards and legislation that can be consistently applied across juridical boundaries in order to ensure public trust grounded on evidence of good governance, a strong digital economy, and a persistent digital memory. In support of that goal, the Preservation as a Service for Trust (PaaST) project is articulating a set of preservation services that support the presumption of authenticity of records entrusted to the Internet. These services will detail those actions and attributes to be documented as records are moved across space, such as transmitted from the creating party to the Internet‐based preserving party, or through time, such as being stored in the Cloud. By integrating the preservation services proposed by PaaST into an Internet‐based storage environment, the record keeping system will create and maintain important metadata that allows for the identity of a record to be established and its integrity demonstrated within a documented chain of custody. To ensure that these preservation services are capable of being integrated into existing Internet‐based environments, ITrust is working with the Object Management Group (OMG) to create a working prototype of the PaaST services for evaluation and, if found to be accurate, complete and implementable, approved by OMG as a Publically Available Specification.
References Authenticity Task Force (2002) Authenticity Task Force Final Report, [online], InterPARES, http://www.interpares.org/display_file.cfm?doc=ip1_atf_report.pdf. Duranti, L., Rogers, C. (2012) “Trust in Digital Records: An increasingly Cloudy Legal Area,” Computer Law & Security Review, Vol. 28 No. 5, pp. 522‐531. Eastwood, T. (1994) “What is Archival Theory and Why is it Important,” Archivaria, No. 37, pp. 122‐130.
71
Adam Jansen
Eastwood, T.; Ballaux, B.; Mills, R.; Preston, R. (2008) Appendix 14: Chain of Preservation Model Diagrams and Definitions. In International Research on Permanent Authentic Records in Electronic Systems (InterPARES) 2: Experiential, Interactive and Dynamic Records; Duranti, L. & Preston, R., Eds.; Associazione Nazionale Archivistica Italiana: Padova, Italy. Federal Rules of Evidence (2013), [online], Administrative Office of the United States Courts, http://www.uscourts.gov/uscourts/rules/rules‐evidence.pdf. InterPARES Trust (2013a) InterPARES Trust, [online], InterPARES, http://interparestrust.org/trust. InterPARES Trust (2013b) Research – Studies: Abstracts, [online], InterPARES, http://interparestrust.org/trust/about_research/studies. InterPARES2 Terminology Database, [online], InterPARES, http://www.interpares.org/ip2/ip2_terminology_db.cfm MacNeil, H. (2000) “Providing Grounds for Trust: Developing Conceptual Requirements for the Long‐term Preservation of Authentic Electronic Records”, Archivaria, No. 50, pp. 52‐78. Object Management Group (2008) The OMG Hitchhiker’s Guide: A Handbook for the OMG Technology Adoption Process, Version 7.8 (omg/2008‐09‐02), [online], Object Management Group, http://www.omg.org/cgi‐bin/doc?omg/08‐09‐02.pdf. Object Management Group (2014) About the Object Management Group, [online], Object Management Group, http://www.omg.org/gettingstarted/gettingstartedindex.htm.
72
The Ten Commandments of Cloud Computing Security Management Issam Kouatli ITOM Department, Business school, LAU
[email protected] Abstract: Just like any new emergent of technology, Cloud computing comes with security issues. These issues even amplified recently with mobility access. Managing and maintain security of cloud computing environment is not only dependent on technological solution but also heavily dependent on ethical behavior of IT professionals working in the Cloud Computing environment as well as the good manageability of data protection procedures. This paper identifies ten different criteria with recommended instructions for effective management of cloud computing environment. The proposed “Commandments” stems from the interdependence of security, protection and Ethics where a survey conducted on these three pillars of cloud management and it shows that ethics and protection as well as ethics and security are highly correlated with 1% significance. Appropriate management of cloud computing environment is like a three legged stool where the three legs represents the three pillars of security, protection and ethics while the surface of the stool represents the IT governance and management of these three critical pillars of cloud computing environment. Management in such environment is interrelated between motivation of IT staff to behave ethically at all times as well as monitoring and control of Man and machines in a good manner to eliminate any possible threat. IT Governance on the other hand, aligns the IT strategy with business strategy, to ensure that cloud computing operations stay on track to achieve their strategies and goals, and implementing good ways to measure IT performance. The objective is to achieve a proactive approach to securely manage such technological environment. The “commandments” discussed in this paper would be very important for cloud service providers to gain trust of their clients (businesses) when requesting cloud services. Unlike technological solutions, these commandments are cloud management steps rather than the turbulent technological development steps for secure solution as these are changing as new technology, devices and protocols emerges. Keywords: Cloud ethics, BYOD mobility, Data protection, Cloud security management, Cloud Governance
1. Introduction The new revolution of the today’s technology of cloud computing offering computational resources as scalable and on demand services, yet it is also a cheap and reliable outsourcing model that may also provide a platform for organisations and companies to collaborate with others to form business processes. As there is an increasing demand for data and computational services, the benefits of cloud computing are highly demanded but it also comes with issues like data center power efficiency, availability, security, Data protection and ethics. Cloud computing, comes in different services like (IaaS, PaaS, SaaS), that can be charged per use to customers. Different appropriate technologies would be necessary to maintain these services. These are mainly virtualization, Grid computing, utility computing, software services, and networking. This concept attracts many corporates and enterprises from different sectors of industry. As a specific case of Information Technology security, Cloud security is a new emerging field that is growing fast. Cloud security and protection is difficult to achieve without proper balance between strong management strategy and strong technological solutions. In an analogy of a three legged stool, secure cloud would require strong management and IT governance as the surface of the stool while the three legs represents the security techniques and procedures, Data protection mechanism and IT ethics. Kouatli (2015) provided a comprehensive historical development of computer services and technology with the associated vulnerabilities during each development phase/period of IT, until the establishment of today’s cloud computing and mobility services. The success of the newly fast growing field of cloud computing hinders around the balance between these three pillars in cloud computing environment. Moreover, these pillars can be industry sector centric where Cloud architecture can be highly interrelated with specific enterprise architecture. Bansa (2013, p.48), for example, stated that enterprise architecture does not scale in the same manner where Scalability of social network architecture will be different from the requirements of a healthcare platform. As such researcher’s tries to define abstraction level defined for users. For example, Goscinski and Brock (2011, P.37) proposed a user level abstraction on top of already available cloud abstraction layers and demonstrate its feasibility. Trust relationships between users in peer clouds would also be of utmost necessity. Petri et al. (2012, P.221), presented a mechanism of forming trustworthy clouds where different end–users can exchange resources where they proposed a trust model for managing the formation and use of such clouds. Sasikala (2011, P.23) discussed cloud computing perspectives of diverse areas of technologists, services available, and cloud standards along with opportunities, challenges and future implications. From a service level agreement perspective, Meland et.al. (2014) considered a cloud brokering model that helps negotiate and establish SLAs between customers and providers. All of these issues would generate a special need of Cloud specific
73
Issam Kouatli
operating system. This has been addressed by Zhang & Zhou (2012) where they proposed a cloud operating system named TransOS where all traditional operating system codes and applications are centrally stored on network servers in a form of layered structure. Cultural background also has a large influence on the typical employee(s) behavior in cloud computing environment. Unethical behaviors and its consequences to illegal activities manifested itself since the early stages of information systems development. For example, (Kouatli & Balozian, 2011) compared the practical perception of IT ethics as opposed to the academically taught perception of IT ethics. The study resulted in the main conclusion that the un‐ethical violations were due to the existence of ill‐defined boundaries of ethical/legal standards when the study was conducted. The impact of unethical IT behavior to businesses in general and to and businesses joining the cloud computing services in specific was also reviewed by Kouatli (2013). Accountability of actions done by IT professionals in such cases becomes a necessity in Cloud computing service providers. Yao et al (2012) outlined and evaluated an approach to enforce strong accountability where analysis was conducted to verify different types of compliance and extent of verification. Cloud service providers will have to prove to their clients that the client data and application are secured with ultimate protection mechanism. The heart of this protection stems from individuals’ ethical behavior and as such cloud service providers has to ensure that all their staff and IT professional behaving ethically according to a preset IT code of ethics at all times. Setting the cloud services securely and developing the trust relationship between businesses and their cloud service provider is only half the story. Trust between employees and businesses in the new era of “Global Mobile Computing” would also be necessary. BYOD (Bring Your Own Device) is a recent terminology used to define the usage of personal device in business environment. BYOD usage would add complexity of code of ethics development and its obvious consequences to security issues. As network security management is essential part of protecting any business, CISCO (Document ID: 13601) recommended managerial action of best practice protection of network/technical environment. Most of these best practice steps are also applicable to Cloud computing environment. This paper proposes ten different commandments that businesses need to take before considering joining the cloud.
2. Cloud Security Security concerns in the cloud emphasized every day and it is becoming the major concerns for Cloud computing adopters. Mobility and BYOD computing escalated these concerns to higher extents. New complex cyber threats are expected to emerge with cloud computing services. Auditors will require more efficient monitoring tools to investigate application and data storage integrity. The recent growth of SaaS also dictated that cloud providers’ needs to help users/clients in setting‐up security concerns. Unfortunately, Bring your‐ own‐device (BYOD), although very practical in a business environment, adds the complexity of enterprise cloud security that can be difficult attain. Mobile devices like iPhone and tablets would add complexity to secure and protect the business from any possible malicious behavior. The challenge of securing mobile devices is in the fact that these devices are owned personally by employees which mainly used for personal usage. However, the very same device is also used by the employee/IT staff to conduct business task and hence some sensitive business information may be stored in these devices. In this case, a possible leak of business information might occur. In the field of IT there is innovation in new products much more frequently compared to any other industrial field. These new innovation dictates a new ethical strategy of dealing with them. For example the new mobile devices that are allowed to be brought to corporates (Bring Your Own Device: BYOD) and used for business communication as well as completing tasks in the corporate. New rules and regulation would be necessary in such Business to employee model (B2E) interaction. Different cloud types (Private, Public, and Hybrid) plus different requested services like SaaS, PaaS and IaaS combined with BYOD, enterprise cloud security can be difficult for to attain. Add to this, the fact that cloud services are still in turbulent development phase where services and applications are changing by the day. Some companies can provide “Security as a Service” such as Barracuda Networks, Sophos and Zscaler. However, maintain security in the cloud requires more than software. It does require appropriate management and monitoring techniques to investigate an external as well as any possible insiders’ threat. Alignment of corporate security standards with compliance defined by service providers must be maintained at all times to achieve powerful Cloud governance. Cloud experts can help sort through the guidelines and give discreet steps on how to comply.
74
Issam Kouatli
3. Cloud Data Protection In cloud computing, keeping business data to end user devices while keeping that data separated, segmented and managed is not an easy task. Mechanism for data storage and protection in the cloud is not standardized yet. Hence different cloud service providers would provide different methodology of storing and protecting data. For public or hybrid cloud storage, storage management tools becomes a necessity where the nature of the data type, Data location, data replication, growth rates and cost control are all factors towards data protection. Demand to protect date is directly proportional to the cloud services and application development. Cloud‐to‐Cloud backup would seem to be necessary in some cases to protect data. Such mechanism requires high trust between different cloud venders. Also, replication of data between data centers and/or cloud providers needs appropriate management to maintain the location and integrity of confidential information. This may require Authentication mechanism to the foreign cloud as well as encrypted form of the data in the external location. Also, the issue of accountability of day‐to‐day tasks done by IT professionals in service provider’s environment has to be addressed. It is of utmost necessity to prove to businesses that their data are secured at all times by tracking the individuals whom are responsible for maintenance, backup…etc. As part of data protection, this might include data replication. At any moment in time, businesses have the right to know the location of all replicas of their data. This might not be an easy task as the system will be automated to replicate the data to different locations. Additional issue related to data replication is that one of the replicated data might be located in a country where IT ethics is not respected and –if exist ‐ legislation might not be clear. This would be an added vulnerability of client’s data to be lost or stolen in a country with no legislation (or at least no clear law) against such act.
4. Cloud Ethics In most business environments, code of ethics would be the vehicle to motivate all employees, IT professionals as well as IT user/worker to act within the boundary of the code and report any violation of it. For example, the IT professional is expected to abide by all copyright laws and agreements. This would include laws against unauthorized use or pirating of software. Similarly, the IT workers are also expected to immediately report a colleague's violation to the code of ethics to the appropriate administrator. Other protective responsibilities of IT professionals would also be the confidentiality of passwords and other sensitive information entrusted to him. Most of the insider’s unethical/illegal IT attack techniques are usually conducted by individuals whom might have a reason for such unethical behavior which might have been triggered by a situation in their working environment. To rectify and minimize this unethical behavior, it would not be enough to counterattack the malicious techniques by using anti‐virus technology/techniques. Proper management would be necessary to rectify any possible problem in unethical IT behavior. Therefore, the issue is not only the malware technology, but it is rather people‐ware problem (Shaw, Ruby & Post, 1998). Since the people are behind the creation and attack of the same systems, it would be necessary to conduct psychological analysis of the information systems criminals, in order to safeguard those systems. One primary and essential tool was to create a robust body of guidelines, norms, measures and rules regulating the outside professional community. One of the earliest ones was The Ten Commandments of Computer Ethics as stated by Ramon (1992). The creation of IT ethical commandments witnessed several attempts, but due to the absence of a centralized and authoritative body governing the IT world, it proliferated independently. Obviously, there are two major limitations of the above commandments (Ramon, 1992) as well as all other attempts for the code of Ethics BCS, ). First, is the absence of a unified international committee dealing with an all‐inclusive and exhaustive IT code of ethics and second, the difficulty of enforcing or proving some of the violation of the code of ethics A code of ethics must be highlighted to all staff to make them aware of the ethical behavior in an IT environment. Although IT professionals usually have ethical education in their perspective schools, a gap does exist between their theoretical knowledge and practical behavior. Employees in general and IT professionals in particular must be surveyed to find and analyze such a gap. Then, an awareness seminar/campaign must be provided to the staff to maintain appropriate IT ethical behavior and security policy implementation to protect the businesses. Surveying all employees/IT staff via questionnaires would provide good indicators about the level of ethical behavior conducted in the business environment. A similar study was conducted by Kouatli & Balozian (2011), where the objective was to compare the individual’s theoretical knowledge about IT ethics as compared with the practical ethical behavior in business environments. This study discovered that 33.6% of the employees wouldn’t inform their supervisors about possible faults that might damage the whole system.
75
Issam Kouatli
This shows that participants usually worried that their reputation (not to be seen making mistakes) prioritized over the success of their company tasks. Moreover, statistics reveal that members in a team were careless when it came to change their password when noticed by coworker(s) (“shoulder surfing”), A surprising more than two third (65%) would not change the password. This shows that the attitude of staff favors friendship over security. Although this indicates good trust and team bonding in business working environment among employees, however, employees might be violating the security policy of the company. Again, good management styles here would be necessary to disseminate and motivate good and disciplined policy practice among members. Managers should plan for dissemination of code of ethics on employees enforcing them to obey the policy to the letter.
5. Cloud Security Management Commandments Outsider attacks can only be blocked by technological solutions. However, on top of technological solutions, a good management strategy would need to be implemented effectively. These commandments were proposed as a conclusion of a statistical study to indicate the inter‐relationship between security, data protection and ethics towards accepting cloud application and cloud data storage. The sample size was 441 students whom are educated about cloud computing services and applications. The survey composed of five main section, these are financial viability (Motivation), Availability, Security, Protection and Ethics. The study aims to focus on the three main issues which are security, protection and ethics and its relation to the possible acceptability for data storage (and management) as well as Cloud application usage. All participants were students in the university with adequate education about cloud computing and hence, this survey is indicative only due to the fact that the sample was taken from a single geographical area and from a single sector of participants (All participants are educated about cloud computing students from a single university). Although the sample is only based (biased) from one geographical location and in specific from one university, however, it does show a possible trend of how potential customers think/ trust cloud. Hence, statistics can only show an indicator about the importance of ethics in terms of protection and security of cloud computing. The survey composed of five sections with the three relevant sections to this paper which are the “Ethics” “Security” and “Data Protection”, where the results are shown in figures 1, figure 2 and figure 3 respectively. These three figures representing the three groups of questions related to Ethics, security and data protection. Ten “commandments” will be proposed and discussed in this paper where related surveyed question results will be discussed within each commandment. These commandments are not listed in order of importance but rather in order of management steps to achieve an ideal management style of cloud computing security and protection environment.
Figure 1: Histogram for the IT Ethics Section.
76
Issam Kouatli
Figure 2‐ Histogram for the Security Section
Figure 3‐ Histogram for the Data Protection Section
5.1 Commandment 1: Establish an Efficient Computer‐Use Policy in Your Organization As it can be seen from figure 1, the highest value would be the second question related to the trust of professionals whom are working on client’s data where it seems participants worried about possible leak of date/information of the client company to their competitors via the IT professionals. This view can also be noticed from the first question where participants where about 70% of the participants indicate that IT code of ethics must be followed. To be able to satisfy these two issues, a clear and well implemented policy must be visible to all employees and IT professionals. This is evident from the last question in this section which indicated the necessity of clear computer use policy and disciplinary actions in case of violations. Creating the Computer‐Use policy must states what employees allowed to do or not do, should be well advertised in the company and should be clearly highlighting the consequences of violating the policy with the details of any disciplinary action/procedure. This policy may also include a statement to clarify the partner “Acceptable” use of the IT systems in your business. It should clearly state the consequences to them in case a security attack has happened. “Acceptable” use must also be defined for system administrators during planning and annual evaluation of IT staff/administrators. Comprehensive and clear policy must be clearly advertised on the organization’s intranet where all employees should sign on the ethics policy. Ground rules for BYOD usage also has to be highlighted and incorporated in the policy document. The policy must align with the company mission and working environment.
5.2 Commandmant2: Establish Ethical Behavior Monitoring and Motivation To be able to maintain ethical behavior at all times, surveying all employees/ IT staff would be necessary to gather all concerns about any ethical issues. This is evident from the fourth question in the ethics section
77
Issam Kouatli
(Figure 1) where over 61% of the participants believe that ethical behavior must be studied in all departments. Moreover, 57% of participants believe that regular seminars should be established for IT professionals to enhance and maintain the importance of ethical issues. To be able to achieve this, any unclear statements in the mission, objectives or policy must be clarified and should include ethical review and/or internal ethical audit among employees. Cloud Provider companies must provide ethical training to all employees in form of cases and scenarios to learn from. Moreover, effects of wrongdoing to the business in general and to individual specifically must be explained. The fine line between good team bonding and friendship versus strict implementation of the policy should be explained. Regular mandatory workshops would also be necessary to discuss emerging ethical issues. Encouragement and motivation for ethical behavior must be maintained for enhanced team performance but at the same time encourage employees to come forward and report wrongdoings of any other member in his/her team. 40% of the participants prefer to allocate a reward for “Best practices ethical behavior” (5th question in Ethical section) and 62% of participants suggest to include an anonymous reporting system (Figure1, Q7) in order to help employees to come forward and report any wrongdoing. This would be necessary to avoid any kind of embarrassment or conflict among employees by reporting on each other. This would also force employees to behave ethically at all time as they don’t know who might be reporting them.
5.3 Commandment 3: ‐ Establish a High Performance Security Team Security team must be led by a team leader where the team is composed of employees representing different operational departments. They should be aware of the security policy and the technical aspects of security design and implementation and hence, proper training would be required for the team members. Only 52% of participants raise the need of and adequate training to IT professionals (Figure 2: Q6). This would be due to the fact that IT professionals would already expect to be highly experienced consultants in their field (and hence only 52% recognize the necessity). The security team has three main responsibilities: The policy development to establish and review policies, practice to conduct the risk analysis with possible change requests and response to possible threats and to fix any violations. 50% of participants believe that risk assessment and threat risk level (Figure 2: Q8) should be highlighted in each section of the cloud computing environment. The individual roles and responsibilities of the security team members in your security policy must be defined with clear description. BYOD security must also be incorporated with the policy, risk analysis and monitor any possible violations. Amazingly enough participants does not believe that encryption techniques can secure their data on a shared medium 45% (Figure 2, Q3) disagree that it can be a major factor to accept cloud services. This is also evident from Figure 2, Q1 where about 57% of participants are worried about shared environment in cloud service environment. At the same time, 65% of participants believe that cloud providers are able to secure transmission with encryption techniques (Figure 2, Q2)
5.4 Commandment 4: Establish Risk Analysis Exercise Identify sections of your network/systems with the possible threat rating for each section. 50% of participants believe that this is a necessity towards accepting the cloud service (Figure 2, Q8). Threat identification with the required level of security in each section would help in balancing between tight security with slow performance and low security sections with high performance. For each network resource, assign a specific risk level as low, medium or high. Risk level must be assigned to each of the following: main network devices, network monitoring devices, distribution network devices, access network devices, network security devices (if any), e‐mail systems, network file servers, network print servers, network application servers (DNS and DHCP), data application servers (Databases), desktop computers, and other devices (standalone print servers and network fax machines). Also, authentication and access control has to be monitored and updated with enforcing password change every regular period (3 months for example). BYOD access must also be monitored and analyzed regularly.
5.5 Commandment 5‐ Establish Appropriate Change Management Strategy Approving Security Change: Your security policy should identify specific security documentation in non‐ technical terms. The security team should match this non‐technical document into technical security configuration document. Then, you can apply these to any future configuration changes. Maintainability of such security policy/procedure is of an utmost importance. About 63% of participants expect privacy and
78
Issam Kouatli
maintainability of their data (Figure2, Q7). Any changes in devices/procedures must be examined appropriately otherwise privacy and maintainability of client’s data would be vulnerable. Besides these approved guidelines, select a member from the security team to sit on the change management approval committee to monitor any possible changes.
5.6 Commandment 6‐ Establish Response‐Ability Performance Measurement Security Violations: Monitoring the security of your network. It is recommended that monitoring low‐risk equipment weekly, medium‐risk equipment daily and high‐risk equipment hourly is essential. About 48% of participants (Figure2, Q9) believe that an effective and quick response‐ability is a major factor towards accepting cloud computing services. Your security policy should address how to notify the security team for security violations. This can be automated via the network monitoring software that can be capable of sending SMS or similar notification to security member in case of violation. Quick decisions when a violation is detected are crucial to system recovery. The first action in this case would be the notification of the security team at any time within the 24 hours. This should be reflected in your security policy. Moreover, the level of authority given to security members should be clearly defined.
5.7 Commandment 7‐ Establish Maintainability and Restore‐Ability Collect and maintain information during a security attack. Details about the compromised system must be logged to form the basis of possible prosecution of external violations. This would be a necessary maintainability procedure where your legal department should review the procedures for gathering evidence in case of legal action to be taken. In case of violation, strong authorities are expected to exist. 57% of the participants expect to have strong authorities in the area/country where cloud providers’ site exist (Figure 3, Q16). As each system has its own procedure of backing up, security policy should define how you conduct, secure, and make available normal backups. In case of possible threat, speed of restoration and recovery is highly critical to most businesses. Figure 3, Q12 indicates that 51% of participants believe that there is a trust when it comes to cloud computing providers providing restoration of data whenever required. However, in case of data loss incidents, only 37% believes that data loss incidents will not happen (figure 3, Q11). In general, participants showed worries when it comes to transmit and store personal data 81% (figure 3,Q1) as opposed to transmit and store customer data 30% (figure3, Q2)
5.8 Commandment 8‐ Establish Data Protection for Customers Customers (businesses joining the cloud) will always be worried about the storage of their data in a shared medium in the cloud service provider’s environment. This is evident from figure 2, Q3 where the survey shows 45% do not trust that encryption can protect their data on shared medium. Hence, segregation of competing businesses into different mediums and/or if possible‐ different IT personnel’s to support their data management and protection provides a piece of mind and develop high level trust relationship. Cloud‐to‐Cloud backup would also be advisable if a trust relationship already established between vendors. One of possible problems may arise is when the data backed up in a datacenter/cloud exists physically in a country that do not have any legislation against IT violations. Backups and replication of customer data must always be known in terms of location as well as replication frequency. Only about 43% of participants believe that trust in an adequate mechanism to back up their data (Figure3, Q10). Ideally, for sensitive information, businesses must develop their own private cloud to incorporate this sensitive and confidential information and then join the public cloud for other applications and non‐sensitive data storage.
5.9 Commandment 9‐ Establish an Efficient Cloud Audit Team. As data storage management becomes the cloud providers’ responsibility in case of public cloud, clients no longer have physical possession of their own data. As such, audit services are critical to ensure the integrity and availability of client’s data and to achieve credibility on cloud computing. 49% of the participants believe that Audit and security certifications must exist in cloud providers before their services can be acceptable (figure 3, Q14). Moreover, 47% of participants believe that reputable external auditor is a necessity before cloud can be acceptable (figure 3, Q15). Some software tools can help towards proving the integrity and the ownership of any specific part of the data. However, efficient auditing would also require face‐to‐face investigation about the process of data storage and protection. This includes replication service – and to which cloud/data center‐, accountability of data maintenance and BYOD access monitoring process.
79
Issam Kouatli
Commandment 10‐ Establish and IT Governance Structure A new creativity of achieving IT governance is dictated by cloud computing which should be termed as Cloud Governance. Cloud governance models can be built rapidly if right steps were accounted for. It cannot be handled separately without ensuring that the clients receiving the cloud services also maintain similar standards suitable for cloud governance. To start with a strong governance team must be formulated where members are representatives of different teams in security, ethics, business and IT personnel. Their main responsibility is to create rules and processes needs to be followed to ensure policies are met. Policies like maintain best security practices with appropriate procedures followed and implemented to all IT assets as well as policies related to the development of applications according to standard design and monitoring. Cloud governance should handle all issues mentioned above as well as integrating them into other business functionalities and align them with the strategic objective(s) of cloud service provides. Also, cohesive bonding with strong IT leadership between the development team and the support team is almost a necessity. This should lead to an effective Internal IT communication where objective is to plan ahead of any possible emerging risks or threats and hence develop a proactive approach towards securing and protecting businesses data and applications.
6. Conclusion Cloud computing management infrastructure is still in turbulent phase as technology is still in process of evolution. As new applications and services exploding every day, this provides an instable environment for systematic approach of managing the cloud. Cloud computing knowledgeable students in the university were surveyed about cloud computing services where 441 students participated in this survey. Obviously, this survey is biased, since participants are all from a single geographical area, single university and only from students (not professionals in industry). However, such survey can give an indicator about how does people in that specific part of the world think about cloud computing. A more comprehensive survey would be possible in the future to cover participants from different geographical locations, industry sectors or different profession would be more conclusive. As a result from the survey, three main issues need to be addressed before stable and efficient Cloud governance. In an analogy of a three legged stool, security, Data Protection and Cloud Ethics forms the three main pillars before building an efficient structure of cloud management and governance. Security concerns keeps changing as protocols, devices and mobility structure changes. Data protection mechanisms also worry most clients as they do not possess their own data anymore and hence most clients prefer the build their own private cloud for sensitive information. The third pillar “IT Ethics” or in this case “Cloud Ethics” is an important factor to maintain an extremely safe environment so that clients would entrust service provider in handling their applications and data protection. In an attempt to provide such a guideline, this paper proposed ten “commandments” that must be established in any cloud provider’s environment. These commandments do not highlight technological solution but rather the managerial steps that can be followed for efficient cloud management and governance irrespective of the turbulent environment of newly developed applications and services.
References ACM: Association for Computing Machinery (ACM),” http://www.acm.org/about/code‐of‐ethics/” Bansa, N. (2013) Cloud computing technology (with BPOS and Windows Azure), Int. J. Cloud Computing, Vol. 2, No. 1, P. 48 The British computer society code of conduct, http://www.ccsr.cse.dmu.ac.uk/resources/professionalism/codes/Bcs.html CISCO “Network Security Policy: Best Practices” White Paper‐ Document ID: 13601 Goscinski, A., Brock M., (2011), “Toward higher level abstractions for cloud computing”, International Journal of Cloud Computing, P. 37‐57 Kouatli, I and Balozian P, (2011), “Theoretical Versus Practical Perception of IT ethics in Lebanon.” Society of Interdisciplinary Business Research (SIBR) 2011 Conference on Interdisciplinary Business Research, June 22, 2011, http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1869432 Kouatli, I, (2013) “Impact of unethical IT behaviors to cloudy businesses”, International Journal of Trade and Global Markets". In press (Scheduled for the next issue of the journal). Kouatli, I. (2015) “A comparative study of the evolution of vulnerabilities in IT systems and its relation to the new concept of cloud computing.” Journal of Management History. In press, to be published in Volume 21, Issue 1 2015. Meland, Per H., Bernsmed, K., Jaatu, M.G., Castejón H. N, Undheim (2014) Expressing cloud security requirements for SLAs in deontic contract languages for cloud brokers, International Journal of Cloud Volume 3, Number 1/2014 Computing, P. 69‐93 Petri, I, Rana O, Rezgui Y, Silaghi G.,(2012) Trust modelling and analysis in peer–to–peer clouds, International Journal of Cloud Computing, Volume 1, Number 2–3/2012 , P. 221‐239 Ramon C. B., (1992) .In pursuit of a 'Ten Commandments' for computer ethics. Computer Ethics Institute. Retrieved from http://cpsr.org/issues/ethics/cei/
80
Issam Kouatli
Sasikala, P. (2011), Cloud computing: present status and future implications”, International Journal of Cloud Computing, Volume 1, Number1, P.23‐36 Shaw, E., Ruby, K.G., Post, J.M. (1998). The Insider Threat to Information Systems. Security awareness Bulletin, 2 (98), 1.Available online at www.pol‐psych.com/sab.pdf Yao J., Chen S., Wang C., Levy D., Zic, J. (2012) Accountability services for verifying compliance in the cloud, International Journal of Cloud Computing Volume 1, Number 2–3/2012 , P. 240‐260 Zhang, Y., Zhou, Y. (2012) TransOS: a transparent computing–based operating system for the cloud, International Journal of Cloud Computing Volume 1, Number 4/2012, P. 287‐301
81
Segmentation of Risk Factors associated with Cloud Computing Adoption Easwar Krishna Iyer Great Lakes Institute of Management India
[email protected]
Abstract: Cloud computing, the world of buying computing as a utility, is on the threshold of a massive global acceptance. Though the advantages of Cloud Computing have been widely documented, the adoption of cloud as a viable alternative to the traditional stand‐alone IT systems is only building up. Organizations are evaluating their IT asset portfolios in trying to assess what could move out to the cloud and what ought to remain in‐house. In this context, assets are getting bifurcated into ownership assets and utilization assets and one of the key decision support tools that is facilitating this division is ‘Risk Management’. Any new technology adoption comes with a set of associated risks, some endogenous and the rest exogenous. This paper aims at mapping the complex risk portfolio associated with cloud computing adoption. The work relies on literature support to stratify cloud adoption risk along six independent vectors. It then further subdivides these vectors into multiple sub‐vectors. The paper then proceeds to map these sub vectors into the three‐fold risk categorization framework proposed by Robert Kaplan. The study ends up in giving both a complete functional segmentation and a characteristics based segmentation of the cloud adoption risks. Keywords: Cloud Computing, Risk Mapping, Segmentation, Robert Kaplan’s Risk Framework, EASWAR framework
1. Introduction Any new technology adoption involves a tradeoff between a set of risks and a set of returns. Let us look at a couple of examples. Dependence on new‐age big data and analytics will yield returns in terms of micro segmentation and positioning of new product offers. Yet, the flip side would be the risk of human intuition getting replaced by machine intellect in the context of understanding consumer behavior. All office‐on‐the‐ move devices like laptops and tablets ensure anytime, anywhere information access. The flip side in this case is a severe compromise on work life balance with the risk of work following people to their homes. Any ERP implementation – despite its known benefits – has downsides like cost over runs, incorrect gap analysis, square peg in a round hole fitment and resistance amongst rank and file for implementation. Cloud computing adoption is no different. The cloud landscape is today dominated by technology titans like Amazon, Microsoft, Google and IBM. The gains that cloud platforms bring in are tangible. Offerings like Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS) are clearly outlined and well understood by enterprises which could probably adopt them. It is the risk part of cloud that requires a closer study. Any product goes through four phases of development ‐ Introduction phase, Growth phase, Maturity phase and Decline phase. This cycle is referred to as the Product Development Life Cycle (PDLC). Cloud offerings today are probably at the first quartile – Introductory Phase ‐ of the Product Development Life Cycle (PDLC). Literature clearly quotes [Armbrust Michael et al (2010)] that “Cloud Computing has tremendous potential to benefit organizations, but substantial challenges and risks stand in the way of adoption”. For cloud to move up the PDLC from introduction to growth phase, the industry should closely understand the perceived adoption risks vis‐à‐vis the actual gains. This paper posits that a rigorous segmentation of the aggregate cloud risk factors will go a long way in understanding the overall risk map. Such a mapping will eventually bolster customer confidence, thereby enhancing cloud buying. The paper analyses cloud risks at two levels. At level 01, the paper relies on existing literature survey to demarcate the risks associated with cloud into six broad clusters. These clusters are Security Risk, Vendor Risk, No Gain Risk, Efficiency Risk, Business Risk and Data Related Risk. Each of these broad clusters is further sub‐ divided to finally get 24 independent sub‐vectors of risk. At level 02, the paper classifies the 24 individual risk elements into three distinct buckets based on the inherent risk characteristics of each category. This categorization framework has been adapted from the work done by Robert Kaplan et al (2012). The three buckets are Preventable Risks, Strategic Risks and External Risks. The allocation of the risk elements across the three Kaplan vectors will eventually help in the true monetization of all risk elements (The monetization part of risk is not part of this study).
82
Easwar Krishna Iyer
2. Literature Survey Academics have already done substantial review of Cloud Computing across all its risk perspectives. A brief of the same is given in this section to lay the foundations of cloud risk segmentation and mapping. Bannerman Paul L (2010) compares the impact of ten cloud risk categories across research and practice. The practice data is a mix of expert opinions and industry surveys and the research data is from literature survey. Most of Bannerman’s risk vectors are considered in this paper also. The seminal work by Armbrust Michael et al (2010) on Clouds helps in understanding the ten key bottlenecks for cloud adoption. On hind sight, many of them are perceived adoption risks. Achara Sachin et al (2014) specifically focusses on the security related risks and their monitoring in the context of cloud adoption. Their paper demarcates the risk due to external attacks as well as the risks due to internal architectural configurations like multi‐tenancy architecture. Alali Fatima A et al (2012) studies cloud risk from an accounting and auditing perspective. Dutta Amab et al (2013) arrive at their version of cloud risks list by developing a questionnaire and distributing it to around 300 IT professionals. The top three scores in terms of high perceived risk were for Confidentiality, Legal inconsistency and Vendor Lock in. All these three vectors find a mention in this paper also. Etro Federico (2011) talks about some macroeconomic perspectives of cloud adoption like tax implications. The paper mentions the possibilities of Service Tax for Cloud Service providers who are in the pay per use model space. Fan Chiang Ku et al (2012) starts with the premise that people who are working in the field of cloud computing are not fully exposed to the entire gamut of risks due to the novelty and complication of the new technology. Among other risks, the paper also clearly delineates disclosure risk due to deliberate malice and accidental disclosure. Hosseini A.Khajeh et al (2011) offers a decision template for cloud adoption which is primarily driven by cost modelling. Kalyvas James R et al (2013) offers a framework for measuring and managing cloud computing risks in a detailed article spread over two parts. Between the two parts, they cover almost the entire gamut of cloud risks ranging from disclosure to confidentiality breach to lack of continuity to cloud outages. Krishna Iyer Easwar et al (2013) models the Net Present Value (NPV) of future cash flows for a firm that gets into cloud adoption. The paper arrives at a bulk unknown risk component, which the authors call as Yuk. This parameter represents a non‐cash, yet monetizable aggregate unknown risk component that goes with cloud adoption. Nkhoma Mathews Z. et al (2013) does a large scale survey of potential cloud adopters to find out the barriers for cloud adoption. Tisnovsky Ross (2010) brings in the crucial element of lack of control – real as well as perceived – for a business which moves its processes to the cloud. There are a few more papers listed in the references section which completes the 3600 cloud risk perspective. All the papers discussed so far cover the risk angle specifically from the cloud computing perspective. This literature survey has also covered a few papers which do not touch upon cloud but look at risk from a standalone approach. The most important among them is the work of the co‐creator of the Balanced Score Card Management System Prof. Robert S. Kaplan. Kaplan et al (2012) discusses about managing a risk ecosystem in a new framework. He broadly divides risks into rule‐based controllable risks, dialogue and discussions based strategic risks and anticipation based external risks. This paper uses Kaplan’s generic segmentation framework to map the cloud risks and analyze its differential impact. The same will be dealt as a complete section. To sum up, there is a body of literature available today which deals with cloud adoption and its associated risks. The purpose of the trilogy of papers on cloud and risk that this author is writing is to arrive at an actuarial evaluation of risk. To achieve this end, the first part of the trilogy has mathematically modelled risk as a big amorphous single parameter. Details on that part are briefly given in the next section. The second part – this paper – delineates cloud risks and maps them in the Kaplan framework. The perception of these risks across different industries would be different. The third part of the study aims at creating a framework using tools like regression and probability to arrive at a reasonably precise monetized value for risk. From this cumulative perspective, this study is unique.
3. Part 01 of this Trilogy – A Brief This study [Krishna Iyer Easwar et al (2013)] looks at what will be the behavior of the Net Present Value (NPV) for a firm which is progressively moving into cloud adoption. It aims to find out the cloud adoption fraction value for which the NPV maximizes. By the very nature of the previous statement, it becomes clear that NPV is
83
Easwar Krishna Iyer
not a linear function of cloud adoption. The function obtained is actually a maximizing function implying that NPV increases monotonically up to some cloud adoption level and then starts falling when the adoption intensity further increases. The turnaround is primarily driven by the unknown risk vector [The paper designates this amorphous unknown risk parameter as Yuk]. If one plots the value of NPV vis‐à‐vis incremental cloud adoption for various arbitrary ‘values’ of Yuk, it can be observed that the turnaround effect starts happening faster (or earlier) for higher and higher values of Yuk. The actual value (or fraction) of incremental cloud adoption for which the NPV function maximizes is thus critically hinged on the unknown risk component associated with cloud adoption Yuk. The paper surmises that the skeptical adoption of cloud that is happening today is primarily due to the perception of a high amount of risk associated with higher cloud adoption. To improve our understanding of what are the real adoption roadblocks, a better understanding of the parameter Yuk is necessary. Delineation and segmentation of the risk factors associated with cloud adoption is the purpose of this paper. Monetization of the same is the purpose of the last paper of the trilogy.
4. Functional Segmentation of Cloud Risks Based on detailed literature reviews, this paper identifies six broad areas of functional risk aggregation in the context of cloud computing adoption. It is worth mentioning at this point that when the paper mentions cloud from the risk analysis point of view, we imply public clouds only. For example risks like ‘lack of control and governance risk’ or ‘data center location risk’ are not applicable for private cloud in the same way we would associate its relevance in the public cloud space. The same logic would apply for risk elements like vendor lock in and SLA adequacy. Thus the risk analysis for this paper is confined to the risk of migration to the public cloud space. Each of the six risk spaces are briefly touched upon below.
4.1 Security Related Risks Based on the frequency with which it gets quoted in literature, security related risks are the prime concern of customers who are on the threshold of a cloud adoption. A slew of sub‐factors, most of which are endogenous, constitute the overall Security Risk. Disclosure of data either by malicious insiders or by sheer accident constitutes a key security threat. Breach of confidentiality – it could be termed as ‘privacy’ for some industries like healthcare industry – is another security vector. The security environment can be compromised on a large scale by external factors like natural disasters. Japan went through a delicate period in restructuring its datacenters after the dual incidences of earthquake and tsunami.
4.2 Vendor Related Risks The vendor space in the newly evolving cloud market can be split into the utilities provider space and the solutions integrator space. It is a three‐layer architecture with the top layer being the utility computing provider and the middle ‘intermediary’ layer being the web applications and SaaS solutioning provider. The 3rd and last layer is the Cloud / SaaS user. Thus, the top two levels aggregate to give the final cloud application to the end client. Many a time, client firms tend to burn their own bridges by dissipating their traditional in‐house computer architecture, before they migrate to the cloud. In this context, ‘Lock‐in’ with an incompatible vendor is the biggest risk that they run into in the vendor related risk space. Reputation fate sharing of another client’s bungling of the cloud space, lack of continuity of vendor operations, typically that of the intermediary vendor and poor uptime maintenance of cloud operations, all these add up to create the platform called Vendor Risk.
4.3 ‘No Gains’ Risks The key posited cloud gain of zero CAPEX (capital expenditure) often obfuscates the significant increase that cloud adopters will face in terms of running OPEX (operational expenditure). There are studies which indicate that IT operational expenses can exceed the initial savings in capital purchase within a 5 year cloud adoption life cycle. Also, the posited cash gains of cloud adoption might be transitory in the event of energy cost escalation. There could also be hidden tax implications with service tax coming in for cloud offerings. The tax component in a product buy situation is more obvious than the ‘hidden’ tax component in a ‘pay‐and‐use’ service buy situation like cloud computing. All these factors add to create the ‘No Gain’ Risk vector.
84
Easwar Krishna Iyer
Figure 1: Six Aggregate Cloud risk vectors exploded into 24 sub risks. (The number in the last colum indicates the literature reference of each sub rick)
4.4 Efficiency Related Risks A combination of internal and external factors adds up to impede the efficiency of cloud operations. Disruptions to smooth operation can be caused by outages triggered by power or network interruption. The possibility of quick upsizing and downsizing (dynamic scalability) can create provisioning roadblocks. On one side, scalability – both up scaling and down scaling ‐ is an intrinsic benefit that cloud computing ushers in. But the constraint that it will impose on the system because of the dynamic nature of provisioning brings in the element of efficiency risk. Finally, the intrinsic technical problem of latency – the number of hops that data has to traverse from origin to destination – adds up to complete the delineation of the efficiency risk space.
4.5 Business Related Risks With technology in a convergence mode, there could be problems associated with the newly emerging business models itself. In other words, technology adoption can compound basic business risks. There is the unknown danger of loss of control and governance of IT assets to a 3rd party hosted facility. The landscape of the legal compliance environment changes with 3rd party IT support. The location of the data center becomes critical in cloud migration for many industries (like banks). All these add up together to create the fifth dimension of cloud adoption risk – Business Risk.
4.6 Data Related Risks This paper treats data per se as different from the security framework that envelops data. Data that moves to the cloud is exposed to its own set of sub‐risks like compatibility, migration, restoration, integrity, redundancy and the likes. They add up to create the sixth and last dimension of risk – Data related risk. The 24 sub‐risk vectors are listed in Figure 01. The last column of this table indicates the literature reference number for each sub risk (taking them in the order in which they appear in ‘References’). As an example,
85
Easwar Krishna Iyer
citations on ‘Green Cost’ are found in 4 [Bannerman Paul L (2010)] and 8 [Hosseini A. Khajeh et al (2011)]. The frequency of quotes can be taken as an empirical indicator of the degree / impact of a given sub‐risk.
5. Inherent Characteristics based Segmentation of Cloud Risks From this level, we proceed to the second level and re‐classify the same set of 24 cloud sub risks, based on the inherent characteristics of the involved risk. This approach has been adapted from the generic risk framework proposed by Prof. Robert S. Kaplan (2012). The three buckets of risk classification at the second stage are Preventable Risks, Strategic Risks and External Risks. The 24 sub‐risks are re‐categorized according to the Kaplan framework in Figure 02. Each of the 24 sub‐risks is aligned to only one of the Kaplan Risk categories. The row‐column alignment and mapping is indicated by a tick mark. If unweighted percentages are any indicators, then 37.5% risks are preventable, 46% have a strategy building quotient and only 16.5% risks are external. This picture will substantially change when weightages get assigned to individual risks, which is what the 3rd and last part of this study (not part of this paper) is all about.
5.1 Preventable Risks
Figure 2: Categorizing the 24 sub risk factors into preventable strategic and external risks. Many risks that firms face have no flip‐side gains associated with it. Or in other words, no user / user group have a strategic leverage in going for such risks. Hence the only two approaches that can be employed to handle such risks are elimination or minimization. Some of these risks can be controlled by putting in place rigorous rules and ensuring compliance. Avoidance of accidental disclosure of data can be achieved by bringing in checks and processes in place. In some cases, an upfront signaling of value systems (through platforms like vision statements and mission statements) can reduce the occurrence of certain classes of risks. Unauthorized and malicious leak of information by disgruntled employees fall in this category. At a different plane, complete elimination of the risk of data loss can be employed by bringing in data redundancy measures. SLAs, service levels, service uptime, business continuity and the likes can be improved by bringing in a rule based risk management environment. In terms of risks – across industries – the category of preventable risks is the most understood risk management category.
86
Easwar Krishna Iyer
5.2 Strategic Risks As previously explained, in the context of the Kaplan framework, Strategic risks are those risks that are adopted by firms to gain a specific strategic leverage. Conscious adoption of a multi‐tenancy framework (despite knowing the risks of common space sharing by multiple clients) is to gain a better pricing advantage of cloud offerings since the hardware space gets commonized between multiple users. Similarly a risk of a high OPEX is consciously adopted to save on equipment purchase, equipment upkeep, equipment upgradation and the likes. The associated costs of tax costs and green compromise costs follow the OPEX costs. Strategic risks do not have the classic control element that one would associate with a preventable risk. A risk like latency is difficult to control. Viewed from a completely different perspective, strategic risks are those risks that follow the archetypal ‘risk‐ return’ profile. Cloud adopters knowingly and consciously adopt these risks to leverage a better strategic return. The point would be driven home better when one looks at examples of strategic risks across other industries. Simple credit default is a strategic risk which is perpetually associated with the banking industry. Any lending is associated with the inherent risk of a payment default. If banks cannot absorb this risk, then their very survival is in question. Similarly cannibalization of one’s own brands / variants is a strategic risk for FMCG (fast moving consumer goods) industry. For a short period, the firm runs the risk of a well‐branded product being pulled out and a new variant trying to establish its place. Upfront R&D investment is a strategic risk for the Pharma Industry. There is no guarantee that one will come up with a breakthrough medicine after every run.
5.3 External Risks Risks that are external, unknown, exogenous, probabilistic and sporadic come in this category. Firms normally have no control on risks like arson, disasters, outages and the likes. Prevention of such risks would be driven by complicated statistical models of these recurring risks based on Apriori data. In the context of cloud computing and external risks, some level of prediction can be achieved on power as well as network outages based on previous data. A finer understanding of the inherent characteristics of each of the three risk buckets is given in Figure 03.
Figure 3: Cloud risk delineation based on the inherent characteristics of the risks
6. Part 03 of this Trilogy – A Brief In a broader risk mapping and mitigation context, this paper is the second part of a trilogy of studies on cash flow modelling and risk mapping of public clouds. The first paper ‐ presented elsewhere – draws attention to the non‐cash, yet monetizable ‘aggregate risk component’ that is associated with Cloud adoption. This paper
87
Easwar Krishna Iyer
disaggregates that risk into multiple risk vectors. The third and last paper in the trilogy ‐ which is currently in the drawing room ‐ will do a weighted analysis of deterministically evaluating the Actuarials associated with each risk. For both the current study and the previous study, the entire analysis can be done by focusing in the vendor space. There is no mentioning on any specific cloud customer nuances. Thus in a sense, the first two parts are ‘adoption industry agnostic’. The third part of the study – where individual weights have to be arrived at for each cloud risk – will be highly adoption industry specific. The individual risk weightages for a banking industry will not be the same for say a healthcare industry, though the broad risk elements will be the same. The entire trilogy will conclude with a risk evaluation framework – Evaluation of Actuarials by Segmented and Weighted Analysis of Risk – or in brief the EASWAR Framework.
7. Conclusion As a technology offering, Cloud Computing which offers distributed, on‐demand, self‐service, location independent, elastic, pay‐for‐use only, zero CAPEX, zero ownership, utility driven computing is here to stay and grow. The movement towards cloud adoption is in line with the global trend of moving from product procurement to service procurement. With Cloud Computing poised to move from its nascent phase to a more robust growth phase, a systemic understanding of the risk space enveloping cloud is becoming important. This paper is primarily focused in delineating the risk vectors of the cloud landscape. Business risk mapping is the process of identification and segmentation of all hazards that impede the normal running of a business. The hazards or risks have first got to be delineated in the functional space. In the context of this paper and cloud, the six functional spaces that have been identified are related to the ‘compromises’ that the adopting firms will have to make to accommodate the cloud factor. The vectors are ‘compromise on security’, ‘compromise on vendor liaison’, ‘compromise on actual gains’, ‘compromise on efficiency’, ‘compromise on business requirements’ and finally ‘compromise on data management’. The functional risk space is then divided into 24 sub‐vectors with each of them giving an indication of one slice of the overall risk. The categorization of risk along functional lines is highly industry dependent. Geopolitics, dollar fluctuation, global demand, OPEC supply, pricing of substitutes, raw material quality and regulatory frameworks will be the functional silos through which we can map the risks of the Global Oil Industry. For say a banking sector, the functional risk spaces would be Operational Risk, Credit Risk and Reputational Risk. The first is the transactional space, second is the payment default space and the third is the credibility space. One can see that no two industries will have a common generic set of functional risks. The first part of this paper uses literature survey as a backup to analyze the functional risks of cloud adoption. Risks or hazards are of three types – the preventable risks with no strategic mileage, the consciously undertaken risks with a strategic underpinning and the pure external risks on which firms has no significant handle. This paper segments the 24 functional sub‐risk vectors into these three buckets. Threat mitigation on an Apriori basis [before it has happened] would be the approach for the Preventable Risks. Consequence containment on an Aposteriori basis [after it has happened] would be the way to tackle the External Risk. Risk‐ Return profiling, Likelihood Analysis, Scenario Mapping and Impact Analysis would be the measures to be undertaken to manage the Strategic Risks. This paper stops at finding out the functional risk silos and then classifying them into the three risk buckets mentioned. The next step would be to take a typical cloud adoption industry – say banking industry – and do an actuarial analysis to estimate the ‘cost’ of each risk. The study of cloud computing risks will be complete when all risk factors are understood, mapped, segmented, weighted and finally monetized.
References Achara Sachin, Rakesh Rathi (2014), “Security Related Risks and their Monitoring in Cloud Computing”, International Journal of Computer Applications (0975 – 8887), Volume 86 – No 13, January 2014 Alali Fatima A., Chia‐Lun Yeh (2012), “Cloud Computing: Overview and Risk Analysis”, Journal of Information Systems, Vol. 26, No. 2, Fall 2012, pp. 13‐33 Armbrust Micheal, A. Fox, R. Griffith, A.D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia (2010), “A view of Cloud Computing,” Communications of the ACM, vol. 53, 2010. Bannerman Paul L (2010), “Cloud Computing Adoption Risks: State of Play”, Asia Pacific Software Engineering Conference (APSEC 2010), Cloud Workshop – Nov 2010
88
Easwar Krishna Iyer
Dutta Amab, Peng Guo Chao Alex, Choudhary Alok (2013), “Risks in Enterprise Cloud Computing: The Perspective of IT Experts”, Journal of Computer Information Systems, Summer 2013 Etro, Federico (2011), “The Economics of Cloud Computing”, IUP Journal of Managerial Economics. May2011, Vol. 9 Issue 2, p7‐22. Fan Chiang Ku, Chen Tien‐Chun (2012), “The Risk Management Strategy of Applying Cloud Computing”, International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 3, No. 9, 2012 Hosseini A.Khajeh, D. Greenwood, J.W.Smith and I. Sommerville (2011), “The Cloud Adoption Toolkit: Supporting Cloud Adoption Decisions in the Enterprise,” Software: Practice and Experience, 2011. Kalyvas James R., Overly Michael R., and Karlyn Matthew A. (2013), “Cloud Computing: A Practical Framework for Managing Cloud Computing Risk—Part I”, Intellectual Property and Technology Law Journal, Volume 25, Number 3, March 2013 Kalyvas James R., Overly Michael R., and Karlyn Matthew A. (2013), “Cloud Computing: A Practical Framework for Managing Cloud Computing Risk—Part II”, Intellectual Property and Technology Law Journal, Volume 25, Number 4, April 2013 Kaplan Robert S., Mikes Anette (2012), “Managing Risks: A new framework”, Harvard Business Review, June 2012. Krishna Iyer Easwar, Panda Tapan (2013), “Cash Flow Modeling and Risk Mapping in Public Cloud Computing‐ An Evolutionary Approach”, International Journal of Consumer and Business Analytics (IJCBA), Vol. 01, No. 1, Feb 2013, Page 83‐94 Mangiuc Dragoş‐Marian (2012), “Security Issues of Cloud Based Services‐ a Guide for Managers”, Review of International Comparative Management, Volume 13, Issue 3, July 2012. Merton Robert C. (2013), “The Big Idea Innovation Risk: How To Make Smarter Decisions”, Harvard Business Review, April 2013. Nkhoma Mathews Z., Dang Duy P.T. and Anthony De Souza‐Daw (2013), “Contributing Factors of Cloud Computing Adoption: a Technology‐Organization‐Environment Framework Approach”, Proceedings of the International Conference on Information Management & Evaluation. 2013, p18‐19. Otim Samual, Dow Kevin E., Grover Varun, and Wong Jeffrey A. (2012), “The Impact of Information Technology Investments on Downside Risk of the Firm: Alternative Measurement of the Business Value of IT, Journal of Management Information Systems / Summer 2012, Vol. 29, No. 1., page 159‐193 Solms R. von and Viljoen M (2012), “Cloud computing service value: A message to the board”, South African Journal of Business Management. Dec 2012, Vol. 43 Issue 4, p73‐81 Tisnovsky Ross (2010), “Risks Versus Value in Outsourced Cloud Computing”, Financial Executive, www.financialexecutives.org, November 2010
89
How NSA’s surveillance programs influence cloud services outside the US? Jyrki Kronqvist and Martti Lehto Faculty of Information Technology, University of Jyväskylä, Finland
[email protected] [email protected]
Abstract: The National Security Agency (NSA) core missions are to protect US national security systems and to produce foreign signals intelligence information. To carry out this mission the first objective of NSA is to succeed in today’s operation. From the US perspective this success requires freedom of action in cyberspace by exploiting foreign use of electronic signals and systems and securing information. The freedom of action activities in the information networks have meant the surveillance of communications, which many has been interpreted as cyber espionage. This has increased distrust of the US ICT service providers. The recent media reports on the surveillance of communications on social networks and public cloud services by NSA will ensure that most non‐US enterprises who intend to use cloud services have to reconsider the risks of adopting public cloud services provided by US companies and outweigh the benefits. This will likely have an impact on the competitiveness of the US cloud computing industry and slow down the overall adaptation of public cloud services operated in the US. This paper will explore the recent NSA surveillance programs and analyse how they impact non‐US enterprises’ cloud adopting and their information security risk landscape. This paper also will review countermeasures available for defending against new risks. Keywords: cyber espionage, cloud services, data encryption, confidentiality, trustworthy
1. Introduction Revelations of the NSA's electronic surveillance programs ensure that most non‐US enterprises who intend to use cloud services have to reconsider the risks of adopting public cloud services provided by US cloud service providers. According to a report published by ITIF (Castro, 2014), US cloud computing providers might lose $21.5 to $35 billion over the next three years. The current and potential enterprises using public cloud services could leave the US cloud services or defer their cloud computing adoption. This kind of progress may have a negative impact to the overall cloud adaptation and competitiveness of the US based firms such as Google, Microsoft, and Salesforce. There are also opposite opinions in this matter. Forrester Research Analyst, Frank Gillett, says that he expects the controversy to have an impact on the US cloud companies, but reality shows that only a small percentage has actually cancelled its cloud services (Milian, 2014). Enterprises may not massively cancel their cloud services provided by the US high‐tech companies because there are few other competitive options. This paper will explore the recent disclosure of documents on the NSA’s surveillance programs and how they may impact the non‐US base enterprises public cloud adoption. It will also review possible countermeasures that large enterprises need to have in place to ensure the confidentiality of the cloud data. The paper is based on online research and carried out as an extensive review of online news and articles related to the selected topics. The structure of this paper is as follows. Section two illustrates the short history of the NSA’s surveillance programs. Section three includes a discussion concerning possibility of non‐US based enterprises postponing their public cloud adoption because of the NSA’s surveillance programs. Section four reviews encryption solutions which are available for cloud data protection. Finally, in section five, conclusions are drawn.
2. NSA’s Surveillance Programs ‐ What we know about them A series of disclosures related to NSA’s surveillance programs was started by Edward Snowden in June 2013. The Guardian (The Guardian, 2014) and The Washington Post (The Washington Post, 2014) revealed the existence of the NSA's PRISM program which started a discussion about the program, what it actually is and how it works. After the first news more and more new information about details and other programs have leaked out, like the recent post about NSA seeking to build a quantum computer (Rich and Gellman, 2014). What we know about PRISM is that the NSA used this system to get access to private communications of customers from nine popular public cloud services as the classified PowerPoint presentation leaked by Edward Snowden states (see Figure 1 and 2 below (The Guardian, 2014 and The Washington Post, 2014)). As the
90
Jyrki Kronqvist and Martti Lehto
presentation states the program enables "collection directly from the servers" of Microsoft, Yahoo, Google, Facebook and other online companies. The legal base for the access is governed by Section 702 of the Foreign Intelligence Surveillance Act, enacted in 2008.
Figure 1: Dates when PRISM collection began for each provider (The Guardian, 2014 and The Washington Post, 2014) The common and uniform reaction from the involved companies was to deny the existence of such surveillance programs. Google’s CEO Larry Page and David Drummond, Chief Legal Officer stated that “First, we have not joined any program that would give the US government — or any other government — direct access to our servers. Indeed, the US government does not have direct access or a “back door” to the information stored in our data centers. We had not heard of a program called PRISM until yesterday.” (Google Official Blog, 2014). Microsoft published their Statement of Microsoft Corporation on Customer Privacy “We provide customer data only when we receive a legally binding order or subpoena to do so, and never on a voluntary basis. In addition we only ever comply with orders for requests about specific accounts or identifiers. If the government has a broader voluntary national security program to gather customer data we don’t participate in it.” (Microsoft News Center, 2014). Facebook CEO Mark Zuckerberg underlined in his blog that "Facebook is not and has never been part of any program to give the US or any other government direct access to our servers. We have never received a blanket request or court order from any government agency asking for information or metadata in bulk, like the one Verizon reportedly received. And if we did, we would fight it aggressively. We hadn't even heard of PRISM before yesterday.” (Zukkerberg, 2014)
91
Jyrki Kronqvist and Martti Lehto
Figure 2: PRISM data collection details (The Guardian, 2014 and The Washington Post, 2014) What is common for all these statements is that they all emphasize they are not aware of PRISM, they do not provide any government agency with direct access to their servers, and any government agency requesting customer data must get a court order. There have been different opinions if the companies are sincere in this matter or if they just use legalistic language to hide their participation. If the companies do not provide direct access to their servers are they possibly using other means to provide access to the servers’ content? In its article The International New York Times says that the internet companies have systems that provide access to NSA based on individual FISA requests reviewed by company lawyers. By sharing the data in this manner and not sending it automatically or in bulk, the government does not have full access to company servers as emphasized by the companies (Miller, 2014). This may be interpreted also from the slides released by Snowden (Ball, 2014), it clearly distinguishes PRISM, which involves data collection from servers, as distinct from four different programs involving data collection using other techniques like collection of communications on fiber cables and infrastructure as data flows past (see Figure 3 below (Ball, 2014)).
92
Jyrki Kronqvist and Martti Lehto
Figure 3: Different methods of data collection under the FISA Amendment Act(Ball, 2014) This was described in detail in additional slides revealed by Snowden. These slides illustrate NSA’s tool to exploit the data links in a project called MUSCULAR. NSA and its British counterpart, the Government Communications Headquarters GCHQ are copying entire data flows across fibre‐optic cables that carry information among the data centers of the internet companies (see Figure 4 below) (Gellman and Soltani, 2014).
Figure 4: MUSCULAR program (Gellman and Soltani, 2014)
93
Jyrki Kronqvist and Martti Lehto
Documents provided by Edward Snowden also show that years‐long efforts by both the NSA and Britain's GCHQ to weaken encryption systems so that they could tap emails and internet communications. There is also suspicion that NSA has undermined the strength of encryption protocols developed by NIST, the US National Institute for Standards and Technology. (Ball, Borger and Greenwald, 2013) If we return back to the question of if the companies are sincere concerning their knowledge of NSA’s surveillance programs or if they just used legalistic language to hide their participation, we may conclude that this question is not so important from the non‐US enterprises point of view. They may participate in the NSA’s surveillance programs or not but the fact is that it seems obvious that NSA and Britain's GCHQ have powerful means to get access to data they want. NSA and its allies are using multiple techniques to get access to the data they need including legal, technology and human based techniques.
3. Non‐US Enterprises and Public Cloud Adoption The United States has been seen as the leader in development of cloud computing and related services. Other counties like European countries are trying to develop their own offerings and compete against the US market. The Europeans commission created a strategy document “Unleashing the Potential of Cloud Computing in Europe ‐ What is it and what does it mean for me?” to speed up and increase the use of cloud computing across the economy in Europe. The key actions of the strategy include: cutting through the jungle of technical standards so that cloud users get interoperability, data portability and reversibility, support for EU‐wide certification schemes for trustworthy cloud providers, development of model 'safe and fair' contract terms for cloud computing contracts including Service Level Agreements, and a European Cloud Partnership with Member States and industry to harness the public sector's buying power (20% of all IT spending) to shape the European cloud market (EU Press release database , 2014). As the European Commission states, the benefits of the cloud services come from its economies of scale. The strategy states that if 80% of organisations will adopt cloud computing, a cost savings of at least 10‐20% are to be achieved and also significant productivity gains are to be expected. The strategy outlines actions to deliver 2.5 million new European jobs, and an annual boost of EUR 160 billion to EU GDP (around 1%), by 2020 (EU Press release database, 2014). In a report published by ITIF (Castro, 2014) they estimated how much US cloud computing providers stand to lose from PRISM? According to the report, on the low end, US cloud computing providers might lose $21.5 billion over the next three years. This estimate assumes the US eventually loses about 10 percent of foreign market to European or Asian competitors and retains its currently projected market share for the domestic market. On the high end, US cloud computing providers might lose $35.0 billion by 2016. This assumes the US eventually loses 20 percent of the foreign market to competitors and retains its current domestic market share. The Figure 5 (Castro, 2014) below illustrates the high estimates of losses from the NSA’s revelations.
94
Jyrki Kronqvist and Martti Lehto
Figure 5: High estimate of losses from NSA’s revelations (Castro, 2014) As a base for these assumptions ITIF uses a survey made by Cloud Security Alliance in June and July of 2013. The survey included Cloud Security Alliance members, like industry practitioners, companies, and other cloud computing stakeholders, and the questions were related to their reactions to the NSA leaks (Cloud Security Alliance, 2014). According to the survey results 10 percent of non‐US resident respondents had cancelled a project with an US‐based cloud computing provider, and 56 percent would be less likely to use US based cloud computing services. For US residents, 36 percent indicated that the NSA leaks made it more difficult for them to do business outside of the United States. There are also different opinions on this matter. Forrester Research Analyst, Frank Gillett, says that he expects the controversy to have an impact on US cloud companies, but reality shows that only a small percentage has actually cancelled its cloud services. As Gillet said, “I’d be surprised at anything of that magnitude. Those who were already concerned about the US Patriot Act and other types of legal authorities have already avoided the United States, anyway. For them, it wasn’t a big surprise.” (Milian, 2014) Enterprises may not massively cancel their cloud services provided by US high‐tech companies because there are few other competitive options, but as ITIF’s report acknowledges that “the data are still thin” and “this is a developing story and perceptions will likely evolve we might conclude that because of the NSA’s surveillance programs US cloud service providers stand to lose some portion of the foreign market in the next few years”. One sign of the change is that some providers in Europe are already reporting their success, for example Artmotion, Switzerland’s largest hosting company reported a 45 percent increase in revenue in the month after Edward Snowden revealed details of the NSA’s PRISM program. (Gilbert, 2014) As the still ongoing media turbulence has shown, US high‐tech cloud service providers are the ones referred to in the NSA’s surveillance programs. As the ITIF report states, most developed countries have mutual legal assistance treaties (MLATs) which allow them to access data from third parties whether or not the data is stored domestically (Castro, 2014). As explained more in detail in (Maxwell, and Wolf, 2014) the authors state that “the governmental access to data stored in the cloud exists in every jurisdiction, not just limited to the United States (PATRIOT Act), but including also European countries with strict privacy laws also have anti‐ terrorism laws that allow expedited government access to cloud data”. Are NSA’s surveillance programs something on a completely different level compared to other countries? One answer was given by John Kerry, the US secretary of state, as he conceded that some of the NSA’s surveillance activities had gone too far and certain practices had occurred without the knowledge of senior officials in the Obama administration (Roberts and Ackerman, 2014).
95
Jyrki Kronqvist and Martti Lehto
Several security companies have reported that their fortunes have increased excessively on the back of NSA’s surveillance programs. For example, an US based company CipherCloud who provides a solution, which encrypts all traffic to popular cloud services such as Salesforce.com, Google Apps, Microsoft Office 365, and others. The enterprise retains the encryption keys, not CipherCloud, so no‐one else is able to decrypt the data (CipherCloud, 2014). In just a few months CipherCloud claims to have grown from 1.2 million to 2 million customers. The main question may be if the NSA’s revelations cause a significant slowness in the cloud adaptation. As CipherCloud emphases in their web pages (Leichter, 2014) the issue may be much broader than a few dominant US cloud providers profits. The enterprises have adopted cloud primarily because of economic reasons, the cloud is less expensive, more efficient, and offers powerful new capabilities that most organizations may not implement easily in‐house. The enterprises have reviewed the public cloud offerings available and moved away of implementing the basic ICT services and applications (like server for a development or testing and CRM or email) in‐ house and outsourced them to infrastructure‐as‐a‐service providers like Amazon Web Services, or application platforms like Google Apps or Salesforce. The cloud computing as a new way to deliver ICT services will not disappear and the enterprises who have adopted the cloud (public or private) are more competitive compared to enterprises stuck with expensive, legacy on premise systems.
4. Is cloud data encryption an answer to the NSA’s surveillance programs In this section we try to analyse the repercussions on the enterprises’ side of the NSA’s surveillance programs. The disclosure of documents on the NSA’s surveillance programs created great concerns within large enterprises and because of that they have changed their policies and instructed their users to manage confidential data and change their behaviour while being online. The main concern is the loss of confidential data, like business confidentiality and privacy, and this has sustained a rapid growth of encryption solutions software such as data encryption applications to securely send emails. A common objection to the use of encryption for cloud data is that it is complicated for enterprises to deploy in their organizations. The proper implementation requires necessary software and tools, enforcement of the appropriate policies and training of the employees, and management in the encryption keys. Encryption helps enterprises preserve some of the benefits of maintaining data on the premises, but is often challenging to implement. Encryption has its own security problems, like the key management (Sun et al., 2014). The recent progress in this area looks promising and the new turnkey encryption solutions available will significantly help enterprises, who are concerned about the NSA’s surveillance programs to implement new countermeasures to protect the cloud data. The encryption solutions, like offered by CipherCloud (CipherCloud, 2014) are specifically designed to work with popular public cloud services such as Salesforce.com, Google Apps, and Microsoft Office 365 (Gould, 2014). There is a strong argument that sensitive data needs to be encrypted before outsourcing in order to provide user data security against unauthorized usage of the data. Encrypted data makes data utilization a very challenging task. For instance keyword search functions on the documents stored in the cloud need specific algorithms and tools. Without those usable data services, the cloud will become only remote storage which provides limited value to the users. Searching through encrypted data is a fundamental and common form of data utilization service, enabling users to quickly sort out information of interest from huge amount of data. (Sun et al., 2014) CipherCloud (CipherCloud, 2014) provides a solution to enable enterprises to securely adopt public cloud service and appropriately manage the risks related to data privacy, security, or regulatory compliance. Their offering provides a platform, which is located at the enterprise’s premises including security controls like encryption, tokenization, cloud data loss prevention, cloud malware detection, and activity monitoring. The solution encrypts sensitive information in real time, before it is sent to the cloud, preserving application usability and functionality and keeps the keys that encrypt and decipher information under the control of the enterprise. As the company emphasizes, this ensures that all information requests must involve the owner, even if information is stored on a third‐party cloud. The company proposes a procedure for enterprises to ensure protection for their business confidential data in the cloud. They emphasize that enterprises need to confront the realities of a ubiquitous surveillance environment by taking proactive steps to fully defend that data from exposure including the following steps (CipherCloud press release, 2014):
96
Jyrki Kronqvist and Martti Lehto
Discover. Before the enterprises may protect information in the cloud, they need to know where it is and who has access to it: a) who should have access to information and who should not, b) what content is sensitive, proprietary, or regulated and how can it be identified, and c) where will this data reside in the cloud and what range of regional privacy, disclosure and other laws might apply?
Protect. Enterprises need to protect the data stored in the cloud using appropriate safeguards, like encryption: a) use of strong, known algorithms, like AES‐256 to protect it from unauthorized viewers, b) keeping the keys that encrypt and decipher information under the control of the user organization, and c) for additional level of security and control, enterprises should customize Data Loss Prevention to protect information according to its level of sensitivity.
Enable. Enterprises should use operations‐preserving encryption, which enables users to search, sort and report on encrypted data in the cloud. Additionally, an open platform capable of supporting all cloud applications and integrating third‐party tools provides a stable foundation for protection.
In the first step above CipherCloud emphasizes, in a similar manner like in the Cloud Security Alliance’s document titled a Security Guidance for Critical Areas of Focus in Cloud Computing (Cloud Security Alliance, 2011), the importance to properly identify an enterprise’s information assets and what kind of information it is going to transfer to the cloud. Is the information sensitive, like business confidential data or is it regulated? As it is mainly about a risk identification and management, enterprises need to carefully review the information assets they plan to transfer to the cloud, analyse cloud services and security offerings and architecture models available and map them to a model of existing compensating security and operational controls to ensure that the appropriate level of security is maintained. If as an outcome the confidentiality of an enterprise’s sensitive data may not be guaranteed, enterprises will seek security for their data in non‐US cloud vendors, in‐house private clouds or alternatively keep the existent legacy on premise systems. The CipherCloud solution allows use of several public cloud services, like Salesforce, Google Gmail, Microsoft Office 365, and Amazon Web Services. Additionally, CipherCloud Connect AnyApp and Database Gateway enable organizations to extend data protection to hundreds of third‐party cloud and private cloud applications and databases. This approach may satisfy the requirements of even the most security‐conscious enterprises (Gould, 2014). Some European Data Protection Authorities share this view, for example the French DPA (Commission nationale de l’informatique et des libertés, CNIL) recommends encryption with enterprise controlled keys as a standard practice for French firms and government agencies adopting the cloud services (Commission nationale de l’informatique et des libertés, (CNIL), 2014). Another example of the recent changes in the ICT industry is the increased use of open source solutions, like OpenPGP (Pretty Good Privacy). The daily creation of unique keys nearly tripled since June, just after the NSA’s surveillance programs first became public based on the report published by sks‐keyservers.net (Rushe, 2014). The statistics related to over 80 key servers around the world. The data demonstrated the growth of new PGP key generation in July and August 2013, revealing a trend that has gone from 500 to 2,200 new keys added every day. The Figure 6 (Rushe, 2014) below indicates an increase in adopting OpenPGP encryption across the enterprises, showing that enterprises are becoming aware of security issues and implementing new countermeasures to protect their business critical data.
97
Jyrki Kronqvist and Martti Lehto
Figure 6: A chart showing the development in the number of OpenPGP keys added by day (sks‐keyservers.net, 2014) The enterprises are adopting the cloud primarily because of economic reasons, it is less expensive, more efficient, and offers powerful new capabilities. Implementing an enterprise wide encryption solution to protect data stored in the cloud is not an easy thing to do despite the recent progress in this area. Some cloud providers have announced that they will improve their security by encryption, e.g. Google and Yahoo are expanding their efforts to protect their customers’ online activities by encrypting all the communications and other information flowing into companies’ data centers around the world (Rushe, 2014), (Gellman and Soltani, 2014). Despite the ongoing improvements, the cloud providers may resist cloud data encryption at least because encryption with customer controlled keys is inconsistent with their business model and might require them to modify their existing software systems (Falkenrath and Rosenzweig, 2014). An encryption solution, no matter if it is an enterprises own implementation or provided by a public cloud service provider, will increase the costs related to the use of cloud services and this may impact the ongoing shift to use public cloud services.
5. Summary A shift towards use of public cloud service is ongoing and enterprises will start to use them in the near future. As public cloud services certainly promises to deliver many benefits, this new way of delivering services also introduces new type of risks. As stated in the Cloud Security Alliance’s document titled Security Guidance for Critical Areas of Focus in Cloud Computing (Cloud Security Alliance, 2011), it is mainly about a risk identification and management, enterprises need to carefully review the information they plan to transfer to the cloud, analyse cloud services and security offerings and architecture models available and map them to a model of existing compensating security and operational controls to ensure that the appropriate level of security is maintained. The disclosure of documents on the NSA’s surveillance programs has created great concerns within large enterprises and because of that they have changed their policies on how to protect their confidential data transferred and stored in the cloud. Encryption helps enterprises preserve some of the benefits of maintaining data on the premises, but is often challenging to implement. Encryption has its own security problems, like the key management (Sun et al., 2014). The recent progress in this area looks promising and the new turnkey encryption solutions available will significantly help enterprises, who are concerned about the NSA’s surveillance programs, to implement new countermeasures. The encryption solutions, like those offered by CipherCloud (CipherCloud, 2014) are specifically designed to work with popular public cloud services such as Salesforce.com, Google Apps, and Microsoft Office 365 (Gould, 2014). Cloud service providers will improve their security. Google and Yahoo have announced that they are improving their security by encrypting all communications and other information flowing into their data centers around the world (Rushe, 2014), (Gellman and Soltani, 2014). Despite the ongoing improvements, the cloud providers may resist cloud data
98
Jyrki Kronqvist and Martti Lehto
encryption at least because encryption with customer controlled keys is inconsistent with their business model and might require them to modify their existing software systems (Falkenrath and Rosenzweig, 2014). The public‐domain encryption alternatives may be a long‐lasting solution for non‐US enterprises. The closed‐source software is easier for the NSA to find a backdoor than open‐source software. An encryption solution, no matter if it is an enterprises own implementation or provided by a public cloud service provider, will increase the costs related to the use of public cloud services and this may impact the ongoing shift to use public cloud services. Public cloud vendors in the United States may be facing losses because of the NSA’s surveillance programs. If that trend develops, non‐US enterprises will seek security for their data in the open source solutions, using non‐US cloud vendors, implementing in‐house private clouds or keep the existent legacy on premise systems. This will slow down the overall public cloud adaption and it will have a negative impact on the enterprises competitiveness and the entire public cloud industry.
References J. Ball (2014). NSA's Prism surveillance program: how it works and what it can do, The Guardian, http://www.theguardian.com/world/2013/jun/08/nsa‐prism‐server‐collection‐facebook‐google. J. Ball, J. Borger, G. Greenwald (2013). Revealed: how US and UK spy agencies defeat internet privacy and security, The Guardian, http://www.theguardian.com/world/2013/sep/05/nsa‐gchq‐encryption‐codes‐security. D. Castro (2014). How Much Will PRISM Cost the U.S. Cloud Computing Industry? The Information Technology & Innovation Foundation (ITIF), http://www2.itif.org/2013‐cloud‐computing‐costs.pdf. CipherCloud (2014). www.ciphercloud.com. CipherCloud press release (2014). First PRISM, now XKEYSCORE, cloud surveillance is wake –up call for the enterprise ‐ Extensive NSA snooping generates cloud data protection concerns, http://www.ciphercloud.com/company/press‐ releases/first‐prism‐now‐xkeyscore‐cloud‐surveillance‐is‐wake‐up‐call‐for‐the‐enterprise/. Cloud Security Alliance (2011). Security Guidance for Critical Areas of Focus in Cloud Computing V3.0, https://cloudsecurityalliance.org/guidance/csaguide.v3.0.pdf. Cloud Security Alliance (2014). CSA survey results – Government Access to Information, https://downloads.cloudsecurityalliance.org/initiatives/surveys/nsa_prism/CSA‐govt‐access‐survey‐July‐2013.pdf. Commission nationale de l’informatique et des libertés, (CNIL) (2014). Recommendations for companies planning to use Cloud computing services, http://www.cnil.fr/fileadmin/documents/en/Recommendations_for_companies_planning_to_use_Cloud_computing _services.pdf. J. Gould (2014). Is Cloud Data Encryption the Answer to Patriot Act Fears? http://safegov.org/2012/11/9/is‐cloud‐data‐ encryption‐the‐answer‐to‐patriot‐act‐fears. EU Press release database (2014). Digital Agenda: New strategy to drive European business and government productivity via cloud computing, http://europa.eu/rapid/press‐release_IP‐12‐1025_en.htm. R.A. Falkenrath, P. Rosenzweig (2014). Encryption, not restriction, is the key to safe cloud computing, http://safegov.org/2012/10/5/encryption,‐not‐restriction,‐is‐the‐key‐to‐safe‐cloud‐computing. D. Gilbert (2014). Companies Turn to Switzerland for Cloud Storage Following NSA Spying Revelations, International Business Times, http://au.ibtimes.com/articles/486613/20130704/business‐turns‐away‐dropbox‐towards‐ switzerland‐nsa.htm#.UsLAzNIW0wA. B. Gellman, A. Soltani (2014), NSA infiltrates links to Yahoo, Google data centers worldwide, Snowden documents say, The Washington Post http://www.washingtonpost.com/world/national‐security/nsa‐infiltrates‐links‐to‐yahoo‐google‐ data‐centers‐worldwide‐snowden‐documents‐say/2013/10/30/e51d661e‐4166‐11e3‐8b74‐ d89d714ca4dd_story.html. Google Official Blog (2014). What the …? http://googleblog.blogspot.fi/2013/06/what.html. The Guardian (2014). The NSA Files, http://www.theguardian.com/world/the‐nsa‐files. W. Leichter (2014). Report: NSA Snooping Cloud Cost US Cloud Companies Billions – a CipherCloud Perspective, Report: NSA Snooping Could Cost US Cloud Companies Billions – a CipherCloud Perspective. W. Maxwell, C. Wolf (2014). A Global Reality: Governmental Access to Data in the Cloud A comparative analysis of ten international jurisdictions, http://www.cil.cnrs.fr/CIL/IMG/pdf/Hogan_Lovells_White_Paper_Government_Access_to_Cloud_Data_Paper_1_.pd f. C.C. Miller (2014). Tech Companies Concede to Surveillance Program. The New York Times, http://www.nytimes.com/2013/06/08/technology/tech‐companies‐bristling‐concede‐to‐government‐surveillance‐ efforts.html. Microsoft News Center (2014). Statement of Microsoft Corporation on Customer Privacy, http://www.microsoft.com/en‐ us/news/press/2013/jun13/06‐06statement.aspx. M. Milian (2014). Thanks to the NSA, the Sky May Be Falling on U.S. Cloud Providers, http://www.bloomberg.com/news/2013‐08‐08/thanks‐to‐the‐nsa‐the‐sky‐may‐be‐falling‐on‐u‐s‐cloud‐ providers.html.
99
Jyrki Kronqvist and Martti Lehto
S. Rich, B. Gellman (2014). NSA seeks to build quantum computer that could crack most types of encryption, The Washington Post, http://www.washingtonpost.com/world/national‐security/nsa‐seeks‐to‐build‐quantum‐computer‐ that‐could‐crack‐most‐types‐of‐encryption/2014/01/02/8fff297e‐7195‐11e3‐8def‐a33011492df2_story.html. D. Roberts, S. Ackerman (2014). US surveillance has gone too far, John Kerry admits, The Guardian, http://www.theguardian.com/world/2013/oct/31/john‐kerry‐some‐surveillance‐gone‐too‐far. D. Rushe (2014). Yahoo to add encryption to all services in wake of NSA spying revelations, The Guardian, http://www.theguardian.com/technology/2013/nov/18/yahoo‐encryption‐nsa‐revelations‐privacy. W. Sun, W. Lou, Y. T. Hou, and H. Li (2014). Privacy‐Preserving Keyword Search over Encrypted Data in Cloud Computing, in S. Jajodia et al. (eds.), Secure Cloud Computing, Springer Science+Business Media New York 2014, pp 189‐212 The Washington Post (2014). Here’s what we learned about the NSA’s spying programs in 2013, http://www.washingtonpost.com/blogs/the‐switch/wp/2013/12/31/heres‐what‐we‐learned‐about‐the‐nsas‐spying‐ programs‐in‐2013/. M. Zukkerberg, (2014). Facebook, https://www.facebook.com/zuck/posts/10100828955847631.
100
Authenticity as a Component of Information Assurance and Security Corinne Rogers University of British Columbia, Vancouver, Canada
[email protected] Abstract: Trust is a universal concern in today’s digital environment. When we trust records, documents, or data online, we assume that they are authentic. But defining authenticity is difficult. Much research has been conducted by records professionals on the nature of digital records and their attributes that may support the presumption of their authenticity. Still, current means of assessing authenticity do not offer any quantifiable measures. There is a pressing need for such measures as our financial, governmental, health, critical infrastructure, and social network systems increasingly rely on complex integrated, interdependent (although not necessarily interoperable), distributed networked systems. Authenticity is frequently cited as either an outcome or a goal in security measures. There are two aspects of security: first, protecting systems from unauthorized access, and second, guaranteeing and protecting the contents of those systems – the records, documents, and data – from theft, alteration, or deletion, and assuring their identity and integrity. Authenticity is therefore an integral part of information assurance and security (IAS). But we know that IAS is not about technology alone. It comprises a comprehensive program involving people, processes and technology (ICC Belgium 2013). Similarly, information governance and records management can only function adequately when they involve an integrated program of standards, policies and procedures, education and training, people and technology. This paper discusses authenticity of digital material as a component of security and information assurance. It introduces research currently underway to establish metrics of authenticity and identify core metadata and documentation that are indicators and/or requirements for authenticity. Preliminary results are presented of a survey conducted by the author that investigates how records professionals establish and maintain authenticity of documents and data for which they are responsible. Keywords: authenticity, security, trusting records, information assurance
1. Introduction Trust is a universal concern in today’s digital environment. When we trust records, documents, or data online, we assume that they are authentic. But defining authenticity is difficult. Authenticity is frequently cited as either an outcome or a goal in security measures. There are two aspects of security: first, protecting systems from unauthorized access, and second, guaranteeing and protecting the contents of those systems – the records, documents, and data – from theft, alteration, or deletion, and assuring their identity and integrity. Authenticity is therefore an integral part of information assurance and security (IAS). But we know that IAS is not about technology alone. It comprises a comprehensive program involving people, processes and technology (ICC Belgium 2013). Similarly, information governance and records management can only function adequately when they involve an integrated program of standards, policies and procedures, education and training, people and technology. This paper discusses authenticity of digital material as a component of security and information assurance. It introduces research currently underway to establish metrics of authenticity and identify core metadata and documentation that are indicators and/or requirements for authenticity. Preliminary results are presented of a survey conducted by the author that investigates how records professionals establish and maintain authenticity of documents and data for which they are responsible.
2. Background – The Problem of Authenticity For as long as people have written down their ideas, or recorded their transactions, others have resorted to forgery for financial gain, mischief, or exercise of power. Perhaps the most famous example is the Donation of Constantine. This forged Roman imperial decree purported to transfer authority over Rome and the western Roman Empire to the Pope. Probably written some time in the 8th century, it supported the Roman Church’s claims of political authority for 700 years. The revival of classical scholarship and textual criticism in the 15th century led the Catholic priest, Lorenzo Valla, to prove conclusively that the Donation, whose authenticity had long been suspected, was in fact a forgery. Valla analyzed the form and content of the document to identify discrepancies in the use of language, as well as cultural and geographic anomalies (MacNeil 2000). Two centuries later, Dom Jean Mabillon wrote the seminal treatise on the science of diplomatics, the systematic study of the external and internal elements of documentary form, the circumstances of the writing, and the juridical nature of the fact communicated used to assess authenticity of documents (Duranti 1998).
101
Corinne Rogers
Forgeries are not restricted to historical documents, however. Cheque fraud, for example – whether involving counterfeit, forged, or altered cheques – is among the oldest and most common forms of financial crime (CIBC n.d.) Nor is forgery restricted to the analogue world. Today, the deep integration of digital technology in all areas of our personal and professional lives has proven fertile ground for old fraud in new forms as well as new types of fraud and forgery intent on stealing or misusing the identity of individuals or organizations. Identity theft perpetrated through forgery can occur in the digital environment as electronic signature forgery, financial forgery, commercial forgery, or governmental or administrative forgery (Laws.com n.d.). Authenticity is as important a consideration today as it was in the 17th century, and in the digital environment, the stakes are as high if not higher than they ever have been. Hence, proving the authenticity of documents has been a pressing concern since the advent of writing, and continues to be today. How can a document, regardless of medium, be trusted to be what it claims to be? Most people intuitively understand that authenticity is the quality of genuineness, but few are able to identify exactly what is required to ensure, assess, and guarantee it. Authenticity, the quality of a record that is what it purports to be, has historically been understood as deriving from the circumstances of a document’s creation, if known, or from the place of its preservation. The presence of a signature indicated the agreement of the author with the content of the document and authenticated the transaction recorded therein. Signatures of witnesses or countersigners further verified the document’s authenticity. Signers and countersigners could be questioned if necessary, and their testimony used as a guarantee of genuineness. Such determination of authenticity was based on observation and testimony. In common law legal systems, documentary evidence must be authenticated in order to be admissible at trial. Authenticity, established through processes of authentication, is codified in our legal systems through statute and common law. Authentication of documentary evidence is accomplished through witness testimony, expert analysis, non‐expert opinion, or, in the case of public documents or other special types, circumstances of record creation and preservation (Cornell University Law School n.d.). These heuristics, based primarily on the appearance of documents, have developed over centuries and are still operational today, often misguidedly applied to digital documents. In interviews conducted in 2011 with lawyers, digital forensics experts, and records managers during the Digital Records Forensics Project (a 3‐year collaboration (April 2008‐April 2011) between the University of British Columbia's School of Library, Archival and Information Studies (SLAIS), the UBC Faculty of Law, and the Computer Forensics Division of the Vancouver Police Department), one respondent from the legal domain answered the question about determining authenticity of documents: “You can tell just by looking at it” (Rogers 2011). Much research has been conducted by records professionals on the nature of digital records and their attributes that may support the presumption of their authenticity. Still, current means of assessing authenticity do not offer any quantifiable measures. There is a pressing need for such measures as our financial, governmental, health, critical infrastructure, and social network systems increasingly rely on complex integrated, interdependent (although not necessarily interoperable), distributed networked systems. Digital technology’s many benefits and challenges in respect of documentary material are well known. The benefits, including ease of creation, search, access, and sharing, are offset by ease of alteration, loss of integrity that may be difficult or impossible to detect, difficulty in establishing ownership and authorship, or enforcing intellectual rights. The advent of cloud computing has increased the challenges, introducing in particular all the issues arising from third party handling of material, and jurisdictional questions about material created, stored, and transmitted around the globe, to name but two. Archival diplomatic theory links the determination and assessment of authenticity to the circumstances of record creation, and framework of subsequent preservation. To assess the authenticity of a digital object, one must be able to establish its identity and demonstrate its integrity. The identity of a digital object is established by the attributes of the object that uniquely distinguish it from other objects, while integrity refers to its wholeness and soundness, that is, the degree to which it is complete and uncorrupted. Authenticity is thus guaranteed and assessed by the presence of attributes proving the identity of the record(s) and the integrity of the system(s) in which they are created, maintained, transmitted, and preserved (Duranti 2005; Duranti & Preston 2008).
102
Corinne Rogers
One of the longest running, continuously funded research projects, InterPARES (1999‐2012, www.interpares.org) has been conducted in three completed phases, and continues in a fourth phase, InterPARES Trust (2013‐2018, www.interparestrust.org). Funded by the Social Sciences and Humanities Research Council of Canada (SSHRC), InterPARES researches issues of trust in records and data arising from our increasing reliance on the creation, exchange and processing of digital information. Grave threats are posed to records and data by issues such as the rapid obsolescence of hardware and software, the fragility of digital storage media, and the ease with which digital entities can be manipulated. InterPARES uses a theoretical framework based on archival science and diplomatics, and is committed to an inter‐disciplinary process involving a wide spectrum of academic and professional fields, including computer engineering, human computer interaction, and law. Results include guidelines and frameworks for systems that will produce authentic, reliable and accurate records, and for identifying records in digital environments, and assessing and preserving their accuracy, reliability and authenticity throughout their life cycle. Findings are integrated into many practical applications for digital recordkeeping and preservation, including the U.S. Department of Defense Design Criteria Standard for Electronic Records Management Software (DoD 5015.2), and the recordkeeping systems commonly used in the Cuban banking system. (Duranti 2005; Duranti & Preston 2008). The lifecycle of authentic digital records, from the development of records systems through generation, maintenance, use, and preservation of records is captured in the Chain of Preservation (COP) model (Duranti & Preston 2008). The model reflects archival theory and archival diplomatics, and complies with the requirements of the Open Archival Information System (OAIS) Reference Model, ISO 14721:2003 Space Data and Information Transfer Systems (CCSDS, 2012). The model identifies all the activities that must be undertaken to ensure that digital records are properly generated in the first instance, maintain their integrity over time, and can be authentically reproduced at any time throughout their existence. As well, it characterizes the metadata and documentation that must be gathered, stored and utilized throughout the lifecycle. Authenticity is an important component of information assurance and security (IAS). The National Institute of Standards and Technology (NIST) defines information security as “The protection of information and information systems from unauthorized access, use, disclosure, disruption, modification, or destruction in order to provide confidentiality, integrity, and availability”, and information assurance as “Measures that protect and defend information and information systems by ensuring their availability, integrity, authentication, confidentiality, and non‐repudiation. These measures include providing for restoration of information systems by incorporating protection, detection, and reaction capabilities” (National Institute of Standards and Technology 2013). IAS has been described as a multidisciplinary knowledge domain (Cherdantseva & Hilton 2013), and a business‐wide issue that extends far beyond the IT department (ICC Belgium 2013). However, authenticity as defined by archival diplomatics and expressed through research such as InterPARES has not been included explicitly in computer security models. The first conceptual computer security model was the CIA‐triad, composed of confidentiality, integrity, and availability. Since its introduction in the mid‐ 1980s security experts have challenged the adequacy of the model and proposed extensions (Cherdantseva & Hilton 2013). In 1998 Donn Parker introduced six foundation elements essential to information security. To the CIA‐triad he added utility, authenticity, and possession. He found integrity, the characteristic of completeness and wholeness, free from corruption or manipulation, to be insufficient without the assurance also of authenticity, or “conformance with reality” (Parker 1998; Kabay 2013). Most recently, Cherdantseva and Hilton have proposed a reference model for information assurance and security that extends the CIA‐triad to the IAS Octave: confidentiality, integrity, availability, privacy, authenticity & trustworthiness, non‐repudiation, accountability and auditability (Cherdantseva & Hilton 2013). Terminology is always a challenge when discussing authenticity, authentication, identity, and integrity. In the realm of computer security, authentication is the process of verifying the identity of a user, and authenticity and integrity of digital material are generally treated as separate. For example, Cherdantseva and Hilton distinguish authenticity from integrity. They define authenticity in tandem with trustworthiness as “An ability of a system to verify identity and establish trust in a third party and in information it provides.” Of integrity they state: A system should ensure completeness, accuracy and absence of unauthorized modifications in all its components” (Cherdantseva & Hilton 2013). NIST identifies authenticity as “The property of being genuine and being able to be verified and trusted; confidence in the validity of a transmission, a message, or message
103
Corinne Rogers
originator and authentication as “Verifying the identity of a user, process, or device, often as a prerequisite to allowing access to resources in an information system, ... The process of establishing confidence of authenticity.” Furthermore, one may distinguish identity verification, message origin authentication, and message content authentication. Identity is linked primarily to a person: “A set of attributes that uniquely describe a person within a given context... The set of physical and behavioral characteristics by which an individual is uniquely recognizable”, although identity is also defined more broadly as “The set of attribute values (i.e., characteristics) by which an entity is recognizable and that, within the scope of an identity manager’s responsibility, is sufficient to distinguish that entity from any other entity. Integrity is concerned with change: “Guarding against improper information modification or destruction, and includes ensuring information non‐repudiation and authenticity” (National Institute of Standards and Technology 2013).
3. Current research Even as archival research and scholarship continue to offer insights into the nature of authentic digital objects and their preservation, the affordances of new technologies, specifically distributed networked systems connected through the Internet, create new challenges to security and authenticity. The fourth phase of InterPARES is currently exploring whether the InterPARES model of metadata and the design requirements for preserving authentic digital records still holds true in the developing world of cloud computing, or must be adapted to the changing technological environment. The over‐arching goal of InterPARES Trust is to understand these new records environments and develop international frameworks and guidelines to ensure their trustworthiness. Establishing the authenticity of digital records and data is critical to trust, but we can no longer rely on traditional methods to test it, nor do we understand the myriad ways in which such records and data may be used, reused, and combined in the future. Metadata describing records and the actions taken on them are key to establishing and assessing authenticity. Metadata are the machine‐ and human‐readable assertions about information resources that allow for physical, intellectual and technical control over those resources. Users create and attach, and then maintain and preserve metadata, either automatically and/or manually, when maintaining their digital records, documents, and data. These metadata may be technical, administrative, or descriptive. They codify and track the identity and integrity of the material over time and across technological change. However, in cloud environments, technological change no longer means simply refreshing the material or migrating it to new media under the control of the original creator of the material. When these records are entrusted to cloud systems, this creator‐generated metadata, remaining inextricably linked to the records, are also stored, and cloud service providers (CSPs) assume control of the material. Within these new environments, these user records will acquire additional metadata from the CSP that will be indicate a number of important elements, including, but not limited to, storage locations, access controls, security or protection measures, failed or successful manipulations or breaches, etc. CSPs may also outsource some components of their services to other third parties, who may also generate service metadata that provide assertions about the maintenance and handling of the material, and about their own actions taken in the course of handling the material. While these metadata are linked to the users’ records, much of it remains proprietary to the provider and not the user. Consequently, proprietary CSP metadata present a sort of event horizon, beyond which the ability to establish an unbroken chain of custody is lost to the owner of the records. CSPs often remain reluctant to share information about the cloud environment itself, the movements of a client’s data within the system, and when the provider (or its contracted third parties) might have access to the data. Additionally, the network of third‐party subcontracts employed by a provider may make it impossible for them to know such information. Nevertheless, these metadata remain invaluable to the user in assessing and ensuring the accuracy, reliability, and integrity of the material over the whole service lifecycle (Enrique et al. 2013; Smit et al. 2012).
4. Survey – Indicators of Authenticity From March 3‐May 1, 2014, this author conducted a web‐based survey to gather basic information about how records, information, or systems professionals ensure, assess, and/or protect digital records’ authenticity, what metadata they employ or rely on, and what indicators of authenticity they consider to be important. The survey was sent to the major English‐speaking archival and records management listservs, reaching professionals in North America, Europe, Latin America and Asia. The survey consisted of 17 questions. Demographic questions asked respondents to identify their job or position and the sector in which they work, their age, level of education, and discipline of their degree(s).
104
Corinne Rogers
Subsequent questions explored their main professional responsibilities, the means they used to ensure authenticity, what metadata they routinely applied or relied on for that purpose, whether they had ever been called upon to make a formal attestation of authenticity in a legal or administrative proceeding and if so what indicators had been most important in that attestation, and whether their organization explicitly defined authenticity in its policy instruments. The survey sought to explore the relationship between practice and belief – that is, what records professionals relied on in their work and whether that matched their belief or trust in authenticity indicators. It also sought to distinguish between social and technological indicators of authenticity, where social indicators include policy instruments, systems documentation about design, operation and management, software documentation, information about changes over time and preservation actions, classification or file plans, retention and disposition schedules, and archival description or other descriptive measures, and technological indicators include audit logs, access controls and security measures, cryptographic validation techniques, and system metadata. These distinctions were explored in a series of ranking, Likert‐style questions. They were further supported by open‐ended opinion questions asking respondents to give their own definition of authenticity and identify the indicators they felt were most important.
5. Preliminary Findings The survey received 441 responses and 254 completions (58%). Respondents self‐identified primarily as Records or Information Managers (33%), or Archivists (43%). Educators comprised 6%, management 4%, IT personnel 3%. Industry sectors most represented were information and cultural industries (including libraries and archives, broadcast and telecommunications) and government (see Fig. 1. Industry sectors were condensed from the North American Industry Classification System) (Statistics Canada & Standards Division 2012).
Figure 1: Professions and Sectors
Respondents were asked to rate the following tools according to how frequently they were used to ensure authenticity: Written policies and procedures governing the management of the records system; Documentation about the record system (design, operation, management, etc.); Written policies and procedures governing digital records; Information about the software used to create and manage the digital records; Information about changes made to the digital records over time (e.g. migration, normalization, etc.); Information about actions taken to preserve the digital records; Classification scheme and/or file plan; Retention and disposition schedules; Archival description; Access controls/security measures; Audit logs; Cryptographic validation techniques (e.g. digital signatures, hash digests, etc.); Standardized metadata.
105
Corinne Rogers
Figure 2:Indicators of Authenticity Used In general, respondents relied on traditional archival and records management tools “most of the time” or “always” for managing authenticity, specifically policies governing records (55%) and record systems (61%), and classification schemes or file plans (62%). Fifty‐three percent of respondents used access controls and security measures and standardized metadata “most of the time” or “always”. However, Fifty‐one percent of respondents never or rarely relied on audit logs in the course of their work, and 60% never or rarely used cryptographic validation techniques. When asked to rate the importance of the same tools if they were required to make an attestation of authenticity in a legal or administrative action, however, 68% said they that standardized metadata would be very important or extremely important, cryptographic validation techniques were deemed very or extremely important by 66%, audit logs were favoured by 76%, and access controls and security measures were considered very or extremely important by 88% of respondents. Despite their current practice, therefore, technical means of validating authenticity were considered as important as traditional means.
106
Corinne Rogers
Figure 3: Importance of Indicators of Authenticity With respect to organizational records and information policies, 54% of respondents said their organization did not define authenticity of digital material, and 17% did not know if their organizational policies contained such definitions. Preliminary analysis of the narrative responses to the final two questions (What is your definition of authenticity of digital records? and What do you believe is essential to proving the authenticity of digital records?) reveal that authenticity is still generally assessed according to traditional social heuristics. In response to the first question, several respondents noted that records produced in the usual and ordinary course of business could be presumed authentic, thus reflecting statute and precedent law governing business records in common law traditions. Most respondents noted integrity as a means of establishing authenticity, and several stated that bitwise integrity was necessary after the moment a record was “fixed” – that is, chosen to be kept as evidence of the action represented in the record, or preserved for long‐term reference in an archives. Responses indicated a pragmatic approach to authenticity, for example, one respondent answered: Is [the record] sufficient for the purposes it may be used for? Would it satisfy a judge or adjudicator? Whatever I can claim about it, can I back that up with facts? / The basic definition of an authentic record is "Can it be used as an authentic record in a situation where an authentic record would be needed?" This is not a yes/no answer (though the question is), but rather a range. I want the records as authentic as they need to be for future uses. They needn't be the MOST authentic ‐ just authentic enough. The final question explored respondents’ beliefs about essential indicators of authenticity. Answers focused on chain of continuity, controls on creation and management, policies on access and to ensure provenance information, and the addition or presence of metadata about the creator and context of creation. Several respondents noted the importance of cryptographic validation techniques, and several specifically stated that security and access controls were paramount (although one respondent noted the importance of these controls in the context of using public cloud‐based email and document sharing.) This early exploration of the survey data points the way to further research to explore in greater depth the importance of social versus technical indicators of authenticity, and how these are used when authenticity is questioned in legal or administrative hearings. Next steps will include further coding and analysis, particularly of the open‐ended survey questions, followed by semi‐structured interviews with many of those who indicated their willingness to provide more information. The applicability and authority of indicators of authenticity of digital records and data will be assessed in different environments, in particular when records and data are created, maintained, or preserved in cloud computing applications. This will be of increasing importance as
107
Corinne Rogers
more organizations turn to cloud service providers to support their operations, and as courts continue to face the increasing challenge of evidence presented in digital form.
6. Limitations Web‐based surveys are convenient ways in which to reach a broad population quickly, but they do have limitations. Primary among these limitations is the non‐representative nature of the sample. Even when using professional listservs (where the sample members can be reasonably assured of common purpose, training, and responsibility), respondents choose to reply, and while all respondents may be members of the target population, not all members of that population are members of, or have access to or read these listservs. Generalizability of the results, therefore, is not possible, nor can validity be objectively measured. However, as an indicator of general practice, such surveys provide useful information.
7. Conclusions Preliminary findings indicate that records professionals still tend to rely on traditional heuristics for ensuring authenticity, even when they claim to put their trust in more technical solutions if required to attest to authenticity. Records and information professionals – archivists and records managers – have traditionally been the trusted professionals who keep records safe, authentic and reliable. As complex technology increasingly mediates between the record and the record user, records professionals necessarily place their trust in information technology professionals. It appears that the trusted records professional is now becoming the trusting technology user – the trustee has become the trustor. However, each discipline has unique and complementary knowledge. The records professional knows what information in the form of records and data has value and must be preserved, and the information technology professional understands how to protect and secure that information. If our documentary heritage is at the root of democracy and accountability, both professions are necessary in its authentic preservation.
References CCSDS. (2012). Reference Model for an Open Archival Information System (OAIS): Recommended Practice Issue 2. Consultative Committee for Space Data Systems. Available at: http://public.ccsds.org/publications/archive/650x0m2.pdf [Accessed May 8, 2014]. Cherdantseva, Y. & Hilton, J., 2013. A Reference Model of Information Assurance & Security. In IEEE, pp. 546–555. Available at: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6657288 [Accessed May 8, 2014]. CIBC, Other Common Fraud Examples. CIBC. Available at: https://www.cibc.com/ca/legal/other‐common‐fraud‐ examples.html. Cornell University Law School, Rule 901. Authenticating or Identifying Evidence. Legal Information Institute. Available at: http://www.law.cornell.edu/rules/fre/rule_901 [Accessed May 7, 2014]. Duranti, L., 1998. Diplomatics: New Uses for an Old Science, Lanham: Scarecrow Press. Duranti, L., 2005. The Long‐term Preservation of Authentic Electronic Records: Findings of the InterPARES Project, San Miniato: Archilab. Duranti, L. & Preston, R., 2008. Research on Permanent Authentic Records in Electronic Systems (InterPARES) 2: Experiential. Interactive and Dynamic Records, Padova: Associazione Nazionale Archivistica Italiana. Enrique, C.‐L., Shekhar, M. & Harmon, R., 2013. On the Concept of Metadata Exchange in Cloud Services. Service Technology Magazine, (LXXI). Available at: http://servicetechmag.com/I71/0413‐3 [Accessed August 20, 2013]. ICC Belgium, 2013. Belgian Cyber Security Guide: Protect Your Information. Available at: http://www.iccbelgium.be/index.php/activities/becybersecure. Kabay, M.E., 2013. The Parkerian Head. Available at: http://www.mekabay.com/overviews/index.htm [Accessed May 8, 2014]. Laws.com, What You Need to Know About Electronic Forgery. Available at: http://identity‐theft.laws.com/electronic‐ forgery. MacNeil, H. (2000). Trusting records: legal, historical, and diplomatic perspectives. Dordrecht: Kluwer Academic. National Institute of Standards and Technology, 2013. Glossary of Key Information Security Terms, Gaithersburg, MD: NIST. Available at: http://nvlpubs.nist.gov/nistpubs/ir/2013/NIST.IR.7298r2.pdf. Parker, D.B., 1998. Chapter 10 ‐ A New Framework for Information Security. In Fighting Computer Crime: A New Framework for Protecting Information. John Wiley & Sons. Available at: http://common.books24x7.com.ezproxy.library.ubc.ca/toc.aspx?bookid=4856 [Accessed May 8, 2014]. Rogers, C., 2011. Trust Me! I’m a Digital Record: Findings from the Digital Records Forensics Project. Smit, M. et al., 2012. A Web Service for Cloud Metadata, York University. Available at: http://www.techrepublic.com/resource‐library/whitepapers/a‐web‐service‐for‐cloud‐metadata/post/ [Accessed August 20, 2013]. Statistics Canada & Standards Division, 2012. North American industry classification system (NAICS) Canada, Ottawa, ON: Statistics Canada.
108
Secure Cloud Based Biometric Signatures Utilizing Smart Devices Bobby Tait University of South Africa, Gauteng, South Africa
[email protected] Abstract: On 20 September 2013 Apple released their latest iPhone device, named the iPhone 5s which incorporates a fingerprint based biometric scanner. The inclusion of a biometric scanner was met with a host of criticism from the security and privacy community. It was soon demonstrated that the biometric reader on the new iPhone is just as vulnerable to spoofing attacks as devised by researchers such as Matsumoto et al. Nearly seven months later, Samsung released the Galaxy 5S, which also incorporates a single inline biometric fingerprint scanner. Apart from gaining access to the device, the Samsung device makes it possible for the user to make payments from a linked PayPal account. As was the case with the iPhone 5s, the Samsung galaxy 5S biometric security was subverted a few weeks after its release, by SRLabs using a faux fingerprint. It is widely accepted that making use of biometrics for effective security during the identification and authentication process is not recommended. People leave latent biometric prints of their fingerprints on everything they touch. Biometric technology is vexed with this problem – a biometric characteristic is not essentially covert, as people deposit their biometric characteristics in various ways in the environment they interact with. In research cases as demonstrated using the iPhone 5s, Samsung Galaxy S5 or many biometric scanners, a fake biometric characteristic can be manufactured from latent biometric prints, to fool the biometric security of the system. Indeed, the ability of biometric technology to directly authenticate an individual is highly desired, and convenient, resulting in many companies investigating and incorporating this technology. If a biometric characteristic is presented, with irrefutable confirmation that the biometric characteristic presented has not been spoofed or tampered with in any way, the authentication environment can be convinced that the person himself, is directly authenticated. In the case with a password or token, however, only the presented password or token is authenticated, and not the individual presenting the password or token. Realising the inherent shortcomings but also the opportunities of biometrics, research has been conducted in this field. A Cloud based biometric security protocol for smart devices was developed, corroborating that it is possible to authenticate a person indisputably using cloud technology, biometrics and a smart device such as the iPhone or Galaxy. This paper proposes an approach to allow a person to use a smart device such as the iPhone 5s, or Galaxy S5 for secure biometric authentication over a networked environment. It is illustrated that a smart device can be considered as a “smart token”, to address the security concerns associated with biometric technology. The secondary focus of this paper is to prove that a cloud based biometric security protocol can be used for secure biometrically based digital signatures. Keywords: Biometrics, cloud security, authentication, hacking, biometric signatures, biometric protocol
1. Introduction The introduction of biometric technology into mainstream smart devices such as the iPhone 5s and Samsung Galaxy S5 is a clear indication that is a strong desire exists to leverage the benefits associated with biometric authentication. The release of biometric technology incorporated in the iPhone 5s immediately attracted a host of criticism from the security community and privacy activists. BBC reported on an open letter written to Tim Cook, the CEO of Apple, warning him of the issues associated with biometric technology (Farber, 2013). The issues raised are not new to the biometric research community, and are often cited as the main reason for biometrics not finding its way into the mainstream identification and authentication sphere (Tait, 2014; Woodward and Orleans 2004). The iPhone 5s and the Samsung Galaxy S5 biometric authentication mechanisms were successfully subverted within a few weeks after the release of these devices. A group from Germany demonstrated that the latest iPhone’s biometric technology can indeed be subverted using a similar approach as outlined by Matsumoto et al. (2002). The proof of this subversion was posted on YouTube (iPhone 5’s Fingerprint Scanner is already hacked) a few days after a challenge was published (Finckle, 2013) to subvert the iPhone’s biometric system (Kovach, 2013, Woollaston, 2013). On 11 April 2014 the Samsung Galaxy 5S was released, and five days later the biometric sensor of the device was successfully hacked by Germany based SRlabs using a faux fingerprint (Kumparak, 2014). Samsung and Apple are among the major role players in the smart device arena. Both companies included biometric technology in their latest releases. In both instances, the biometric scanners were hacked within days after their release using a faux finger created from a latent biometric characteristic.
109
Bobby Tait
Furthermore, the Galaxy S5 smart device allows a user to transact using the biometric technology incorporated in the phone by accessing a person’s PayPal account (Paypal, 2014). Taking into account the relative ease of subverting the biometric technology of this device, the need for a secure biometric protocol is adamant. The following section briefly introduces the background information relating to the research conducted for this paper, followed by section 3 which elaborates on the fundamental operation of the secure cloud based biometric protocol. Section 4 presents the mechanism to allow secure biometric signatures which are based on applying the secure cloud based biometric protocol. The paper is concluded in section 5.
2. Background If a password or token is used for authentication, only the password or token is authenticated, with no conclusive evidence that the individual presenting the password or token is the authentic, authorised user of the authentication mechanism presented (Tait, 2012). If, on the other hand, a biometric characteristic is presented, with affirmation that the biometric characteristic presented has not been spoofed in any way, the authentication environment can accept with conviction the authentic person himself, is personally authenticated (Tait, 2012). The success of any current authentication system relies on the fact that the mechanism used to prove authenticity should be totally secret or absolutely unique (Summers, Bosworth, 2004). Therefore, if a password is absolutely secret, the particular authentication mechanism is regarded with a high level of confidence. If a token such as a credit card is totally unique and kept secure and safe, one can assume that the authentication of the owner is trustworthy. Compared to passwords and tokens, biometric characteristics, though unique, are not secret at all (Uludag et al., 2004). Faces (used in facial authentication), are not hidden or unacquainted. People interact with the environment, during which, latent biometric characteristics such as hair, saliva, skin and fingerprints etcetera, are left behind (Tait, 2014). It is generally accepted therefore that biometric technology has inherent shortcomings which make it unsuitable for credible identification and authentication (Uludag et al, 2004). In essence, whenever biometric characteristics are used to authenticate a person, the system relies on a mechanism which is not totally secure. Biometric characteristics can be stolen in electronic format as biometric data, or manufactured from latent biometric characteristics, and then used for perfidious authentication or identity theft (Tait, 2014) If the hacker is in possession of a manufactured fake biometric characteristic such as a latex fingerprint, the hacker has the ability to masquerade as the authentic user, and be accepted as such. Biometric technology compared to typical authentication is presented with the following trade‐offs:
Authentication mechanism / person relationship: Passwords and tokens do not authenticate the person personally, with biometric authentication the person is directly linked to the authentication.
Authentication mechanism secrecy: Passwords and tokens are kept secret, yet biometric mechanisms are not concealed at all due to the person’s interactions with the environment.
Duplication and replay: Passwords and biometric data can be replayed. Tokens and biometric characteristics can be duplicated.
In view of the ability found in biometric characteristics to directly link a person to the authentication step biometric characteristics are an attractive possibility for credible digital signatures. To date, important documents are using a written characteristic – the autograph. Current electronic signatures rely on the fact that a secret password is used to generate a short message authentication code (MAC) to sign the document (Dodis et al, 2012). Unfortunately there is no secure way to ensure that the person that generated the MAC is indeed the person authorised to sign the document. Other than the safe keeping of the password, there is no link between the person and the password used to generate the MAC. The following section briefly illustrates a protocol to allow biometric characteristics to be used via an unsecure networked environment, using cloud based biometric authentication.
110
Bobby Tait
3.
Secure Cloud‐Based Biometric Authentication Utilising Smart Devices for Electronic Transactions.
During the research of biometric technology, it became clear that biometric data is asymmetrical due to the fact that the biometric sensor does not collect all biometric markers in exactly the same way every time a biometric characteristic is scanned (Tait, 2012). This aspect of biometric data can be used to identify every biometric characteristic presented for authentication. Section 3.1 briefly discusses biometric symmetry which is fundamental to the working of the cloud‐based biometric protocol.
3.1 Biometric Asymmetry The asymmetric nature of biometric data affords a unique benefit: All biometric data received from a user’s biometric characteristic will almost always be unique. It is highly unlikely that, considering all the variables associated with the capturing and digitising of a biometric characteristic, a 100% match will be found with any previously offered biometric data (von Solms, Tait, 2009). The fact that biometric data is uniquely identifiable is the first step towards a cloud‐based biometric authentication protocol to prevent the possibility of replay of biometric data. Each instance of accepted biometric data can be linked to a given transaction performed by the user. In order to ensure that biometric data is not being replayed, and to link offered biometric data from the user to a specific transaction, a special biometric transaction log file must be used in the cloud environment. This log file is referred to as a cloud bio archive (CBA). The cloud‐based biometric authentication system can detect any replay attempt, and log transactions by using a CBA. A second problem as mentioned in section 2, relates to the possibility of sourcing a latent biometric image of a person’s biometric characteristic. Research demonstrated that it is possible to collect fingerprints from a user’s environment such as glass, manufacturing a paper‐thin latex overlay to be used in a biometric scanner spoof (Matsumoto et al., 2002; Woollaston, 2013).
3.2 Cloud‐Based Biometric Authentication Protocol If biometric technology is to be used for secure authentication, a protocol should be used to ensure that problems as outlined earlier in this paper can be concisely managed. Though attacks on the biometric system cannot be eliminated, the proposed protocol ensures that attacks on the system can be mitigated. To ensure that a hacker cannot use a fake biometric characteristic, a user side biometric archive (UBA) is introduced on a smart device such as the iPhone 5s or Galaxy S5. This is not a typical token such as a Smart Card, but rather an intelligent token, due to the fact that this type of token incorporates a processor, full operating system, and additional security layers. The archive stored in the smart token contains finite (for example, 500) previously offered biometric data samples. The UBA is populated by the cloud‐based authentication server, and gets updated under trusted situations. The protocol is outlined in Figure 1. In step 1, the user needs to conduct a transaction requiring authentication, and supplies a fresh biometric characteristic to the smart device’s biometric scanner. In step 2, the smart device digitises the biometric characteristic resulting in fresh biometric data. During previous communication with the cloud, the authentication system sent a challenge to the smart device. This process is outlined in detail in previous paper presented (Tait, 2014). For this discussion, it is stated that the authentication server requested biometric data stored in position 58 of the user bio archive (UBA).
111
Bobby Tait
Figure 1: Cloud‐based biometric authentication protocol In step 3, the smart device fetches the biometric data stored in position 58 in the UBA. In step 4, the smart device generates a biometric parcel (bio‐parcel); The fresh biometric data is XOR’ed with the historic biometric data from the UBA, resulting in a ‘XOR bio‐parcel’. In step 5, the XOR bio‐parcel generated in step 4, is sent to the cloud for authentication. In step 6, the cloud‐based server fetches the historic data in the CBA corresponding with the biometric data challenged from the user’s UBA. During step 7, the XOR bio parcel is unlocked in the cloud by XOR of the historic bio‐data with the XOR bio parcel received from the user, yielding the fresh biometric data of the user. In step 8, the fresh biometric data received from the user, is matched with the reference biometric template of this user on file. If the match is satisfactory, the user is considered authentic. The fresh biometric data is added to the CBA for this user’s profile. At this stage the user is successfully authenticated. The result of the authentication can at this stage be conveyed using various existing protocols to the user, or institution requesting authentication.
3.3 Protocol Evaluation For the protocol to function, a user must be able to supply fresh biometric data, and be in possession of a smart token – a smart device, such as an iPhone 5S or Galaxy S5, which can supply historic biometric data from the UBA. Should the smart device, be stolen, an instruction can be sent to the device, to clear all UBA content. The intelligent token ensures that the UBA can only be used by the authentic user. The following section illustrates in what way the developed protocol can be used for secure cloud‐based digital signatures.
4. Secure Cloud Based Biometric Signatures Utilizing Smart Devices. To ensure the integrity of a document, a MAC is generated for the document. The creation of a MAC is a one‐ directional operation, and the original message cannot be generated from the MAC. If the integrity of the document must be confirmed, the original message and a secret key is used to re‐generate the MAC, and if the resulting new MAC is an exact match with the originally generated MAC, the integrity of the document is confirmed. The message authentication algorithm is often used to generate a MAC and is based on the data encryption standard (DES), which is an encryption algorithm that was adapted to yield a 64bit MAC for a given symmetric secret key and clear text combination (Pfleeger, Pfleeger, 2006). In a public key infrastructure (PKI)
112
Bobby Tait
environment the key used to generate the MAC and the key used to test the MAC is asymmetric (Pfleeger, Pfleeger, 2006). As it is highly improbable that a person would be able to provide biometric data identical to previously provided biometric data, it therefore renders biometrics unfit to be used as the secret key for creating a MAC without utilizing the protocol prescribed in the paper. The biometric based protocol proposed in this paper for digital signatures, requires a similar process to the process followed by the message authentication algorithm. However, this algorithm relies on the symmetry of the key used to test the MAC for a given message. For this reason the CBA and the UBA plays a vital role in the cloud based biometric signature protocol. A prototype has been developed as proof of concept. For clarity the process is described quite verbose, however, from the user’s perspective, the process is totally transparent, as the whole process is automated on the smart device and the cloud based authentication server.
4.1 Using A Biometric Characteristic to Generate A MAC In this section, the Cloud based biometric authentication protocol (CBBAP) is used to allow user 1 –let’s say John to sign a message destined for user 2 – let’s say Sam, using his biometric characteristic and smart device. This method relies on the fact that both John and Sam are part of the CBBAP environment – as EBay relies that buyers and sellers are both part of the PayPal environment. To successfully generate a MAC using biometric characteristics, the following 6 phased protocol is proposed:
Phase 1: John creates a message and using his smart device signs this message, using his biometric characteristic. The smart device generates a two message bundles, one for the cloud based server containing a XOR bio‐parcel and MAC, and one for Sam, containing the message and MAC.
Phase 2: The authentication server receives the message bundle. The server handles the bio‐parcel according to the rules stipulated by the CBBAP.
Phase 3: Sam reads the message from John, and in order to test the MAC received from John, her smart device generates a new message bundle destined for the cloud server containing a CBBAP bio‐parcel and the MAC received from John.
Phase 4: The cloud based server confirms Sam’s authenticity using the bio‐parcel received from Sam’s smart device.
Phase 5: The cloud based server generates a new bio‐parcel destined for Sam’s smart device. The bio‐ parcel includes the biometric data John used to generate the MAC of his message for Sam.
Phase 6: Sam’s device extracts John’s biometric data from the bio‐parcel received from the cloud based server and uses this biometric data to generate a MAC for the message she received from John.
4.2 Phase 1: Message and MAC Generation. John intends to send a message to Sam. This message does not necessarily contain any sensitive information. However, it is important that the authenticity of the message can be confirmed beyond any doubt, and that the integrity of the message can be tested. Therefore John signs the message using his biometric characteristic. Figure 2 illustrates the first phase in this process. Step 1 John provides a fresh biometric characteristic to the smart device. The smart device digitizes the biometric characteristic resulting in biometric data. In Step 2 the freshly digitized biometric data is used as the secret key for the MAC algorithm to generate a unique MAC for the message John created. As per the CBBAP mechanism the smart device will have a pointer set in the UBA, In Step 3 this pointer is set to position 21 in John’s UBA. The smart device will thus automatically obtain the 21st biometric data from John’s UBA. Step 4, the smart device takes the biometric data of the fresh biometric characteristic (that was also used as the secret key in the hashing algorithm) and XOR’s the fresh biometric data with the 21st biometric data obtained during step 3, resulting in the XOR bio‐parcel. Step 5, the MAC generated in step 2 is then concatenated with the bio‐parcel, resulting in a message bundle. This bundle is addressed to the cloud based authentication
113
Bobby Tait
server. Step 6, a second message bundle is created by the smart device; this message bundle consists of the MAC generated in step 2 and the message for Sam. This bundle is addressed to Sam. Step 7, the message bundles are then sent from the smart device via the network to the cloud based authentication server and to Sam respectively. Figure 2: Generate message and MAC. If these messages are sniffed during transmission, the hacker would be in possession of a XOR bio‐parcel that cannot be used, a MAC, that he cannot re‐create, and a clear text message that he can read. If the hacker alters the message, the subsequent testing of the MAC will fail.
4.3 Phase 2: Cloud Based Authentication Server During the second phase, as illustrated in figure 3, the cloud based authentication server receives the message bundle John sent. Figure 3: Generate message and MAC.
114
Bobby Tait
Step 1, the server receives the message bundle from John. The message bundle includes: A bio‐parcel and, a MAC. Step 2, during previous communication with John’s smart device, the server sent a challenge to supply the 21st biometric data. The server extracts the bio‐parcel from the message bundle and XOR this bio‐ parcel with the corresponding biometric data from John’s CBA. This step yields the fresh biometric data from John. The server tests the fresh biometric data from John for replay and authenticity as prescribed by the rules of CBBAP. Step 3, if the server is content with the fresh biometric data, this biometric data is added to the CBA of John. Step 4, the server obtains the MAC received in the message bundle from John and associates this MAC with the fresh biometric data received in John’s CBA. The server is now in possession of the MAC and the biometric data which was used as the secret key to generate that MAC.
4.4 Phase 3: Request of John’s Biometric Data from Cloud Based Authentication Server. Sam’s smart device received a message bundle from John. This message bundle contained: The message, that Sam can read immediately, and a MAC to confirm the integrity of the message sent by the authentic John. Figure 4: Generate message and MAC. In this phase, as illustrated in figure 4, Sam’s smart device generates a message bundle destined for the cloud based authentication server to request the biometric data John used to generate the MAC for this message. Step 1, the smart device receives the message bundle from John, consisting of the clear text message and the MAC of this message. Sam can read the message, but to test the MAC the smart device proceeds to the second step in this phase. Step 2, Sam is requested by the smart device to provide a fresh biometric characteristic to the smart device’s biometric scanner. The smart device digitizes the biometric characteristic. Step 3, during a previous encounter with the authentication server, the server sent a challenge to the smart device. In this example, this request pointed to the 38th biometric data in Sam’s UBA. Step 4, the smart device takes the fresh biometric data, and XOR’s this with the electronic data of the 38th biometric data obtained in step 3, resulting in the XOR bio‐parcel. Step 5, the MAC received from John in step 1, is concatenated with this new bio‐parcel, resulting in a message bundle. This message bundle is addressed to the cloud based authentication server, and sent to the cloud based authentication server by Sam’s smart device in step 6.
4.5 Phase 4: Cloud Based Authentication Server Confirms Sam’s Authenticity. Step 1, as illustrated in figure 5, the server receives the message bundle from Sam’s smart device. This message bundle includes a bio‐parcel and the MAC John sent. Step 2, in previous communication with Sam’s smart device, the server sent a challenge to supply the 38th biometric data in Sam’s UBA. The server obtains the biometric data from Sam’s CBA that corresponds with the 38th biometric data in Sam’s UBA on the smart device. Step 3, the server extracts the bio‐parcel from the message bundle and XOR’s this bio‐ parcel with the
115
Bobby Tait
38th biometric data from Sam’s UBA. This step yields the fresh biometric data Sam supplied. The server tests this fresh biometric data for replay and authenticity. Step 4, this biometric data is then added to the CBA of Sam. Figure 5: Confirm Sam’s identity. The server obtains the MAC received in the message bundle. Once the server has confirmed that the current communication is with the authentic Sam, the server proceeds to the next phase to supply Sam’s smart device with the biometric data that John used to generate the MAC.
4.6 Phase 5: Supply of Johan’s Biometric Data to Sam The second part of the message bundle consisted of the MAC that John used to generate the message that was sent to Sam. In this phase as illustrated in figure 6, the cloud authentication server checks to see if this MAC exist in John’s CBA, and if it does the authentication server will generate a new bio‐parcel destined for Sam’s smart device. Figure 6: Supply of Biometric Data
116
Bobby Tait
Step 1, the cloud authentication server searches John’s CBA for a match of the MAC received from Sam’s smart device. The server determines that the MAC received in the message bundle, matches the MAC associated with the 51st biometric data in John’s CBA. Step 2, the cloud authentication server uses the fresh biometric data Sam’s smart device supplied in phase 4, and XOR’s this fresh biometric data with the 51st biometric data found in John’s CBA. This step will thus result in a new bio‐parcel, destined for Sam’s smart device. Step 3, the authentication server submits the new bio‐parcel back to Sam’s smart device. In the final phase Sam’s smart device extracts the biometric data that John used to generate the MAC of the message she received earlier. Her smart device then uses this biometric data to test the MAC received from John along with the message earlier.
4.7 Phase 6: Test Of MAC Using Biometric Data.
Figure 7: Test MAC using John’s biometric data. Step 1, Sam’s smart device receives the bio‐parcel sent by the cloud authentication server. Step 2, Sam’s smart device XOR’s the fresh biometric data generated in the 3rd phase, with the bio‐parcel received from the cloud server. This will yield the biometric data that John used to generate the MAC of the message sent in phase 1. Step 3, Sam’s smart device uses the biometric data extracted from the bio‐parcel, as the secret key for the MAC algorithm, to generate a new MAC for John’s message. Step 4, the smart device compares the MAC received in the message bundle from John in phase 3, with the MAC generated in step 3. Step 5, as the message was indeed generated with John’s biometric data, and the message was not tampered with, the testing of the MAC is a success, proving that the message from John is authentic and has not been altered since John sent the message. At this stage Sam can be satisfied that the message is indeed from the authentic John, as his biometric data, which is directly related to him, generated the same MAC for the message. This also proves that the message was not tampered with, and that the integrity of the message is above suspicion.
5. Conclusion This paper demonstrated successfully that the CBBAP can be used to facilitate the signing of documents in order to insure the integrity and authenticity of the document. This is extremely beneficial as the biometric
117
Bobby Tait
data is directly linked to the signing party and for this reason allows non‐repudiation to be enforced successfully. It is clear that there are once again certain similarities to the PKI environment; however, when using the CBBAP, a user does not need to protect his “private key” as the private key is actually his biometric characteristic, and can subsequently be linked to the biometric key used to sign a document. If a person at any stage feels that his or her identity is compromised in one or other way, this person can remove all existing biometric data by cleaning all UBA and CBA data, and removing the reference biometric template. A user can start with a clean slate, re‐creating the UBA and CBA, with a fresh reference biometric template. The old “identity” is archived. As long as a person is part of the CBBAP environment, the person’s biometric characteristics are safe, and such a person does not need to concern himself with the possibility that latent biometric data might be fraudulently used.
References. Dodis, Y., Kiltz, E., Pietrzak, K., & Wichs, D. (2012). Message authentication, revisited. In Advances in Cryptology– EUROCRYPT 2012 (pp. 355‐374). Springer Berlin Heidelberg. Farber, D. (2013) ‘Sen. Franken questions privacy of iPhone 5s fingerprint scanner’, September [online] http://news.cnet.com/8301‐13579_3‐57603947‐37/ sen‐franken‐questions‐privacy‐of‐iphone‐5s‐fingerprint‐scanner (accessed 14 May 2014). Fickle, J. (2013) ‘Hackers offered cash, to crack iPhone fingerprint security’, 19 September [online] http://www.reuters.com/article/2013/09/19/ us‐iphone‐hackers‐idUSBRE98I10I20130919 (accessed 7 October 2013). Greg Kumparak (2014) ‘Samsung’s Galaxy S5 Can Be Tricked By The Same Lifted Fingerprint Hack As The iPhone 5S’ [online] http://techcrunch.com/2014/04/15/samsung‐galaxy‐s5‐fingerprint‐sensor/ (Accessed 16 May 2014) iPhone 5’s Fingerprint Scanner is Already Hacked [online] http://www.youtube.com/ watch?v=‐PloUfgIlDk (accessed 14 May 2014). Kovach, S. (2013) ‘The iPhone 5s fingerprint scanner has been hacked’, 22 September [online] http://www.businessinsider.com/iphone‐5s‐fingerprint‐scanner‐hacked‐2013‐9 (accessed 7 October 2013). Matsumoto, T., Matsumoto, H., Yamada, K. and Hoshino, S. (2002) ‘Impact of artificial gummy fingers on fingerprint systems’, Proceedings of SPIE, Vol. 4677, Optical security and counterfeit deterrence techniques IV. Paypal 2014, ‘Pay faster with your fingerprint’ [online] https://www.paypal‐pages.com/samsunggalaxys5/ us/ index.html (accessed 16 May 2014) Pfleeger C.P, Pfleeger S.L. ‘Security in Computing’, Prentice Hall; 4th edition (October 23, 2006), ISBN‐13: 978‐0132390774 Summers W C, Bosworth E (2004) ‘Password policy: the good, the bad, and the ugly’ WISICT '04 Proceedings of the winter international symposium on Information and communication technologies pp 1‐6 Tait, B. L. (2012). Applied Fletcher–Munson curve algorithm for improved voice recognition. International Journal of Electronic Security and Digital Forensics, 4(2), 178‐186. Tait, B. L. (2014). Secure cloud–based biometric authentication utilizing smart devices for electronic transactions. International Journal of Electronic Security and Digital Forensics, 6(1), 52‐61. Uludag, U., Pankanti, S., Prabhakar, S. and Jain, A.K. (2004) ‘Biometric cryptosystems: issues and challenges’, Proceedings of the IEEE, Vol. 92, No. 6. von Solms, B., & Tait, B. (2005). Biovault: Solving the Problem of Replay in Biometrics. In Challenges of Expanding Internet: E‐Commerce, E‐Business, and E‐Government (pp. 465‐479). Springer US. Woodward, J.D. Jr. and Orleans, N.M. (2004) Biometrics: Identity Assurance in the Information Age, ISBN 0‐07‐222227. Woollaston, V. (2013) ‘Apple’s iPhone 5S fingerprint scanner hacked after just two days’, 23 September [online] http://www.dailymail.co.uk/sciencetech/article‐2429814/Apples‐iPhone‐5S‐fingerprint‐scanner‐hacked‐just‐TWO‐ days.html (Accessed 14 May 2014)
118
PhD Research Papers
119
120
Digital Photographs in Social Media Platforms: Preliminary Findings. Jessica Bushey University of British Columbia, Vancouver, Canada
[email protected]
Abstract: This paper discusses the early findings of a web‐based survey on digital photographs accessed and stored on photo‐sharing and social networking platforms. The study is part of a larger doctoral research project in the School of Library, Archival and Information Studies at the University of British Columbia. The working title of that project is “The Trustworthiness of Digital Photographs Accessed and Stored in Social Media Platforms.” This article presents a brief overview of the literature on the topic, drawn from different perspectives in the domains of the law and journalism, filtered through the lens of archival science. A number of key issues have been identified in the results of the web‐based survey and are presented in this paper as a catalyst for more in‐depth discussions throughout the study. The purpose of this research is to gain an understanding of the creation, management and preservation of digital photographs in social media platforms, with a focus on the factors that support or hinder the reliability, accuracy and authenticity of digital photographs and collections. The target audience is archivists and information professionals that are considering methods for acquiring and managing user‐generated content that may reside in online platforms that utilize cloud computing services. Keywords: archives, digital photographs, social media websites, trustworthiness, preservation, contracts, terms of service, case law9
1. Introduction The massive digitization of image‐based media and the convergence of cameras into mobile devices with Internet connectivity are transforming twenty‐first century communications from text‐based to image‐driven. Individuals and organizations are documenting their personal and business activities with digital photographs and short videos captured on smartphones and shared instantaneously via Twitter and Instagram. The proliferation of photo‐sharing (e.g., Flickr) and social networking platforms (e.g., Facebook, Twitter) and the overwhelming usage statistics of these services demonstrate the global practice of real‐time, image‐driven communications. Archivists and information professionals are recognizing that, as more activities are conducted online, people are experiencing and building “digital lives” (John et al, 2010). The areas of personal information management (PIM) and personal digital archives (PDA) are receiving scholarly attention as lives become increasingly mediated through digital technologies (Bass, 2013). At this stage it is still not clear how contemporary image‐making and keeping practices will impact the roles and responsibilities of information professionals and the activities of digital stewardship conducted by archives, libraries and museums. The cloud‐based services offered by photo‐sharing and social networking sites include file hosting, content sharing and storage via Software‐as‐a‐Service (SaaS). Individuals and organizations use them but do not control the operating system, hardware or network infrastructure that it is running (Mell and Grance, 2011). The adoption of photo‐sharing and social networking platforms introduces new tools and methods for managing, accessing and preserving digital photograph collections that need to be considered in relation to existing practices for image‐making and keeping. Additional concerns regarding the jurisdiction in which the data reside, legal ownership and control, and privacy related to shared content and membership accounts should be explored from an archival perspective. The survey on digital photographs in social media platforms is part of a larger research project aimed at developing an understanding of the access, creation, management and storage of digital photographs in social media platforms. The research focuses on the factors that support or hinder the trustworthiness (i.e., reliability, accuracy and authenticity) of digital photographs and photographs collections and will establish how record‐making and recordkeeping activities affect the value of digital photographs and collections in photo‐ sharing and social networking sites as potential records to be acquired and preserved by archival institutions. The primary target audience for this research is the archival and digital heritage community. The secondary target audience for this research is creators of digital photographs that use social media platforms for access, management and storage of their digital collections.
121
Jessica Bushey
2. Study Significance and Research Questions The study is significant for several reasons. First, the study shows the extent to which the convergence of cameras into mobile devices with Internet connectivity has changed practices of digital photograph creation, use, management and storage, and the impact these changes have on the trustworthiness of digital photographs and photographs collections. Second, the study reveals social media members’ expectations for on‐going access to and long‐term storage of their digital photograph collections held within social media platforms. Third, the study provides recommendations and strategies to archivists on how they can effectively assess their ability to acquire and preserve individual collections that have been managed and stored in photo‐ sharing and social networking platforms. The study is guided by the overarching research question: How do we ensure the trustworthiness of digital photographs held within social media platforms? Following the literature review, the study has four data gathering phases: web‐based survey, review of Canadian and United States case law dealing with digital photographs in social media sites, review of provider Terms of Use (ToU), and semi‐structured interviews. This paper will discuss the literature review and the preliminary findings of the survey on digital photographs in social media platforms. The first phase is guided by the following primary research questions:
What is the primary purpose for which persons use photo‐sharing and social networking sites in regards to digital photographs?
Are persons that use photo‐sharing and social networking services to store their digital photographs concerned with trustworthiness (i.e., authenticity, reliability and accuracy)?
Are persons aware of the risks posed by social media platforms to the trustworthiness of their digital photographs collections?
Are persons that use photo‐sharing and social networking sites aware of the challenges presented by social media platforms to continuing access and long‐term preservation of digital photographs collections/fonds?
3. Literature Review Camera phone photography and photo‐sharing and social networking platforms are a relatively recent phenomenon, made possible by the convergent and emerging technologies that support the immediate exchange of digital information. Along with the technological advancements are a number of cultural changes that encourage new modes of visual communication and digital documentation. This is an emerging area of research that is drawing the interest of scholars from a number of different fields, including journalism and the law. By taking knowledge about digital photography from other disciplines and bringing it to bear upon archival science, the study adopts an interdisciplinary approach and explores the phenomenon from different perspectives filtered through the lens of archival science and digital diplomatics. A major theme in the journalism literature is the trustworthiness of news photographs in the digital era, due to the rise of citizen journalism (i.e., amateur photographers and the general public that contribute digital photographs and video to mainstream media) and the use of social media platforms for sharing and distributing newsworthy content (Reading, 2009; Singer et al, 2011; Gürsel, 2012). The economics of the digital era are driving many of the decisions regarding who creates news photographs. The results of immediate transmission and global distribution are increased competition between professional photojournalists and amateur “eye witnesses” using camera phones, shorter turn‐around times and, in the case of major news events, simultaneous reporting on the event while it is occurring. These circumstances present the opportunity for digital manipulation, transmission errors and incomplete captioning, which lead to inaccurate and possibly falsified digital photographs being published. In an effort to support the transmission of reliable, accurate and authentic digital photographs, the International Press Telecommunications Council (IPTC) updated its existing standards for exchanging news content, adding a schema for image metadata to describe and manage digital photographs. The metadata schema, IPTC Core & Extension, is recognized by the imaging industry as a necessary component of digital photographic practice and has been integrated into the majority of image management software; yet, a recent study conducted by IPTC reveals that some social media platforms remove this pertinent image metadata from digital photographs during upload into the application and/or download out of the application (IPTC, 2013). The stripping away of metadata from image files presents a real
122
Jessica Bushey
threat to the ability to verify or prove the trustworthiness of digital photographs accessed and stored in social media platforms. A major theme in the North American legal literature is the inadequacy of existing rules for identifying and authenticating evidence when dealing with digital photographs and/or digital photographs accessed in social networking platforms (Wiebe, 2000; Brown, 2012; Mehlman, 2012). Concerns are being raised by litigators and legal scholars regarding the ease of digital manipulation and the potential for falsified photographs to be admitted into evidence at trial (Witkowski, 2002; Grimm 2007). A review of case law in both Canada and the United States reveals that traditional approaches to authentication of photographic evidence, which are based on the principle that a photograph is a graphic portrayal of oral testimony, are still relied upon in trial courts to determine the admissibility of digital evidence. The testimony of the witness will determine if the photograph is a correct and accurate representation of the relevant facts, see Almond v. State, 553 S.E.2d 803, 805 (Ga. 2001). More recently, issues surrounding reasonable expectations of privacy are being raised in the context of e‐discovery of user generated content on social networking, along with methods of authentication for digital photographs posted to profile pages, see People v. Beckley, 185 Cal. App.4th 509 (110 Cal. Rptr. 3d 362). Another topic of importance in the legal literature is the issue of Terms of Service contracts between social media providers and customers. The American Society of Media Professional (AMSP) is working with legal experts to analyse ToS offered by major social media and photo‐sharing sites (ASMP 2014; ASMP and PhotoShelter, 2013; Krogh, 2013; Buntrock and Madden, 2013). Their publications highlight the importance of Terms of Service (also referred to as Terms of Use) for establishing roles and responsibilities for controlling user‐generated content and membership account information. Key concerns are that the language used by the provider makes it difficult to determine where the hosting company’s rights end, what the user is allowed to do with another’s content, and if the hosting company is obligated to notify users of changes to the ToS (ASMP 2014; Bradshaw et al, 2011). A major theme in the archival science literature is the management and preservation of digital records, including digital photographs. A number of studies addressing digital records has been conducted, such as the International Preservation of Authentic Records in Electronic Systems (InterPARES) 1, 2, 3 Projects (1999‐ 2012); Personal Archives Accessible in Digital Media (PARADIGM) Project (2005‐2007); and Digital Lives >> Personal Digital Archives for the 21st Century Project (2007‐2010). In general, these studies focus on digital records created, managed and preserved offline or in networks that rely on servers that are under the control of the organization or a third‐party. At the time these studies were conducted, adoption of cloud computing services and infrastructure was not widespread and the studies do not specifically address issues introduced by data accessed and stored in public clouds or social networking platforms. The Digital Lives Project highlights the potential for public and commercial repositories to work with social media platforms in an effort to offer services that support the long‐term preservation of authentic records (John et al. 2010, xiii). One of the recommendations made in the final report of the Digital Lives Project is for future evaluation of the sustainability and security of online services. Under the direction of Luciana Duranti, the InterPARES Projects developped an intellectual framework and methodology based on archival science and diplomatics for understanding the long‐term preservation of authentic records created and/or maintained in digital form; and provided the basis for standards, policies, and strategies for ensuring the longevity of digital records across systems and over time (Duranti and MacNeil, 1996; Duranti and Thibodeau, 2006; Duranti and Preston, 2008). As part of InterPARES 2, the “Survey of the Recordkeeping Practices of Photographers using Digital Technology” (Bushey and Braun, 2006) explored the practical aspects of how photographers create and manage their digital photographs as reliable records, and preserve their authenticity over the long term. The findings of this study revealed the importance of the type of storage media and the persistence of image metadata for ensuring the authenticity of digital photographs (Bushey, 2008, 128). This study highlighted how photographic practice is shaped by individual habits, social interaction, the technological environment, and legal requirements.
4. Methodology The research for this study draws on knowledge from archival science and diplomatics to analyse and determine the trustworthiness of digital photographs in social media platforms. A digital photograph is reliable when it can stand for what it is about: reliability is established by examining the completeness of the digital
123
Jessica Bushey
photograph's form and the amount of control exercised on the process of its creation. A digital photograph is accurate when its content is precise, correct, free of error or distortion: accuracy is based on the competence of the photographer and the controls on the capture and transmission of the image content. A digital photograph is authentic when it is what it purports to be and has not been tampered with or corrupted: authenticity is assessed on the basis of the photograph identity (i.e., the whole of the attributes of the digital photograph that uniquely distinguish it from other photographs) and integrity (i.e., the degree to which a digital photograph is capable of conveying its message) as well as the reliability of the system that contains it. This latter point is important in the context of photo‐sharing and social networking services. The research began with a literature review, during which key issues impacting the accuracy, reliability and authenticity of digital photographs were identified. Prior empirical studies of digital photography conducted by archival scientists are sufficient for specific scenarios involving offline storage, but studies exploring the context of cloud services and the shared environment of social media are limited. Therefore, to gain a comprehensive view of contemporary photographic practices, a survey was conducted. The questionnaire is composed of thirty‐two questions aiming to elicit basic information related to the views and activities of current users of photo‐sharing and social media platforms regarding access, management and storing of digital photographs and collections. The areas covered by the questions include general demographics of social media users, the variety of accounts and companies people frequently use, the key activities involving digital photography and social media platforms, issues encountered in using social media services, user’s expectations of social media services and their knowledge of Terms of Service etc. Two questions were included to offer respondents the opportunity to receive a summary of the survey results and volunteer to participate in a follow‐up interview. Both closed‐ and open‐ended questions were designed to invite respondents to share their experiences.
5. Early Findings and Discussion In total, there were 565 respondents with a completion rate of 85%. Respondents are distributed among 20 countries with the majority from North America and the United Kingdom. These responses will serve as the basis of this report. The majority of respondents are under fifty‐years of age, with a large portion (36%) between the ages of 19 and 29 years old. The top five photo‐sharing and social networking sites used by respondents are: Facebook, Twitter, Dropbox, Instagram, and Flickr (listed in order of frequency). The majority of respondents (86%) are using these services for free (see Figure 1).
Figure 1: Which of the following types of membership/accounts do you have? A very small percentage of respondents (11%) is willing to consider paying for their account if the free service becomes unavailable; however, the majority would not. A follow‐up interview question will ask respondents what they would do with their accounts and content if payment was required. A significant percentage of respondents (68%) identified their photographic practice as amateur. A much smaller percentage of respondents (13%) identified their practice as professional. The option to select both
124
Jessica Bushey
professional and amateur attracted a larger percentage of respondents (17%), and additional comments suggested hybrid practices of academic and research photography and hobby photography. When asked about the general nature of their use of photo‐sharing and social networking services, more than half (58%) of the survey respondents selected personal, and approximately one third (37%) of respondents selected both personal and business. The mobile phone is the most frequently used device for capturing digital photographs, followed closely by the digital camera. The majority of survey respondents uses photo‐sharing and social networking services for sharing their photographs with others (see Figure 2). Respondents also ranked “Sharing images” as the most important feature of photo‐sharing and social networking platforms, followed by Access, Management, and Storage (in order of importance.)
Figure 2: What is your primary activity when using photo‐sharing and social networking services as part of your photographic practice? The combination of these responses provides important context for understanding the primary reason why persons use photo‐sharing and social networking sites for their digital photographs. The majority of respondents is engaged in photographic practice that is shaped by mobile phones with Internet connectivity, personal habits and a desire to communicate their worldview with others through images. Therefore, in response to the first research question, the primary purpose for which persons use photo‐sharing and social networking sites for their digital photographs is to share their photographs with others. Nearly half (49%) of all respondents have been members of photo‐sharing and/or social networking sites for 6‐ 10 years. The numbers of digital photographs stored in respondent’s accounts vary, but on average respondents are storing between 101 and 5,000 digital photographs per account, with a small percentage of members (approx. 6%) storing in excess of 10,000 image files (see Figure 3).
125
Jessica Bushey
Figure 3: In total, how many digital photographs do you have stored in your photo‐sharing and social networking accounts? The activity of adding information about the creation of the photograph, e.g. metadata that include the name of the photographer or the place of capture, is undertaken by less than half (42%) of survey respondents. Additional comments reveal the practice of removing certain types of metadata from image files prior to upload, “I choose the metadata – mostly copyright and licensing things, I strip any info about equipment, location etc.” as well as the removal of metadata by social media sites, “No [I don’t add metadata], because most social media sites strip it anyway.” To those respondents who add metadata to their image files prior to uploading the most important types of information are the name of the photographer, the place of capture and the subject (see Figure 4).
Figure 4: When adding information to identify your digital photographs, how important are the following types? After uploading, the majority of respondents adds tags, comments, likes and/or ratings. When respondents were asked if they use watermarks, copyright symbols or creative commons licenses to control how people may use their digital photographs, the majority of respondents (73%) answered in the negative. Additional comments highlighted the intended use of the photograph as a determining factor for including watermarks or copyright attribution. The combination of these responses provides important context for gaining an understanding of whether persons using photo‐sharing and social networking sites to store their digital photographs are concerned with trustworthiness. The majority of respondents store significant numbers of photographs and control who can access and use them (83%). Yet, approximately half of the respondents (49%) does not add metadata which could assist in uniquely identifying the photograph. The addition of information that contributes to the identity and integrity of the digital photograph is necessary for ensuring its authenticity. Therefore, in response to the second research question, persons using photo‐sharing and social networking services to store their digital photographs are concerned with controlling access and use of their collections; however, only half of respondents is using metadata to do so. The latter are more concerned with adding information that identifies the who, what, when and where, than protecting the integrity of their digital photographs from unauthorized alteration or use. Many of the respondents (67%) have downloaded their photographs from a photo‐sharing or
126
Jessica Bushey
social networking site. Of them, less than a quarter (20%) has experienced issues, the most common being unexpected changes in size and colour, and corruption of files (see Figure 5). Additional comments emphasize the inability to download the original file uploaded to specific platforms and the lack of features supporting batch downloads of files.
Figure 5: Please select any similar issues from the list below (check all that apply). The majority of respondents (91%) routinely keeps a copy of their digital photographs on their personal computer or an external hard drive. Other commonly used devices and platforms for storing copies of photographs that have been uploaded are mobile phones and cloud storage. The least used method is to store copies of digital photographs on optical media (e.g., DVD). Of the small percentage (6%) of respondents who do not keep copies, the photograph uploaded into the social networking site is the only version of that image file. The combination of these responses provides important context to assist in understanding the degree to which users are aware of the risks posed by social media platforms to the trustworthiness of their digital photographs collections. The majority of respondents is taking actions, such as storing copies of their photographs on personal computers and external hard drives, which could be based on an awareness of the risks posed by social media platforms; however, without follow‐up questions, it is difficult to determine with any certainty. Although the percentage of respondents who have experienced issues with their photographs after downloading them out of social media platforms is small, the volume of digital photographs stored in photo‐ sharing and social media sites by many of the respondents is high. Instances of corrupt image files may be manageable for users that are downloading individual files, but large collections of thousands of photographs could give cause for concern. Therefore, in response to the third research question, users are conducting their photographic practice in a manner that minimizes the risks posed by social media platforms to the trustworthiness of their digital photographs collections. The Terms of Use (ToU) is one of the tools that determine the roles and responsibilities of the service provider and the customer. Less than a quarter of respondents (19%) have read the ToU before signing‐up for a photo‐ sharing or social networking account and just over half the respondents (59%) skimmed the ToU. Very few respondents (6%) have had their account deleted by the service provider due to inactivity or accident, and of those respondents, the majority was able to contact the service provider and have their account and data restored. Overall, respondents expect to have access to their photographs stored in photo‐sharing and social networking sites for many years to come (see Figure 6).
127
Jessica Bushey
Figure 6: How long do you expect to have access to your digital photographs that are stored in photo‐sharing and social networking sites? The majority of respondents (83%) has not made plans to share their passwords or account information with a family member or beneficiary in an effort to protect the legacy of their digital photograph collections. Additional comments provided by respondents highlight the novelty of valuing digital assets as part of a personal estate, “That is a fascinating idea. I am going to consider doing that.” The combination of responses provides important context to assess whether persons using photo‐sharing and social networking sites are aware of the challenges presented by social media platforms to continuing access and long‐term preservation of their digital collections. The general lack of attention paid by respondents to the Terms of Service presents an opportunity to overlook and/or misunderstand critical information about account termination, access to digital photographs stored within accounts of deceased persons, and ownership of digital photographs stored in inactive accounts to be overlooked or misunderstood. Therefore, in response to the fourth research question, users are not aware of the challenges presented by social media platforms to continuing access and long‐term preservation of digital photographs collections.
6. Conclusion The survey on digital photographs in social media platforms is only the first phase of a larger study, which explores the phenomenon of contemporary photographic practice utilizing mobile devices with Internet connectivity and its impact on recordkeeping practices performed by archivists and information professionals. At this early stage, the survey findings reveal a number of areas that need to be explored further through semi‐structured interviews and analysis of case law and terms of service contracts. Based on the responses, it is clear that individuals currently value photo‐sharing and social networking services for sharing their digital photographs with others, yet their unwillingness to pay for an account raises questions about the sustainability of these platforms and the content held within. In light of these responses, it is surprising to learn of users’ expectations of social networking sites to store and provide access to user‐generated content in excess of twenty years. The survey gathered data from persons who use photo‐sharing and social media services to share their personal photographs with friends and the online public. An issue to be explored further in the interviews is the notion of privacy in the networked environment and user’s expectations of ownership over their accounts. The fact that very few respondents have taken steps towards ensuring that their family can access their photo‐ sharing and social media accounts could be an indicator that society has yet to acknowledge the degree to which daily activities and major events are being conducted online and the potential for digital photographs held in social networking accounts to be meaningful to future generations.
References Almond v. State, 553 S.E.2d 803, 805 (Ga. 2001)
128
Jessica Bushey
ASMP. (2014) Social Media, Legal Considerations and More, [online], American Society of Media Photographers, http://asmp.org/pdfs/Know_Your_Rights_Social_Media.pdf , accessed 21 June 2014. ASMP and PhotoShelter Inc. (2013) The Photographer’s Guide to Copyright, [online], American Society of Media Photographers, http://asmp.org/pdfs/PhotographersGuidetoCopyright.pdf, accessed 21 June 2014. Bass, J. (2013) “A PIM Perspective: Leveraging Personal Information Management Research in the Archiving of Personal Digital Records.” Archivaria No. 75, Spring, pp 5‐48. Bradshaw, S., Millard, C. and Walden, I. (2011) “Contracts for clouds: comparison and analysis of the Terms and Conditions of cloud computing services,” International Journal of Law and Information Technology Vol. 19 No.3, pp 187‐223. doi:10.1093/ijlit/ear005. Brown, I. (2013) “Humanity takes millions of photos every day. Why are most so forgettable?” The Globe and Mail, 21 June 2013. . http://www.theglobeandmail.com/life/humanity‐takes‐millions‐of‐photos‐every‐day‐why‐are‐most‐so‐ forgettable/article12754086/?page=all, accessed 21 June 2014. Buntrock, R. and Madden, J. (2013) “#KnowYour(Copy)Rights: Applying a Legal Filter to Instagram's Revised Terms of Use,” The Instagram Papers, pp 16‐ 20, [online], American Society of Media Photographers, http://asmp.org/pdfs/KnowYour(Copy)Rights.pdf, accessed 21 June 2014. Bushey, J. and Braun, M. (2006) “General Study 07 Final Report: Survey of Recordkeeping Practices of Photographers using Digital Technology,” [online], InterPARES 2 Project, http://www.interpares.org/display_file.cfm?doc=ip2_gs07_final_report.pdf, accessed 21 June 2014. Bushey, J. (2008) “He Shoots He Stores: New Photographic Practice in the Digital Environment,” Archivaria Vol.65, Spring, pp 125‐149. Duranti, L. and MacNeil, H. (1996) “The Preservation of the Integrity of Electronic Records: An Overview of the UBC‐MAS Research Project,” Archivaria No. 42, Spring, pp 46‐67. Duranti, L. and Thibodeau, K. (2006) “The Concept of Record in the Interactive, Experiential and Dynamic Environments: the View of InterPARES,” Archival Science Vol.6, No.1, pp 13‐68. Duranti, L. and Preston, R. (2008) “Creator Guidelines, Making and Maintaining Digital Materials: Guidelines for Individuals.” InterPARES 2 Project, [online], http://www.interpares.org/ip2/display_file.cfm?doc=ip2%28pub%29creator_guidelines_booklet.pdf, accessed 21 June 2014. Grimm, Paul, Judge. (2007) Jack R. Lorraine and Beverly Mack v. Markel American Insurance Company, Civil Action No.: PWG‐06‐1893, Memorandum Opinion. District Court of Maryland. Gürsel, Zeynep Devrim. (2012) “The Politics of Wire Service Photography: Infrastructures of Representation in a Digital Newsroom.” American Ethnologist Vol.39, February, pp 71‐89. IPTC. (2013) “IPTC study shows some social media networks remove rights information from photos,” Media Release, [online], http://www.iptc.org/site/Home/Media_Releases/IPTC_study_shows_some_social_media_networks_remove_rights_ information_from_photos, accessed 21 June 2014. John, J.L. et al. (2010) “Digital Lives. Personal Digital Archives for the 21st Century >> An Initial Synthesis, Digital Lives Research paper, Beta Version 0.2,” The British Library, [online], http://britishlibrary.typepad.co.uk/files/digital‐lives‐ synthesis02‐1.pdf, accessed 21 June 2014. Krogh, P. (2013) “The Right to Terminate.” The Instagram Papers, pp 5‐15, [online], American Society of Media Photographers, http://asmp.org/pdfs/RightToTerminate.pdf, accessed 21 June 2014. Mehlman, J. (2012) “Facebook and MySpace in the Courtroom: Authentication of Social Networking Websites.” American University Criminal Law Brief Vol.8, No.1, pp 9‐28. Mell, P. and Grance, T. (2011) “The NIST Definition of Cloud Computing: NIST Special Publication 800‐145,” Gaithersburg, MD: Computer Security Division, Information Technology Laboratory National Institute of Standards and Technology. People v. Beckley, 185 Cal. App.4th 509 (110 Cal. Rptr. 3d 362) Reading, A. (2009) “Memobilia: The Mobile Phone and the Emergence of Wearable Memories.” In Save As... Digital Memories, ed. J.Garde‐Hansen, A. Hoskins, and A. Reading, pp 81‐95. Palgrave Macmillan, London. Singer, J. et al. (2011) Participatory Journalism: Guarding Open Gates at Online Newspapers, Wiley‐Blackwell, Chichester, W. Sussex. Witkowski, J. (2002) “Can Juries Really Believe What They See? New Foundational Requirements for the Authentication of Digital Images.” Washington University Journal of Law & Policy Vol. 10, pp. 267‐294.
129
Diversified‐NFS Martin Osterloh, Robert Denz and Stephen Taylor Scalable Concurrent Systems Laboratory, Thayer School of Engineering at Dartmouth College, Hanover, New Hampshire, USA
[email protected] [email protected] [email protected] Abstract: As corporations and governments increasingly move to operate in the cloud, there is a natural concern for the safety of intellectual property and confidential information: Advanced threats may lurk within the infrastructure, for long periods of time, with the goal of extracting valuable data. This data theft has the potential to cost companies millions in lost revenues and impact the lives of individuals through identity theft. The problem is exacerbated by vulnerability amplification: the use of a few standard operating systems and tools throughout the cloud allowing a single reusable exploit to access thousands of hosts. This paper describes a new cloud technology, Diversified‐NFS (D‐NFS) that mitigates vulnerability amplification by dynamically modifying every executable binary at load time. The impact of this change is that every instance of any particular code, throughout the cloud, is unique with no shared addresses against which a single exploit can operate. The technology leverages recent advances in network services, including a diversifying ELF‐loader developed at Dartmouth, the libNFS Network File System, Preboot eXecution Environment (PXE‐boot), and Dynamic Host Configuration Protocol (DHCP) servers. This combination allows a cloud service provider to manage trusted stores of binaries that are only accessible, out‐of‐band, via a Local Area Network (LAN) or an even smaller Virtual LAN (V‐LAN). Keywords: Diversity, vulnerability amplification, compiler/loader transformations
1. Introduction Compilers, such as GCC, transform source‐code files into machine independent object‐code files that utilize standard, relocatable formats such as Common Object File Format (COFF) and Executable and Linkable Format (ELF). Object files are subsequently linked together, using linkers such as ld and gold, to resolve inter‐file dependencies corresponding to function and variable references. This process produces an executable (or binary) file that can be loaded within a particular operating system for execution. On Unix‐like systems the loading process occurs through system calls such as fork, to create a process, and exec, to initiate execution of a binary in a process. The exec system call loads the binary file from a mount point in the file system, either on an attached hard‐drive or over the network using a distributed file system such as the Network File System (NFS). An unfortunate by‐product of this translation and loading process is that if the source code contains an exploitable vulnerability, then it is replicated on each system where the corresponding executable is loaded. If the source code corresponds to a generic application, it may give rise to user‐level access. Unfortunately, if the source code is a third‐party device driver, privileged service, or the operating system code‐base, it may provide a more insidious privileged access, allowing intruders to hide from network defenders. Cloud systems are relatively homogeneous in comparison to the internet at large: Due to the economies of scale produced by modern blade and rack‐based systems, typically a constrained number of hardware platforms are used. These platforms are shared by virtual environments that leverage a relatively small number of operating systems variants; typically the more recent versions of Windows and Linux. As a result, vulnerabilities imported to a single virtual machine are replicated or amplified throughout the cloud. This amplification makes a single attack vector extremely valuable as it may have broad impact across a large proportion of a cloud structure. Address Space Layout Randomization (ASLR), developed by the Linux PaX team (PaX Team 2001) and later used in many other operating systems including variants of Windows, is the most prominent approach to mitigating vulnerability amplification. It operates by randomly relocating the base address of libraries, program text, stack, and heap at runtime, thereby introducing a non‐deterministic (unpredictable) level of diversity into the binary execution. This removes the ability for attackers to statically pre‐determine the address of functions and gadgets (code fragments) that can be used to build exploits or arbitrary code segments. ASLR is believed to be effective against both return‐to‐libc attacks (c0ontex) and return‐oriented programming (Shacham 2007)
130
Martin Osterloh, Robert Denz and Stephen Taylor
but only requires the attacker to discover base addresses to craft a re‐usable exploit. Unfortunately, a minimal level of entropy is required to protect against brute‐force attacks (Shacham 2004). Early analytical work to quantify the impact on attacker workload has already been conducted; it concludes that approximately 16‐bits of entropy, corresponding to 216 unique code addresses within a process binary, are required to protect against brute‐force attacks within reasonable timeframes i.e. 20 minutes.
2. Technical Approach Our approach, illustrated in Figure 1, uses NFS in combination with the Preboot eXecution Environment (PXE‐ boot) and Dynamic Host Configuration Protocol (DHCP) to apply three transformations (Kanter 2013a) and (Kanter 2013b):
Compile‐time diversity: Random, unused, code is injected into every block in the source code. This randomly displaces every entry and exit point to every code fragment within each function and is achieved using a plugin to the clang compiler.
Load‐time diversity: Function and data sections are loaded into random pages, scrambling their order and randomizing their entry and exit points. Code dependencies, such as function calls and variable references, are resolved at load‐time by a modified ELF‐loader.
Replication diversity: Functions and data sections are replicated at load time by the ELF‐loader; those instances to be used are chosen non‐deterministically, providing a form of camouflage.
Figure 1: System Overview The overall architecture places DHCP, TFTP (used by PXE‐boot), and NFS servers on a separate out‐of‐band subnet or V‐LAN within the cloud that is used only for code protection. Each program, corresponding to an application, operating system, or hypervisor, is compiled at an out‐of‐band server to form a variant repository within the distributed file system. For the sake of simplicity, we currently place all application binaries in /bin; operating system and hypervisor binaries are placed in /var/lib/tftp. A background task at the out‐of‐band server periodically recompiles each program and replaces the existing version within the repository to avoid the delay associated with on‐demand compilation. The random nature of the compile‐time diversity transformation ensures that each time a program is recompiled a unique binary variant is available for loading. The standard ELF format segregates an executable program into distinct sections that designate TEXT (code), DATA (initialized variables), RODATA (read‐only data), and BSS (uninitialized variables). The ELF file also contains headers that describe how these sections should be stored in memory. Typically, for example in Linux,
131
Martin Osterloh, Robert Denz and Stephen Taylor
functions are loaded sequentially into sections and sections are loaded back‐to‐back into memory. There is no re‐ordering of functions or sections and as a result, the location of code in memory is deterministic and can be reverse‐engineered. Instead, we force the compiler to build a separate section for each function using the compiler ‐ffunction‐sections option. This allows the ELF loader to re‐order function layout placing each function in a random page at load time. Moreover, using relocations (Rel/Rela sections) generated by the compiler, the loader is able to update inter‐section dependencies between functions and data at load‐time (Kanter 2013c). When a physical host bootstraps within the cloud, its NIC card obtains an IP address from the DHCP server together with the address of the TFTP server (1). The NIC card then downloads the current hypervisor variant from the TFTP server (2) and executes it using the PXE‐loader; this provides a crude form of diversity, based only on the compile‐time transformation, to instances of the hypervisor binary throughout the cloud. Each hypervisor includes a virtual NIC‐card that follows the same pattern to load an operating system variant from the TFTP server (2); however, the hypervisor employs the modified ELF loader for bootstrapping. This ensures that each instance of an operating system variant has a unique image in memory with additional levels of diversity induced by the load‐time transformations. Services and device drivers, loaded during bootstrapping, are also diversified with the modified ELF loader. We assume that the bootstrapped operating system incorporated an NFS client capability, designated in Figure 1 as NFSD. Eventually, the operating system opens a shell and begins execution of commands (3) using an execve system call. This call will load the current variant of the application over the network through NFS (4) into the operating system kernel. The modified ELF loader is then used to initiate execution of the application (5). This ensures that the each instance of an application in memory is random and unique. This process mitigates vulnerability amplification and substantially increases attacker workload: in order to craft a reusable exploit, the attacker must reverse‐engineer each program variant and determine its unique memory footprint, on each host in the cloud, as a function of time. The more frequently a program is reloaded, the more difficult the attackers task. The intent is to force the refresh frequency inside (shorter than) the time to reverse‐engineer a code and develop an exploit. This removes the opportunity to use an exploit without the need to detect intrusions. The approach yields the full capacity of entropy (39 bits based on addressing in 64‐ bit architectures, ~550 Billion variants) for even small code bases, allows compile‐time code‐size overhead and entropy to be adjusted based on threat‐level, and has negligible run‐time overhead; all of the overhead is paid up‐front at load‐time (Kanter 2013).
3. Microkernel Implementations The descriptions in Section 2 assume a traditional monolithic operating system that loads and executes binaries in the kernel with direct use of the network driver. In the latest generation of micro‐kernels, typified by Minix (Tanenbaum 2006), device drivers are placed into user‐space allowing them to be regenerated and opening the opportunity to diversify them on the fly. Figure 2 illustrates how this process can be accomplished through modifications to the execve system call, used after a new process has been forked. The execve system call is separated into two components, one of which operates with user‐level permissions and the other as part of the kernel.
132
Martin Osterloh, Robert Denz and Stephen Taylor
Figure 2: File‐based Approach Since the kernel does not have direct access to the device driver or NFS, each binary is first be loaded into user memory. This is achieved by loading the entire binary over the network in a single NFS read operation. Once this user‐level operation is complete, execve then passes control to the kernel. The kernel reloads the pages of the user‐process into a temporary location accessible only within the kernel. It then copies the binary into kernel pages, allowing them to be manipulated by the diversifying ELF loader. A memory I/O abstraction layer has been implemented to provide file operations on kernel memory: Instead of using fread(…) and fseek(…) to access components of the binary, the loader is implemented in terms of mem_read(…) and mem_seek(…) operations respectively. These allow the loader to conveniently access all of the core ELF data structures contained within the binary for the purpose of resolving relocations. The ELF loader can then set the instruction pointer (RIP) to the entry point of the diversified ELF file and the new user space process can be scheduled to run. Incremental Loading. Figure 3 illustrates an alternative approach that we have also investigated in which sections of the ELF file are loaded incrementally. As before, the ELF loader is separated into two parts: The first executes in user‐land and retrieves ELF sections from the binary file as needed over the network. The primary headers of interest in the ELF format are the file header, program header(s), section headers, string table and symbol table. By reading the first 64 Bytes (the file header), all pointers to the subsequent headers can be obtained. Referencing pointer by pointer, all sections of interest can be read into the user space process’s memory.
133
Martin Osterloh, Robert Denz and Stephen Taylor
Figure 3: Segment‐based Approach The drawback of this solution is that each section and section header are handled, loaded and stored independently. Thus, code size (attack surface) increases from around 300 lines of code for direct loading to approximately 650 lines for on demand loading. In order to pass the loaded ELF sections to the kernel, a new structure, shown in Fragment 1, must be created to hold all the essential information. struct elf_ctx_user { struct Elf_Ehdr file_header; struct Elf_Phdr *program_headers; struct Elf_Shdr *section_headers; uint8_t *section_strtab; uint8_t *strtab; struct Elf_Sym *symtab; uint64_t *pt_load_segment; int size_pt_load_segment; int size_phdr; int size_shdr; int size_section_strtab; int size_strtab; int size_symtab; }; Fragment 1: Kernel ELF structure Once the sections are loaded and stored in this fashion, the kernel side of the execve system call can be activated. Upon entering the kernel, the context can be copied from user space memory (i.e. heap) into kernel space. The ELF loader can then allocate memory as needed, load the new code, and set the instruction pointer (RIP) to the entry point of the ELF file (specified in the file header). Once this is completed, the new user space process can be scheduled to run. This approach was implemented successfully but unfortunately was impractical. Recall that since programs are compiled with ‐ffunctions‐sections, each function of a program is placed into its own section. In addition, each section then has a corresponding relocation (Rel/Rela) section. Diversity makes heavy use of ELF’s Rel and Rela sections that provide relocation information for resolving references. Although, in the description above, every
134
Martin Osterloh, Robert Denz and Stephen Taylor
section is already loaded, every corresponding Rel/Rela section must also be loaded. For example, a simple ls program contains about 67 section headers, which corresponds to 134 network file I/O operations (seek, then read), plus associated copies to the kernel. If diversity is used, this number doubles to 268 file I/O operations to access the relocation information. On the kernel side, every section header would have to be manually copied in order to pass a copy of the binary to the ELF loader. It soon became clear that loading sections on demand results in unreasonable network management overhead.
4. Prototype Implementation These concepts described in Sections 2 and 3 have been incorporated into a new, from‐scratch operating system design ‐‐ Bear ‐‐ that operates on 64‐bit, x86 multi‐core blade servers (Nichols 2013). The full system is depicted in Figure 4 and is composed of a minimalist micro‐kernel with an associated hypervisor that share code extensively to reduce the attack surface. The complete system is currently less than 11,000 lines of code and presents a kernel attack surface (runtime image) of just over 200 Kbytes. The core functions associated with scheduling user processes and protecting them from each other are handled by the micro‐kernel. All processes and layers are hardened by strictly enforcing MULTICS‐style read, write, and execute protections (Corbató 1965) using 64‐bit x86 address translation hardware. This explicitly removes vulnerabilities associated with code execution from the heap and stack. All potentially contaminated user processes, device drivers and services are executed with user–level privileges and are strictly isolated from the micro‐kernel via a message‐passing interface. User processes are loaded via an NFS client daemon (NFSD) operating in user space while the kernel and hypervisor are bootstrapped using PXE‐boot, as described in Section 2. On a full‐up system, less than 15% of the entire code base is associated with kernel and hypervisor.
Figure 4: The Bear Operating System To prevent persistence in compromised device drivers and services, the micro‐kernel randomly and non‐ deterministically regenerates and diversifies them from gold‐standard binaries resident in a trusted read‐only file store. This store is currently realized through the memory based file system, achieved with PXE‐boot, described in Section 3 which is accessible only from the kernel and hypervisor; however, it could alternatively be realized via read‐only memory (ROM) or via an out‐of‐band, write‐enabled channel to flash on new hardware. Unlike the MINIX re‐incarnation process (Tanenbaum 2006), regeneration and diversification is carried out without regard to perceived fault or infection status. User processes can be refreshed through pre‐ arranged or designated schedules; for example, when not in use, every few hours, at night, or just prior to a mission event. To prevent persistence in the micro‐kernel, it is also non‐deterministically refreshed and diversified from a gold‐standard binary in the trusted file store, by the hypervisor. Unlike traditional hypervisors, which are intended to support a general virtual machine execution environment (VMware 2001), (Matthews 2008), (Habib 2008), this minimalist hypervisor is designed to support only the operations required to bootstrap a new micro‐kernel and change its network properties (e.g. IP & MAC address) so as to invalidate an adversary’s surveillance data. The current running and bootstrapping instances of the micro‐kernel are isolated in
135
Martin Osterloh, Robert Denz and Stephen Taylor
hardware through extended page tables, implemented with Intel VT‐x extensions. Similarly, the network card is isolated through a mapping scheme based on Intel VT‐d extensions. Like user processes, kernel refresh and diversification can be organized whenever a machine reboots, or can be forced under pre‐arranged or designated maintenance schedules.
5. Future work Using the multi‐boot standard (The GRUB Team 1995), the prototype hypervisor described in Section 4 has also been used to bootstrap the NetBSD kernel. As a result, we are considering how to incorporate the D‐NFS concepts for loading and diversifying binaries into NetBSD under the presumption of minimal change to the operating system, and the opportunities for diversifying BSD itself in conjunction with hypervisor based refresh. Any system that conforms to the multi‐boot standard may be treated in the same manner. Although the file‐based approach to execve operates well on our current prototype, it has yet to be exercised on large binary files (ie. >10 Mbytes). Since the entire binary must be transferred into memory and then copied into kernel space, it may be a challenge to follow this concept for larger binaries. Our planned solution would provide a synthesis of the segment‐based and file‐based approaches described in Section 3, slicing a large binary into several large sections and then loading them piecemeal.
6. Conclusion ALSR has already demonstrated that diversification of binaries is beneficial to network security, yet its weaknesses are already being discovered (Durden 2002). Our previous work has resulted in diversifying transformations that provide negligible runtime overhead but present a dynamic, time‐dependent element to diversification. Restructuring the operating system to reduce its attack surface and operate under continual refresh and diversification significantly increases the attackers workload and mitigates the opportunity for vulnerability amplification. This paper has shown how such an operating system can be integrated within existing cloud infrastructure using standard protocols, such as NFS, PXE‐boot, and DHCP. The approach assumes that the source‐code is available and requires changes to the standard tool chain in the form of a compiler plug‐in, a modified ELF loader, and changes to the execve system call. It is our expectation that these changes can be readily applied to other operating systems, especially open‐source systems such as BSD and Linux variants, especially systems conforming to the multi‐boot standard.
7. Notice The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Defense Advanced Research Projects Agency (DARPA) or the U.S. Government.
References Corbató, F. J. and Vyssotsky, V. A. (1965) ‘Introduction and overview of the Multics system.’ Proceedings of the November 30‐‐December 1, 1965, fall joint computer conference, part I. ACM. Durden, T. (2002) ‘Bypassing PaX ASLR Protection’ Phrack Magazine, vol. 0x0b, no. 0x3b. Habib, I. (2008) ‘Virtualization with kvm.’ Linux Journal 2008.166. Kanter, M. and Taylor, S. (2013a) ‘Diversity in Cloud Systems through Runtime and Compile‐Time Relocation‘ In Proceedings of IEEE‐HST.
Kanter M. and Taylor, S. (2013b) ‘Attack Mitigation through Diversity’, In Proceedings of MILCOM pp 1410‐1415. Kanter, M. (2013c) Enhancing Non‐determinism in Operating Systems, Ph.D. Thesis, Thayer School of Engineering at Dartmouth. Matthews, J. N. et al. (2008) Running Xen: a hands‐on guide to the art of virtualization Prentice Hall PTR. Nichols, C. and Kanter, M. and Taylor, S. (2013) ‘Bear – A Resilient Kernel for Tactical Missions” In Proceedings of MILCOM 2013, pp 1416‐1421. c0ntex Bypassing Non‐Executable‐Stack During Exploitation with Return‐to‐libc Available online at: http://www.open‐ security.org/texts/4 (Accessed at 21 May 2014). PaX Team (2001) Address Space Layout Randomization Available online at: http://pax.grsecurity.net/docs/aslr.txt (Accessed at 21 May 2014). Shacham, H. (2007) ‘The geometry of innocent flesh on the bone: return‐into‐libc without function calls (on the x86)’, in Proceedings of the 14th ACM conference on Computer and communications security.
136
Martin Osterloh, Robert Denz and Stephen Taylor
Shacham, H. and Page, M. and Pfaff, B. and Goh, E.‐J. and Modadugu, N. and Boneh, D. (2004) ‘On the Effectiveness of Address Space Randomization’, in Proceedings of the 11th ACM conference on Computer and communications security. Tanenbaum, A. S. and Woodhull, A. S. (2006) Operating Systems: Design and Implementation Prentice Hall. The GRUB Team (1995) The Multiboot Specification, version 0.6.96. Available at: http://www.gnu.org/software/grub/manual (Accessed at 21 May 2014). VMware (2011) E.S.X. Server: User's Manual Version, pp 122‐124.
137
Cache Side‐Channel Attacks in Cloud Computing Younis Younis, Kashif Kifayat and Madjid Merabti School of Computing and Mathematical Sciences, Liverpool John Moores University, Liverpool, UK Y.A.‐
[email protected] [email protected] [email protected] Abstract: Cloud computing is considered one of the most dominant paradigms in the Information Technology (IT) industry nowadays. It supports multi‐tenancy to fulfil future increasing demands for accessing and using resources provisioned over the Internet. Multi‐tenancy enables to share computing physical resources among cloud computing tenants and offers cost‐effective, on‐demand scaling. However, multi‐tenancy in cloud computing has unique vulnerabilities such as clients’ co‐residence and virtual machine physical co‐residency. Physical co‐residency of virtual machines can facilitate attackers with an ability to interfere with another virtual machine running on the same physical machine due to an insufficient logical isolation. In the worst scenario, attackers can exfiltrate sensitive information of victims on the same physical machine by using hardware side‐channels. Side‐channel attacks are an implementation level attack on cryptographic systems. They exploit the correlation between the higher level functionality of the software and the underlying hardware phenomena. There are various types of side‐channels attacks, which are classified according to hardware medium they target and exploit, for instance, cache side‐channel attacks. CPU caches are one of the most hardware devices targeted by adversaries because it has high‐rate interactions and sharing between processes. Furthermore, full encryption keys of well‐known algorithms (i.e. RSA and AES) have been broken using simple spying processes to spy and collect information about cache lines, which have been accessed. This information is analysed and linked to the current virtual machine, which occupies the processor. The target of this paper is to explore potential security issues related to side‐channel attacks, particularly cache side‐channel attacks in cloud computing. It highlights research directions, investigates various real attack scenarios and gaps in the existing approaches that proposed to prevent and defend against cache side‐channel attacks in the cloud computing. Keywords: cloud computing; security challenges; side channel attacks; cache side‐channel attacks.
1. Introduction Cloud computing is an open standard model, which is Internet‐centric and provides various services either software or hardware. . It offers new cost effective services on‐demand such as Software as a service (SaaS), Infrastructure as a service (IaaS) and Platform as a service (PaaS). A significant interest in both industry and academia has been generated to explore and enhance cloud computing. It has five essential characteristics: on‐demand self‐service, measured service, rapid elasticity, broad network access and resource pooling. It is aiming at giving capabilities to use powerful computing systems with reduce cost, increase efficiency and performance (Mell & Grance 2011). It consolidates the economic utility model with the evolutionary enhancement of many utilised computing approaches and technologies, which include computing infrastructure consisting of networks of computing and storage resources, applications and distributed services. Moreover, there is an ongoing debate in Information Technology (IT) communities about how cloud computing paradigm differs from existing models and how these differences affect its adoption. One view consider it as a modern or a fashionable way to deliver services over the Internet, while others see it as a novel technical revolution (Bardin et al. 2009). However, with all of these promising facilities and benefits, there are still a number of technical barriers that may prevent cloud computing from becoming a truly ubiquitous service. Especially where a customer has strict and complex requirements over the security of an infrastructure (Zhang et al. 2010). Security is the main inhibitor to cloud adaptation. Cloud computing may inherit some security risks and vulnerabilities from the Internet, such as malicious code (Viruses, Trojan Horses). In addition, cloud computing suffers from data privacy issues and conventional distributed systems attacks i.e. Distributed Denial of Service attacks (DDoS), which could have a huge impact in its services. Moreover, cloud computing has brought new concerns such as moving resources and storing data in the cloud with a probability to reside in another country with different regulations. Computing resources could be inaccessible due to many reasons such as natural disaster or denial of service. Cloud computing is a shared environment, which shares large‐scale computing resources among large consumers (organizations and enterprises) comprising large number of users. Hence, cloud computing tenants
138
Younis Younis, Kashif Kifayat and Madjid Merabti
will share the same physical resources and are likely to face co‐residence vulnerabilities. Virtual Machine (VM) physical co‐residency facilitates attackers to interfere with other virtual machines running on the same physical machine by using hardware side‐channels. In the worst scenario, attackers can exfiltrate victims’ sensitive and confidential information. There are various types of side‐channels attacks, which are classified according to hardware medium they target and exploit, for instance, cache side‐channel attacks. Cache side‐ channel attacks are type of Micro architectural Attacks (MA), which is a large group of cryptanalysis techniques within side‐channel analysis attacks (Acıiçmez et al. 2010). In this paper, we look at side‐channel attacks and how they can affect the multi‐tenancy and virtualization in cloud computing. It defines side‐channel attacks and presents their types. It then concentrates on cache side‐ channel attacks and how they penetrate the security of cryptographic algorithms. The rest of this paper is structured as follows. Section 2, illustrates side‐channel attacks and its impact on virtualization. Section 3 describes different types of cache side‐channel attacks and how they can extract information from CPU caches. Section 4 shows gaps in the existing researches and a number of proposed countermeasures to cache side‐ channel attacks in cloud computing. The conclusion and our future work are presented in section 5.
2. Side‐Channel Attacks and Virtualization 2.1 Side‐Channels Cryptographic systems relay on security mechanisms to authenticate communicating entities and ensure the confidentiality and integrity of data. The security mechanisms have to be implemented according to cryptographic algorithms and to meet the security goals of the security systems. Although the security mechanisms can control and specify what functions can be performed, they cannot specify how their functions are implemented. For example, a security protocol’s specification is usually free of whether the encryption algorithms are implemented in custom hardware units or using software running on a general processor. It is also independent of the used memory to store intermediate data during computations, either it is on a separate chip or on the same chip with the computing unit (Zhou & Feng 2005). Moreover, cryptographic algorithms are always implemented in the hardware or the software of physical devices that interact with, and are influenced by their environments. These interactions can be monitored and instigated by attackers. The gap between how security functions are specified and how they are implemented, has led to a new class of attack, which is a side‐channel attack. Side‐channel attacks are an implementation level attack on cryptographic systems. It exploits a correlation between high‐level functionalities of the software and the underlying hardware phenomena. For instance, it looks at the correlation between the internal state of the computation processing device and the physical measurements taken at different points during the computation. They gather information about certain operations taking place on computation processing activities i.e. power consumption of a custom hardware unit or electromagnetic radiation as shown in figure 1.
Figure 1: The cryptographic model including side‐channel (Zhou & Feng 2005) There are various types of side‐channels attacks, which are classified according to the hardware medium they target and exploit, for instance, cache side‐channel attacks. Furthermore, attackers are always looking for hardware functions that offer high‐rate of computing interactions, which can facilitate attackers with detailed information about the state of computing operations taking place. For example, CPU cache side‐channels are one of the most prone hardware devices targeted by adversaries due to its high‐rate of interactions and sharing between processes (Wu et al. 2012). Moreover, according to Zhou et al. (Zhou & Feng 2005), there are three major classifications of side‐channel attacks:
139
Younis Younis, Kashif Kifayat and Madjid Merabti
Classifications depending on the method used in the analysis process.
This type is classified according to the tools or methods used to analyse the sampled data collected from attacks. It has two different methods to perform analysis: Simple Side‐Channel Attack (SSCA): This type exploits the relationship between side‐channel outputs and executed instructions (Zhou & Feng 2005). A single trace is used in an SSCA analysis to extract the secret key. However, the side‐channel information related to the attacked instructions (the signal) has to be larger than the side‐channel information of the unrelated instructions (the noise) to deduce the secret key (Blake et al. 2005). Differential Side‐Channel Attack (DSCA): It exploits the correlation between side‐channel outputs and processed data. In this type, many traces are used in the analysis, and then statistical methods are used to deduce the possible secret keys.
Classifications depending on the control over the computation process.
In this type, the side‐channel attacks are classified according to the control over a computation process by attackers. It is divided into two major categories: Passive attacks: Where the attacked system works as there is no attack occurred and the attacker has not been noticed interfering with the targeted operation. Active attacks: It has some interfering with the targeted operation, and some influence might be detected.
Classifications depending on the way of accessing the module.
This type relies on the kind of interfaces, which the attackers use to exploit the security system. These interfaces can be a set of logical, physical or electrical interfaces. Anderson et al. (Anderson et al. 2006) has classified these attacks in three different types: Invasive Attacks: De‐packaging is used in an invasive attack to get direct access to the internal components of a cryptographic device or module. For example, an attacker might open a hole in the passivation layer of a cryptographic module and place a probing needle on a data‐bus to see the data transfer. Semi‐invasive Attacks: In such attacks, an attacker gains access to the device, yet without damaging the passivation layer such as using a laser beam to ionize a device to alter some of its memories in order to change its output (Anderson et al. 2006). Non‐invasive Attacks: In this attack, an attacker requires close observation or manipulation of the device’s operation and does not need to get direct access to the internal components of cryptographic devices or modules. Thus become completely undetectable. In this process, the attacker just analyse data that unintentionally leaked, such as timing analysis. Timing analysis correlates an operation performed by a device with the time consumed to execute the operation in order to deduce the value of the secret keys.
2.2 Virtualization in Cloud Computing Cloud computing is a shared open environment, which has its own characteristics and features such as on‐ demand services and multi‐tenancy. Multi‐tenancy employs virtualization to share computing physical resources among cloud computing customers. Virtualization is a key element in cloud computing that provides greater flexibility in terms of resource allocation. It enables various virtual machines with different operating systems to share and beneficially use computing resources (e.g. processors, memories, etc.). Cloud computing utilises various kinds of virtualization technologies such as VMWare, Xen and Hyper V. However, running virtual machines simultaneously on the same physical machine may lead to security breaches. An adversary can penetrate the logical isolation between virtual machines to either attack and try to control the hypervisor or leak other users’ confidential data. Multi‐tenancy is considered as the source of the main security concern in cloud computing. It has unique vulnerabilities such as clients’ co‐residence and virtual machine physical co‐residency. It can give malicious parties ability to interfere with other running machines and direct access to the hosted server (Keller et al. 2010). Utilising virtual machine physical co‐residency can facilitate attackers with an ability to interfere with
140
Younis Younis, Kashif Kifayat and Madjid Merabti
another virtual machine running in the same physical machine due to insufficient logical isolation. In the worst scenario, attackers can exfiltrate victims’ sensitive information using side‐channels. Virtualization in cloud computing makes the non‐invasive side‐channel attacks an easy option for adversaries. An attacker just needs to place his/her Virtual Machine (VM) on the same physical machine hosting the targeted VM, and collects information about time taken or power consumed to perform cryptographic operations on the targeted VM.
Figure 2: CPU Caches And Their Levels
3. Cache Side‐Channel Attacks Cache is a small high speed section of memory built in and outside the CPU as illustrated in figure 2. It is usually a Static RAM (SRAM) that any requested data must go through. It contains the most recently accessed data and frequent accessed memory addresses. Furthermore, cache can lead to massive speed increases by keeping frequently accessed data and reducing time taken to evict and fetch data form the main memory. It is utilised to increase the speed of memory access as the time to execute an instruction is by far lower than the time to bring an instruction (or piece of data) into the processor (Kowarschik & Weiß 2003). For example, a 100 MHz processor can execute most instructions in 1 Clock (CLK) or 10 nanosecond (ns); whereas a typical access time for DRAM is 60 ns and the SRAM is 15 ns, which is 3 to 4 times faster than DRAM (Intel 2007). As shown in figure 2, a core can have three different levels of cache. Level 1 (L1) is the smallest among them, but it is the fastest. It is usually divided into data and instruction cache. If the requested data is not kept at L1, it will yield a cache miss. Otherwise, it will return a cache hit that means the data are located at the L1. Because L1 is small, therefore, Level 2 (L2) has been introduced. It is much larger than L1, yet slower than it. When the core experiences a cache miss from L1, it will look at the L2 for the wanted data or address of data. If the core gets another cache miss, it will jump to look for the requested data at Level 3 (L3). L3 is the biggest cache in terms of the size, but it is the slowest among them. The L3 is shared between the processor’s cores, and others (L1and L2) are shared between processes and threads. Cache side‐channel attacks are a type of Micro architectural Attack (MA), which is a large group of cryptanalysis techniques within side‐channel analysis attacks (Acıiçmez et al. 2010). CPU caches are one of the most targeted hardware devices by adversaries due to the high‐rate interactions between processes (Wu et al. 2012). Cache side‐channel attacks in cloud computing environments take an advantage of running multiple virtual machines simultaneously at the same infrastructure to leak secret information about a running encryption algorithm. Furthermore, full encryption keys of well‐known algorithms and schemes such as Data
141
Younis Younis, Kashif Kifayat and Madjid Merabti
Encryption Standard (DES) (Acıiçmez et al. 2010), Advanced Encryption Standard (AES) (Acıiçmez et al. 2006) and RSA (Acıiçmez & Schindler 2008), have been broken using spying processes to collect information about cache lines, which have been accessed. The information is analysed and linked to the current virtual machine that occupies the processor. In the side‐channel attacks, attackers are always looking for high‐rate hardware functions to explore current running cryptographic operations and the state of the operation in execution. The high‐rate hardware functions can communicate information more quickly and deduce the needed data to yield the secret key. Thus, CPU caches are always an interesting target to adversaries due to the following reasons: They are shared among VMs or cores. Therefore, an attacker can easily use clients’ co‐ residence or VM physical co‐residency to interfere and exfiltrate sensitive information of victims.
They have higher‐rate of computing interactions between processes.
They have the most fine‐grained and detailed information about the state of computing operations running on a system. There are three major types of side‐channel attacks, which facilitate adversaries with various capabilities to attack CPU caches (Acrimez & Koc 2006): Access‐driven side‐channel attack: In access‐driven attacks, an attacker runs a spy program on the physical machine that hosts it and the victim, in order to get information about cache sets accessed by the victim. There are various types of CPU caches’ architectural components can be targeted. Adversaries can monitor the usage of instruction cache (Acıiçmez et al. 2010), data cache (Tromer et al. 2009), branch‐prediction cache (Aciiçmez et al. 2007) or floating‐point multiplier (Aciicmez & Seifert 2007) to get information about the executing cryptographic operation in order to get the secret key. The most well‐known attack in this category is called the Prime and Probe attack [12]. It measures the time needed to read data from memory (RAM) associated with individual cache sets. In this attack, an attacker uses a process to fill cache lines with its own data. This step is named as a prime. Then, the process will wait for a prespecified time to let the victim accessed the cache. Having waited for the predefined time, the process starts the probe stage, which refills the same cache sets with attacker’s data and observes the victim’s activity on cache sets. If the victim accesses a primed line, data on the line will be evicted and cause a cache miss. This will yield a higher time to read this line than if it is still untouched. There is another type of the Prime and Probe attack, which is Flush+Reload (Yarom & Falkner 2013). The spy process shares memory pages with the victim and measures the time to access certain lines. The attack works as follows: The spy process flushes the monitored memory lines form the cache hierarchy (L1, L2 and L3 if found) as shown in figure 3.
Figure 3: The monitored line in the Flush+Reload attack
142
Younis Younis, Kashif Kifayat and Madjid Merabti
The spy process waits for a predefined time to allow the victim to access the main memory and cache hierarchy.
The spy process will reload the targeted memory lines and measure the time. If the accessed time is less than a predefined threshold, then it is a hit and the memory lines are accessed by the victim as shown in figure 4. Otherwise, it is miss.
Figure 4: The victim accesses the monitored lines
Time‐driven side‐channel attack: In this type of attack, an attacker aims to measure the total execution times of cryptographic operations with a fixed key. The whole execution times are influenced by the value of the key. Thus, the attacker will introduce some interference to the victim to learn indirectly whether a certain cache‐set is accessed by the victim process or not. This attack is called Evict and Time (Osvik et al. 2006). An attacker will execute a round of encryption; evict one selected cache‐set by writing its own data on it and measure the time it takes to a round of encryption by the victim. The time to perform the encryption relies on values in the cache when the encryption starts. Hence, if the victim accesses the evicted set, the round of encryption time tends to be higher (Liu & Lee 2013).
Trace‐driven side‐channel attack: The third class is trace‐driven, which looks at getting information related to the whole number of cache misses or hits for a targeted process or machine (Liu & Lee 2013). An attacker can capture the profile of cache activities during a round of encryption in terms of the victim’s misses and hits accesses. Moreover, these attacks need to monitor some aspect of CPU caches constantly throughout a round of encryption such as electronic emanations (Gandolfi et al. 2001). The ability to monitor CPU caches continuously makes these attacks quite powerful (Zhang et al. 2012).
4. Gaps in Existing Researches Although side‐channel attacks in general and cache side‐channel attack in particular are known for a quite long time, it seems there is a lack of remedies and countermeasures that can be applied in cloud computing. Multi‐ tenancy and co‐residency in cloud computing have gained the researchers’ attention to explore and examine the level of damage side‐channel attacks can do in cloud computing (Ristenpart & Tromer 2009)(Aviram et al. 2010)(Xu et al. 2011). Moreover, it is also proven that side‐channel attacks can extract cryptographic private keys from unwary virtual machines (Zhang et al. 2012). This section will be focusing on a number of proposed solutions to tackle cache side‐channel attacks in cloud computing. The proposed mitigation approaches can be classified in two different types of approaches: software‐based mitigation techniques or hardware‐based solutions. Hardware‐based solutions A considerable number of hardware solutions have been proposed to tackle and prevent side‐channel attacks in general (Page 2005)(Wang & Lee 2007)(Lee 2008) (Domnitser et al. 2012)(Kong & Aciicmez 2013). Most of
143
Younis Younis, Kashif Kifayat and Madjid Merabti
these solutions focus on reducing or eliminating interfering in cache accesses such as cache randomisation (Wang & Lee 2007) and cache partitioning (Domnitser et al. 2012). In the randomisation approach, the cache interferences are randomised by randomising the cache eviction and permutation of the memory‐cache mapping (Lee 2008). However, the cache partitioning approach is focusing on partitioning the cache into distinctive zones for different processes. Therefore, the cache interfering will be eliminated due to each process can only access its partition that has reserved cache lines (Domnitser et al. 2012). Although hardware‐ based defences’ techniques seem to be more secure to be implemented as they more efficient and thwart the root cause of those attacks, they cannot be practically applied until CPU makers implement them into CPUs and that does not to be feasible in the recent time. Software‐based mitigation techniques Software‐based mitigation techniques are attack‐specific solutions, which can only tackle attacks that they are proposed for. Consequently, these solutions might not have the ability to mitigate new side‐channel attacks. Assigning predefined cache pages to CPU cores: This solution relies on assigning one or many prespecified private pages of the CPU cache, particularly the last level of cache (L3) to the CPU cores (Kim et al. 2012). So, each core will have a limited amount of memory, which will not be accessed by or shared with other cores. However, it suffers from insufficient uses of CPU cache as operations executed by CPU cores demand different sizes of cache pages. They require various sizes of pages according to operations they are performing. As a consequence, cores will be assigned with more or less than they need of cache pages. Furthermore, when numbers of virtual machines are increased, this approach will suffer from scalability and security issues, as virtual machines can overlap using a CPU core that assigns exclusive pages. Therefore, assigning private pages to a CPU core used by various virtual machines will not prevent cache side‐channel attack. Finally, the proposed solution only targets the last level of cache (L3) with extra cost and aims to mitigate active time‐ driven and trace‐driven side‐channel attacks. Hence, it cannot prevent other types of side‐channel attacks such as access‐driven side‐channel attacks or deal with other CPU cache levels (L1and L2). Flushing the CPU cache: This solution is targeting the Prime and Probe attack, which is presented in section 3. It flushes the CPU cache to prevent an adversary from gaining any information about timing to read data from memory associated with individual cache sets (Godfrey & Zulkernine 2013). In this solution, when two machines overlap and use the same CPU cache, the CPU cache will be flushed immediately after changing from one VM to another. Thus, when a virtual machine primes the CPU cache and waits for another virtual machine to access the CPU cache, the cache will be flushed directly after the second virtual machine takes control of the CPU cache, and that will destroy the probe step. Although this solution can prevent access‐driven cache side‐ channel attacks by preventing interfering between virtual machines, it affects the cache usefulness by flushing the CPU cache when virtual machines overlapping occur. It also introduces overhead particularly when the numbers of virtual machines are increased. Inject noise to cache timing: This approach aims to inject additional noise into the timing that an adversary may observe from the CPU cache (Zhang & Reiter 2013). This approach is also targeting the prime and probe attacks. When an attacker periodically primes the CPU cache with its own data, a periodic cache cleansing process will be called to cleanse the CPU cache. So, the attacker cannot observe any timing information about the victim when it launches the probe step. The periodic cache cleansing process primes the CPU cache in random order until all the cache entries have been evicted. However, this approach actually flushes all the CPU cache entries, which will reduce the cache usefulness and introduce unacceptable overhead to the CPU.
5. Conclusion Side‐channel attacks are well‐known Micro architectural Attacks (MA), which benefits from correlation between the higher level functionalities of software and the underlying hardware phenomena. Their affect gets worse with cloud computing multi‐tenancy unique vulnerabilities such as clients co‐residence and virtual machine physical co‐residency. They enable adversaries to interfere with victims on the same physical machine and exfiltrate sensitive information. In this paper, we have surveyed cache side‐channel attacks and how they benefit from multi‐tenancy and virtualization in cloud computing. We have defined them and present their types with ways to penetrate the security of cryptographic algorithms. We have also presented gap in existing researches and number of possible countermeasures. In the prospect of this work, we are going to propose a generic solution to cache side‐channel attack. It will focus on preventing these attacks without affecting the cache and CPU efficiencies.
144
Younis Younis, Kashif Kifayat and Madjid Merabti
References Acıiçmez, O., Brumley, B. & Grabher, P., 2010. New results on instruction cache attacks. In CHES’10 Proceedings of the 12th international conference on Cryptographic hardware and embedded systems. Springer‐Verlag Berlin, Heidelberg, pp. 110–124. Available at: http://link.springer.com/chapter/10.1007/978‐3‐642‐15031‐9_8 [Accessed March 18, 2014]. Aciiçmez, O., Koç, Ç. & Seifert, J., 2007. On the power of simple branch prediction analysis. In ASIACCS ’07 Proceedings of the 2nd ACM symposium on Information, computer and communications security. pp. 312–320. Available at: http://dl.acm.org/citation.cfm?id=1266999 [Accessed March 19, 2014]. Acıiçmez, O. & Schindler, W., 2008. A Vulnerability in RSA Implementations Due to Instruction Cache Analysis and Its Demonstration on OpenSSL. In CT‐RSA’08 Proceedings of the 2008 The Cryptopgraphers' Track at the RSA conference on Topics in cryptology. pp. 256–273. Acıiçmez, O., Schindler, W. & Koç, Ç., 2006. Cache based remote timing attack on the AES. Topics in Cryptology–CT‐RSA 2007, 4377, pp.271–286. Available at: http://link.springer.com/chapter/10.1007/11967668_18 [Accessed March 19, 2014]. Aciicmez, O. & Seifert, J.‐P., 2007. Cheap Hardware Parallelism Implies Cheap Security. In Workshop on Fault Diagnosis and Tolerance in Cryptography (FDTC 2007). Ieee, pp. 80–91. Available at: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4318988 [Accessed March 19, 2014]. Acrimez, O. & Koc, C., 2006. Trace‐driven cache attacks on AES. In ICICS’06 Proceedings of the 8th international conference on Information and Communications Security. Springer‐Verlag, pp. 112–121. Available at: http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Trace‐Driven+Cache+Attacks+on+AES#4 [Accessed March 19, 2014]. Anderson, R. et al., 2006. Cryptographic Processors‐A Survey. Proceedings of the IEEE, 94(2), pp.357–369. Available at: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1580505. Aviram, A. et al., 2010. Determinating timing channels in compute clouds. In Proceedings of the 2010 ACM workshop on Cloud computing security workshop ‐ CCSW ’10. New York, New York, USA: ACM Press, p. 103. Available at: http://portal.acm.org/citation.cfm?doid=1866835.1866854 [Accessed May 13, 2014]. Bardin, J. et al., 2009. Security guidance for critical areas of focus in cloud computing. Cloud Security Alliance, pp.0–176. Available at: https://cloudsecurityalliance.org/guidance/csaguide.v3.0.pdf [Accessed October 11, 2012]. Blake, I., Seroussi, G. & Smart, N., 2005. Advances in elliptic curve cryptography, Cambridge University Press,. Available at: http://books.google.com/books?hl=en&lr=&id=E3hVu5ZjbxQC&oi=fnd&pg=PR9&dq=Advances+in+elliptic+curve+cry ptography.&ots=GzQfOPw9Au&sig=qJBISNT6zMN0CXrdOIdyQTbFbJ4 [Accessed March 17, 2014]. Domnitser, L. et al., 2012. Non‐monopolizable caches. ACM Transactions on Architecture and Code Optimization, 8(4), pp.1–21. Available at: http://dl.acm.org/citation.cfm?doid=2086696.2086714 [Accessed May 14, 2014]. Gandolfi, K., Mourtel, C. & Olivier, F., 2001. Electromagnetic analysis: Concrete results. In Cryptographic Hardware and Embedded Systems — CHES 2001. Springer‐Verlag Berlin, Heidelberg, pp. 251–261. Available at: http://link.springer.com/chapter/10.1007/3‐540‐44709‐1_21 [Accessed May 12, 2014]. Godfrey, M. & Zulkernine, M., 2013. A Server‐Side Solution to Cache‐Based Side‐Channel Attacks in the Cloud. In 2013 IEEE Sixth International Conference on Cloud Computing. Ieee, pp. 163–170. Available at: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6676691 [Accessed December 16, 2013]. Intel, 2007. An Overview of Cache. Intel, pp.1–10. Available at: http://download.intel.com/design/intarch/papers/cache6.pdf [Accessed March 19, 2014]. Keller, E. et al., 2010. NoHype: virtualized cloud infrastructure without the virtualization. In ISCA ’10 Proceedings of the 37th annual international symposium on Computer architecture. pp. 350–361. Available at: http://dl.acm.org/citation.cfm?id=1816010 [Accessed March 18, 2014]. Kim, T., Peinado, M. & Mainar‐Ruiz, G., 2012. Stealthmem: system‐level protection against cache‐based side channel attacks in the cloud. In Security’12 Proceedings of the 21st USENIX conference on Security symposium. Available at: http://dl.acm.org/citation.cfm?id=2362804 [Accessed December 31, 2013]. Kong, J. & Aciicmez, O., 2013. Architecting against software cache‐based side‐channel attacks. IEEE TRANSACTIONS ON COMPUTERS, 62(7), pp.1276–1288. Available at: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6178238 [Accessed June 30, 2014]. Kowarschik, M. & Weiß, C., 2003. An overview of cache optimization techniques and cache‐aware numerical algorithms. Algorithms for Memory Hierarchies, 2625, pp.213–232. Available at: http://link.springer.com/chapter/10.1007/3‐ 540‐36574‐5_10 [Accessed March 19, 2014]. Lee, R.B., 2008. A novel cache architecture with enhanced performance and security. In 2008 41st IEEE/ACM International Symposium on Microarchitecture. Ieee, pp. 83–93. Available at: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4771781. Liu, F. & Lee, R.B., 2013. Security testing of a secure cache design. In Proceedings of the 2nd International Workshop on Hardware and Architectural Support for Security and Privacy ‐ HASP ’13. New York, New York, USA: ACM Press, pp. 1– 8. Available at: http://dl.acm.org/citation.cfm?doid=2487726.2487729 [Accessed December 31, 2013]. Mell, P. & Grance, T., 2011. The NIST definition of cloud computing. NIST special publication, pp.1–3. Available at: http://csrc.nist.gov/publications/nistpubs/800‐145/SP800‐145.pdf [Accessed October 15, 2012]. Osvik, D., Shamir, A. & Tromer, E., 2006. Cache attacks and countermeasures: the case of AES. In Topics in Cryptology–CT‐ RSA 2006. pp. 1–25. Available at: http://link.springer.com/chapter/10.1007/11605805_1 [Accessed March 19, 2014].
145
Younis Younis, Kashif Kifayat and Madjid Merabti
Page, D., 2005. Partitioned Cache Architecture as a Side‐Channel Defence Mechanism., Available at: https://eprint.iacr.org/2005/280.pdf [Accessed May 14, 2014]. Ristenpart, T. & Tromer, E., 2009. Hey, you, get off of my cloud: exploring information leakage in third‐party compute clouds. In CCS ’09 Proceedings of the 16th ACM conference on Computer and communications security. pp. 199–212. Available at: http://dl.acm.org/citation.cfm?id=1653687 [Accessed March 11, 2013]. Tromer, E., Osvik, D.A. & Shamir, A., 2009. Efficient Cache Attacks on AES, and Countermeasures. Journal of Cryptology, 23(1), pp.37–71. Available at: http://link.springer.com/10.1007/s00145‐009‐9049‐y [Accessed March 19, 2014]. Wang, Z. & Lee, R.B., 2007. New cache designs for thwarting software cache‐based side channel attacks. In Proceedings of the 34th annual international symposium on Computer architecture ‐ ISCA ’07. New York, New York, USA: ACM Press, p. 494. Available at: http://portal.acm.org/citation.cfm?doid=1250662.1250723. Wu, Z., Xu, Z. & Wang, H., 2012. Whispers in the hyper‐space: High‐speed covert channel attacks in the cloud. In Security’12 Proceedings of the 21st USENIX conference on Security symposium. pp. 1–9. Available at: https://www.usenix.org/system/files/conference/usenixsecurity12/sec12‐final97.pdf [Accessed March 17, 2014]. Xu, Y. et al., 2011. An exploration of L2 cache covert channels in virtualized environments. In Proceedings of the 3rd ACM workshop on Cloud computing security workshop ‐ CCSW ’11. New York, New York, USA: ACM Press, p. 29. Available at: http://dl.acm.org/citation.cfm?doid=2046660.2046670. Yarom, Y. & Falkner, K., 2013. Flush + Reload : a High Resolution , Low Noise , L3 Cache Side‐Channel Attack. Cryptology ePrint Archive, pp.1–9. Available at: http://eprint.iacr.org/ [Accessed March 2, 2014]. Zhang, Q., Cheng, L. & Boutaba, R., 2010. Cloud computing: state‐of‐the‐art and research challenges. Journal of Internet Services and Applications, 1(1), pp.7–18. Available at: http://www.springerlink.com/index/10.1007/s13174‐010‐ 0007‐6 [Accessed July 18, 2012]. Zhang, Y. et al., 2012. Cross‐VM side channels and their use to extract private keys. In Proceedings of the 2012 ACM conference on Computer and communications security ‐ CCS ’12. New York, New York, USA: ACM Press, pp. 305–316. Available at: http://dl.acm.org/citation.cfm?doid=2382196.2382230. Zhang, Y. & Reiter, M., 2013. Düppel: retrofitting commodity operating systems to mitigate cache side channels in the cloud. In Proceedings of the 2013 ACM SIGSAC conference …. Available at: http://dl.acm.org/citation.cfm?id=2516741 [Accessed December 31, 2013]. Zhou, Y. & Feng, D., 2005. Side‐Channel Attacks: Ten Years After Its Publication and the Impacts on Cryptographic Module Security Testing. IACR Cryptology ePrint Archive, (60503014), pp.1–34. Available at: http://csrc.nist.gov/groups/STM/cmvp/documents/fips140‐3/physec/papers/physecpaper19.pdf [Accessed March 17, 2014].
146
Masters Research papers
147
148
In Kernel Implementation of RSA Routines Asaf Algawi1, Pekka Neittaanmäki2, Nezer Zaidenberg2 and Tasos Parisinos3 1 Tel Aviv Jaffa Academic college, Department of computer science, Tel Aviv, Israel 2 University of Jyväskylä, Department of information technology, Jyväskylä, Finland 3 Beagle project, Jyväskylä, Finland
[email protected] [email protected] Abstract: Microsoft Windows operating system offers an in kernel implementation of RSA(Rivest et al 1978) algorithm and other cryptographic algorithms. In‐kernel implementation, such as the one offered by windows, have great security significance compared to userspace implementation. because kernel code cannot be easily replaced using simple DLL or library replacement techniques unless the executing server is already hacked it is harder to attack such encryption systems. Should the encryption library be somehow implemented in user space (as it is done in almost any UNIX system and OpenSSL it would have been relatively easy to replace it. Malware or viruses can trick the end user by replacing the RSA algorithm implementation with a different, defective user space RSA code . The end user may think a communication with some server is secure as she is using RSA encryption for that communication. In practice the RSA implementation that is used for that specific communication, even if the communication was set up by a 3rd party application may be a replaced or a defective RSA implementation etc. the defective RSA code will mislead the user. The end user and authenticating server has no way of knowing that the RSA implementation used by some 3rd party application on some 3rd party computer is indeed secure and not replaced and may be off guard. The Linux kernel was lacking of such RSA implementation. Parisinos (2007) have implemented the first step in such RSA protection but his implementation did not include routines for key management. Our contribution is a kernel library for RSA implementation and for handling math operations with large numbers (128bit) augmenting Tasos’s original contribution.. Our paper describes our in‐kernel implementation of RSA encryption algorithms. We provide benchmark of our in kernel RSA implementation and match it with user space standard implementations. Our implementation is GPL license (open source) and offered as a to the Linux community. Keywords: RSA Routines
1. Introduction Efficient user space libraries for RSA as well as virtually any commonly used cryptographic algorithms exist for a while. ( for example (OpenSSL) among others) however, user space libraries (shared objects in Linux) can be easily replaced, for example using LD_PRELOAD or by replacing the code that is called in the executable (ELF) to point to a different DLL (Shared object under UNIX). A malicious code or malware may detect applications that are using RSA code and replace the RSA code with defective code. Such applications can even be 3rd party and unrelated code and still be affected by the malware. The defective RSA code may use non cryptographically secure pseudo random number generator, leak the private keys or be otherwise defective in a countless of ways. The result of using such defective RSA implementation is that the communication is not secure, however, the end user and even the server that the end user communicates with are not aware of any attack. As far as the user is concerned we are using trustworthy algorithms and the user should have no suspect of eavesdropping or man in the middle attacks. Likewise, The server cannot detect foul play and warn the user. The end result is that is even worse than insecure communication because the user is alert over insecure communication protocols. The problem with insecure RSA is that the user think the communication is secure and will lower her guard. This weakness in user space code replacement is present in any operating system (not only Linux) and thus microsoft windows provides in kernel routines for encryption. These in kernel methods can still be replaced but it is much more difficult to replace or change kernel behavior as opposed to user space code. Changing kernel behaviour requires root permissions on the machine (“rooted device”) which is harder to obtain and the device must be hacked first as opposed to just run user code which is permission that is usually freely given. However, this critical security feature is still not available under Linux. This vulnerability is part of the white box (Joye 2008) cryptographic model. Normally (under black box cryptographic model) the user is running a cryptographic black box and may attack the inputs and outputs but
149
Asaf Algawi et al
can not attack or change any code that is part of the black box itself. On the white box environment an attacks has full control of the machine executing the code (as opposed to just it’s inputs and outputs). In that case the attacker can fully replace routines running on the machine. this is typical to trusted computing environment. In order to increase the protection we propose an in‐kernel implementation of RSA. This is not a complete protection against such attacks because an attacker may even gain administrator access (“root”) to the machine and load or replace kernel modules effectively changing kernel code but it certainly improves the protection. It is much more difficult to obtain root or administrator permissions (“rooted device”) on the target compared to normal user permissions. Our implementation allows for the RSA routines to run encryption and decryption inside the Linux operating system kernel as kernel modules. While running in the operating system kernel our RSA routines are less suspect to user level attempts to replace them. Furthermore, the kernel integrity can be easily verified by a remote server providing the keys to ensure the operating system in not tampered with(Kennel et al 2002). It is much more difficult to perform such tests on user space code because the routines that compute the checksums themselves may be tampered with. Furthermore, as anything but the most simple application include countless of libraries and each of these libraries progresses with its own version numbers and its own code modification an application may have lots of different legitimate libraries it can use and many legitimate checksums with each different library version resulting in many legitimate application checks and valid checksum generation and endless need to support test and add legitimate checksums as each of the dependent libraries releases new version. This proves too much more for most practical purposes. Microsoft Windows had these abilities (in kernel RSA decryption) for a long while (since windows XP or earlier) but for some reason in kernel RSA implementation was left out side of the Linux kernel. In this paper we describe our kernel module that fills the gap.
2. Prior and related work The most common RSA implementation under Linux is part of the OpenSSL(2014) open source software package. OpenSSL user space implementation is used in virtually all Linux distribution and also in several other non‐linux open source operating systems. Tasos Parisinos have introduced an in kernel RSA implementation in a forum post on LWN.net (parisinios 2007) but in order to implement some complete systems Parisinos’s implementation is lacking some critical features. Parisinos’s implementation did not include means to communicate with user space code from the kernel RSA implementation and the the key generations and other key features that are required to actually use the RSA system were lacking. We augment Parisinos’s existing, published and tested work by adding several structures and functions. Our additional functions and structure relevant for key generations. Another additional feature of our implementation is means for communication between user applications and our kernel space RSA implementation. Our communication protocol is utilizing the Linux netlink socket interface. Lasr our implementation introduces new code for cryptographically secure generating and managing of random keys.
3. Our contributions As previously stated previously, we use Parisnios’s prior implementation as basis for our implementation as perisinios implementation is already tested for both performance and cryptography and is performing satisfactory on both accords. However, Parisnios’s prior implementation includes only the means to encrypt to decrypt blocks using RSA. Thus this work cannot be used with any user processes at all. furthermore, even if this limitation is somehow lifted by introducing some means for kernel and user communication the solution is still insufficient. In order to work with this solution the user must supply his own public and private keys. This is insufficient because all previously discussed attacks (libraries that leak keys, non cryptographically secure pseudo random number generator etc.) may still be used on the user code that calls the kernel code and provides the keys. so unfortunately under white box assumptions the system is still vulnerable and the keys can still be leaked or faulty keys may be generated. Therefore we made the following additions to the Parisinios’s original work:
Pseudo random prime number generator or even use intel platform RDRAND assembler instruction.
150
Asaf Algawi et al
Key generation methods and structures
User space and Kernel space communication layer over net link sockets.
The first of three is the pseudo random prime number generator, in order to build such generator we needed to build a test which could provide the means to determine if a given number P is prime. one such test is Fermat Primality test (Cormen et al 2001) based on Fermat's little theorem(fermat 1640) which states the ≡ 1 ⇒ , based on that test we build the following method: following: rsa_op_random_prime_generator(n) input: n //size of number in bytes output: p //random prime number do { p 1) { ri = left % right;
151
Asaf Algawi et al
qi = left / right;
left = right;
right = ri;
tmp = yi2 ‐ qi * yi1;
yi2 = yi1; yi1 = tmpy; }
//validate the sign so that yi1 > 0 if (yi1