Frontiers of information technology

4 downloads 149846 Views 8MB Size Report
ÓCopyright 2011 by International Business Machines Corporation. Copying in printed form for ...... Yelp!, Inc., Sony Corporation, Apple, Inc., or Sun Microsystems, Inc., in the United States, ... wiki/History_of_computing. 2. R. Kurzweil, BThe ...
Frontiers of information technology

D. E. Dillenberger D. Gil S. V. Nitta M. B. Ritter

Every year, the IBM Research Division undertakes a year-long process to analyze and map out important trends and future directions shaping the advances and the applications of information technology (IT). The results are summarized in a document called the Global Technology Outlook (GTO), which influences IBM’s strategy and technology roadmaps. Coinciding with the IBM Centennial, a special chapter was commissioned in the 2011 GTO, which was designed to both reflect on the historical evolution of computers and computation as well as to look a few decades ahead to explore the new frontiers of IT. This paper presents the results of this study. It provides a vision of the future in which advances in technology will enable the creation of a new class of ‘‘learning’’ systems, i.e., designed with people as an integral and central element of the process, and which are explicitly aimed to enhance human cognition. These systems will learn from both structured and unstructured data, find important correlations, create hypotheses for these correlations, and suggest and measure actions to enable better outcomes for users. Systems with these capabilities will transform our view of computers from ‘‘calculators’’ to ‘‘machines that learn,’’ which is a shift that will radically alter our expectations of what computing ought to do for us as humans and that will equip us to successfully navigate the increasing complexity of our globally interconnected world.

Introduction The twenty-first century presents our species with the following defining challenge: How will we achieve our desired outcomes in the context of the massively complex and interdependent world we have built? For millennia, humans have improved their condition by inventing and using tools to shape the world around them. The Industrial Revolution dramatically improved our ability to master the physical world through advances in energy use and mechanization. We can transport goods and people efficiently and rapidly across vast distances due to cars, trains, ships, and planes. Machines enable us to drill miles underground, move small mountains, build dikes that shield us from the sea, create cities in the desert, and even explore outer space. We are no longer limited by what we can lift and how fast we can run. In the twentieth century, the information and communications technology revolution further reshaped the world by connecting people and information in Digital Object Identifier: 10.1147/JRD.2011.2163275

unprecedented ways. The cumulative effect of all of these technological advances has created the modern world. Along with it, we have greatly complicated decision making because of the massive complexity and interdependence we have produced. Consequently, one of the defining questions of our time revolves around the question of whether we have the right tools to help us navigate in this new world. We have compasses and global positioning systems (GPSs) to help us arrive at the correct physical location, but how will we navigate this complex and interdependent world to find the right ideas and solutions for our most pressing challenges?

Computing and decision making Daily, we are faced with a torrent of unstructured data containing complex interrelationships and correlations. This data is valuable and can be mined to improve decisions and provide better outcomes for businesses, institutions, and individuals. Unfortunately, present computing systems (i.e., the best tools we have been able to build to help us

ÓCopyright 2011 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied by any means or distributed royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor. 0018-8646/11/$5.00 B 2011 IBM

IBM J. RES. & DEV.

VOL. 55

NO. 5

PAPER 1

SEPTEMBER/OCTOBER 2011

D. E. DILLENBERGER ET AL.

1:1

manage and process information) cannot Blearn[ these relationships from the sea of data; the best we have been able to do thus far has been to deploy expert programmers to create a constellation of targeted applications to analyze specific domainsVan approach that, although powerful, is limited in scalability and is brittle. The applications are brittle because they rely on experts to code known relationships rather than having the system learn all correlations present in the available information sources. Present applications are also brittle because they are static, requiring patches and updates to accommodate changing data, business, and societal conditions, with the net result of limiting the scalability of this expert-based approach.

From Bcalculators[ to Blearning systems[ Fortunately, our expanding technology frontiers are providing us with an answer. We anticipate that it will be possible to build a new class of systems that can learn from both structured and unstructured data, find important correlations, create hypotheses for these correlations, and suggest and measure actions to enable better outcomes for users. Systems with these capabilities will transform our view of computers from Bcalculators[ to Bmachines that learn,[ a shift that will radically alter our expectations of what computing ought to do for us as humans and that will equip us to navigate successfully the increasing complexity of our globally interconnected world. IBM’s Deep Question and Answer (DeepQA) system, Watson*, is a powerful demonstration that the era of Blearning systems[ is indeed upon us. Watson’s ability to compete and win against the best champions in Jeopardy!** represents both a milestone in DeepQA systems and the beginning of a technological revolution. To expand computing systems to exhibit higher intelligent behavior will require advances (and sometimes breakthroughs) in learning algorithms and architectures, expanded data input and output modalities (e.g., the ability to process text, graphs, images, videos, sound, and other sensory information), and novel device technologies that will exploit the latest semiconductor and nanotechnology advances. We envision a roadmap that will progress toward increasingly dynamic learning capabilities, eventually achieving systems that will be able to learn autonomously across arbitrary domains. These systems will have an impact on virtually every sector of the economy, enabling applications and services that will range from preventing fraud and providing better security to improving sales and helping launch products to improving medical diagnosis. In the subsequent sections of this paper, we describe a roadmap to build such learning systems. Before we do so, however, we start with a brief history of computing, followed by trends in machine learning. We also describe current tradeoffs between device technology and architecture and

1:2

D. E. DILLENBERGER ET AL.

define a new device roadmap that would, in addition to advances in machine learning, enable us to reach our goal of building truly autonomous systems one day.

Brief history of machine computing and intelligence Ever since humans began trading and, consequently, adding and subtracting numbers, they have perceived a need for mechanical assistance to help them keep track of their transactions. Thus, the earliest computers were born. In this section, we attempt to highlight some of the most noteworthy inventions in the history of computing, with particular emphasis on the evolution of computer intelligence in the last part of the 20th century. The earliest recorded attempts to use machines to perform additions and subtractions can be traced back to the use of the abacus by the Babylonians in circa 2,400 BC [1]. The use of the abacus spread all over Asia and the Middle East and eventually reached Greece in circa 500 BC [1]. Greece is where the Antikythera mechanism was invented around 100 BC, and this represented the next significant advance in computing. The Antikythera was a very complex astronomical calculator and can be considered the first analog computer. It was so complex that it was not successfully replicated until about 1400 AD in Europe [1]. Advances in the complexity and sophistication of computers happened in brief spurts, followed by long periods of gradual advances in the field. Interestingly, this trend in the advance of technology has been observed in other fields, as well as in the evolution of life itself, and this observation is well chronicled [2]. A good example of this is the invention of Napier’s rods in the 16th century, which led to a series of advances that ultimately led to the invention of the Babbage engines. The culmination of this period was the design of the difference and analytical engines by Charles Babbage [1]. Interestingly, Babbage’s engines were never realized in his time, and it was only in the last part of the 20th century that his difference engine was fully built [3]. Babbage’s designs were significant because they anticipated several features of modern electronic computers such as internal memory and provided several forms of output. The next series of rapid advances began when Alan Turing invented the principles of modern computing in 1936. His BUniversal Turing Machine,[ as it came to be known, was a conceptual digital computer that had a stored program, large memory, a heuristic search engine and, intriguingly, an ability to have the machine learn from experience (see, e.g., plato, http://plato.stanford.edu/entries/computing-history/ or http://www.stottlerhenke.com/ai_general/history.htm) by altering its own instructions. Turing’s theory of computation proved that any mathematical deduction could be imitated by manipulating symbols as simple as 0’s and 1’s. Turing further postulated about intelligence in machines for the

IBM J. RES. & DEV.

VOL. 55

NO. 5

PAPER 1

SEPTEMBER/OCTOBER 2011

first time. Turing’s ideas had a great impact on von Neumann who helped design the Electronic Discrete Variable Automatic Computer, which is the successor to the Electronic Numerical Integrator And Computer, the first general-purpose stored program digital computer ever built. In a further example of exponential trends creating rapid advances in a field, the postwar period in the middle of the 20th century became a particularly fruitful one for advances in computing. Vacuum tubes gave way to the bipolar transistor and, later, the FET. The invention of the transistor and the ensuing need to build circuits with many transistors led to the development of the integrated circuit in 1958 by Jack Kilby at Texas Instruments [4]. Robert Noyce independently invented his own version of the integrated circuit several months later, and his approach proved to be more suitable to manufacture; thus, the modern electronics industry was born. Programming languages were invented, operating systems were developed, Gordon Moore proposed his famous Blaw,[ and Robert Dennard described the essential scaling principle for the FET. All this resulted in computers rapidly becoming smaller, more powerful, and ever more ubiquitous. Notable events during this time include the birth of the Internet (ARPAnet in 1969), the development of UNIX**, and the birth of the CPU. In the 1940s and 1950s, scientists began discussing the creation of an artificial brain. Early examples of analog robots created at this time were the Johns Hopkins Beast [5] and W. Grey Walter’s turtles [6]. These early nondigital robots wandered the halls looking for electrical outlets to plug themselves into. As digital computers became more accessible in the 1950s, pioneers wrote programs that used rules and symbolic logic to solve increasingly complex reasoning problems. In 1951, Christopher Strachey wrote a checkers program [7] and Dietrich Prinz wrote one for chess [8]. In 1955, Allen Newell, Herbert Simon, and J. C. Shaw created the Logic Theorist, which was able to prove 38 theorems from Whitehead and Russell’s Principia Mathematica. Some of the proofs developed by this program were new and more elegant than previously published proofs [9]. The term Bartificial intelligence[ was first coined at the Dartmouth Conference of 1956. This conference asserted that Bevery aspect of learning or any other feature of intelligence can be so precisely described that a machine can be made to simulate it[ [10]. During the 1960s, artificial intelligence used semantics and script-based programming to simulate human conversation in Bchatterbot[ programs. These programs mimicked human conversation in text with no underlying understanding of human language. An example of a chatterbot was the 1966 MIT ELIZA [9], which was created by Joseph Weizenbaum. Rule and script-based programs were limited by their unchanging rules and the constrained scenarios they were written for. In changing real-world environments, their uses

IBM J. RES. & DEV.

VOL. 55

NO. 5

PAPER 1

SEPTEMBER/OCTOBER 2011

were narrow. In the 1970s, worldwide government funding in artificial intelligence waned [11]. Around this time, IBM Research started to apply statistical models to speech processing and machine translation [12]. In the 1980s, as computing systems became more powerful, with faster CPUs and more memory, more data was able to be stored and mined digitally. Machine learning was able to use real-time statistical techniques and probabilistic models (e.g., Markov, Bayesian, and support vector machines) to find patterns in vast amounts of digital data. Probabilities and actual historical data were used to augment and guide rule and script-based reasoning. In the 1990s, IBM’s Deep Blue* defeated Gary Kasparov in a match while searching more than 100 million chess positions per second. In 2011, IBM’s question-answering system, Watson, defeated the two greatest Jeopardy! game-show champions. Further techniques to enable machines to overcome the limitations of static prewritten rules were the use of neural networks and genetic programming. Neural networks were used to train programs to evolve their own rules and scripts. Genetic programming was introduced to enable programs to change themselves, evolving toward a Bfitness[ function to optimize a task. Dramatic developments in computer architecture, machine learning, and communications are now once again positioning the field for rapid evolution. At this juncture in time, computer capability has advanced to the point where computers can routinely beat the best human chess players, recognize natural language speech, and assist humans by autonomously flying airplanes when desired. Further, they are used to autonomously control important processes in a range of industries from finance to health care. As noted above, Turing predicted that, ultimately, computers will advance to the point where, for all practical purposes, they appear intelligent. In his own words, he suggested that computers would advance to the point that they can Bplay the imitation game so well that an average interrogator will not have more than 70% chance of making the right identification after five minutes of questioning[ [13, 14]. He predicted this would happen around the year 2000. While he was inaccurate about the time, it can be argued that the full realization of Turing’s vision is inevitable, and the only question that remains is one of when it is likely to happen.

State-of-the-art learning systems Systems based on expert rules have been developed and are used in fields such as medicine [15] and automobile repair [16] to help narrow down possible root causes from symptoms. The limitation of expert rule-based systems is that the rules are static and will become obsolete as new knowledge is discovered [17]. Advances have also been made on statistical methods [18], which are used to find

D. E. DILLENBERGER ET AL.

1:3

Figure 1 IBM DeepQA hypothesis generation.

patterns in data. However, humans are still required to augment statistical methods when the context of the data is ambiguous [19]. For example, does the term Bcookie[ refer to a pastry or a web-browser software tag? Finally, progress has also been made on neural networks [20] and genetic programming [21]. Yet, for all these different machine-learning methods, humans are still required to identify to the system what are the essential features of a domain [22]. For example, expert physicians can provide diagnosis software with essential features to use to find abnormalities or patterns, such as a person’s temperature, blood pH, heart rate, and cholesterol level. The next frontier of learning systems is to autonomously and accurately identify essential features for any domain (e.g., medicine, fraud, global climate change, and financial risk) over time as conditions change. To do this, learning systems must understand the context of data to disambiguate. The methods that are currently used to identify essential features fall into two categories: human experts and crowd sourcing. Human experts have years of experience in a domain and work with information technology (IT) engineers to build a system that encapsulates their knowledge to provide rule-based systems, dashboards, mechanical autonomous best

1:4

D. E. DILLENBERGER ET AL.

practices, statistical extractors (e.g., medical diagnosis software, aviation autopilots, terrorist identification systems, and police neighborhood crime-indicator forecasters). The limitations of this approach are that 1) the essential features identified will become obsolete over time and 2) this method is manually intensive, requiring teams of experts for each domain, where each domain has further nuances that require specialist subsets (e.g., financial risk identifiers specific to countries, industry, or type of commodity). A second category currently used to extract essential features is crowd sourcing. This method captures anonymous mass preferences for choices. For example, crowd-sourcing techniques are used to provide the most Brelevant[ search results to keywords (page ranking algorithm), movies one may enjoy (Netflix** recommendations algorithm), books one may like to purchase (Amazon** book recommendations), and Bbest[ restaurants by area or cuisine (Yelp**). No one person or team of people sat down to examine explicitly what would be your personal preferences in movies, books, or restaurants. The advantage of this technique for identifying essential features of a domain (Bbest[ movie, restaurant, book tailored to subject, person, country, etc.) is that it is dynamic. The essential features will change over time as crowd choices change. One

IBM J. RES. & DEV.

VOL. 55

NO. 5

PAPER 1

SEPTEMBER/OCTOBER 2011

Figure 2 Timeline for state-of-the-art machine-learning systems.

disadvantage is a susceptibility to the Lemming effect, where following popular choices may not lead to the best outcome. A second disadvantage is that learning essential features from crowd-sourcing techniques does not lead to novelty. A new feature or trend will not be identified until after groups of people have already chosen it. For example, crowd sourcing will not lead to a discovery of what is the next Sony Walkman** or the next iPod**. At IBM Research, a team has developed Watson, a system that was able to compete successfully in the game show Jeopardy!. Watson combines advanced natural language processing, information retrieval, knowledge representation, reasoning, and machine-learning technologies to the field of open-domain question answering. At its core, Watson is built on IBM DeepQA technology for hypothesis generation, massive evidence gathering, analysis, and scoring. The IBM team provided Watson with millions of documents, including dictionaries, encyclopedias, and other reference material that it could use to build its knowledge. Rather than relying on a single algorithm to answer the questions, Watson uses hundreds of algorithms simultaneously to find the correct path to the answer, as shown in Figure 1. The more algorithms that independently arrive at the same answer, the more likely Watson will be correct. Once Watson comes up with a small number of potential solutions, it uses scoring algorithms to assess how

IBM J. RES. & DEV.

VOL. 55

NO. 5

PAPER 1

SEPTEMBER/OCTOBER 2011

strongly the evidence justifies an answer [23]. Watson’s approach was not to build specialized databases of domain-specific knowledge but to advance natural language processing and information-extraction techniques to use free-format text sources. These free-format text sources (human knowledge), i.e., combined with using multiple algorithms to come up with potential answers, allow for the comparison and scoring of the answers to select the response most likely to fit the question. Watson, crowd-sourcing techniques, and systems that rely on human experts are not 100% accurate all the time. Expert knowledge becomes stale. Automated extraction techniques have a difficult time with ambiguous terms. To advance in the area of automated essential features extraction, across any domain, learning systems must do the following: 1. Build a dynamic data corpusVConstantly mine data sources (including crowd sourcing) to build and update libraries of background knowledge across domains. 2. Expand hypothesis generation to different domainsVLeverage crowd sourcing. 3. Add scorers for different input modalitiesVImages, video, voice, environmental, biological, etc. 4. Incorporate deeper reasoningVImprove learning algorithms to make them real-time and online and allow higher levels of semantic abstraction.

D. E. DILLENBERGER ET AL.

1:5

Figure 3 Key areas of innovation required for future systems. Biology provides examples of ultimate computational efficiency and integration (left) showing that we have much further to go, and new computing paradigms (right) offer the promise of vastly increased efficiency for specific problems.

5. Invent domain adaptation toolsVDevelop software tools to help automate domain adaptation. 6. Understand context to disambiguateVFor example, when a machine-learning system encounters the term BInternet cookie[ for the first time, should this term be categorized as a pastry or a software tag? A state-of-the-art learning system would achieve understanding of natural language, image, voice, emotion, and other sensory information; be able to self-formulate hypotheses and generate questions across arbitrary domains; and utilize the selection of multiple algorithms to learn autonomously. A timeline of progress toward autonomous learning systems is shown in Figure 2. Currently, the state-of-the-art hardware to support advanced learning systems uses massively parallel systems [23]. In the next section, we describe technology trends and roadmaps to build future learning systems.

Device and technology innovation frontiers Future systems require new innovations in devices and technology to allow lower energy per operation and greater

1:6

D. E. DILLENBERGER ET AL.

functional diversification. Moore’s Law codifies the remarkable trend of device scaling observed over the past five decades [24], allowing more and more transistors to occupy the same area on a silicon chip. However, in the last decade, microprocessor chips have approached practical power limits, clock speeds have stagnated, and we are now approaching the end of our ability to shrink the transistor. We need new approaches to continue increasing computing capability when Moore’s Law in silicon comes to an end. Biology offers our inspiration: The brain can perform an estimated million to billion calculations per second with a power dissipation of only 20 W, which is roughly a million times more efficient than estimations of the power of a computer of similar capabilities (see Figure 3). The brain does not solve all problems as well as today’s computers; however, it does process and correlate sensory input, producing an understanding of context and meaning that we have not yet achieved with the von Neumann computational paradigm, regardless of the size of the computer. If we are to continue to advance computing and adding intelligence, we must tackle this efficiency problem.

IBM J. RES. & DEV.

VOL. 55

NO. 5

PAPER 1

SEPTEMBER/OCTOBER 2011

Figure 4 New device technology roadmap is required to continue scaling the performance of (bottom) von Neumann architectures and to implement (top) new non-von Neumann architectures.

We must also increase the functional diversity of the microsystems we create. Biology again provides the inspiration: Migratory birds can fly, navigate successfully thousands of kilometers per year, eat, reproduce, and live decades, and yet weigh only 100 g. If we instrument ourselves and the environment around us for greater medical, societal, and environmental understanding, the sensors we use must be incredibly power efficient (perhaps scavenging power) and must perform sensing, processing, and communication tasks. New technologies must be developed to provide the functional diversity needed to meet challenging environmental and functional specifications. As silicon scaling is approaching fundamental limits, we need to explore new computing paradigms to drive new levels of performance. For example, quantum computing affords exponential speedup of a restricted class of problems such as factoring numbers or solving the traveling salesman problem. With a quantum computer employing the algorithm in [25], one could factor a number in human time that would take more than the age of the universe for the biggest traditional computer. Quantum algorithms can also accelerate quadratically large database search tasks using the algorithm in [26]. Biologically inspired and neuromorphic

IBM J. RES. & DEV.

VOL. 55

NO. 5

PAPER 1

SEPTEMBER/OCTOBER 2011

computation also promises to increase the efficiency and scope of future computation, allowing great advances in speech, image, video, and other recognition tasks, as well as greater levels of machine understanding. A new device technology roadmap (see Figure 4) is required to deliver greater efficiency and allow new computing paradigms to be realized. The first portion of this roadmap includes technologies to support traditional von Neumann architectures, which are characterized by computing with programs that fetch instructions and data from memory, and to decode and execute those instructions. For existing architectures, technology scaling will continue, with a likely shift to carbon nanotubes or other nonsilicon-device scaling approaches below roughly 7 nm. New approaches to make very low voltage devices must be pursued in parallel, with tunnel FETs [27] and spin-based memory devices showing the most promise at present [28]. Reconfigurable logic technologies will be used to increase the efficiency of key computations or to allow feature upgrades without requiring the design and fabrication of new silicon. Using a combination of these technologies could enable power efficiencies 10–100 times better than today’s silicon technology, allowing new architectures employing

D. E. DILLENBERGER ET AL.

1:7

Figure 5 Learning system high-level architectural flowchart.

increased parallelism afforded by hundreds of billions of low-voltage devices. As a complement to new devices, we must also create new technologies for greater subsystem integration. Key examples are 3-D silicon technologies to place more memory very close to processors [29, 30], storage-class memory to provide the density of Flash memory devices but at much greater speed [31, 32], silicon photonics to allow greater data bandwidth to compute, and memory chips [33, 34]. Finally, new circuit, machine, and software architectures will be required to reap the benefits of new devices and technologies. Lower-power-per-operation devices would enable highly parallel architectures, potentially employing 3-D packaging for ultimate performance. However, effectively utilizing more processors will require advances in local storage density and communication bandwidth, as well as efficient parallelization of software tasks. Non-von Neumann architectures, which allow computing without programming, could give the greatest efficiency for certain classes of problems. Biomimetic approaches to increase machine intelligence will likely require device technologies with much greater fanout and architectures with features far different than those employed in traditional von Neumann machines. For example, we have already begun research employing phase-change-memory crossbar arrays to mimic neuronal synapses [35, 36]. Systems employing quantum computation are an example of a radical change in the entire computing system architecture. In quantum computation, the quantum bits, or qubits, hold information and perform

1:8

D. E. DILLENBERGER ET AL.

computation through interactions of qubits. One promising approach to the implementation of qubits is based on superconducting quantum interference devices [37], which must operate at cryogenic temperatures in order to reduce thermal noise and to preserve the quantum information. In such a quantum computing system, one sets the initial state of the qubits, time is allowed for the system of qubits to interact, and then, the answer is read from the final state of the qubits. All qubits interact with all the other qubits to determine the final answer. Standard von Neumann machines will likely always complement non-von Neumann computation approaches, requiring new hybrid system architectures for practical system implementation.

Learning systems With the advances in machine learning, data analytics, and incorporation of crowd sourcing, as well as the innovations in architecture and device technology described in the prior sections, we now describe what the capabilities of future learning systems might be. A high-level architectural flowchart of a learning system is given in Figure 5. Training and learning engines receive input from the Internet, from environmental sensors, and from society, institutions, and archives. They build models, find correlations among the inputs, and therefore define insights from the torrent of data entering the machine. Hypothesis engines then attempt to understand the origin of these correlations by creating conjectures for these correlations. Verification engines are then used to find valid

IBM J. RES. & DEV.

VOL. 55

NO. 5

PAPER 1

SEPTEMBER/OCTOBER 2011

Figure 6 Opportunity matrix separating categories of learning system applications by complexity (vertical axis is increasing complexity) and response speed (increasing to the right).

conjectures by gathering more information and by running simulations relevant to the domain of application to select the best hypotheses. In some cases, the training and learning engines will need to be modified through Bactive learning,[ i.e., a process where experts update the algorithms for increased accuracy. Policy engines constrain the set of hypotheses that are considered by imposing business, legal, and ethical rules. The outcome engine implements only tested hypotheses, suggesting best actions to human operators. In cases requiring a fast response, the outcome engine can act directly. The system then tracks those actions, collecting data to ensure that they achieve the desired outcome.

System opportunity matrix In Figure 6, the areas of application of these new systems are presented in a system opportunity matrix. The vertical axis is a measure of increasing task complexity; the more complex tasks are at the top of the chart. The horizontal axis represents increasing speed of the system (or decreasing latency of response) with the tasks to the right requiring the most rapid system response. Slow tasks of low complexity are in the lower left quadrant; these include technologically aided human navigation (e.g., GPS) and other tasks that require relatively simple or no information aids for a human to accomplish the task. This quadrant would include commodity information appliances that generate or consume information from larger servers or learning systems.

IBM J. RES. & DEV.

VOL. 55

NO. 5

PAPER 1

SEPTEMBER/OCTOBER 2011

The upper left quadrant contains very complex tasks such as product planning and launch, revenue maximization for large retail businesses, and medical diagnosis and treatment. This quadrant contains tasks that are currently accomplished by teams of specialists utilizing tailored applications and equipment to mine data in an attempt to optimize business processes and medical outcomes. Learning systems would greatly augment these tasks, as we shall describe in detailed scenarios. The upper right quadrant represents tasks that are both complex and need to be executed with low latency (or high speed). Security and first-responder applications exemplify these sectors of the economy characterized by decisions that must be made quickly, given imperfect information. In these applications, humans are typically unable to gather and process a sufficient volume of information quickly enough to arrive at optimal options for action. Learning systems would help collect and digest information, presenting better choices for better outcomes. In the lower right quadrant are tasks such as fraud prevention where response speed is critical. Here, a human could sift through data and arrive at an understanding of what decision to make, but the process would simply take too much time to perform the desired task with the intended outcome. A more detailed description of potential scenarios for learning systems applied to four of the above tasks will help illustrate the power of the new systems that we are proposing.

D. E. DILLENBERGER ET AL.

1:9

Figure 7 Learning system flowchart describing features required for a revenue maximization scenario.

Financial: Fraud prevention A typical medium-size bank processes approximately one billion transactions per year. Approximately, 1 in 5,000 transactions are fraudulent, accounting for losses of as much as $100 million per year for a medium-size bank. Applications to find and prevent fraud exist today [38], but they are based on static machine-learning and expert system approaches with infrequent updates. Unfortunately, static approaches are unable to match the ingenuity of swindlers who typically change their modus operandi rather quickly once they realize they have been discovered. Fraud prevention is an ideal application for learning systems. Training and learning engines would monitor automatic teller machine (ATM), retail store, gas station, online, and other transactions, learning normal and fraudulent or suspect transaction behavior. Hypothesis engines would dynamically assign risk scores for each customer, business, and transaction, and could posit patterns of collusion among businesses and verify by further transaction observations. The score assigned to a transaction would have a threshold dynamically modified by the policy engine containing business, legal, and ethical rules concerning types of transactions. For example, the transaction for a key customer checking into a hotel should not be denied even if it appears abnormal because the card issuer might lose a very valuable client. However, transactions at other businesses

1 : 10

D. E. DILLENBERGER ET AL.

might have a much lower fraud threshold because of past history of fraud. The outcome engine must then allow or block the transaction in a few milliseconds. If the transaction is blocked, a device with a camera such as an ATM could acquire an image or video of the person requesting the transaction, aiding in the eventual apprehension of criminals. Retail: Revenue maximization Consider another example of wide interest but one that has great complexity: retail store revenue maximization. Every manager has one principle goal, i.e., given the size of their store, how do they maximize revenue? This is a seemingly simple question that admits no trivial answer for most businesses. More stores are adopting digital pricing, allowing the opportunity for change based on store promotions. The collocation of certain products as well as the general flow of categories of products in the store can favorably influence consumer purchasing patterns. Security cameras record activities for review in the event of illegal behavior. Point-of-sale devices record purchases and are already connected to systems that monitor inventory and even initiate ordering and track supply chain issues. Store ambiance, such as music or lighting, and staff location can also influence sales. Given the staggering complexity of interrelated issues, store managers need new tools to help optimize revenue.

IBM J. RES. & DEV.

VOL. 55

NO. 5

PAPER 1

SEPTEMBER/OCTOBER 2011

A learning system (see Figure 7) would be a partner in revenue maximization and could use varied inputs, such as point-of-sales terminals and even store security cameras to collect input. Learning engines would detect correlations in purchasing behavior with the time of the day or year, the age and number of shoppers in a group, the effect of staff availability and location on the volume of purchases, and the effect of ambiance on customer behavior. Hypothesis engines could propose actions to increase revenue by collocating products, offering targeted discounts based on buying correlations, or by changes in ambiance or staff location. These hypotheses could be validated with models of consumer behavior, economic models of price elasticity, and known cultural or geographical preferences (e.g., in ambiance) to test and revise the hypothesis, and more data could be gathered to choose the most likely successful hypothesis. Finally, validated hypotheses could be actuated with store manager approval and impact data collected and tabulated to confirm the validity of the chosen action. Management could move from targeted implementation of actions in a few stores to deployment across the franchise as the validity of the changes was demonstrated. Security Learning systems could also significantly enhance the security of public places, buildings, parking lots, and other poorly monitored locations. The system would have an ability to detect anomalous events and differentiate between normal and abnormal human behavior, as well as identify known trouble makers. As an example, it could detect the difference between a person who is searching for his vehicle in a parking lot and one who is loitering and looking for an opportunity to steal a car. It would make a hypothesis based on behavioral observation and further attempt to ascertain vital information such as elevated pulse rates and body temperature. It would then send an alarm to security forces to investigate the incident. This capability of determining anomalous human behavior from normal behavior could be particularly useful in situations where there is an enormous amount of streaming video data coming from banks of cameras and that typically overwhelms the capabilities of a few humans positioned to examine the data. Medical: Wellness delivery Wellness delivery could also benefit from advanced learning systems, for there is no patient who is Baverage.[ Local information appliances, eventually containing some degree of learning capability, would provide the telemetry of secure patient vital signs to remote monitoring centers where systems would learn and track each patient’s condition. The system would have access to all patient records and tests and could suggest further tests based on best practices or changes in patient condition and could even suggest the

IBM J. RES. & DEV.

VOL. 55

NO. 5

PAPER 1

SEPTEMBER/OCTOBER 2011

dispatch of emergency services in critical cases. If the patient condition deteriorated, the system could prompt the local information appliance to send more detailed information to aid in the decision to dispatch special caregivers. The power of such a system would be not only in its ability to learn and monitor conditions special to each patient but also in the ability to correlate the efficacy of treatment approaches across a population of anonymized patients with similar issues. The result would be better outcomes for individuals and society as best-practice treatment regimens often reduce cost and speed recuperation.

Conclusion It is not too farfetched to think that learning systems will become our partners in expanding the horizon of human cognition. This partnership, which is anchored in the belief that computing ought to do much more for us and that it should do it on our own terms while remaining centered on our most pressing needs, is the new frontier of IT.

Acknowledgments The authors would like to thank D. Ferrucci, D. Gondek, J. Kosloski, F. Gebara, N. Donofrio, I. Wladawsky-Berger, M. Fleming, B. Dietrich, P. Pattnaik, J. Kelly III, M. Dean, A. Lewis, L. Haas, P. Varker, A. Paradkar, V. Salapura, T. C. Chen, and T. Theis. *Trademark, service mark, or registered trademark of International Business Machines Corporation in the United States, other countries, or both. **Trademark, service mark, or registered trademark of Jeopardy Productions, Inc., The Open Group, Netflix, Inc., Amazon.com, Inc., Yelp!, Inc., Sony Corporation, Apple, Inc., or Sun Microsystems, Inc., in the United States, other countries, or both.

References 1. History of Computing. [Online]. Available: http://en.wikipedia.org/ wiki/History_of_computing 2. R. Kurzweil, BThe singularity is near,[ in Viking Adult. New York: Viking Press, 2005. 3. Computer History Museum. [Online]. Available: www. computerhistory.org 4. The History of the Integrated Circuit. [Online]. Available: http:// nobelprize.org/educational/physics/integrated_circuit/history/ index.html 5. D. Watson and D. Scheidt, BAutonomous systems,[ Johns Hopkins APL Tech. Dig., vol. 26, no. 4, pp. 368–376, 2005. 6. R. Hayward, BThe tortoise and the love machine: Grey Walter and the politics of electro-encephalography,[ Sci. Context, vol. 14, no. 4, pp. 615–642, 2001. 7. M. J. C. Gordon and C. Strachey, BRecollections of his influence,[ Higher-Order Symbolic Computat., vol. 13, no. 1/2, pp. 65–67, Apr. 2000. 8. A Brief History of Computing. [Online]. Available: www. alanturing.net 9. P. McCorduck, Machines Who Think, 2nd ed. Natick, MA: A. K. Peters, 2004. 10. J. Weizenbaum, Computer Power and Human Reason: From Judgment to Calculation. New York: W. H. Freeman and Company, 1976.

D. E. DILLENBERGER ET AL.

1 : 11

11. J. Lighthill, BArtificial intelligence: A general survey,[ in Proc. Artif. Intell.VA Paper Symposium, Science Research Council, 1974, pp. 317–322. 12. F. Jelinek, Statistical Methods for Speech Recognition. Cambridge, MA: MIT Press, Jan. 1998. 13. A. M. Turing, BComputing machinery and intelligence,[ Mind, vol. 59, pp. 433–460, 1950. 14. K. Appel and W. Haken, BThe solution of the four-color-map problem,[ Sci. Amer., vol. 237, pp. 108–121, 1977. 15. A. Kele, A. Kele, and U. Yavuz, BExpert system based on neuro-fuzzy rules for diagnosis breast cancer,[ Expert Syst. Appl., vol. 38, no. 5, pp. 5719–5726, May 2011. 16. P. W. Tse, E. Y. Li, J. C. Chan, and J. T. Leung, BAutomatic generator health assessment system that embedded with advanced fault diagnosis and expert system,[ in Proc. PHM Conf., 2010, pp. 1–7. 17. J. Bigham. (2010, Nov./Dec.). Knowledge-based expert systems in industry. Electron. Power. [Online]. 33(11/12), p. 747. Available: http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5188571 18. J. Mortera, A. P. Dawid, and S. L. Lauritzen, BProbabilistic expert systems for DNA mixture profiling,[ Theor. Population Biol., vol. 63, no. 3, pp. 191–205, May 2003. 19. B. Baharudin, L. H. Lee, and K. Khan, BA review of machine learning algorithms for text-documents classification,[ J. Adv. Inf. Technol., vol. 1, no. 1, pp. 4–20, Feb. 2010. 20. B. Wilamowski, BEfficient neural network architectures and advanced training algorithms,[ Inf. Technol., no. 8, pp. 345–352, 2010. [Online]. Available: http://www.eng.auburn.edu/~wilambm/ pap/2010/Efficient%20Neural%20Network%20Architectures %20and%20Advanced%20Training%20Algorithms%20_ %20Gdansk.pdf. 21. D. Anand and K. K. Bharadwaj, BAdaptive user similarity measures for recommender systems: A genetic programming approach,[ in Proc. 3rd IEEE ICCSIT, 2010, pp. 121–125. 22. W. N. Greene, Y. Zhang, T. T. Lu, and T.-H. Chao, BFeature extraction and selection strategies for automated target recognition,[ in Proc. 8th SPIEVIndependent Component Analyses, Wavelets, Neural Network, Biosystem Nanoengineering, 2010, vol. 7703, p. 77030B, DOI: 10.1117/12.8480072010. 23. D. Ferrucci, E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. Prager, N. Schlaefer, and C. Welty, BBuilding Watson: An overview of the DeepQA project,[ Artif. Intell. Mag., vol. 31, pp. 59–79, 2010. 24. G. E. Moore, BCramming more components onto integrated circuits,[ Electron. Mag., vol. 38, no. 8, pp. 114–117, Apr. 1965. 25. P. W. Shor, BPolynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer,[ SIAM J. Comput., vol. 26, no. 5, pp. 1484–1509, Oct. 1997. 26. L. K. Grover, BA fast quantum mechanical algorithm for database search,[ in Proc. 28th Annu. ACM STOC, New York, 1996, pp. 212–219, DOI: 10.1145/237814.237866. 27. K. E. Moselund, H. Ghoneim, M. T. Bjo¨rk, H. Schmid, S. Karg, E. Lortscher, W. Riess, and H. Riel, BComparison of VLS grown Si NW tunnel FETs with different gate stacks,[ in Proc. ESSDERC, 2009, pp. 448–451. 28. D. C. Worledge, G. Hu, P. L. Trouilloud, D. W. Abraham, S. Brown, M. C. Gaidis, J. Nowak, E. J. O’Sullivan, R. P. Robertazzi, J. Z. Sun, and W. J. Gallagher, BSwitching distributions and write reliability of perpendicular spin torque MRAM,[ in IEDM Tech. Dig., 2010, pp. 296–299. 29. P. W. Coteus, J. U. Knickerbocker, C. H. Lam, and Y. A. Vlasov, BTechnologies for Exascale Systems,[ IBM J. Res. & Dev., vol. 55, no. 5, pp. 14:1–14:12, 2011, Paper 14. 30. J. U. Knickerbocker, C. S. Patel, P. S. Andry, C. K. Tsang, L. P. Buchwalter, E. J. Sprogis, R. R. Horton, R. J. Polastre, S. L. Wright, and J. M. Cotte, B3D silicon integration and silicon packaging technology using silicon through-vias,[ J. Solid State Circuits, vol. 41, no. 8, pp. 1718–1725, Aug. 2006. 31. G.W. Burr, M. J. Breitwisch, M. Franceschini, D. Garetto, K. Gopalakrishnan, B. Jackson, B. Kurdi, C. Lam, L. A. Lastras, A. Padilla, B. Rajendran, S. Raoux, and R. Shenoy, BPhase

1 : 12

D. E. DILLENBERGER ET AL.

32.

33.

34.

35.

36.

37. 38.

change memory technology,[ J. Vac. Sci. Technol. B, vol. 28, no. 2, pp. 223–262, 2010. G. W. Burr, B. Kurdi, J. C. Scott, C. H. Lam, K. Gopalakrishnan, and R. Shenoy, BAn overview of candidate device technologies for storage-class memory,[ IBM J. Res. & Dev., vol. 52, no. 4, pp. 449–464, Jul. 2008. W. M. J. Green, M. J. Rooks, L. Sekaric, and Y. A. Vlasov, BUltra-compact, low RF power, 10 Gb/s silicon Mach-Zehnder modulator,[ Opt. Exp., vol. 15, no. 25, pp. 17 106–17 113, Dec. 2007, DOI: 10.1364/OE.15.017106. J. Van Campenhout, W. Green, S. Assefa, and Y. A. Vlasov, BLow-power, 2  2 silicon electro-optic switch with 110-nm bandwidth for broadband reconfigurable optical networks,[ Opt. Exp., vol. 17, no. 6, pp. 24 020–24 029, Dec. 2009. H. Choi, H. Jung, J. Lee, J. Yoon, J. Park, D. Seong, M. Hasan, G.-Y. Jung, and H. Hwang, BAn electrically modifiable synapse array of resistive switching memory,[ Nanotechnology, vol. 20, no. 34, p. 345 201, Aug. 2009, DOI: 10.1088/0957-4484/20/34/ 345201. J. Borghetti, Z. Li, J. Straznicky, X. Li, D. A. A. Ohlberg, W. Wu, D. R. Stuart, and R. S. Williams, BA hybrid nanomemristor/ transistor logic circuit capable of self-programming,[ Proc. Nat. Acad. Sci., vol. 106, no. 6, pp. 1699–1703, 2009. M. Steffen, D. P. DiVincenzo, J. M. Chow, T. N. Theis, and M. B. Ketchen, BQuantum computing: An IBM perspective,[ IBM J. Res. & Dev., vol. 55, no. 5, pp. 13:1–13:11, 2011, Paper 13. R. J. Bolton and D. J. Hand, BStatistical fraud detection: A review,[ Statist. Sci., vol. 17, no. 3, pp. 235–255, Aug. 2002.

Received May 16, 2011; accepted for publication June 22, 2011 Donna Eng Dillenberger IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY 10598 USA ([email protected]). Ms. Dillenberger joined IBM in 1988 and has worked on future hardware simulations, mainframe operating systems, workload management, machine learning algorithms, scalable compilers, web servers, video servers, and Java** containers. She is a Distinguished Engineer, a Master Inventor, a member of the IBM Academy, and an Adjunct Professor at Columbia University, Graduate School of Engineering. She is currently working on systemic risk, hybrid servers, and cyber vaults. Dario Gil IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY 10598 USA ([email protected]). Dr. Gil is a Research Staff Member and Program Director in the Industry Solutions and Emerging Business Department at the T. J. Watson Research Center. He received the Ph.D. degree in electrical engineering from the Massachusetts Institute of Technology, Cambridge, in 2003. A technical expert in the field of lithography and nanofabrication, he was a leader in the team that built the world’s first microprocessor with immersion lithography in 2004, an achievement that was recognized with an IBM Outstanding Technical Achievement Award. Dr. Gil’s research results have appeared in more than 20 international journals and conferences and he has numerous patents in the field of lithography and nanofabrication. Prior to his current position as the leader of IBM’s worldwide Energy Research efforts, Dr. Gil was a Program Manager in the office of the Vice President of Science and Technology in IBM Research, where he was responsible for working with executives to develop the Science and Technology Strategy for IBM’s research laboratories. Dr. Gil is a member of the IBM Academy of Technology.

Satya V. Nitta IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY 10598 USA ([email protected]). Dr. Nitta is a Research Staff Member and a program manager in the office of the Vice President of Science and Technology at the IBM T. J. Watson Research Center, Yorktown Heights, NY. Previously, he was the manager of the Advanced Interconnect Technology group at the IBM T. J. Watson Research Center. Dr. Nitta

IBM J. RES. & DEV.

VOL. 55

NO. 5

PAPER 1

SEPTEMBER/OCTOBER 2011

received a Ph.D. degree from Rensselaer Polytechnic Institute, Troy, NY, in chemical engineering in 1998 and has been at IBM since 1999 during which time he has served as a technical leader of teams which have developed several new aspects of on-chip interconnect technology. He was named as the IEEE Spectrum BInnovator of the Year[ in 2008 and also won an IEEE Spectrum Ace Award for BTechnology of the Year[ in 2008. He has also been recognized as a Master Inventor at IBM Research. Dr. Nitta serves as one of IBM’s technical liaisons to Rensselaer Polytechnic Institute where he is a member of the Industrial Advisory boards for the Rensselaer Nanotechnology Center and The Center for Future Energy Systems. He has authored or coauthored over twenty five publications, one book chapter, and currently holds more than 35 U.S. patents.

Mark B. Ritter IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY 10598 USA ([email protected]). Dr. Ritter is a senior manager in the Physical Sciences department and manages groups working on materials, analytics and science of information processing. He received the B.S. degree in physics from Montana State University, Bozeman, in 1981 and M.S., M.Phil., and Ph.D. degrees from Yale University, New Haven, CT, in 1987. In his work at IBM, he has contributed to Fibre Channel, 10-Gb/s Ethernet, and other high-speed communication products and standards, as well as to efficient, physics-based models for vias for interconnections. Dr. Ritter is a member of the IBM Academy of Technology and was the recipient of the 1982 American Physical Society Apker Award, three IBM Outstanding Innovation awards, and several Research Division and Technical Group Awards. He is author or coauthor on numerous technical publications and holds 22 U.S. patents.

IBM J. RES. & DEV.

VOL. 55

NO. 5

PAPER 1

SEPTEMBER/OCTOBER 2011

D. E. DILLENBERGER ET AL.

1 : 13