Proceedings Of National Conference on Current

7 downloads 0 Views 12MB Size Report
Mar 24, 2014 - A Novel DNA Cryptography Model for Wireless Sensor Network ... Isita Bose, Debashis Mishra, Utpal Chandra De, Amitava Sen. 121-128. 17. ... The information stored on the cloud is often seen as valuable to individuals ...... area of interest is Soft Computing ,Data Mining, Graph Theory, Cryptography.
Proceedings Of National Conference on Current Trends in Computing (NCCTC) 23rd -24th March, 2014 (BARIPADA)

Editor-in-Chief Prof. (Dr.) HIMA BINDU MARINGANTI Professor and Head, Dept. of Computer Applications North Orissa University Baripada-757003, Mayurbhanj, Odisha, India Email: [email protected]

Organized by:

Department of Computer Applications North Orissa University Baripada-757003, Mayurbhanj, Odisha, India ISBN: 978-3-642-24819-6

About :: NCCTC The National Conference on Current Trends in Computing will provide an excellent forum for sharing knowledge and research in current & recent trends in computing and multi-disciplinary applications of Computer Science and present theories and results in theoretical models and applications. Several eminent scientists and researchers are working all over the world and in the country in various domains of Computing, including Super computing, NLP, Soft Computing, Genome Computing, Artificial Intelligence, Cloud computing, Parallel Computing , Chip Computing, Wireless Sensor Computing, Brain Computing, to name a few. The present Conference on “ Current Trends in Computing “ is a forum for researchers and scholars working in the domains mentioned above and research that is diverted towards integration of Multiple Disciplines, so that the amalgamation is put to its best use for the benefit of the society at large and human development in particular. Also, it provides opportunities for the delegates to exchange new ideas and application experiences face to face, to establish business or research relations and to find partners for future collaborative work. Authors are solicited to contribute to the this national conference by submitting articles that illustrate research results, projects, theoretical models, survey reports and practical experiences that describe promising and significant advances. Topics of interest for submission include, but are not limited to: Algorithms and Bioinformatics Networking and communications Database and Data Mining DSP/Image Processing/Pattern Recognition/Multimedia Embedded system and software Grid and scalable computing Mobile and ubiquitous computing Modeling and Simulation

Soft Computing (AI, Neural Networks, Fuzzy Systems, etc.) Web and Internet computing Theoretical computer science Natural Language Processing Information Security Robotics and Automation Genome Computing Cloud computing Parallel Computing Chip Computing Wireless Sensor Computing Brain Computing

Organizing Committee Chief Patron: Prof. Sanghamitra Mohanty, Vice-Chancellor, NOU Patron: Mr. Madhusudan Sahoo, Registrar, NOU Invited Speakers: 1. Prof. Ganapati Panda, IIT Bhubaneswar 2. Prof. Anupam Basu, IIT, Kharagpur 3. Prof. Ashok Dutta, ISI, Kolkata 4. Prof. Ajit Pal, IIT, Kharagpur 5. Prof. Shivaji Bandopadhyaya, Jadavpur University, Kolkatta 6. Dr. Rabi Narayan Behera, NIC, Bhubaneswar 7. Prof. Amiya Kumar Rath, DRIEMS 8. Dr. Anirban Dutta Advisory Committee: 1. Prof. N.N. Das, Chairman, P.G. Council 2. Prof. B.Sahu, Director, SFC 3. Prof. J.Lenka, HOD, Department of Economics 4. Prof. Hima Bindu M, HOD, Department of Computer Applications 5. Mr. Madhusudan Sahoo, Registrar 6. Sri. S.N. Soren, Controller of Finance Local Organizing Committee: 1. Prof. Hima Bindu Maringanti, Convener 2. Dr. J.K.Mantri, Co-Convener & Hospitality 3. Mr. S.S. Basa, Travel and Accommodation 4. Mr. P.S.Mishra, Publication 5. Mrs. Sanjukta Mohanty, Registration 6. Miss Meenakhi Sahu, Event Management 7. Miss Gayatri Dey, Hospitality 8. Mr. Ambarish Parida, Publication 9. Mr. Biswamitra Rana, Event Management

Editorial The 21st century has seen a variety of applications of Computer Science, growing multi-dimensionally penetrating every field of not only science, but arts, medicine and philosophy. The then Turing Machine, designed way back around 60 decades from now, had the objective of Computing virtually everything, making it difficult to believe. But now, that concept is truly possible and the present generation is able to see and feel it working. Artificially Intelligent systems were then theories and theoretical models, and now really possible talking, diagnosing, speaking, understanding, monitoring, assisting etc. systems. Computer Algorithms to solve real world problems and in real time are the challenges of today. Though Computer engineering deals with the design of computers and computer systems, is not limited only to that, but may involve the design of new hardware, the design of PDAs and supercomputers or the use of computers to control an industrial plant and various domains of Computing, including Super Computing, Natural Language Processing, Soft Computing, Genome Computing, Artificial Intelligence, Cloud computing, Parallel Computing , Chip Computing, Wireless Sensor Computing, Brain Computing, to name a few. The present Conference on “ Current Trends in Computing “ is a forum for researchers and scholars working in the domains mentioned above and research that is diverted towards integration of Multiple Disciplines, so that the amalgamation is put to its best use for the benefit of the society at large and human development in particular. Desktop computers represent a tiny fraction of the devices a computer engineer might work on, as computer-like architectures are now found in a range of devices including

video game consoles and DVD players, washing machines, ePets, e-Assistants etc. Even a fraction of active participation deeply influences the magnanimity of this national event. I must acknowledge your response to this conference. I ought to convey that this conference is only a little step towards knowledge network and relationships. I wish all success to the paper presenters and I congratulate the participants for getting selected to this conference. I extend my heart felt thanks to members of faculty from different institutions, research scholars, delegates, NOU Family members, members of the technical and organizing committee and all the student volunteers involved in the various committees. Above all, my salutations are due to the Almighty. Editor-in-Chief Prof. (Dr.) Hima Bindu Maringanti Professor and Head Department of Computer Applications North Orissa University Baripada, Odisha Email:[email protected] Associate Editors  Mr. Partha Sarathi Mishra  Ms. Gayatri Dey

TABLE OF CONTENTS Sl. No.

Topic

Page No.

Editor-in-Chief  Prof. (Dr.) Hima Bindu Maringanti Keynote Speaker I. Introduction to Cloud Computing Dr. Amiya Kumar Rath, Principal, DRIEMS II. Recent Trends in Bio-inspired Computing Prof. G. Panda III. Security in Cyber Space Dr R N Behera IV. In The Offing Asoke Kumar Datta, Bom, Kolkata V. Wireless Sensor Networks Prof.Ajit K. Pal, IIT, Kharagpur 1

Rough Set and Current Trends in Computing  Nilaratna Kalia, J.K. Mantri, Debadutta Mohanty

01-09

2.

Cyber Crime Assessments  Yerra Shankar Rao

10-14

3.

Survey on security issues of Wireless sensor networks  K.Lakshmanarao, Hima Bindu Maringanti

15-24

4.

Machine Learning Approaches for Solving Computationally Intensive Problems  Partha Sarathi Mishra

25-35

5.

A Backoff Analysis To Mitigate The Correlated Contention For EventDriven Wireless Sensor Networks  Pushpa Mamoria, Rajeev Shakya, Hima Bindu Maringanti

36-44

6.

Hybrid Approach For Odia Handwritten Character Recognition  Ambarish Parida, Hima Bindu Maringanti

45-51

7.

Santali Morphological Analysis  Chitralekha Marandi, Hima Bindu Maringanti

52-59

8.

Opinion Mining of Movie Reviews Using Term Frequencies  Rakesh Ch, Balabantaray, Oaindrila Das

60-66

9.

Simulation of Behavior of Neurotransmitters  Bishwamitra Rana, Hima Bindu Maringanti

67-72

10.

Thematic Analysis of Odia Language  Gayatri Dey, Hima Bindu Maringanti

73-81

11.

Multi-Modal Perception Model Of Sentence Comprehension  Meenakhi Sahu, Hima Bindu Maringanti

82-87

12.

Affordable Interactive Multitouch Whiteboard Technology For Smart Classroom  Manas Kumar Rath, Prasanta Kumar Swain

88-94

13.

Genome Computing  Bibhuti P. Barik, Priyanka Mishra

95-102

14.

Redundancy Control In Wireless Sensor Network Using Rahtap Algorithm  Alina Dash

103-112

15.

A Novel DNA Cryptography Model for Wireless Sensor Network  Utpal Chandra De, Debashis Mishra, Prasanta Kumar Swain, Jogeswar Mohanta

113-120

16.

Fuzzy Impulse Noise Detection in Color Image  Isita Bose, Debashis Mishra, Utpal Chandra De, Amitava Sen

121-128

17.

Object-Oriented Challenges for Testing Process  Santi Swarup Basa

129-137

18.

Test Data Generation for Path Coverage Based Testing Using Differential Evolution Optimization Algorithm  Madhumita Panda, Partha Pratima Sarangi, Srinivas Pattnaik

138-147

Introduction to Cloud Computing Dr. Amiya Kumar Rath, Principal, DRIEMS

Cloud computing is a subscription-based service where one can obtain networked storage space and computer hardware and software resources. One way to think of cloud computing is to consider the experience with email. One‟s experience with email is related to the cloud computing. Cloud computing is considered only as computing model. It may not be considered as a technology. It is the use of computing resources (hardware & software) that are delivered as a service over a network. Cloud computing is used for business improvements. It boosts output to keep cost down. Cloud computing streamlines operations, reduces cost and strain on resources. Applications can be made available anywhere in the globe on any computer by this cloud computing process. In order to improve the operations and productivity, business system now-a-days employs cloud computing for different purposes.

Types of Clouds There are different types of clouds that we can subscribe depending on our needs. As a home user or small business owner, we will most likely use public cloud services. 1.Public Cloud 2.Private Cloud 3.Community Cloud 4.Hybrid Cloud

Choosing a Cloud Provider Each provider serves a specific function, giving users more or less control over their cloud depending on the type. When we chose a provider, compare our needs to the cloud services available. Our cloud needs will vary depending on how we intend to use the space and resources associated with the cloud.

1.Software as a Service (SaaS) 2. Platform as a Service (PaaS) 3. Infrastructure as a Service (IaaS)

Security The information stored on the cloud is often seen as valuable to individuals with malicious intent. There is a lot of personal information and potentially secure data that people store on their computers, and this information is now being transferred to the cloud. Companies remain reluctant, for a variety of reasons, to trust the cloud for their mission–critical applications. Here are some reasons; Data Privacy, Security, Licensing, Applications, Interoperability, Compliance, SLAs Network monitoring.

Talk 1: Recent trends in advanced computing Prof. G. Panda Former Deputy Director, Professor, School of Electrical Sciences, IIT Bhubaneswar The distinguished lecture will cover an overview of the advances on Intelligent computing( Soft computing and Evolutionary or bio inspired computing), Distributed computing, Robust computing, Mobile computing, Cloud computing, Grid computing and Ubiquitous computing. The lecture will include details of Genetic algorithm and Differential Evolution. In addition the talk will also cover interesting application areas of these computing techniques.

Talk 2: Bio-inspired computing: Theory and applications This talk will cover Bio inspired techniques such as Particle swarm optimization (PSO), Ant Colony Optimization (ACO) and Bacterial Foraging Optimization (BFO).The talk will be supported by number of real life examples of various disciplines.

Security in Cyber Space Dr R N Behera Sr Technical Director, National Informatics Centre E-Mail: [email protected] Information and Communication Technologies (ICTs) have transformed modern lifestyles. They have provided us with real- time communications, border- less and almost unlimited access to information and a wide range of innovative services. The key is to use technology, economic growth, education, health services and government programs to enhance the socio economic condition of citizens. Technology I believe can play an important role in this process. ICT brings about openness, accessibility, connectivity, democratization and decentralization that result ultimately in social transformation. IT provides us rich information content to people everywhere, anywhere irrespective of their economic background, their caste, religion and race. At the same time, IT has also created new opportunities for exploitation and abuse. Cyber threats have become one of the biggest global issues of our time. Information is an asset which, like other important business assets, has Value to an organization and consequently needs to be suitably protected. Information security protects information from a wide range of threats. Protecting confidential information is a business requirement, and in many cases also an ethical and legal requirement. Information security means protecting information and information systems from unauthorized access, use, disclosure, disruption, modification or destruction. The terms information security, computer security and information assurance are frequently incorrectly used interchangeably. Governments, military, corporations, financial institutions, hospitals, and private businesses amass a great deal of confidential information about their employees, customers, products, research, and financial status. Most of this information is now collected, processed and stored on electronic computers and transmitted across networks to other computers. Should confidential information about a business' customers or finances or new product line fall into the hands of a competitor, such a breach of security could lead to lost business, law suits or even bankruptcy of the business. So Information security is now a major concern for all level of society.

Cyber security is rather used in different contexts to mean anything from: “information and computer security, security of the Internet infrastructure; security of anything connected to the Internet (including “essential services” such as electricity distribution, flight navigation), security of data, applications and communications; safety of Internet users (particularly children), and frequently encompasses notions of “national” security as well as “private” security. Indeed, there is no consensus on what the term means. But, in cyber security terms, transboarder cybercrime may be easier to commit and attacks could have far reaching effects. Cyber Terrorism in India is no more a fictional idea but is a reality. Although there is no clear cut definition of cyber terrorism yet use of ICT by terrorists is beyond any dispute or doubt. Cyber security standards are security standards which enable organizations to practice safe security techniques to minimize the number of successful cyber security attacks. ISO 27001 (ISO27001) is the international Cyber security Standard that provides a model for establishing, implementing, operating, monitoring, reviewing, maintaining, and improving an Information Security Management System. These guides provide general outlines as well as specific techniques for implementing cyber security. For certain specific standards, cyber security certification by an accredited body {i.e. National Informatics Centre (NIC)} can be obtained. Cyber Security can be broadly classifies into two: 1) Network Security 2) Application Security 1) Network Security: Vulnerability Assessment Service, Firewall Service, SSL services, VPN (virtual private network) services, Antivirus Services, Digital Signature Certificate 2) Application Security: Application security is also a major concern by which intruders enter into the application and make cyber attack on websites and applications. Government and NIC have a host of applications deployed in various user departments. These applications may involve the collection and storage of important data over the network. The possibility of an attack on specific software applications is usually overlooked, and these may be the first to be exploited by a hacker. Such applications should thus have built-in features that take care of aspects such as Access Control, Data Integrity, Change Controls, Accountability, Service Continuity and the possibility of tracing unauthorized access to the application or the stored data. Cyber Security

Audit of these applications can be carried out to determine such requirements. The Open Web Application Security Project (OWSAP) is a project dedicated to sharing knowledge and developing open source software that promotes understanding of web application security. The OWASP top ten represents a broad consensus about what the most critical web applications vulnerabilities are adopted in a collaborative effort of: US Federal Trade commission, US DOD, VISA, Other companies including Sprint, IBM etc. It provides a minimum standard for web application security. These are: Unvalidated Input, Broken Access Control, Broken Authentication and Session Management, Cross Site Scripting (XSS) Flaws, Buffer Overflow, Injection Flaws, Improper Error Handling, Insecure Storage, Denial of Service, Insecure Configuration Management. Further Government of India is trying to curb the cyber security attacks by imposing various laws. IT Act 2000, IT Act 2008 (Amendment) and the recent IT Act 2013

Conclusion: There is an emergent need for strengthening cyber security of India. Government of India has taken cyber security seriously recently. India needs a dedicated cyber security research and training centre at the national level. Efforts are on to train the government officials holding crucial posts and departments, law enforcement officials, judges, lawyers and others. Government of India NIC and DARPG has evolved Indian Government web guide lines. Indian Government web guide lines and cyber security standards are new dimension in curbing cyber security attacks. There are 115 matrixes to be implemented in all web sites and web applications in India All web sites and applications to be cyber security audited and complied. There should be secured network communication while web updating and data transfer. There should be virus free environment for all computers. Latest patches in Operating Systems should be implemented. A strong techno legal solution is required to be done by Government of India. It also needs Global Co-Operation to curb the cross boarder cyber threats.

IN THE OFFING Asoke Kumar Datta BOM, KOLKATA The standard procedure in AI is first to locate physical parameters in the signal emanating from the external reality. These are presumably extracted by our external organs and analyzed by the sub-cortical neural processes and then cognized and interpretated in the cortex. The problems lie at many levels    

Locating parameters responsible for a particular act of cognition The processing of the parameters Cognition of the processed streams Interpretation

Let me begin with the difficulties I faced in dealing with AI problems.  The problems in hard sciences are purely objective because they deal with parameters represented by numbers, so-called pointer readings and therefore discrete in nature.  The problem in soft sciences is subjective in nature. These are normally converted through some non-deterministic processes again into numbers to „deal with‟.  This „dealing with‟ is now digital computation.  This is where my problems lie. In aural cognition the sound signal is converted by the transducer (the cochlea) in the ear into some well known basic units like F0 (pitch), amplitude (loudness), spectral structure (timbre), and other lesser known shape parameters like attack time, extrema rate, zero-crossing etc. The subcortical pathways including medial geniculate body processes these basic components possibly with some involvement of cortex and sends them to specific enervation areas in the auditory cortex. For at least 70 years the cognition of articulatory positions of phonemes are firmly believed to depend on spectral structure.

It is reported that the excitation of some neurons in the cochlea produces patterns which may be called auditory spectra of the signal. These led to this traditional belief that the cognition processes are based on the formant and anti-formant structures. Together with this the development of the rigorous mathematical tool of Fourier transforms and the corresponding computer algorithms used in machine recognition of phonemes seemed to cement this golden concept. Over these seven decades numerous meticulous experiments has been conducted to prove that:  Vowels are cognitively differentiated substantively on the basis of first two formants.  The cognition of the four places of articulation of plosives is borne the dynamics of the spectral structure.  The neurological processes involved in aural cognition are well researched. The existence of these for extraction and the processing spectral domain features like formants are well established.

CONTRARY COGNITIVE EVIDENCE began to appear in India in early 90’s of the last century (in ISI) These demonstrate that formant frequencies are neither necessary nor sufficient for cognition of category of vowels or plosives. The results seriously question the role of formant transition in cognition in these categories of phonemes.

YET THE AFORESAID COGNITION EXPERIMENTS SHOW THAT THERE IS SOMETHING BEYOND  The existence of neurological complexes for extracting objective parameters does not guarantee their exclusive roles in cognition.  The experiments with illusory contours in visual domain exhibited similar paradox.

 In the physics of speech and sound, fundamental frequency is an extraneous thing. It is the number of impulses hitting the board in a second!  Because of the significant amount of random perturbations associated with the mucosal semi-fluid nature of the vibrators, the sounds do not have a stable periodicity. Signal is produced by repeated interruptions. Yet Fourier transform is applied with aplomb.

A NEW PARADIGM OF PHYSICS IS NEEDED TO DEAL WITH ORAL SIGNALS.  The enormous complexity of human cognition may be guessed from the fact that human brain consists of 1010 active individual analogue comparators with 1011 possible linkages between themselves working 24x7 hours a day for years together.  The addressing mechanism is associative, not sequential.  Soma is an analogue computer which computes ∑wi yi and fires when this exceeds a threshold value  There could be 100 billion connection between the neurons. They are in groups. Each group is a memory element. They are formed during learning period which generally extends up to 11 years, 24x7 hours a day, more than 2 billion seconds.  One needs to notice the enormous learning that goes on in the brain.

Cognition is an associative distributive phenomenon in the brain. The memory elements fired during the learning stage are organized and distributed into different functional lobes of the brain. One particular assembly is the „significance‟ of that particular input from reality .The „significance‟ or the „meaning‟ or the „cognition‟ of an external event is the associative pattern of areas activated. Cognition may have three manifestations:  ACTION  EMOTION

 REVOKED STATE OF MIND WITHOUT MANIFESTATION

SUMMARY  Computer (understandably) cannot help in finding what physical features are used by the brain in processing electrical signals transuded by the external organs.  In AI we are trying to simulate analogue processing in brain by digital computation  As soon as we digitize real parametric space it becomes nonEuclidean  Brain work on associative addressing mode  Brain does not compute, it matches patterns: A COMPARATOR

NON-COMPUTATIONAL PROCESSING FUNCTIONS (A New Paradigm)

OF

NON-DISCRETE

 SIMILARITY for defining basic units of relevance (scaling, normalization : analogue operation) Symbol formation  ASSOCIATION of units (symbols) to define intermediate concept units (similarity of clusters)  Memorization using DISTRIBUTIVE logic  ASSOCIATIVE ADDRESSING: Matching using neighborhood concept.

WIRELESS SENSOR NETWORKS Prof.Ajit K. Pal, IIT, Kharagpur Wireless sensor networks are gaining immense importance in today‟s world with respect to numerous applications in which they can be used. These applications range from military surveillance, environment monitoring, inventory management, habitat monitoring, health monitoring, etc. Almost all these applications need the sensor network to be alive for months or even for years. So, the main objective when the sensor nodes are deployed is that the life of the sensor network has to be maximized, so that it can last the duration the application demands. The lifetime of a sensor network depends on the lifetime of the constituent sensor nodes. Given that the sensor nodes are battery powered and cannot be recharged or replaced upon its complete discharge, the sensor node lasts only till its battery lasts. So, in order to improve the sensor network lifetime, the battery lifetime of the individual motes has to be maximized. Some of the approaches in improving the sensor node lifetime are - by using energy-efficient sensor node architecture, poweraware sensor network architectures and power-aware protocols. The power dissipation in sensor networks can be broadly divided into two parts. They are the communication power and the computation power. It has been observed that the amount of energy consumed in the communication process is much higher than the amount consumed in the computation process. So, by using power-aware protocols in the protocol stack, the amount of energy consumed can be minimized. Researchers have been carried out in the fields of power-aware routing protocols and power-aware MAC layer protocols.

Rough Set and Current Trends in Computing

__________________________________________________

Rough Set and Current Trends in Computing Nilaratna Kalia1, J.K.Mantri2, Debadutta Mohanty3, Lipika P.Kalia4 1

Department of Mathematics,U.N.College,Nalagaja,Mayurbhanj,Odisha,757073,India Department of CS and Engineering, NOU,Baripada,Mayurbhanj,Odisha,757001,India, 3 Department of Mathematics, SMV,Jharpokhoria,Mayurnhanj,757068,Odisha,India 4 Deparment of CS & Engineering, KISD , Bhubaneswar, Odisha, India,

2

Email: [email protected], [email protected], [email protected], [email protected]

Abstract: Rough set theory, is now Rough Set Technology for soft computing finding a wide application today. Rough Set Theory manages the vagueness in a data system and has been successfully used to formulate the rules. These rules can be used to discover the hidden patterns in data. It offers a set theory approach to manage the uncertainty in data systems. It has been used for the discovery of data dependencies, importance of features, patterns in sample data, feature space dimensionality reduction, and the classification of objects. Objects can be classified by means of their attributes when considered in the context of an approximation space. Rough Set analysis requires no external parameters and uses only the information presented in the given data. In this paper we introduced basic concepts of rough set as a current trend for soft computing which covers our ability to create a scenario out of incomplete information so that we can take a decision. Keywords: Rough Set, Approximation Space, Soft Computing, Knowledge.

1. INTRODUCTION This Now computer technology is not advancing rapidly but explosively. The nature of scientific computing has changed dramatically over the past couple of decades. We recall the word of Heraclitus (534 B.C.- 475 B.C. ) said ‘‘Nothing is permanent, but change !’’ The appearance of the personal computer in the early 1980s changed forever the landscape of computing. Today’s scientific computing environment is still changing, often at breathtaking speed. Rough Set Theory was proposed by Zdzislaw Pawlak in 1982. Since then it is in a state of constant development. Its methodology is concerned with the classification

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 1

Rough Set and Current Trends in Computing

__________________________________________________ and analysis of imprecise, uncertain or incomplete information and knowledge. It is also considered as one of the first non-statistical approaches in data analysis. The fundamental concept behind Rough Set Theory is the approximation of lower and upper spaces of a set, the approximation of spaces being the formal classification of knowledge regarding the interest domain. The subset generated by lower approximations is characterized by objects that will definitely form part of an interest subset, whereas the upper approximation is characterized by objects that will possibly form part of an interest subset. Every subset defined through upper and lower approximation is known as Rough Set. Over the years Rough Set Theory has become a valuable tool in the resolution of various problems like representation of uncertain or imprecise knowledge, knowledge analysis, evaluation of quality and availability of information with respect to consistency in image processing, pattern reorganisation and neural networking. As a result, now we have the following concepts. There are top ten computer trends in 21st century: (1) Computer will become powerful extensions of human beings designed to argument intelligence, learning, communication and productivity. (2) Computers will become intuitive- they will learn, recognize and know what we want, who we are and even what we desire (3) Computer chips will be everywhere and they will become invisible-embedded in everything from brains and hearts to clothes and toys.(4) Computers will manage essential global system such as transportation and food production better than humans will.(5) Online computer resources will enable us to download applications on demand via wireless access anywhere and anytime.(6) Computer will become voice-activated, networked, video-enabled and connected together over the Net, linked with each other and humans. (7) Computers will have digital senses-speech, sight,, smell, hearing-enabling them to communicate with humans and other machines.(8) Neural networks and other forms of artificial intelligence will make computers both as smart as humans and smarter for certain jobs.(9) Human and computer evolution will converge. Synthetic intelligence will greatly enhanced the next generations of humans. (10) As computers surpass humans in intelligence, a new digital species and a new culture will evolve that is parallel to ours. The knowledge discovery in Data mining with the rough set is a multi process which mainly consists of Discretization and Rules generation on training set and fits completely into the real life. It is a mathematical approach to imperfect knowledge. The applications of Rough set theory are in various domains such as engineering environment, banking, medicine, decision making and many more to extract knowledge from many databases. Rough set theory is proposed to mine rules from the Data warehouse. It constructs concise classification rules for each

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 2

Rough Set and Current Trends in Computing

__________________________________________________ concept satisfying the given classification accuracy. Clustering algorithms are probably the most commonly used methods in data mining. Applications can be found in virtually any knowledge domain. Prominent areas of application are bioinformatics, engineering and marketing besides many others. The rough sets approach to data analysis hinges on two basic concepts, mainly the lower and upper approximation refers to : 1.1 Lower Approximation: The lower approximation consists of all the data without any ambiguity based on the knowledge of the attributes. (The elements doubtlessly belong to the set.) 1.2 Upper Approximation: The objects are probably belong to the set, cannot be described as not belonging to the set based on the knowledge of the attributes. ( The elements that possibly belong to the set.) 1.3 Approximations of Sets: Let 𝑈 be a set called universe, and let 𝑅 be an equivalence relation on 𝑈 called an indiscernibility relation. Equivalence classes of the relation 𝑅 are called elementary sets. Then the pair 𝑆 = (𝑈, 𝑅) will be called an approximation space and 𝑋 be a subset of 𝑈 . 1.4 The lower approximation and the upper approximation: For any set 𝑋 ⊆ 𝑈 , the lower approximation of 𝑋 in 𝑆 under the indiscernibility relation 𝑅 be defined by (i)

𝑅 𝑋 ={𝑥 ∈𝑈 ∶ 𝑥

𝑅

⊆𝑋}

And an upper approximation of 𝑋 in 𝑆 under 𝑅 be defined by (ii)

𝑅 𝑋 ={𝑥 ∈𝑈 ∶ 𝑥

𝑅

∩ 𝑋 ≠𝜑}

A set 𝑋 ⊆ 𝑈 is called rough with respect to 𝑅 if and only if 𝑅 𝑋 ≠ 𝑅 𝑋 and 𝑋 is called definable with respect to 𝑅 if and only if 𝑅 𝑋 = 𝑅 𝑋 . 1.5 The boundary region: The border line region of 𝑋 with respect to 𝑅 is denoted by 𝐵𝑁𝑅 (𝑋)and is given by 𝐵𝑁𝑅 (𝑋) = 𝑅 𝑋 − 𝑅 𝑋 . For an element 𝑥 ∈ 𝑈 , 𝑥 is certainly in 𝑋 under the equivalence relation 𝑅 if and only if 𝑥 ∈ 𝑅 𝑋 and 𝑥 is possible in 𝑋 under 𝑅

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 3

Rough Set and Current Trends in Computing

__________________________________________________ if and only if 𝑥 ∈ 𝑅 𝑋 . The borderline region under 𝑅 is the undecidable area of the universe.

2. ROUGH SET The We say 𝑋 is rough with respect to 𝑅 if and only if 𝑅 𝑋 ≠ 𝑅 𝑋 , which is equivalent to 𝐵𝑁𝑅 (𝑋) ≠ 𝜑, otherwise 𝑋 is said to be 𝑅 − 𝑑𝑒𝑓𝑖𝑛𝑎𝑏𝑙𝑒 if and only if 𝑅 𝑋 = 𝑅 𝑋 or 𝐵𝑁𝑅 (𝑋) = 𝜑 . Let 𝜑 is the empty set, −𝑋 is the complement of 𝑋 in, we find the following proposition.([1]) 1. 𝑅 𝑋 ⊆ 𝑋 ⊆ 𝑅 𝑋 2. 𝑅𝑈 = 𝑅𝑈 = U, 𝑅𝜑 = 𝑅𝜑 = 𝜑 3. 𝑋 ⊆ 𝑌 ⇒ 𝑅𝑋 ⊆ 𝑅𝑌 and 𝑅𝑋 ⊆ 𝑅𝑌 4. 𝑅 𝑋 ∪ 𝑌 = 𝑅 𝑋 ∪ 𝑅 𝑌 and 𝑅 𝑋 ∩ 𝑌 = 𝑅 𝑋 ∩ 𝑅 𝑌 5. 𝑅 𝑋 ∩ 𝑌 ⊆ 𝑅 𝑋 ∩ 𝑅 𝑌 and 𝑅 𝑋 ∪ 𝑌 ⊇ 𝑅 𝑋 ∪ 𝑅 𝑌 6. 𝑅 −𝑋 = −𝑅 𝑋 , 𝑅 −𝑋 = −𝑅 𝑋 . 7. 𝑅(𝑅 𝑋 = 𝑅(𝑅 𝑋 = 𝑅(𝑋) , 𝑅 𝑅 𝑋

=𝑅 𝑅 𝑋

=𝑅 𝑋 .

𝑈/𝑅 is defined as the family of equivalence classes of 𝑅 . 2.1. Information System From the above approximation space we derive an information system. An information system 𝐼𝑆 can be seen as a system 𝐼𝑆 = (𝑈, 𝐴) where 𝑈 is the universe (a finite set of objects, 𝑈 = {𝑥1 , 𝑥2 , … … … 𝑥𝑛 } and 𝐴 is the set of attributes (called features, variables ). For each attribute 𝑎 ∈ 𝐴 (attribute a belonging to the considered set of attributes 𝐴 defines an information function 𝑎 ∶ 𝑈 → 𝑉𝑎 , where 𝑉𝑎 is the set of values of, called the domain of attribute. 2.2. Indiscerenibility For every set of attribute 𝐵 ⊂ 𝐴 , is an indiscernibility relation Ind(B) is defined as two objects 𝑥𝑖 and 𝑥𝑗 , are indiscernible by the set of attributes 𝐵 in 𝐴, if 𝑏 𝑥𝑖 = 𝑏(𝑥𝑗 ) for every 𝑏 ⊂ 𝐵 .The equivalence class of Ind(B) is called elementary set in 𝐵 because it represents the smallest discernible groups of objects. For any element 𝑥𝑗 of 𝑈 the equivalence class of 𝑥𝑖 in relation 𝐼𝑛𝑑(𝐵) is

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 4

Rough Set and Current Trends in Computing

__________________________________________________ represented as [𝑥𝑖 ]𝐼𝑛𝑑 (𝐵) .The construction of elementary sets is the first step in classification with rough sets. In particular let X denote the subset of elements of the universe 𝑈 .The lower approximation of 𝑋 in B (𝐵 ⊂ 𝐴 ) denoted by 𝐵𝑋 ,is defined as the union of these elementary sets which are contained in 𝑋 . More formally: 𝐵𝑋 = {𝑥𝑖 ∈ 𝑈|[𝑥𝑖 ]𝐼𝑛𝑑

𝐵

⊂ 𝑋} .

The above statement is to be read as : the lower approximation of the set 𝑋 is the set object 𝑥𝑖 which belong to the elementary sets contained in 𝑋 ( in the B space). That is the elements doubtlessly belong to the set. The upper approximation of the set 𝑋 denoted 𝐵𝑋 , is the union of these elementary sets, which have a non empty intersection with 𝑋 : 𝐵𝑋 = 𝑥𝑖 ∈ 𝑈 𝑥𝑖

𝐼𝑛𝑑 𝐵

∩𝑋 ≠𝜑 .

That is the elements that possibly belong to the set. The degree of knowledge completeness is expressed by the above formulated accuracy measure. The previous equation is able to capture the boundary region size of the data sets; but, the structure of the knowledge is not easily captured. However, the fundamental advantage of RST is the ability to hold a category that cannot be piercingly defined given a knowledge base. The inexactness and topological characterization of imprecision can be expressed by the following measures: There are four basic classes of rough sets, that is four categories of vagueness. 1

𝑋 𝑖𝑠 𝑟𝑜𝑢𝑔𝑕𝑙𝑦 𝐵 − 𝑑𝑒𝑓𝑖𝑛𝑎𝑏𝑙𝑒,

𝑖𝑓𝑓 𝐵 𝑋 ≠ 𝜑 𝑎𝑛𝑑 𝐵 𝑋 ≠ 𝑈

2

𝑋 𝑖𝑠 𝑖𝑛𝑡𝑒𝑟𝑛𝑎𝑙𝑙𝑦 𝐵 − 𝑖𝑛𝑑𝑒𝑓𝑖𝑛𝑎𝑏𝑙𝑒 ,

𝑖𝑓𝑓 𝐵 𝑋 = 𝜑 𝑎𝑛𝑑 𝐵 𝑋 ≠ 𝑈

3

𝑋 𝑖𝑠 𝑒𝑥𝑡𝑒𝑟𝑛𝑎𝑙𝑙𝑦 𝐵 − 𝑖𝑛 𝑑𝑒𝑓𝑖𝑛𝑎𝑏𝑙𝑒,

𝑖𝑓𝑓 𝐵 𝑋 ≠ 𝜑 𝑎𝑛𝑑 𝐵 𝑋 = 𝑈

4

𝑋 𝑖𝑠 𝑡𝑜𝑡𝑎𝑙𝑙𝑦 𝐵 − 𝑖𝑛𝑑𝑒𝑓𝑖𝑛𝑎𝑏𝑙𝑒,

𝑖𝑓𝑓 𝐵 𝑋 = 𝜑 𝑎𝑛𝑑 𝐵 𝑋 = 𝑈

These above classifications of rough sets can be characterized by the size of the boundary region and structure. 𝑃𝑂𝑆𝐵 𝑋 = 𝐵𝑋 is called the B-positive region of 𝑋, it is the set of objects, which can with certainly be classified in the set 𝑋.

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 5

Rough Set and Current Trends in Computing

__________________________________________________ 𝑁𝐸𝐺𝐵 𝑋 = 𝑈 − 𝐵𝑋 is called the B-negative region of 𝑋, is the set of objects which are belonging to the complement of 𝑋.

2.3. Information Tables Data are often presented as a table, where each column is labelled by an attribute, each row is labelled by an object of interest and each entry of the table contains an attribute value. Such tables are composed of information systems, attribute-value tables and information tables. Here we will distinguish data sets in two forms: as information tables and as decision tables. In both cases the columns represent variables and rows represents cases (objects). All variables in information tables are called attributes while in decision tables it is needed to specify one variable called a decision variable and the remaining variables are attributes. It is used in the representation of data that will be utilized by Rough Set, where each object has a given amount of attributes. These objects are described in accordance with the format of the data table, in which rows are considered objects for analysis and columns as attributes. A decision table (DT) is the quadruple T = (U; A;C;D), where U is a nonempty finite set of objects called the universe, A is a nonempty finite set of primitive attributes, and C;D ⊆ A are two subsets of attributes that are called the condition attributes and decision attributes. Information can be represented in a form of a table. Such tables are composed of information systems, attribute-value tables and information tables. 2.4. Core and Reduct of attributes If the set of attributes is dependent, one can be interested in finding all possible minimal subsets of attributed, which lead to the same number of elementary set sets as the whole set of attributes (reducts) and in finding the set of all indispensable attributes (core).The concepts of core and reduct are two fundamental concepts of rough set theory. The reduct is the essential part of an IS, which can discern all objects discernible by the original IS. The core is the

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 6

Rough Set and Current Trends in Computing

__________________________________________________ common part of all reducts. To compute reducts and core , the discernibility matrix is used. The discernibility matrix has the dimension 𝑛 × 𝑛 , where n denotes the number of elementary sets and its elements are defined as the set of all attributes which discern elementary sets [𝑥]𝑖 and [𝑥]𝑗 . 2.5. Accuracy of approximation of rough set The accuracy measure of a set 𝑋 in 𝐵 ⊆ 𝐴 is defined as 𝜇𝐵 𝑋 =

𝑐𝑎𝑟𝑑 𝐵𝑋 𝑐𝑎𝑟𝑑 𝐵𝑋

The cardinality of a set is the number of objects contained in the lower and upper approximation of the set 𝑋. We see that 0 ≤ 𝜇𝐵 (𝑋) ≤ 1. If X is definable in 𝑈 then 𝜇𝐵 𝑋 = 1 and if 𝑋 is undefinable in 𝑈 then 𝜇𝐵 𝑋 < 1 . That is 𝑋 is rough or imprecise with respect to 𝐵 .

3. DEPENDENCY The semantic aspects of dependency is that , knowledge Q is derivable from knowledge P, if all elementary categories of Q can be defined in terms of some elementary categories of knowledge P. For this (i)

𝑅 𝑋∩𝑌 ⊆𝑅 𝑋 ∩𝑅 𝑌

(ii)

𝑅 𝑋∪𝑌 ⊇ 𝑅 𝑋 ∪ 𝑅 𝑌

The above two properties are of great importance. Pawlak ([1]) have mentioned that the knowledge included in a distributed knowledge base is less than in the integrated one. In other words dividing the knowledge base in to smaller fragments, in general, causes loss of information, which is logical one. Le U be the nonempty finite universe and let 𝐶1 , 𝐶2 be two coverings of 𝑈 . We say 𝐶1 ⊆ 𝐶2 if and only if for every 𝐾 ∈ 𝐶 then there exists at least one 𝑀 ∈ 𝐶2 such that 𝐾 ⊆ 𝑀 . 3.1 Definition: let 𝐶1 , 𝐶2 be two coverings of 𝑈. Then the covering 𝐶1 depends on 𝐶2 denoted by 𝐶2 ⇒ 𝐶1 if and only if 𝐵𝑁𝐶1 𝑋 ⊆ 𝐵𝑁𝐶1 (𝑋) for all 𝑋 ⊆ 𝑈 and 𝐶2 ⊂ 𝐶1 . 3.2 Proposition: Let 𝑈 be the universal set and let (𝑈, 𝐶1 ) , 𝑈, 𝐶2 and (𝑈, 𝐶3 ) be three covering approximation space where 𝐶1 , 𝐶2 , 𝐶3 be the three different coverings of 𝑈. Then

𝐶2 ⇒ 𝐶1 and

𝐶3 ⇒ 𝐶2 implies

𝐶3 ⇒ 𝐶1 .

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 7

Rough Set and Current Trends in Computing

__________________________________________________ Proof : We have 𝐶2 ⇒ 𝐶1 𝑋⊆𝑈 𝐵𝑁𝐶2 (𝑋) ⊆ 𝐵𝑁𝐶1 (𝑋) 𝐶3 ⇒ 𝐶2 . So that proved.

and

and

𝐵𝑁𝐶3 (𝑋) ⊆ 𝐵𝑁𝐶2 (𝑋)

𝐶3 ⇒ 𝐶2

these imply ,for any

𝐵𝑁𝐶3 (𝑋) ⊆ 𝐵𝑁𝐶2 (𝑋)

and 𝐶2 ⇒ 𝐶1

subset and

and 𝐶3 ⊆ 𝐶1 . This implies 𝐶3 ⇒ 𝐶1 . Hence

4. CONCLUSION In the analysis of data the important discovery is the dependency between attributes. The great advances in information technology have made it possible to store a great quantity of data. The capabilities of both generating and collecting data were increased rapidly. This explosive growth in data and databases has generated an urgent need for new computational techniques and tools that can intelligently and automatically transform the processed data into useful information and knowledge.

REFERENCES [1]

Pawlak, Z.W.(1991), ROUGH SETS, Theoretical Aspects of Reasoning About Data, Kluwer Academic Publishers.

[2]

Kalia,N.R.(2010), Ph.D. Thesis, North Orissa University.

[3]

Joint Rough Set Symposium (2013) October 11-14, Halix, Canada

[4]

Rough Set and Current Trends in Computing (2004) ,4 th Conference RSCTC- 2004, Uppsala, Sweden, June 1-5,2004.

[5]

Rough set and current trends in computing (2012) , 8 th International Conference , RSCTC- 2012, Chengdu, China, August 17-20,2012

[6]

Walczak, B. and Massart, D.L. (1999); Tutorial Rough Set Theory, Chemometrics and intelligent laboratory systems 47 (1999) 1-16,Elsevier

International

Authors Short Biography 1. Sri Nilaratna Kalia, M.Sc. (Mathematics), presently working as HOD, Department of Mathematics, U.N.College, Nalagaja, Mayurbhanj, Odisha as such has 29 years of teaching experience. His area of interest is Fuzzy Set

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 8

Rough Set and Current Trends in Computing

__________________________________________________ Theory, Rough Set Theory and Knowledge Engineering. To his credits he has completed two Minor Research Projects under UGC assistance. He has organised one State Level and one National Level Conferences under UGC assistance. He has published and presented more than 10 research papers in the reputed International Journals, International Conferences and National Conferences. He has also edited two Proceedings of National Conferences. He is also a life member of Orissa Mathematical Society and Founder Member of Mayurbhanj Mathematical Society. 2. Dr J.K.Mantri, M.Sc.; M.Phil.; M.Tech.; Ph.D. presently working in the Department of Computer Science and Engineering, North Orissa University, Baripada, Mayurbhanj, Odisha as such has 25 years of teaching experience. His area of interest is Soft Computing ,Data Mining, Graph Theory, Cryptography and Knowledge Engineering. He has published and presented more than 38 research papers in the reputed International / National Journals, International Conferences and National Conferences. He has also edited three Proceedings of National Conferences. To his credit he is a guide of four research scholars for Ph.D. and have published three books of International Edition. 3. Dr Debadutta Mohanty, M.Sc.;M.Phil.;Ph.D. presently working as HOD, Department of Mathematics, Seemanta Mahavidyalaya, Jharpokhoria, Mayurbhanj, Odisha as such has 27 years of teaching experience. His area of interest is Fuzzy Set Theory, Rough Set Theory , Soft Set , Near Set and Artificial Intelligence. To his credits he has completed one Minor Research Project and one Major Research Project under UGC assistance. He has organised one National Level Conferences under UGC assistance. He has published and presented more than 25 research papers in the reputed International Journals, International Conferences and National Conferences. He has also edited two Proceedings of National Conferences. He is also a life member of Orissa Mathematical Society, Indian Mathematical Society and Founder Member of Mayurbhanj Mathematical Society. To his credit he is a guide of more than seven research scholars for Ph.D. 4. Lipika Priyadarshini Kalia, M.Sc. ( Computer Science ) is continuing her M.Tech in Computer Science and Engineering at Kaustav Institute of Self Domain, Bhubaneswar, Odisha.



__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 9

Cyber Crime Assessments

__________________________________________________

Cyber Crime Assessments Yerra Shankar Rao1 1

Department of Mathematics, Gandhi Institute of Excellent Technocrats, Bhubaneswar, 752054, Orissa

Abstract: Cybercrime is a kind of crime that happens in “cyberspace”, that is crime that happens in the world of computer and the Internet. Although many people have a limited knowledge of “cybercrime”, this kind of crime has the serious potential for severe impact on our lives and society, because our society is becoming an information society, full of information exchange happening in “cyberspace”. Elderly is that vulnerable group who has been deprived from any information regarding latest technologies and innovation especially in the area of computer world and has lack of knowledge about internet and become the victim of different types of cybercrime. The main objective was to assess the types of cyber crime faced by the elderly. The research design was cross-sectional in nature. 60 respondents each residing in their homes and old age homes respectively were selected from different areas of Bhubaneswar city. Total sample size was 120. The purposive random sampling technique was used to collect the data. Finding of the study revealed that majority of respondents reported that they were not affected by cyber pornography, phishing, money laundering, password sniffer, credit card fraud and even web jacking either residing in own homes or old age homes Keywords: cybercrime, knowledge.

1. INTRODUCTION As the Internet, mobile phones, and other computer technologies have flourished, criminals have found ways to use them for old-fashioned goals such as theft, fraud, intimidation, and harassment. Crimes committed through the use of computer systems are known as cybercrimes. Here are some common cybercrime faced by every individual: a)

Assault by Threat – threatening a person with fear for their lives or the lives of their families or persons whose safety they are responsible for (such as

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 10

Cyber Crime Assessments

__________________________________________________ b) c) d)

e)

f)

g)

employees or communities) through the use of a computer network such as email, videos, or phones. Child pornography – the use of computer networks to create, distribute, or access materials that sexually exploit underage children. Cyber laundering – electronic transfer of illegally-obtained monies with the goal of hiding its source and possibly its destination. Cyber stalking – express or implied physical threats that creates fear through the use of computer technology such as email, phones, text messages, webcams, websites or videos.[3] Cyber terrorism – premeditated, usually politically-motivated violence committed against civilians through the use of, or with the help of, computer technology. Cyber theft is using a computer to steal. This includes activities related to: breaking and entering, DNS cache poisoning, embezzlement and unlawful appropriation, espionage, identity theft, fraud, malicious hacking, plagiarism, and piracy. Hardware Hijacking - Researchers at Columbia University recently discovered a serious security flaw in certain printers, as well. Many printers automatically update their software when accepting a print job, connecting to the Internet to download the latest print drivers.

The above table indicated that majority of the respondents 60 percent were affected by hacking who were residing in homes whereas more than half of respondents (58.3 percent) were not affected by hacking who were residing in Old Age homes. The reason may be that respondents residing in old age homes don’t use computer frequently. Data also showed that majority of the respondents 93.3 percent were not targeted by cyber stalking whether residing in own homes or old age homes. Majority of respondents reported that they were not affected by cyber pornography, phishing, money laundering, password sniffer, credit card fraud and even web jacking either residing in own homes or old age homes. The reason could be rare use of technology and ignorance about these items. h)

Spam - Unsolicited mass e-mail, known colloquially as “spam”, is more than annoying: spam messages can be used to trick people into giving up sensitive personal information (known as “phishing”), or as carriers for computer worms and viruses. [1]

i)

Script kiddies-A wannabe hacker. Someone who wants to be a hacker (or thinks they are) but lacks any serious technical expertise. They are usually only able to attack very weakly secured systems.

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 11

Cyber Crime Assessments

__________________________________________________ j)

Insiders- They may only be 20% of the threat, but they produce 80% of the damage. These attackers are considered to be the highest risk. To make matters worse, as the name suggests, they often reside within an organization.

k)

Advanced Persistent Threat (APT) Agents-This group is responsible for highly targeted attacks carried out by extremely organized state-sponsored groups. Their technical skills are deep and they have access to vast computing resources. [2]

Additionally, opening spam e-mail could leave you vulnerable to “spoofing”, where a spammer gains the ability to send more of this junk e-mail through your account. To keep oneself updated one has to depend on cyber facilities. Not only new or young generation, but adults and even elderly are also taking benefits of internet, computer, etc to perform their day to day activities like paying bills, ticket booking, online shopping, etc but if one has to updated with latest information about computers and cybers he/she may easily become the target of cybercrime. In this case, the most vulnerable group is elderly because most of them don’t use computers and internet regularly as the young ones do. They use it occasionally and thus are easily targeted towards cybercrime.

2. METHODOLOGY 2.1Aims and Objectives The aim and objective of the study is to asses about the types of cyber crime faced by Elderly. Research design A research design is the specification of methods and procedure for acquiring the information needed. The research design for the present study was cross-sectional research design. Cross-sectional method was used because this method is extensive and can be used to collect data from a large sample at a particular point of time. 2.2Sampling Design The sample for the present study consisted of 120 respondents (60 elderly residing in the homes and 60 elderly residing in old age homes). The purposive random sampling technique used to select the sample from the selected area. 2.3Methods of Data Collection As the study is Cross- Sectional in nature, survey method was adopted to collect the information from the target population. A well structured and pre tested interview schedule was given to the subjects to their response. Interview schedule was used with great care so as to have minimum possible biasness. “English” version of the interview schedule was used.

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 12

Cyber Crime Assessments

__________________________________________________ 2.4Data Analysis For the analysis of data the following steps were followed: 2.4.1Coding- A coding plan was developed in which code numbers were given to every question and its responses and then tabulated on the coding sheet. 2.4.2Tabulation- The coded data was transferred from the coding sheet to comprehensive tables to give a clear picture of the findings. 2.4.3 Statistical Analysis- The descriptive statistic applied was frequency and percentage distribution Distribution of respondents on the basis of type of cybercrime faced by elderly (N=120).

The above table indicated that majority of the respondents 60 percent were affected by hacking who were residing in homes whereas more than half of respondents (58.3 percent) were not affected by hacking who were residing in Old Age homes. The reason may be that respondents residing in old age homes don’t use computer frequently. Data also showed that majority of the respondents 93.3 percent were not targeted by cyber stalking whether residing in own homes or old age homes. Majority of respondents reported that they were not affected by cyber pornography, phishing, money laundering, password sniffer, credit card fraud and even web

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 13

Cyber Crime Assessments

__________________________________________________ jacking either residing in own homes or old age homes. The reason could be rare use of technology and ignorance about these items.

REFERENCE [1]

S. Subashini, and V. Kavitha. “A survey on security issues in service delivery models of crime computing,” Journal of Network and Computer Applications, vol. 34, no. 1, pp. 1-11, (2011).

[2]

Shinder, D.L.(2002), Scene of the Cyber crime: Computer Forensics Handbook. Syngress Publishing Inc. 88 Hingham Street, USA.

[3]

Sutherland, E.(1939), Principles of Criminology. Fourth edition.

[4]

Vladimir, G.(2005), International Cooperation in Fighting Cyber Crime.www.crimeresearch.org

[5]

Types of cyber crime, http://www.slideshare.net/ferumxxl/types-ofcomputer-crimes. Accessed on December 2012.

[6]

Different types of cyber crime, http://www.faronics.com/news/blog/7types-of-cyber-criminals/. Accessed on November 2012.

[7]

Six common types of cyber crimes, http://www.selectitschools.com /Articles/tabid/67/Entryld/3/six-common-types-of-cybercrime.aspx. Accessed on December 2012.

[8]

Fighting types of cybercrime, http://www.computerweekly.com /feature/A-new-approach-to-fighting-varied-types-of-cybercrime-cases. Accessed on January 2013.

[9]

Hemraj saini,Yerra ShankarRrao, T.C.Panda “ Cyber crime and their Imapct A Review” IJERA March 2012 ,Vol 2 P.P No 201-206

About the Author Mr Y Shankar Rao is doing Ph.D. in SOA (ITER ) BBSR .He is also working as a Asst Prof .Department of mathematics in GIET BBSR.



__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 14

Survey on security issues of Wireless sensor networks

__________________________________________________

Survey on security issues of Wireless sensor networks K. Lakshmanarao1, Hima Bindu Maringanti2 1IT-Department, GMRIT, Rajam, 532127, India. 2Department of Computer Applications, North Orissa University, Baripada,757003, Mayurbhanj, Odisha, India.

Abstract: Wireless sensor networks is an emerging technology, which is used to sense highly sensitive data for further processing such as for communication of sensitive data, to monitor the restricted area like battle field or to analyze the behavior of an environment and etc. These networks are vulnerable to various security threats during sensitive data communication. Since the sensor nodes have limited computation capability and resource constraint, the traditional cryptographic methods with high computing power and heavy usage of resources are infeasible to WSNs. Hence security in wireless sensor networks is a challenging task. In this paper, we survey the present state of art in security mechanisms for WSNs. We review several protocols which provide security to WSNs, with an emphasis on security requirements of WSN, unattended operation, attacks on protocol stack of WSN. Keywords: security attacks, unattended operation, protocol stack.

1. INTRODUCTION: The advances in Micro-Electro-Mechanical Systems (MEMS) [2] technology facilitated the development of smart sensors that are small in size and communicate in short distances. The sensors equipped with MEMS are low cost compared to the traditional sensor nodes. These sensors are very small in size, with limited processing and computing resources. Smart sensor nodes are low power devices equipped with one or more sensors, a processor, memory, a power supply, a radio, and an actuator. An actuator [2] is an electromechanical device that can be used to control different components in a system. In a sensor node, actuators can actuate different sensing devices, adjust sensor parameters, move the sensor, or monitor power in the sensor node. The wireless sensor nodes have many limitations [3] such as memory limitations, energy limitations, and speed limitations in

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 15

Survey on security issues of Wireless sensor networks

__________________________________________________ processing. Hence it is infeasible to use the traditional cryptographic methods to provide security to the WSNs. Thus, we have to consider limitations of WSNs to develop effective security methods to protect data communication between sensor nodes. Sensor nodes fulfill the requirements of different application objectives[4] such as industrial, environmental, manufacturing, traffic surveillance, health care, military sensing, air traffic control, distributed robotics and commercial fields. In general the sensor nodes [5] are deployed densely in one area are fixed (static) to that region most of the time. Mobile nodes deployed according to application requirements. The sensor nodes keep monitoring the network, and if any event occurs, one of the surrounding sensor nodes can detect it, generate a report, and transmit the report to a Base Station through multi-hop wireless link. Base station acts as a gateway between Wireless Sensor Network and external world. The new WSNs applications occupies more than 50% growth rate in the present embedded system market per year[6]. The new wireless sensor applications force the need of new security methods to protect the data of WSN. The sample structure of WSN is as shown in fig.1; the base station is also called as a sink [8]. The sensor nodes are usually scattered in a sensor field and each of these scattered sensor nodes has the capability to collect data and route data back to a sink and end users as shown in Fig.1. The sink may communicate with the task manager node via the Internet or via satellite communications. Securing WSNs is very difficult due to limited energy, short communication range, low bandwidth, openness of wireless communication, the need of heavy computation due to extensive mathematical calculations to ensure data security and limited storage in each wireless sensor node [2]. In general, four categories of cryptographic primitives [6] are used to ensure security in WSNs, first one is public key primitives, secondly private key primitives, and third one is hashing functions, and last is elliptic curve cryptography (ECC). In each category, efficient security methods proposed by many researchers. The implementation of these security methods can be done either in software or hardware.

Fig.1: Architecture of Wireless sensor networks

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 16

Survey on security issues of Wireless sensor networks

__________________________________________________ Many researchers proposed mechanisms to provide security for WSNs in literature. This paper concentrates on different security aspects of WSNs such as the security requirements, unattended operations, and attacks on protocol stack. Section two describes security requirements of WSN and section three describes attacks on protocol stack.

2. SECURITY REQUIREMENTS: The wireless sensor nodes should have an efficient security mechanism to defend themselves against security attacks while communication of secret (key exchange) information, the resources from threats and misbehavior of nodes. Every security mechanism of WSNs should satisfy the below listed security requirements [7]. 2.1 Data confidentiality [8]: In order to avoid passive attacks like eavesdropping the sensor node should provide confidentiality to the sensed data during transit. In general data encryption techniques are used to achieve data confidentiality. Enforcement of access control policies at base station along with encryption avoids misuse of information during traffic analysis on the cipher text. To obtain better data confidentiality, we should follow below listed rules [7, 8]:  The sensor node should avoid leakage of sensitive data, that is, do not share the keys used for encryption and decryption with the neighboring nodes.  Secure channels should be built into WSNs.  Encrypt the identities of sensor node to defend against traffic analysis attack. 2.2 Integrity [7, 8]: The data integrity ensures that the received data has not been altered or deleted during data transit. The data confidentiality [3] does not imply that the data are secure. The adversary could still be able to modify the data to affect the overall operation of the network. To ensure data integrity, the techniques like message authentication codes or cyclic codes are used. 2.3 Freshness: Data freshness [3, 9, 7] suggests that the data are recent, and assures that no old message has been resent. This requirement is especially important when shared keys strategies are being used. To solve this problem, a time dependent counter or a nonce may be added to the packet for assuring data freshness. Data freshness also protects data aggregation [9] schemes against replay attacks by ensuring that the transmitted data is recent.

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 17

Survey on security issues of Wireless sensor networks

__________________________________________________ 2.4 Authentication [7]: The authentication of communicating nodes is very essential to stop outsiders from inserting, spoofing packets or unauthorized access of a node. The authentication ensures that the communicating node is the one that it claims to be. Authentication [9, 10] can be provided by symmetric key cryptography and by using MAC codes that is the sender and the receiver share a secret key to compute the message authentication code (MAC) for all transmitted data. 2.5 Availability [10, 8]: The Denial of service (DOS) attack on sensor node usually lead to failure of availability. Loss of availability may also open a back door for enemy invasion in battlefield surveillance applications. Lack of availability may affect the operation of many critical real time applications such as those in the healthcare sector that require 24-hour operation. The availability of entire WSNs fail for the following reasons:  The node is unavailable, when a sensor node battery power drains due to additional processing.  The additional communication between nodes drains available battery power hence it also leads to unavailability of a node.  Single point of failures is also leads to the failure of availability of entire WSNs. To solve this problem more number of sensor nodes are deployed to perform the same task that is node redundancy technique is used to tolerate the availability loses.

3. UNATTENDED OPERATION: Most of the sensor nodes are unused that is unattended for long time periods in many WSN applications. This situation leads to the below three problems.   

Physical attacks on nodes: in this case the sensor nodes are deployed in an environment open to adversaries, which causes the physical attack on nodes. Failure of remote management: The node does not have physical contact with the base station due to physical attack or network maintenance problems. Failure in no central point of administration: In wireless sensor networks data distribution is most important aspect with no central point of administration.

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 18

Survey on security issues of Wireless sensor networks

__________________________________________________ This will be failed when a WSNs unable to control physical attacks on nodes and failure in no central point of administration.

4. THE PROTOCOL STACK [1]: The sensor nodes protocol stack is given in Fig. 2. This protocol stack has five layers and three management planes. The layers application layer, transport layer, network layer, data link layer and physical layer, different types of application software can be built and used on the application layer. The transport layer helps to maintain the flow of data based on sensor application requirement. The network layer takes care of routing. The data link layer concentrates on medium access management. The physical layer takes care about the needs of a simple but robust modulation, transmission and receiving techniques. In addition, the power, mobility, and task management planes monitor the power, movement, and task distribution among the sensor nodes. These planes help the sensor nodes coordinate the sensing task and lower the overall power consumption.

Fig . 2. Sensor node protocol stack 4.1 Attacks on Protocol stack: Every layer of a protocol stack have many security threats and solution to each threat. Attacks on Physical layer: Table 1 presents security attacks on physical layer of WSNs. Attack on Data Link layer: Attackers obtain access to transmission media, creating radio interference, preventing from legitimate sensor nodes to communicate or launching DoS attacks. Table-2 is presented the link layer attacks on WSNs. Attacks on Network layer: Attackers can gain access to routing table of a node and redirect the traffic, propagating false routing information into the WSN or launching DoS attacks. Table-3 Presented the routing attacks on WSNs. Attacks on Application and Transport Layer: WSNs are vulnerable to transport layer attacks. Table-4 lists some of the Application and transport layer attacks.

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 19

Survey on security issues of Wireless sensor networks

__________________________________________________

Table-1 the physical layer attacks on WSNs.

Table-2. Data Link Layer attacks on WSN's

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 20

Survey on security issues of Wireless sensor networks

__________________________________________________

Table-3 Network layer attacks on WSNs.

Table-4 Application and Transport layer attacks on WSNs.

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 21

Survey on security issues of Wireless sensor networks

__________________________________________________ 5. CONCLUSIONS: Security should be integral part of a wireless sensor networks to satisfy need of user requirements. This paper explores security requirements, unattended operations and attacks on protocol stack of WSN’s; describes the security requirements and appropriate counter measures to each requirement. The explanation of attacks in each layer of a protocol stack and the counter measure for each attack. It is hoping by reading this paper, readers can have a better view of different WSNs' attacks; so, it leads to design secure WSNs.

REFERENCES: [1]

I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci, A survey on sensor networks, IEEE Communications Magazine 40 (8) (2002) 104–112.

[2]

Jennifer Yick, Biswanath Mukherjee, Dipak Ghosal, “Wireless sensor network survey ”, Since Direct on Computer Networks 52 (2008) 2292– 2330 .

[3]

Luis E. Palafox, J. Antonio Garcia-Macias “chapter XXXIV Security in Wireless sensor networks” .

[4]

K.Lakshmanarao et al., “Survey on different issues of Sensor-Cloud”, International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 2 Issue 10, October – 2013.

[5]

T.Kavitha, D.Sridharan, “Security Vulnerabilities In Wireless Sensor Networks: A Survey”, Journal of Information Assurance and Security 5 (2010) 031-044.

[6]

Abidalrahman Moh’d, et al. , “A Secure Platform of Wireless Sensor Networks”, Proceeding of Science Direct 2nd International Conference on Ambient Systems, Networks and Technologies on Computer Science 5 (2011) 115–122.

[7]

Jaydip Sen, “A Survey on Wireless Sensor Network Security”, International Journal of Communication Networks and Information Security (IJCNIS) Vol. 1, No. 2, August 2009.

[8]

Peter Stavroulakis, Mark Stamp(Eds.), “chapter-26 of Handbook of Information and Communication Security”, Springer, 2010 ISBN: 978-3642-04116-7.

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 22

Survey on security issues of Wireless sensor networks

__________________________________________________ [9]

S. Ozdemir, Y. Xiao , “ Secure data aggregation in wireless sensor networks: A comprehensive overview”, Elsevier Computer Networks 53 (2009) 2022–2037.

[10]

Sankardas Roy, Mauro Conti, Sanjeev Setia, and Sushil Jajodia, “Secure Data Aggregation in Wireless Sensor Networks”, IEEE Transactions on information forensics and security, VOL. 7, NO. 3, June 2012.

[11]

H.Iadidoleslamy, “A Comprehensive Comparison of Attacks in Wireless Sensor Networks”, IJCCN International Journal of Computer Communications and Networks, volume 4, Issue 1, February 2014, ISSN 2289-3369.

[12]

A. Perrig, J. Stankovic and D. Wagner, “Security in Wireless Sensor Networks”, In Communications of the ACM Vol. 47, No. 6, 2004.

[13]

S.Mohammadi and H.Jadidoleslamy, “A Comparisonof Physical Attacks on Wireless Sensor Networks”, International Journal of Peer to Peer Networks (IJP2P), Vol.2, No.2, pp. 24-42; 2011.

[14]

S.Mohammadi, R.A.Ebrahimi and H.Jadidoleslamy, “A Comparison of Routing Attacks on Wireless Sensor Networks”, International Journal of Information Assurance and Security (JIAS), Vol. 6, pp. 195-215; 2011.

[15]

Y.Hu, A.Perrig and D.B.Johnson, “Rushing Attacks and Defense in Wireless Ad Hoc Network Routing Protocols”, Carnegie Mellon University; Rice University; San Diego, California, USA; 2003.

[16]

Prachi Bansal, Beenu Yadav, Sonika Gill and Harsh Verma “Security Attacks in Wireless Sensor Network”, International Journal of Scientific & Engineering Research, Volume 3, Issue 4, April-2012 ISSN 2229-5518.

[17]

C.Karlof and D. Wagner, “Secure Routing in Wireless Sensor Networks: Attacks and Countermeasures”, Elsevier's AdHoc Networks Journal, Special Issue on Sensor Network Applications and Protocols; 2003.

AUTHOR BIOGRAPHY K.Lakshmanarao is a PhD student at North Orissa University, baripada, Orissa. He is working as research scholar under esteemed guidance of Prof. Hima Bindu M. He received his MTech in Computer Science and Technology (CSE) from Andhra university, Visakhapatnam, AP, India, in September 2009. His research interests are including Computer Networks (especially Wireless Sensor Network), Information Security. He may be reached at [email protected].

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 23

Survey on security issues of Wireless sensor networks

__________________________________________________ Hima Bindu Maringanti received doctorate (Ph.D.) Artificial Intelligence from Indian Institute of Information Technology, Allahabad, India in 2009. She has worked with BHABHA Atomic Research Institute, ISM, Dhanbad, IIIT, Allahabad. Presently she is working as an Professor and head of Department of Computer Applications, North Orissa University, Orissa, India. Her research areas of interests are Artificial Intelligence, Image Processing and Pattern Recognition, Natural Language Processing and Cognitive Science, Computer networks and Information security. She has published many papers in national and international conferences and journals. She is the review board member of various reputed journals. She is board of studies member for various autonomous institutions and universities. She can be contacted by email [email protected].



__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 24

Machine Learning Approaches for Solving Computationally Intensive Problems

__________________________________________________

Machine Learning Approaches for Solving Computationally Intensive Problems Partha Sarathi Mishra Dept. of Computer Applications, North Orissa University, Baripada, 757003, Odisha, India

Abstract: Machine learning approaches are progressively gaining application to the problems for which there is no human expertise. Even though human expertise should be used when it is needed and available; it often proves less adequate for automated problemsolving routines. Financial domain is a challenging and promising task for the researchers, which comprises of different computational intensive problems like Stock index prediction, Bankruptcy prediction, corporate failure etc. The present paper introduces some of the novel machine learning approaches like Multilayer perceptron with back-propagation (MLP-BP), Genetic Algorithm (GA) and hybridized radial basis function networks with GA (RBFNsGA) for assessing and comparing the performance of one over the other by taking stock index predictionas one of the computationally intensive problems into consideration.Finally, the results show that the cross-fertilised RBFNs-GA model is more practical having significant advantages in respect of error rates, fast convergence speed and good generalization ability than the others. Keywords:MLP-BP, GA, RBFNs, Stock index

1. INTRODUCTION In the last decade, financial prediction domain is the global place where the researchers have their keen interest to solve financial problems using new emerging techniques. Generally these techniques include Neural Networks, Genetic Algorithms, Genetic Programming, Grammatical Evolution, Particle Swarm Optimisation and Ant Colony Models etc. While neural networks, and to a lesser extent generic algorithms and ant algorithms, have attracted considerable interest, other forms of computation intelligence have received relatively less attention. Radial basis function (RBF), emerged as a variant of artificial neural network, have been successfully applied to a large diversity of applications including prediction, classification, interpolation, chaotic time-series modelling, control engineering, image restoration, data fusion etc.[1, 2]. Genetic algorithms

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 25

Machine Learning Approaches for Solving Computationally Intensive Problems

__________________________________________________ are adaptive and robust computational procedures modelled on the mechanics of natural genetic systems [3]. Due to the extensive global optimization capability, Genetic algorithms have been applied on neural network and so on. Hence we have taken three different machine learning techniques such as MLP-BP, GAand hybridized RBFNs-GA into considerations using different potential indicators [4] for finding the performance of each learning algorithmsamong them for the financial prediction modelling. The advantages of each approaches offers an exact method of prediction for easy understanding of the users like expert systems. The rest of the paper is organised as follows. The second section provides a brief description of MLP-BP, GA and RBFNs. The third section describes crossfertilisation method of GA and RBFNs. The fourth section reports the model development and the results of the experiments. The final section five discusses the conclusions and future research issues.

2. BACKGROUND Research into this financial prediction using the past data has a long history. Prediction of stock market index, corporate failure, bankruptcy, and as well as bond ratings using past financial data is also a well-documented topic. Early studies of financial domain like index prediction of stock market using artificial neural network [5, 6], bankruptcy prediction using statistical techniques such as multiple discriminant analysis, logitand probit regression models, andcorporate failure prediction using grammatical evolution [7], genetic algorithmic technique [8] is the remarkable one. The financial application of GAs is growing with successful applications in trading system [9], stock selection [10], portfolio selection [11], bankruptcy prediction [12] and credit evaluation [13]. Although there are some statistical methods have been used for financial prediction, it suffers from the shortcomings. Basically the potential indicators based machine learning approaches can prove to be fruitful in the computationally intensive financial problems.

2.1. MULTI-LAYER PERCEPTRONS Multilayer perceptions (MLP) are feed-forward neural network with one or more layers of neurons, called hidden layers, between the output layer and the network’s input. A three-layer feed-forward network that is commonly used for the purpose of predictions with an adequate number of neurons in the hidden layer is shown in Figure 1. It consists of an input layer and an output layer, corresponding to model input and output variables x and y, respectively, as well as a hidden layer.

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 26

Machine Learning Approaches for Solving Computationally Intensive Problems

__________________________________________________

y ( x)  wo1

( 2)

m

  xi wi1 , where wi1( 2) ,1  i  m are (1)

( 2)

the weights

i 1

associated with the neurons between hidden and output layer. The value of

xi ,1  i  m can be computed as follows: (1)

xi

(1)

 f ( i ) n

 i  x0 ( 0) woi (1)   x j ( 0) w ji (1) j 1

Where f is known as the activation function. The two popular activation functions are: i) logistic, and ii) tansigmoid. Logistic:

f ( )  1exp(1  )

Tansigmoid:

 ) f ( )  11exp( exp( )

The logistic activation function bounds f in the range (0, 1), while tansigmoid function bounds f in the range (-1, 1).In the model development stage, samples of data (x,y) called training data, are generated from simulation or measurement. The neural network is then train by adjusting the weights wij, such that the neural network predicted output “best” matches that of training data, the target output. This is done by minimizing the error function between the predicted output of the neural network and the targeted outputs.MLP were generally not popular due to the lack of effective learning algorithms, but this changed since the development of back-propagation (BP) learning algorithm proposed by Rumelhart, Hinton and Williams [14].

Figure 1.Multi-layer Perceptron with 11 neurons in the input layer and 5 neurons in the hidden layer and one neuron in the output layer.

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 27

Machine Learning Approaches for Solving Computationally Intensive Problems

__________________________________________________ 2.2. GENETIC ALGORITHMS Genetic algorithms (GA) was first introduced by John Holland to model the natural evolution development of highly complex, highly fitted organisms from lower to complex one. It comes from the natural law of survival of the fittest genetics and biotechnology, is a group operation. GAs are stochastic search techniques that can search large and complicated spaces on the ideas from natural geneticsand evolutionary principles. These are particularly suitable for multi-parameter optimization problems with an objective function subject to numerous hard and soft constraints. The operation is targeted to individual groups of all, through choice, crossover and mutation operators producing new generations of groups, until the results are satisfactory. Due to the random genetic algorithm, a search of all solutions is possible, so they can find the global optimum. Therefore, the canonical GA can be described as an algorithm that turns one population of candidate encoding and corresponding solutions into another using a number of stochastic operators as shown in Fig.2.

Figure 2. Flow chart of Canonical Genetic Algorithm.

2.3. RADIAL BASIS FUNCTION NEURAL NETWORKS Radial basis function neural network(RBFNs) was proposed by Broomhead and Lowe[15], and this neural network differs from neural networks with sigmoidal activation functions in that it utilizes basic functions in the hidden layer that are locally responsive to input stimulus. RBF are embedded in a two layer neural network, where each hidden unit implements a radial activated function. The

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 28

Machine Learning Approaches for Solving Computationally Intensive Problems

__________________________________________________ output units implement a weighted sum of hidden unit outputs. The RBF network topological structure is shown in Fig.3. In RBF networks, the outputs of the input layer are determined by calculating the distance between the network inputs and hidden layer centers. The second layer is the linear hidden layer and outputs of this layer are weighted forms of the input layer outputs. Each neuron of the hidden layer has a parameter vector called center.Here input vector𝑿 = [𝒙𝟏, 𝒙𝟐, … … . , 𝒙𝑵 ]; 𝑪𝒊 - the centre of RBF neural network, a constant vector with the same dimension as 𝑋; 𝑵-the dimension of the input vector; 𝑴-neurons number of the hidden layer; 𝝋(.) Radial basis function; ||𝑿 − 𝑪𝒊 ||-Euclidean distance between 𝑿 and 𝑪𝒊 ; 𝒋output node,𝑗 = 1,2 … … 𝑃; 𝑾𝒊𝒋 - the weight value which connected the 𝒊-th hidden node with the 𝒋-th output node.As shown in Fig.3, ideal output 𝒀𝒋 𝒋 = 𝟏, 𝟐 … . . 𝑷 , the actual output 𝒀^𝒋 and the weight value of the output layer 𝑾𝒊𝒋 can be obtained by the RBF neural network. Choosing Gauss function 𝝋i 𝒙 =exp [-||x-ci||2 / (2𝝈𝟐 )] as radial basis function, the actual output 𝒀^𝒋 is calculated by the following formula:

𝒀^𝒋 =

𝒎 𝒊=𝟏 𝒘ij

𝝋i(x) =

𝒎 𝒊=𝟏 𝒘ij

exp [-||x-ci||2 / (2𝝈𝟐 )]

(1) Then, the weight value 𝑤𝑖𝑗 is adjusted to satisfy the following formula, from which the final result of the RBF neural network can be obtained.

(2) Where,M is the number of neurons in the hidden layer (i ∈ {1,2 … . . 𝑀), P is the number of neurons in output layer(j ∈ {1,2, … … 𝑃), Wij is the weight of the ith neuron and jth output, 𝜑i is the radial basis function, 𝐶𝑖 is the Centre vector of the ith neuron and 𝑌𝑗 ^ Network output of jth neuron.

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 29

Machine Learning Approaches for Solving Computationally Intensive Problems

__________________________________________________

3. CROSS-FERTILIZATION OF GA AND RBFNS In this paper GA uses real-coded, makes Ci (radial basis function centre), σi(variance to RBF), and the right to export unit value of RBF network as multi parameter. In this if the binary coding is used, then the coding string is too long and the string should be translated to the real and the fitness is calculated. Hence real coding is used in this paper as shown in above Fig.4. The right to export unit, radial basis function centre and the variance to the RBF of chromosome are assigned to the network structure and is arranged by certain order. The training samples were seen as the input and output of the network.

3.1 Defining Fitness Function The fitness function for the above method is defined as follows: F(x) = 1/ [1+ ( 𝜹=(

𝑵 𝒊=𝟎[

𝑵 𝒊=𝟎[

𝒚(𝒌)-𝒚^(𝒌)]2) / N]

𝒚(𝒌)-𝒚^(𝒌)]2) / N

(3) (4)

Where, y (k) is the actual output,y^ (k) is the prediction output, and x is the chromosome.The fitness function F(x) and prediction error function 𝛿 justifies the optimal output.

3.2 Setting GA parameters It is seen that crossover Pcross and mutation rate Pmut affects the genetic algorithm. If Pmut is bigger, the speed of the new entity is faster, causing to the

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 30

Machine Learning Approaches for Solving Computationally Intensive Problems

__________________________________________________ damage of the genetic model. If Pcross is bigger, the genetic algorithm will become random search algorithm causing Pcross to too small to generate a new individual. Therefore the crossover and mutation rate is adjusted by the following way as shown in equations below.

Pmut1= 0.95X (1-gen/G), Pmut2= 0.4X (1-gen/G) Pcross1=0.3 X (1-gen / G) Pcross2=0.01 X (1-gen/ G) Pmut=

Pmut1 −

(5) (6) Pmut 1− Pmut 2 X f−favg fmax −favg

𝑓 ′ ≥ 𝑓𝑎𝑣𝑔

Pmut1𝑓 ′ < 𝑓𝑎𝑣𝑔 (7)

Pcross= Pcross1 − Pcross1

Pcross 1− Pcross 2 X fmax −f fmax −favg

𝑓 ≥ 𝑓𝑎𝑣𝑔

𝑓 < 𝑓𝑎𝑣𝑔 (8)

By the above analysis, the GA based RBFNs optimized algorithm can be stated as follows: a. b. c.

d.

First give the initialization groups, and set evolution generation i.e., gen=0. Then estimate the each value of the P (gen). Now two individuals form P (gen) is selected. Crossover and mutation of these individuals is performed. See that new generation is achieved. Then set Gen=Gen+1. If the prediction accuracy is satisfied, then stop the execution, otherwise goto step (b).

4. EXPERIMENTAL MODEL SETUP A total of 378 days of daily closing price is collected from the Dow Jones Index from 24th July’2000 to 22nd October’ 2007 with a period of seven years. The Figure 5 shows the daily price change of DJIA. For the required features with no data available, the value zero will be assigned. We then remove those entries that have missed more than half of the required features, in order to reduce the possible noise to the benchmark. To be specific, every row contains eleven attributes of

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 31

Machine Learning Approaches for Solving Computationally Intensive Problems

__________________________________________________ different potential indicators extracted from their index price as i) 10-dayEMA, ii)20-day EMA, iii)30–day EMA, iv) ADO,v)STI, vi)RSI9,vii)RSI14 ,viii)PROC27,ix)CPACC,x)HPACC, and xi)10-day William’s %R. The whole dataset is divided into two subsets: one for training and the other one for testing. The above data set has been used to study the performance of the three proposed machine learningapproaches like MLP-BP, GA and GA-RBFNs using MATLAB. The error measures considered to evaluate the performance of the experiments are mean squared error (MSE= 𝑖(𝑑𝑖− 𝑦𝑖 )2/n) and mean absolute percentage error (MAPE = 𝑖(|(di-yi)/ di|)/n), where di is the real value, yi is the predicted value and n is the number of data.

4.1 Experimental Result Analysis The results for all the methods considered are shown in Table 1. From the table it is observed that the MLP-BP and GA method have reached good results both in training and validating the dataset. However cross-fertilised GA-RBFNs shows the outstanding performance in all the performance measuring factors likeR-squared value,Correlation, Root Mean Squared Error (RMSE), Mean Squared Error (MSE), Mean Absolute Error(MAE), and Mean Absolute Percentage Error(MAPE) between training and validating. Hence this cross fertilised GA-RBFNs model is the most acceptableone than the other two models by considering the higher R 2 value and lower MAPE value which indicates the better prediction.The Fig.6 shows the error rate vs. number of neurons of theoutstanding cross-fertilised GARBFNs model.

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 32

Machine Learning Approaches for Solving Computationally Intensive Problems

__________________________________________________

5. CONCLUSIONS The combination of genetic algorithms with various forms of artificial neural structure is a remarkable area of current research. The objective of this study was to find out the performances of different machine learning approaches including their cross-fertilisation methods for prediction of the stock market. The developed model showed an outstanding performance in achieving our goal. Several extensions of other biologically inspired algorithms like ACO-FLANN and PSORBFN in this study are indicated for future work. The developed GA-RBFNs model will be tested on additional datasets to further assess the generality of the promising results in this study.

ACKNOWLEDGEMENTS Author would like to thanks Prof. M. HimaBindu, H.O.D, Deptt. Of Computer Application for her consistent helping hand and encouragement for writing this paper.

REFERENCES [1]

Chen, S., Cowan, C.F.N., Gant, P.M. (1991, “Orthogonal Least Squares Learning Algorithm for Radial Basis Function Networks”, J.IEEE Transactions on Neural Networks 3, pp.302-308.

[2]

Yu, C.-g., Ying, Y.-b Wang, J.-p. (2005), “Determining Heating Pipe Temperature in Greenhouse Using Proportional Integral Plus Feedforward

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 33

Machine Learning Approaches for Solving Computationally Intensive Problems

__________________________________________________ Control and Radial Basis Function Neural Networks”, J. Journal of Zhejiang University SCIENCE 6A (4), pp. 265-269. [3]

Holland,J.H.(1975), Adaptation in Natural and Artificial Systems, Ann Arbor, The University of Michigan Press.

[4]

Mishra, P. S. and Dehuri, S, “Potential indictors for stock index prediction: A Perspective”, International Journal of Electronic Finance, Vol. 6, No. 2, pp.157-183,DOI: 10.1504/IJEF.2012.048465, ISSN-Online-1746-0077.

[5]

Brabazon, A. (2002b), “Financial time series modeling using neural networks: An assessment of the utility of a stacking methodology”, in proceedings of AICS 2002, Lecture Notes in Artificial Intelligence(2464),(Eds,) O’Neill et al., Springer,pp.137-144.

[6]

Nygren, K.(2004), Stock Prediction—A Neural Network Approach, Master’s Thesis, Royal Institute of Technology, KTH, Sweden.

[7]

Brabazon, A. and O’Neill, M. (2004), “Diagnosing Corporate Stability using Grammatical Evolution”, International Journal of Applied Mathematics and Computer Science, 14(3), pp.363-374.

[8]

Varetto, F. (1998), “Genetic algorithms in the analysis of insolvency risk”, Journal of Banking and Finance, 22(10), pp.1421-1439.

[9]

Colin, A. M. (1994), “Genetic algorithms for financial modeling”, In Deboeck, G.J. (Eds.), Trading on the Edge, New York: John Wiley, pp. 148173.

[10]

Mahfoud, S and Mani, G. (1996), "Financial forecasting using genetic algorithms", Applied Artificial Intelligence, 10(6), pp. 543-565.

[11]

Rutan, E. (1993), “Experiments with optimal stock screens”, Proceedings of the 3rd InternationalConference on Artificial Intelligence Applications on Wall Street, pp.262-273.

[12]

Kingdom, J. and Feldman, K.(1995), Genetic algorithms for bankruptcy prediction, Search Space Research Report, No.01-95, Search Space Ltd. London.

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 34

Machine Learning Approaches for Solving Computationally Intensive Problems

__________________________________________________ [13]

Walker, R., Haaasdijk, E. and Gerrets, M. (1995), “Credit evaluation using a genetic algorithm”, In Coonatilake, S. and Treleaven, P. (Eds.), Intelligent Systems for Financial and Business, Chichester, John Wiley, pp.35-59.

[14]

Rumelhart, D., Hinton, G., and Williams, G. (1986), “Learning Internal representations by error propagation”, in Rumelhart, D., and McCleland, J. (Ed.): “Parallel Distributed Processing”, vol. 1, MIT Press.

[15]

Broomhead, D.S. and Lowe, D. (1988), “Multivariable functional interpolation and adaptive networks”, Complex System, 2, pp.321-355.

Author Partha Sarathi Mishra is a Lecturer and a Ph.D Student of North Orissa University. He received his M. Tech degree in Computer Science from Fakir Mohan University, Odisha. His research interests include High Ordered Neural Networks, Bio-inspired algorithms and OOPs and its Applications.He has already published manynumbers of research papers in reputed journals and refereed conferences, and has published a book for undergraduate and postgraduate students. He may be contacted in [email protected].

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 35

A back off analysis to Mitigate the Correlated Contention for Event-Driven Wireless Sensor Networks

__________________________________________________

A back off analysis to Mitigate the Correlated Contention for Event-Driven Wireless Sensor Networks Pushpa Mamoria1, Rajeev Shakya2, Hima Bindu Maringanti3 1

Department of Computer Applications, U.I.E.T., C.S.J.M. University, Kanpur, India, Department of Electrical Engineering, Indian Institute of Technology, Kanpur, India, 3 Department of Computer Science and Application, North Orissa University, Orissa, India, 2

Email : [email protected], [email protected], [email protected]

Abstract: Sensor nodes may suffer heavy collisions due to spatially correlated traffic when an event is detected by many nodes of same neighbor in event-driven applications. This paper gives an analysis using basic CSMA/CA scheme commonly used in most of medium access control protocols for wireless sensor networks. An enhancement of Binary Exponential Backoff (BEB) algorithm is proposed to handle correlated traffic. We show that proposed algorithm enhances the performance by minimizing the chance of collisions for event-driven wireless sensor networks. Keywords: CSMA/CA, IEEE 802.11, Backoff algorithm, performance analysis, Wireless Sensor Network.

1. INTRODUCTION The sensor networks are emerging area of mobile ad-hoc network that presents novel networking issues because of their different application requirements, limited resources capabilities and functionalities, small packet size, and dynamic multi-hop technologies. The Medium Access Control (MAC) is an important mechanism to share wireless communication channel by resolving the issues of collisions thus leading to successful transmission of packets. The primary function of a sensor network is to sample sensory information such as temperature, light etc., and report it to the base station or sink node. Due to dependency on sensor for generating data, the traffic characteristics are variable and correlated. Mostly they change little over long period of time and on other hand can be very intense for short period of time. This correlated traffic is characteristics of densely deployed wireless sensor network applications. For example, in room monitoring application where a fire in a room of a building triggers a number of nodes attached with temperature sensors

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 36

A back off analysis to Mitigate the Correlated Contention for Event-Driven Wireless Sensor Networks

__________________________________________________ to begin reporting a common event. These all nodes simultaneously become active and transmit packets. This behavior causes the nodes to operate in spatiallycorrelated contention where multiple nodes of same neighborhood sense a common event at same time [4]. In such behavior, when traffic is rare a sensor node does not need to wait a long time before transmitting a report. When channel is sensed idle for each slot time, backoff timer must decrease fast (i.e. exponentially) in order to reduce the free idle slots. While in the case of bursty traffic, backoff timer must decrease linearly to become friendly during congestion period. In event-driven applications, latency requirement is bounded according to sensor network applications, for example, the events related to critical temperature in case of forest fire detection, must be reported at the sink within a few seconds. Tay et al. [4] have defined characteristics for event driven sensor network applications as follows. (i) An event-driven based sensor network encounters spatiallycorrelated contention, when an event occurs into a particular region/zone of monitored area. The multiple nodes of same neighborhood sense it at the same time and thus a synchronized burst of transmissions happens. (ii) In many applications, all the packets may not be treated as equally important. It could be enough if some nodes out of all are successful in transmitting their packets. (iii) The number of nodes getting activated by an event in a particular region changes with time. For example, when a target enters into a sensor field, the number of active sensing nodes could become large very quickly. In these types of traffic patterns, channel access delay and network lifetime are the performance limiting factor when reporting nodes are large in numbers. To mitigate the impact of spatially-correlated contention among the active nodes on the network performance, particularly observed into event-driven based sensor applications [4], we proposed a new adaptive and predictable algorithm, called adaptive random backoff. We use basic access mechanism IEEE 802.11 DCF without RTS/CTS scheme to analyze proposed random backoff algorithm for event-driven sensor network applications. Since the packets are typically small in size for WSNs, use of RTS/CTS is not preferred. Hence, collision problems will be more severe without RTS/CTS in event area. An extension of IEEE 802.11 MAC protocol is proposed to solve contention problem of event reporting nodes in eventdriven wireless sensor networks applications.





In IEEE 802.11, backoff time is chosen randomly in the range of 0, cw . The node waits for the chosen random number of vacant slots before transmitting. If two nodes contend to access the channel at the same time, and both find free channel for DIFS time, both transmit resulting in collision. After collision, each

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 37

A back off analysis to Mitigate the Correlated Contention for Event-Driven Wireless Sensor Networks

__________________________________________________ colliding node choose random number of vacant slots to wait for from [0, CW) range. Here CW for a node is

2  w  where i i

0

th

collisions have been suffered

for the packet by the node. The node which has backoff time corresponding to lower slot number gets the chance to access the channel and node with higher slot number waits for other node to finish before starting its transmission. Backoff time is random variable and can be a large value depending on range from which random number is selected.

2. RELATED WORK In WSN, the research on MAC protocol design has been focused mainly on energy-latency trade-offs. SMAC [5] is designed to save the energy by using listen and sleep periodically with collision avoidance facilities of IEEE 802.11 standard. S-MAC uses synchronization mechanism to form virtual clusters of sleep/wakeup schedule to avoid overhearing problem. Many variants of S-MAC have been proposed to further decrease the energy consumption. These are D-MAC [6], TMAC [7] etc. These all variants deal with major source of energy wastage such as idle-listening, overhearing and collisions problems. For event-driven sensor network applications, very little work has been published in literature [1, 8, 9, 10]. The CC-MAC proposed in [8] uses spatial correlation into consideration for removing redundant data in WSNs. In CC-MAC, Iterative Node Selection (INS) algorithm was proposed to calculate correlation radius based on correlation model. Only one node is allowed to transmit event information within a correlation radius in order to suppress the redundant transmissions from nodes of same correlation radius. In first contention phase, all nodes within correlation radius contend for channel access. Only node winning the contention is selected as representative node for that region denoted by correlation radius while other nodes turn to sleep. SIFT MAC [4] is another protocol designed for event-driven WSN. The objective of SIFT MAC protocol is to minimize the latency when spatially correlated contention occurs in a monitored area. Jamieson et. al [4] had argued that only R nodes out of N nodes that report to a common event are sufficient to be successful to transmit the event information to the sink node. SIFT MAC uses nonuniform geometric distribution to choose slot number for picking up a slot for transmission within fixed-size contention window (32 slots). Recently, using alternative correlation model, a generic model has been developed to capture the spatial correlation for Omni-directional sensor networks [9, 10].

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 38

A back off analysis to Mitigate the Correlated Contention for Event-Driven Wireless Sensor Networks

__________________________________________________ 3. DETAILS OF ADAPTIVE RANDOM BACKOFF ALGORITHM Since each event triggers a large number of nodes for sensing and transmission, it is prudent to have an initial window say Wavg being estimated based on past





experience. Thus chances of collision are reduced to a greater extent. Further CW can be adapted when collisions happen. When a node‟s transmission collides

i th times CW can be increased to 2i W  Wavg .This provides for desired adaptability. Here W is increment in CW after first collision. The CW when packet is successfully transmitted can also be used to update Wavg for use in future transmissions. In this section, we present adaptive random backoff algorithm that is based on minor modifications of Binary Exponential Backoff algorithm. 3.1. Assumptions and Model Used In the event-driven WSNs, we assume that network consisting of N sensor nodes, is deployed with a sink node one hop away from sensor nodes. Based on Bianchi‟s model [2], we analyze the performance analysis of our modified protocol in order to optimize the channel access especially for event-driven wireless sensor networks. We are interested in specific area inside sensor field where an event is generated by means of some activities for example detection of an intrusion or sudden change of temperature in case of fire etc. When an event occurs significant number of sensor nodes closer to it get activated and each triggered node has a packet to transmit to sink node independently using MAC. Following [2,3], for a th

given node, let w(t) the stochastic process that represents i backoff stage in the range i=0,1,2,…..i = L where L is maximum retransmission limit (retry limit) and b(t) be the stochastic process that represents kth backoff counter in the range





k=0,1,2,…wi - 1 where wi is given by 2  wmin . Therefore, the system can be modeled as a Markov chain model {w(t), b(t)} representing the state of each node {i.k} . This model has assumption that each packet collides with constant and independent probability 5Ø]Ü„u“ during each transmission attempt irrespective of number of retransmission suffered, it has been shown in [2, 3] that: i

(1) The probability 5Ø]Ü\“J“ can be expressed as

p  1  (1   ) N 1 (2) The probability

(1)



can be expressed as

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 39

A back off analysis to Mitigate the Correlated Contention for Event-Driven Wireless Sensor Networks

__________________________________________________ m 1  p L 1 (2)    bi ,0  b0,0 1 p i 0 Where b0,0 is given by Eq.(3).

b0,0 

1 k  wi 1 wi  i 1 1  i0  1  p k 1 w  p i  

(3)

L

According to Bianchi‟s model [2], the  is known as attempt probability. Bianchi has used a two dimensional model to obtain the expression with no retransmission limit. Later, the limit for maximum retransmission was addressed in [3]. The Eq.(1) and Eq.(2)form a system of two non-linear equations that has unique solution and can be solved numerically for values of p and  . 3.2. Illustration of Adaptive Random Backoff Algorithm Illustration of ARB operates as follows (as shown in Algorithm 1): when a node has a data packet to transmit a cwi is selected randomly from [0, cwmin] similar to BEB. Upon a successful data transmission, if cwi is less then cwth (cwth is used to reset the lower bound of CW.), a lower bound of CW for the next cwi+1 selection will assigned as

i CWlb    CWavg , where

i i 1 CWavg  2  cwi  CWavg . In case, the cwi is equal to 0, CWlb is set to default

value (here. the default value of CWlb is set to 1.). Otherwise, CWlb will be reset to zero. Therefore, the node will set the value of CWlb+1 from the range CWlb ,CWmin for next transmission. If there is a failed transmission, the CW is





doubled

CW

lb

and

the

 1, min (2

i min

backoff

value,

 n f ,2

i max



cw

is

selected

from

the

range

)  1 where nf is the number of a failed

transmission. The CW keeps increasing until reaches the

CWmax .

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 40

A back off analysis to Mitigate the Correlated Contention for Event-Driven Wireless Sensor Networks

__________________________________________________

In this way, the weighted average value of CW is computed dynamically in each unsuccessful transmission attempt and kept unchanged upon successful transmission attempt to use for future transmission. In addition, the collision ratio is higher on detection of an event (explained in previous section). After each successful transmission, BEB algorithm does not consider current status of backlogged nodes indicated by current larger value of CW, it decrements the CW to CWmin immediately.

4. PERFORMANCE EVALUATION We use analytical model given in [2] to calculate the Normalized System Throughput, the Frame Dropping Probability and the Expected Channel Access Delay for comperasion with Binary Exponential Backoff (BEB) algorithm. The analytical results were produced by solving non-linear equations of model using MATLAB, then simulations were conducted using ns-2 network simulator to validate the analytical results. All the parameters used in the simulations can be found in [2] for DSSS. This paper uses simulation model with similar assumptions as those in analytical model [2]. Lucent‟s WaveLAN parameters are used for radio model with 250 transmission range and 1 Mb/s channel capacity. It is assumed that an event is generated for duration of 5 seconds.

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 41

A back off analysis to Mitigate the Correlated Contention for Event-Driven Wireless Sensor Networks

__________________________________________________ Fig. 1(a), Fig. 1(b) and Fig. 2 show normalized system throughput, Channel access delay and Framedropping probability for BEB model and proposed ARB model over the number of active source nodes respectively for an event. As shown in Figures, ARB model has better throughput and channel access delay than that of BEB model of IEEE 802.11 DCF. However, throughput doest not improve much but latency performance has significant improvement because first reports of each reporting node would be transmitted successfully by dynamic adjustment of CW based on past experience. Hence ARB is much suitable for event-driven scenario where events must be reported within a short period of time. In addition, It is clearly shown that ARB model with initial contention window 5ØGÜ5Ø6Ü5ØJÜ and halving contention window with probability f after each successful transmission, contributes to further improvement in the performance.

Fig.1. (a) Normalized System Throughput, (b) Channel Access Delay

Fig.2. Frame-dropping Probability

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 42

A back off analysis to Mitigate the Correlated Contention for Event-Driven Wireless Sensor Networks

__________________________________________________ Table-1 shows the simulation results using NS-2 versus numerical results using model for proposed ARB algorithm, where the relative error is calculated by

Simulation Re sult  Analytical Re sult . As shown in the table, both the Analytical Re sult

results are much satisfactory with minimum errors for normalized system throughput (NST) and channel access delays (CAD).

5. CONCLUSION In event-driven WSNs, ARB algorithm performs well by adaptive CW updates when active reporting nodes start for channel access. When a packet is successfully transmitted ARB resets the CW to half of current value according to probability of packet collisions. Using Bianchi‟s model, we analyze the performance of ARB algorithm and numerical results are validated with simulation results over NS-2. Our results and analysis show that ARB achieves low latency with enough number of successful transmissions which are sufficient for the event reporting.

REFERENCES [1]

Shakya, K.Rajeev, Singh N.Yatindra & Verma K.Nishchal, (2012) “A Novel Spatial Correlation Model for Wireless Sensor Network Applications”, in proc. of IEEE WOCN 2012 Indore, India, pp.1-6.

[2]

Bianchi, B., (2000) “Performance analysis of the IEEE802.11 distributed coordination function”, IEEE J.Sel.Area Commun., vol. 18, no.3, pp.535547.

[3]

Wu H., Peng Y., Long K., Cheng S., & Ma J., (2002) “Performance of reliable transport protocol over IEEE 802.11 wireless LAN: analysis and enhancement”, In Proc. of the IEEE Information Communications (INFOCOM ‟02), vol. 2, pp. 599-607.

[4]

Jamieson K., Balakrishnan H. and Tay Y. C., (2006) “Sift: a MAC protocol for event-driven wireless sensor networks,” in Proc. of the 3rd European Workshop on Wireless Sensor Networks (EWSN ‟06), pp.260-275.

[5]

Heidemann W. Ye. J., and Estrin D., (2000) “An energy-efficient MAC protocol for wireless sensor networks,” in Proc. of the Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM), vol. 3, pp. 1567-1576.

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 43

A back off analysis to Mitigate the Correlated Contention for Event-Driven Wireless Sensor Networks

__________________________________________________ [6]

Krishnamachari G. Lu.B., and Raghavendra C., (2004) “An adaptive energy-efficient and low-latency MAC for data gathering in sensor networks,” in Proc. IPDPS.

[7]

Dam T. Van, and Langendoen K., (2003) “An Adaptive Energy Efficient MAC Protocol for Wireless Sensor Networks”, in Proc. of ACM SynSys „03, pp.5-7.

[8]

Vuran M C and Akyildiz I. F., (2006) “Spatial Correlation based Collaborative Medium Access Control in Wireless Sensor Networks”, in IEEE/ACM Trans. On Networking, vol.14,no.2,pp 316-329.

[9]

Shakya, K.Rajeev, Singh N.Yatindra & Verma K.Nishchal, (2013) “Generic Correlation Model for Wireless Sensor Network Applications”, IEEE/IET Wireless Sensor Systems Journal, Vol. 3, No. 4, pp266-276.

[10]

Shakya, K.Rajeev, Singh N.Yatindra & Verma K.Nishchal, (2012) “A Correlation Model for MAC protocols in Event-driven Wireless Sensor Networks”, in proc. of IEEE TENCON 2012 Cebu, Philippines, pp.1-6.

[11]

P Mamoria, “Energy controlled event reporting in event-driven sensor networks”, International journel of Engineering science and technology, ISSN: 0975-5462, Vol 4 No 9, September 2012.



__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 44

Hybrid Approach for Odia Handwritten Character Recognition

__________________________________________________

Hybrid Approach for Odia Handwritten Character Recognition Ambarish Parida1, Hima Bindu Maringanti2 12

Department Of Computer Application, North Orissa University, Baripada, 757003, Email : 1 [email protected] , 2 [email protected]

Abstract: The history and classification of different scripts has been instrumental in understanding the evolution of script and as such the civilization. The different phases of evolution of script have been critically examined for the Devanagari Script, which has been instrumental to the development of other scripts in the Indian subcontinent. Especially, the key to the evolution of Odia Script is the process of transformation of Devanagari Script. The same can be further examined with the help of Optical Character Recognition (OCR) & Natural Language Processing (NLP) techniques, automating the study of the various phases of ODIA Script Evolution. A prior step to the transformation process is the recognition of Odia text, to understand how it is different from Devanagari Script. To achieve this objective, a hybrid approach of image processing and soft computing, with a chosen set of heuristic parameters for features has been proposed. The Odia handwritten characters have a typical shape and structure, which is analyzed and using segmentation technique, identified by giving each sub-structure shape, a special name, falling into either of the two categories- line or reservoir(Figure 3). Keywords: OCR, NLP, Odia Script, Devanagari Script

1. INTRODUCTION At the beginning of civilization there was no script. The knowledge was transferred from generation to generation through rote learning of, the ancestral knowledge. It was known as Shruti. Later on, the script was developed to represent our expressions. Presently, script is used for handwriting system in a standard method of information storage and transfer for communication of thought and ideas in a language by visually encoding and decoding i.e. writing and reading. Odia script in stone engravings, copper plates, and palm-leaf manuscripts shows its antiquity. It has been a carrier of vibrant literature, a medium of

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 45

Hybrid Approach for Odia Handwritten Character Recognition

__________________________________________________ instruction and a means of communication through the centuries. Modern Odia script, like Devanagari script is a descendant of Brahmi script. But unlike Devanagari, the characters have got a circular look, possibly under influence of Dravidian writing system and devoid of horizontal lines, (maybe) to be drawn on palm-leaves used as writing material in the earlier times. The modern Odia script consists of simple and complex characters, derived from basic characters of Odia alphabet (first 11 are vowels and rest are consonants) shown in (Figure 1). Segmentation of handwritten text into lines, words and characters is one of the important steps in the handwritten script recognition process. The task of individual text-line segmentation from unconstrained handwritten documents is complex because the characters of two consecutive text lines may touch or overlap. These overlapping or “touching” characters complicate the line-segmentation task. Many techniques like global and partial projection analysis, techniques based on statistical modeling etc. exist. In Odia script (as in some other Indian languages), a vowel following a consonant takes a modified shape, which, depending on the vowel, is placed at the left, right (or both) or bottom of the consonant. These are called modified characters. A consonant or vowel following a consonant sometimes takes a compound orthographic shape, which we call compound character. Such characters can be combinations of consonant and consonant, as well as consonant and vowel. A text line may be partitioned into three zones. The upper-zone denotes the portion above the mean line, the middle zone covers the portion of basic (and compound) characters below the mean line and the lower-zone is the portion below the baseline. An imaginary line, where most of the uppermost (lowermost) points of characters of a text line lie, is referred as mean line (baseline). The baseline partitions the text line into three zones.

2. STATE-OF-THE-ART Paper[1] explores the complex ways in which writing systems relate to the language they depict. Script Writing, as Coulmas contends, is not only the guide or garment of spoken language, but has a deep and lasting effect on the development of language itself. By studying the Egyptian hieroglyphics and the cuneiform system of the ancient time the author describes the

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 46

Hybrid Approach for Odia Handwritten Character Recognition

__________________________________________________ Chinese writing. He also discussed why an apparently cumbersome system has been used continuously for more than 3,000 years; the ranges across the writing systems of western Asia and the Middle East. The Indian families and the various alphabetic traditions which have their origin in the multifarious world of Semitic writing, came to full bloom in pre-classical Greece. Work [2] discusses how the alphabet was born from Hieroglyphs. An account of the origin of the alphabet from Egyptian Hieroglyphics to the first sound, pictures, Hebrew, and now all languages, is given by the author. Book [3] explains in great detail what the Chinese language and its ancient writing system is all about. The World's Writing Systems [4] systematically explores most of the world's writing systems from the earliest times onwards. Aryan Alphabets [5] gives an account of the origin and development of letters. The writing of the volume was undertaken with the intention of compiling a brief account of recent discoveries as to the origin of the alphabet, and its subsequent developments.

3. CONTRIBUTORY WORK The existing Odia character Set is as mentioned below:

Figure 1. Odia Character Set A new model of Odia handwritten character recognition has been developed, in which a hybrid approach of image processing and soft computing, with a chosen set of heuristic parameters for features has been proposed, as shown in (Figure 2).

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 47

Hybrid Approach for Odia Handwritten Character Recognition

__________________________________________________

Figure 2: Odia Character Recognition System The algorithmic steps to be followed for the objective to be achieved using the hybrid technique of image processing and machine learning is presented in the next section.

3.1 Algorithm 1. Start 2. Input Odia Handwritten sample 3. Apply Pre-processing Steps (i) Resize with respect to standard pixel size (np*np) (ii) Convert the above character image into gray scale (iii) Noise removal by using back ground subtraction (iv) Rotation and skewing 4. Zoning: Divide the Odia hand written sample into different zones by drawing horizontal and vertical grid lines 5. Feature extraction and segmentation: Each cell, formed out of the intersection of the horizontal and vertical grid lines, which is the

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 48

Hybrid Approach for Odia Handwritten Character Recognition

__________________________________________________ ROI(region of interest) containing one character of Odia is now segmented into various units like (i) Lines: rlt, rlnt, rsls, rslb,lsl, hl, stlt, slnt (ii) Reservoirs: uprb, uprs, lrb, lrs, rnd 6. Generation of feature vector [{L},{R}] 7. Training of the ANN 8. Odia Character Recognition 9. Stop Odia handwritten text is taken as an input then it is pre-processed. Preprocessing includes resizing to a standard [n*n] pixels, converting into gray-scale, noise removal by background subtraction and rotating and skewing the character to a standard straight line alignment/orientation. Zoning is done by dividing line of text in four parts as shown in (Figure 3) - Upper line, Mean line, Base line and Lower line.

Figure 3: Different Zones Of An Odia Text Line In segmentation phase, where various lines and reservoir features are identified. The features thus extracted, form a feature vector [L,R]=[{rlt,rlnt,rsls,rslb,lsl,lslb…},{uprb,uprs,uprbd,lrb,lrs,lrd,rnd,rnds,rnd b,uprnds,uprlts,srnds...}] rlt- Right line touching rlnt- Right line not touching rsls- Right slanting line (small) rslb- Right slanting line (big) uprb-Upper reservoir (big) lsl- Left slanting line lslb- Left slanting line (big) uprb-Upper reservoir (big) uprs-Upper reservoir (small)

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 49

Hybrid Approach for Odia Handwritten Character Recognition

__________________________________________________ uprbd-Upper reservoir big distorted lrb- Lower reservoirs (big) lrs-Lower reservoirs (small) lrd- Lower reservoirs distorted rnds- Round (small) rndb-Round (big) uprnds-Upper round (small) uprlts-Upper right line touching (small) srnds-Slanting round (small) The feature vector is then fed to a Neural Network (NN) for training purposes, to learn the structural characteristics in order to identify the Odia character. The converged output of the NN shall thus be mapped onto a single Odia handwritten character.

4. CONCLUSION The present proposal is a straight forward and simple procedure compared to the cropping algorithm given in (Debananda Padhi 2012). The Proposed algorithm is a hybridized approach combining basic and routine image processing steps with machine learning technique, involving a set of heuristic parameters for features. The structural features based on shapes (both line and reservoir) from basic Image Processing is combined with the Artificial NN method for learning the character’s features.

ACKNOWLEDGEMENTS Authors would like to thank Ms. Meenakhi Sahu, Ms. Gayatri Dey and Mr. Biswamitra Rana for their consistent help and encouragement. REFERENCES [1] [2]

[3] [4]

Coulmas and Florian 1996. The Blackwell Encyclopaedia of Writing Systems. Oxford: Blackwell Publishers Ltd. Goldwasser and Orly Mar/Apr 2010. How the Alphabet Was Born from Hieroglyphs. Biblical Archaeology Review (Washington, DC: Biblical Archaeology Society) 36 (1). DeFrancis and John. 1990. The Chinese Language: Fact and Fantasy. Honolulu: University of Hawaii Press. Daniels, Peter T., and William Bright 1996. The World's Writing Systems. Oxford University Press.

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 50

Hybrid Approach for Odia Handwritten Character Recognition

__________________________________________________ [5] [6]

[7] [8]

[9]

Isaac Taylor 2003. History of the Alphabet: Aryan Alphabet., Part 2, Kessinger Publishing. Debananda Padhi 2012. Novel Hybrid approach for Odia Handwritten Character Recognition System. volume 2. International Journal of Advanced Research in Computer Science and Software Engineering, India. Millard, A. R. 1986. The Infancy of the Alphabet. World Archaeology 17 (3): 390–398, doi:10.1080/00438243.1986.9979978. S.Srinivasan, J.V.M.Joseph and P. Harikumar 2012. Indus script deciphered: the method of semblance at work. Current Science, Vol .103. No 3, Pg 268-281. Clyde Winters 2012. Dravidian is the Language of the Indus writing. Current Science, Vol .103. No 10, Pg 1220-1225.

Authors

Ambarish Parida, M.Tech Computer Science, Teaching the MCA students of North Orissa University.

Hima Bindu Maringanti, Ph.D in Artificial Intelligence from IIIT, Allahabad, INDIA. She is currently working as a Professor and Head of the Department at North Orissa University, where she teaches among other fields Theory of Computation, Information Retrieval and AI. She is involved as a reviewer of various International Conferences and Journals, books published by reputed publishers like Pearson and McGrawhill. She has a total 20 years of experience and her favorite areas of interest are AI, NLP, Cognitive Science, Computational Neuroscience and Medical Imaging.



__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 51

Santali Morphological Analysis

__________________________________________________

Santali Morphological Analysis Chitralekha Marandi1, Hima Bindu Maringanti2 1

Department of Computer Science & Applications, North Orissa University, Baripada, Odisha, India [email protected] 2 Department of Computer Science & Applications, North Orissa University, Baripada, Odisha, India [email protected]

Abstract:

Natural Language Processing holds great promise for making computer interfaces that are easier to use for people, since people will (hopefully) be able to learn from the computer in their own language. Santali is one of the languages in the World, native to countries like India, Bangladesh, Nepal and Bhutan. The native speakers in India live in states like Jharkhand, Assam, Bihar, Odisha, Tripura and Bengal. Many people speak and write in Santali Language. But there is no research in Santali Language Processing. Our work would concentrate on Santali Morphological Analysis: A Literature Survey, providing the methodologies and tools to enhance our ideas towards the development of Santali Morphological Analyzer (SMA) that will return all the morphemes and their grammatical categories such as its Part of Speech and various forms of the root word. It consists of various paradigms under which nouns, adjectives and finite verbs of Santali are classified. The main objective of the work would be to make this beautiful language more widely acclaimed..

Keywords: Santali, Morphology, Natural Language Processing, Lexical Analysis, Morphological Analysis.

1. INTRODUCTION Now days Natural Language Processing (NLP) is most interest areas for researchers. Natural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages. Many language/ linguistics researchers have started their work in area for different languages such as Hindi, Bengali, Odia], etc[1][2][3][4]. Santali is one of the languages in the World, native to countries like India, Bangladesh, Nepal and Bhutan. The native speakers in India live in states like Jharkhand, Assam, Bihar, Odisha, Tripura and Bengal. Many people speak and write in Santali Language. It is a language in the Munda subfamily of Austro-

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 52

Santali Morphological Analysis

__________________________________________________ Asiatic, related to Ho and Mundari. It is spoken by around 6.2 million people in the world [5][6]. This language is a suffixing agglutinating language. It has own its features. The invention of Ol Chiki script during 1930s by Pandit Raghunath Murmu has created a new era for Santali language and its literatures [7]. Free true type Ol Chiki fonts have been developed by Prof. R. C. Hansdah and the members of Wesanthals e-group in 2002-2005, they have already suggested a draft keyboard mapping for Ol Chiki script. Presently, all free Ol Chiki fonts use this mapping. Still there is no work in Santali language. So in this work we have done a literature survey on morphological analysis and designed a morphological analyzer for Santali language. Here we have used different phases of NLP such as Lexical Analysis, Morphological Analysis to generate Root words, Part-Of-Speech, Suffix and Synonyms of words in Santali Text.

2. STATE-OF-THE-ART The paper[5] deals with analysis of Santali language and provides a versatile framework for translating Santali into English and vice versa and helps in writing good technical articles in Santali. The paper[3] deals with the Analysis and Design of Odia Morphological Analyzer (OMA). OMA system is designed according to Object Oriented Analysis to increase its reusability, robustness and extensibility. Here some examples the OMA with OriNet:the WordNet for Indian Language has been reflected. Here the development has been worked out on the basis of the syntactic approach of Sanskrit for which it is hoped that the technology involved can be extended to any other Indian language. The paper[4] describes an Object Oriented model for OriNet System, uses Object Oriented programming and programming principles that Java offers to represent and retrieve Odia language related information. The paper[1] presents Hindi derivational morphological analyzer which gives us derivational morphological analysis of Hindi. The paper[2] describes the development of an open-source morphological analyser for Bengali Language using finite state technology. It discuss the challenges of creating a morphological analyser for a highly inflectional language like Bengali and then propose a solution to that using lttoolbox, an open-source finite-state toolkit.

3. METHODOLOGY The different phases that can be used for Morphological Analysis in Santali are:

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 53

Santali Morphological Analysis

__________________________________________________ 3.1. Tokenization Tokenization is the process divides a stream of text into words and removes punctuation mark. Tokenization does this task by locating word boundaries. Word boundaries mean ending point of a word and beginning of the next word. Here the outputs are called terms or tokens.

3.2. Stemming Stemming is the process for reducing inflected words to their stem, base or root form. Tokens are input into the stemming and give the output as stem or root word. Here affix are separated from the tokens. An affix is a bund morpheme that is joined before, after, or within a root or stem.

3.3. Tagging Tagging (POS), In Natural Language Processing, part-of-speech tagging (POS tagging or POST), also called grammatical tagging or word-category disambiguation. it is the process of assigning a part of speech such as a noun, verb, pronoun, preposition, adverb and adjective to each word in a sentence.

The Functional Block Diagram of Morphological Analysis is shown in Figure 1. Here I have taken Santali text as input. Santali text are input to the Tokenization phase where I have use the methods of StringTokenizer class to tokenize the Santali text and to remove the punctuation marks. These methods are : •

String nextToken(String delimiters)



String nextToken()



String has MoreTokens()



String countTokens()

In the method of stop word elimination I have used the algorithm-1 to identify and to remove the stop words such as ar, ikHan, hoN , arhoN etc.

__________________________________________________ National Conference on Current Trends in Computing (NCCTC) ISBN No. : 978-3-642-24819-6, 23rd -24th March, 2014 54

Santali Morphological Analysis

__________________________________________________

Figure 1. Functional Block Diagram of Morphological Analysis Algorithm-1 STOP(TOKENLIST,STOPLIST,TEMP,T,S,TEMP,PURETOK,COUNTPURE) [Here TOKENLIST is an array that contains T numbers of Santali text tokens and STOPLIST is an array that contains S numbers of stop words. TEMP is the counter. This is the algorithm that identifies and removes the stop words from TOKENLIST and stores these pure tokens without stop words in an array PURETOK and counts COUNTPURE the number of pure tokens.] 1.

[Initialize Counter]

Set TEMP:=0, COUNTPURE:=0. 2. [Compaire the text tokens of TOKENLIST with the stop words of stop word list STOPLIST] Repeat while (i