Commonsense Scene Semantics for Cognitive Robotics - arXiv

38 downloads 150 Views 9MB Size Report
Sep 15, 2017 - “Person1 reaches for the bread, picks up a slice of bread, and moves the hand .... Humanoid Robots, Atlanta, Georgia, USA, October 15–17. 2013. [3] B. Bennett ... Commonsense World. Ablex Publishing Corporation, Nor-.
Commonsense Scene Semantics for Cognitive Robotics Towards Grounding Embodied Visuo-Locomotive Interactions

Jakob Suchan1 , and Mehul Bhatt1,2

arXiv:1709.05293v1 [cs.RO] 15 Sep 2017

1

Spatial Reasoning. www.spatial-reasoning.com

EASE CRC: Everyday Activity Science and Engineering., http://ease-crc.org University of Bremen, Germany 2

Machine Perception and Interaction Lab., https://mpi.aass.oru.se Centre for Applied Autonomous Sensor Systems (AASS) ¨ Orebro University, Sweden

Abstract We present a commonsense, qualitative model for the semantic grounding of embodied visuo-spatial and locomotive interactions. The key contribution is an integrative methodology combining low-level visual processing with high-level, human-centred representations of space and motion rooted in artificial intelligence. We demonstrate practical applicability with examples involving object interactions, and indoor movement.

1. Introduction Practical robotic technologies and autonomous systems in real-world settings are confronted with a range of situational and context-dependent challenges from the viewpoints of perception & sensemaking, high-level planning & decision-making, human-machine interaction etc. Very many research communities and sub-fields thereof addressing the range of challenges from different perpecsectives have flourished in the recent years: computer vision, artificial intelligence, cognitive systems, human-machine interaction, cognitive science, multi-agent systems, control & systems engineering to name a few. Driven by the need to achieve contextualised practical deployability in real-world non-mundane everyday situations involving living beings, there is now a clearly recognised need for integrative research that combines state of the art methods from these respective research areas. Towards this, the research presented in this paper addresses commonsense visuo-spatial scene interpretation in indoor robotics settings at the interface of vision, AI, and spatial cognition. The focus of this research is on activities of everyday living involving people, robots, movement, and human-machine interaction.

Interpreting Embodied Interaction: On Grounded Visuo-Locomotive Perception Visuo-locomotive perception denotes the capability to develop a conceptual mental model (e.g., consisting of abstract, commonsense representations) emanating from multi-sensory perceptions during embodied interactions and movement in a real world populated by static and dynamic entities and artefacts (e.g., moving objects, furniture). Visuo-locomotive perception in the context of cognitive robotics technologies and machine perception & interaction systems involves a complex interplay of high-level cognitive processes. These could, for instance, encompass capabilities such as explainable reasoning, learning, concept formation, sensory-motor control; from a technical standpoint of AI technologies, this requires the mediation of commonsense formalisms for reasoning about space, events, actions, change, and interaction [5]. With visuo-locomotive cognition as the context, consider the task of semantic interpretation of multi-modal perceptual data (e.g., about human behaviour, the environment and its affordances), with objectives ranging from knowledge acquisition and data analyses to hypothesis formation, structured relational learning, learning by demonstration etc. Our research focusses on the processing and semantic interpretation of dynamic visuo-spatial imagery with a particular emphasis on the ability to abstract, reason, and learn commonsense knowledge that is semantically founded in qualitative spatial, temporal, and spatio-temporal relations and motion patterns. We propose that an ontological characterisation of human-activities — e.g., encompassing (embodied) spatio-temporal relations— serves as a bridge between high-level conceptual categories (e.g., pertaining to human-object interactions) on the one-hand, and low-level / quantitative sensory-motor data on the other. I

Figure 1: A Sample Activity – “Making a cup of tea” (egocentric view from a head-mounted RGB-D capture device) Commonsense Scene Semantics: Integrated Vision and Knowledge Representation The starting point of the work presented in this paper is in formal commonsense representation and reasoning techniques developed in the field of Artificial Intelligence. Here, we address the key question: How can everyday embodied activities (involving interaction and movement) be formally represented in terms of spatiotemporal relations and movement patterns (augmented by context-dependent knowledge about objects and environments) such that the representation enables robotic agents to execute everyday interaction tasks (involving manipulation and movement) appropriately?

We particularly focus on an ontological and formal characterisation of space and motion from a human-centered, commonsense formal modeling and computational reasoning viewpoint, i.e., space-time, as it is interpreted within the AI subdisciplines of knowledge representation and reasoning, and commonsense reasoning, and within spatial cognition & computation, and more broadly, within spatial information theory [1, 5, 6, 8, 9, 24]. We build on state of the art methods for visual processing of RGB-D and point-cloud data for sensing the environment and the people within. In focus are 3D-SLAM data for extracting floor-plan structure based on plane detection in point-clouds, and people detection and skeleton tracking using Microsoft Kinect v2. Furthermore, we combine robot self-localisation and people tracking to localise observed people interactions in the global space of the environmental map. I

We emphasise that the ontological and representational aspects of our research are strongly driven by computational considerations focussing on: (a). developing general methods and tools for commonsense reasoning about space and motion categorically from the viewpoint of commonsense cognitive robotics in general, but human-object interactions occurring in the context of everyday activities in particular; (b). founded on the established ontological model, developing models, algorithms and tools for reasoning about space and motion, and making them available as part of cognitive robotics platforms and architectures such as ROS. The running examples presented in the paper highlight the semantic question-answering capabilities that are directly possible based on our commonsense model directly in the context of constraint logic programming.

2. Commonsense, Space, Motion Commonsense spatio-temporal relations and patterns (e.g. left-of, touching, part-of, during, approaching, collision) offer a human-centered and cognitively adequate formalism for logic-based automated reasoning about embodied spatiotemporal interactions involved in everyday activities such as flipping a pancake, grasping a cup, or opening a tea box [8, 26, 27, 30]. Consider Fig. 1, consisting of a sample human activity —“making a cup of tea”— as captured from an egocentric viewpoint with a head-mounted RGB-D capture device. From a commonsense viewpoint, the sequence of high-level steps typically involved in this activity, e.g., opening a tea-box, removing a tea-bag from the box and putting the tea-bag inside a tea-cup filled with water while holding the tea-cup, each qualitatively correspond to high-

S PATIAL D OMAIN (QS)

Formalisms

Spatial Relations (R)

Entities (E)

Mereotopology

RCC-5, RCC-8 [23]

disconnected (dc), external contact (ec), partial overlap (po), tangential proper part (tpp), non-tangential proper part (ntpp), proper part (pp), part of (p), discrete (dr), overlap (o), contact (c) proceeds, meets, overlaps, starts, during, finishes, equals left, right, collinear, front, back, on

arbitrary rectangles, circles, polygons, cuboids, spheres

Rectangle & Block algebra [16]

Orientation

LR [25]

facing towards, facing away, same direction, opposite direction adjacent, near, far, smaller, equi-sized, larger

OPRA [21]

Distance, Size

QDC [19]

Dynamics, Motion

Space-Time [17, 18]

Histories

moving: towards, away, parallel; growing / shrinking: vertically, horizontally; passing: in front, behind; splitting / merging; rotation: left, right, up, down, clockwise, couter-clockwise

axis-aligned rectangles and cuboids 2D point, circle, polygon with 2D line oriented points, 2D/3D vectors rectangles, circles, polygons, cuboids, spheres rectangles, circles, polygons, cuboids, spheres

Table 1: Commonsense Spatio-Temporal Relations for Abstracting Space and Motion in Everyday Human Interaction level spatial and temporal relationships between the agent and other involved objects. For instance, one may most easily identify relationships of contact and containment that hold across specific time-intervals. Here, parametrised manipulation or control actions (Θ1 (θ), ...Θn (θ)) effectuate state transitions, which may be qualitatively modelled as changes in topological relationships amongst involved domain entities. Embodied interactions, such as those involved in Fig. 1, may be grounded using a holistic model for the commonsense , qualitative representation of space, time, and motion (Table 1). In general, qualitative, multi-modal, multidomain1 representations of spatial, temporal, and spatiotemporal relations and patterns, and their mutual transitions can provide a mapping and mediating level between human-understandable natural language instructions and formal narrative semantics on the one hand [8, 13], and symbol grounding, quantitative trajectories, and lowlevel primitives for robot motion control on the other. By spatio-linguistically grounding complex sensory-motor trajectory data (e.g., from human-behaviour studies) to a formal framework of space and motion, generalized (activitybased) qualitative reasoning about dynamic scenes, spatial relations, and motion trajectories denoting single and multiobject path & motion predicates can be supported [14]. For instance, such predicates can be abstracted within a region based 4D space-time framework [3, 4, 18], object interactions [10, 11], and spatio-temporal narrative knowledge [12, 13, 29]. An adequate qualitative spatio-temporal representation can therefore connect with low-level constraintbased movement control systems of robots [2], and also help grounding symbolic descriptions of actions and objects to be manipulated (e.g., natural language instructions such as cooking recipes [28]) in the robots perception. 1 Multi-modal

in this context refers to more than one aspect of space, e.g., topology, orientation, direction, distance, shape. Multi-domain denotes a mixed domain ontology involving points, line-segments, polygons, and regions of space, time, and space-time [18]. Refer Table 1.

3. Visuo-Locomotive Interactions: A Commonsense Characterisation 3.1. Objects and Interactions in Space-Time Activities and interactions are described based on visuospatial domain-objects O = {o1 , o2 , ..., oi } representing the visual elements in the scene, e.g., people and objects. The Qualitative Spatio-Temporal Ontology (QS) is characterised by the basic spatial and temporal entities (E) that can be used as abstract representations of domainobjects and the relational spatio-temporal structure (R) that characterises the qualitative spatio-temporal relationships amongst the entities in (E). Towards this, domain-objects (O) are represented by their spatial and temporal properties, and abstracted using the following basic spatial entities: – points are triplets of reals x, y, z; – oriented-points consisting of a point p and a vector v; – line-segments consisting of two points p1 , p2 denoting the start and the end point of the line-segment; – poly-line consisting of a list of vertices (points) p1 , ..., pn , such that the line is connecting the vertices is non-selfintersecting; – polygon consisting of a list of vertices (points) p1 , ..., pn , (spatially ordered counter-clockwise) such that the boundary is non-self-intersecting;

and the temporal entities: – time-points are a real t – time-intervals are a pair of reals t1 , t2 , denoting the start and the end point of the interval.

The dynamics of human activities are represented by 4dimensional regions in space-time (sth) representing people and object dynamics by a set of spatial entities in time, i.e. ST H = (εt1 , εt2 , εt3 , ..., εtn ), where εt1 to εtn denotes the spatial primitive representing the object o at the time points t1 to tn .

stracted by the spatio-temporal characteristics of the in-

stracted bydomain-objects, the spatio-temporal of high-level the involved and characteristics may be used for involved domain-objects, and may be used for high-level terpretation and reasoning about scene dynamics. interpretation and reasoning about scene dynamics.

3.2. Spatio-Temporal Characteristics

3.2. Spatio-Temporal Characteristics of Human Activities of Human Activities The space-time histories (sth) used to abstract the dynamThe space-time histories (sth) used to abstract the dynamics of human activities are based on basic spatio-temporal ics of human activities are based on basic spatio-temporal entities obtained the sensed corresponding to entities obtained fromfrom the sensed data, data, corresponding to the declarative model of visuo-locomotive interactions. To the declarative model of visuo-locomotive interactions. To extract entities, we define functions the specific extract thesethese entities, we define functions for theforspecific spatio-temporal properties of domain-objects. I.e., spatio-temporal properties of domain-objects. I.e., the fol-the following functions are used for static properties. lowing functions are used for static spatialspatial properties. person(full_body, [upper_body, lower_body]). person(upper_body, [head, left_arm, right_arm, ...]). ... body_part(left_upper_arm, joint(shoulder_left), joint(elbow_left)). body_part(left_forearm, joint(elbow_left), joint(elbow_left)). ... joint(spine_base, joint(id(0)). joint(spine_mid, joint(id(1))). joint(neck, id(2)). joint(head, id(3)). ... joint(thumb_right, id(24)).

Figure Posture Figure 2: 2: Declarative DeclarativeModel ModelofofHuman-Body Human-Body Posture Declarative Model of Visuo-Locomotive Interactions

– position: ×R R, × R, gives 3D position – position: O⇥ O× T ! TR→ ⇥ R⇥ gives the 3Dthe position (x,y,z) of anofobject o at aotime-point t; (x,y,z) an object at a time-point t; – size: O⇥ O× T ! TR→ , gives the size an of object o at a o at a – size: R, gives the of size an object time-point t; t; time-point – distance: O⇥ O⇥ !T R,→ gives the distance between – distance: O× TO× R, gives the distance between twotwo objects o and o at a time-point t; 1 o and 2 o at a time-point t; objects 1

2

– angle: O⇥ O⇥ T ! R, gives the angle between two – angle: O× O× T → R, gives the angle between two objects o1 and o2 at a time-point t;

objects o1 and o2 at a time-point t;

To account for changes in the spatial properties of domainTo account for changes in the spatial properties of domainobjects we use the following functions for dynamic spatioobjects we use the following functions for dynamic spatiotemporal properties.

temporal properties.

– movement velocity: O⇥ T ⇥ T ! R, gives the – movement velocity: O× oTbetween × T →two R,timegives the amount of movement of an object amount of movement of an object o between two timepoints t1 and t2 ;

points t1 and t2 ;

– movement direction: O⇥ T ⇥ T ! R, gives the di– movement direction: O× oT between × T→ R , gives rection of movement of an object two time-the dipoints t1 and ; rection oft2movement of an object o between two time-

Declarative Model of Visuo-Locomotive Interactions Based on the qualitative spatio-temporal ontology (QS), points t1 and t2 ; Based on the qualitative spatio-temporal ontology (QS), – rotation: O⇥ T ⇥ T ! R, gives the rotation of an obhuman interactions in the environment are represented ushuman interactions in the are represented ject o between two and tthe 2 ; rotation of an ob– rotation: O× time-points T × T → R,t1gives ing a declarative model of environment visuo-locomotive interactions,using a declarative model of visuo-locomotive interactions, encompassing dynamics of humans, objects, and the enviject o between two time-points t1 and t2 ; encompassing dynamics of humans, objects, and the envi- These functions are used to obtain basic spatial entities, e.g. ronmental characteristics. points, lines, regions,are from theto sensor data. Spatio-temporal ronmental characteristics. These functions used obtain basic spatial entities, e.g. relationships (R) between the basic entities in Spatio-temporal E may be • Human Body Pose The human body is represented points, lines, regions, from the sensor data. characterised with respect to arbitrary spatial and spatiousing a declarative model of the body-structure (see • Human Body Pose The human body is represented relationships (R) between the basic entities in E may be temporal domains such as mereotopology, orientation, disFig. 2), within this model we ground the human bodyusing a declarative model of the body-structure (see characterised with respect to arbitrary spatial and spatiotance, size, motion, rotation (see Table 1 for a list of conpose in 3D-data of skeleton joints and body-parts obFig. 2), within this model we ground the human bodytemporal domains such as mereotopology, orientation, disspatio-temporal abstractions). E.g, let D1 , . . . , Dn tained RGB-D pose infrom 3D-data of sensing. skeleton joints and body-parts ob- sidered tance, size, motion, (seeofTable 1 for a list be spatial domains (e.g. rotation the domain axis-aligned rect-of contained from sensing. The semantic structure sidered spatio-temporal abstractions). E.g, let D • Semantics of RGB-D the Environment 1 , . . . , Dn angles). A spatial relation r of arity n (0 < n) is defined be spatial domains (e.g. the domain of axis-aligned rectof the environment is represented using a topological • Semantics of the Environment The semantic structure as: map corresponding to the floor-plan of the environangles). A spatial relation r of arity n (0 < n) is defined of theextracted environment using a topological ment from is3Drepresented point-clouds obtained from as: r ✓ D1 ⇥ · · · ⇥ Dn . map corresponding to the floor-plan of the environ3D-SLAM data. ment extracted from 3D point-clouds obtained from The spatio-temporal dynamics of the scene can then be repr ⊆ D × · · · × Dn . Using3D-SLAM these models, resented based on the relations1holding between the objects data.visuo-locomotive interactions, involving humans, robots, and objects can be declaratively abin the and the changes with inofthem. Thescene, spatio-temporal dynamics the scene can then be repUsing these models, visuo-locomotive interactions, involvresented based on the relations holding between the objects ing humans, robots, and objects can be declaratively abin the scene, and the changes with in them.

Space

Space

Space

Space

Time

Space

Time

Space

Time

curved(o)

parallel(o1 , o2 )

Space

Space

moving into(o1 , o2 )

splitting(o1 , o2 ) Space

Space

Time

Time

Time

Space

Space

Space

cyclic(o)

Space

merging(o1 , o2 )

Space

Time

Space

Time

Time

shrinking(o)

stationary(o)

Space

Space

Space

Time

moving(o)

Space

Time

growing(o)

Space

inside(o1 , o2 )

Space

Space

Space

Time

Time

overlapping(o1 , o2 )

Space

Space

Space

Space

Time

discrete(o1 , o2 )

Space

Space

moving out(o1 , o2 )

Time

attached(o1 , o2 )

Figure 3: Commonsense Spatial Reasoning with Space-Time Histories Representing Dynamics in Everyday Human Activities Interaction (Θ) pick up(P, O) put down(P, O) reach f or(P, O) pass over(P1 , P2 , O)

Description a person P picks up an object O. a person P puts down an object O. a person P is reaches for an object O. a person P1 passes an object O to another person P2 .

Interaction (Θ) moves into(P, F S) passes(P, F S)

Description a person P enters a floor-plan structure F S. a person P passes through a floor-plan structure F S.

Table 2: Sample Interactions Involved in Everyday Human Activities: Human Object Interactions and Peoples Locomotive Behaviour

objects in the scene, at a specific time; these are defined by the involved spatio-temporal dynamics in terms of changes in the status of space-time histories caused by the interaction, i.e. the description consists of (dynamic) spatio-temporal relations of the involved space-time histories, before, during and after the interaction (See Table 2 for exemplary interactions). We use occurs-at(θ, t), and occurs-in(θ, δ) to denote that an interaction θ occurred at a time point t or in an interval δ, e.g., a person reaching for an object can be defined as follows. occurs-in(reach for(oi , oj ), δ) ⊃ person(oi )∧ holds-in(approaching(body part(hand, oi ), oj ), δi )∧ holds-in(touches(body part(hand, oi ), oj ), δj )∧

Spatio-temporal fluents are used to describe properties of the world, i.e. the predicates holds-at(φ, t) and holds-in(φ, δ) denote that the fluent φ holds at time point t, resp. in time interval δ. Fluents are determined by the data from the depth sensing device and represent qualitative relations between domain-objects, i.e. spatio-temporal fluents denote, that a relation r ∈ R holds between basic spatial entities ε of a space-time history at a time-point t. Dynamics of the domain are represented based on changes in spatio-temporal fluents (see Fig. 3), e.g., two objects approaching each other can be defined as follows. holds-in(approaching(oi , oj ), δ) ⊃ during(ti , δ) ∧ during(tj , δ)∧ before(ti , tj ) ∧ (distance(oi , oj , ti ) > distance(oi , oj , tj )). (1)

Interactions. Interactions Θ = {θ1 , θ2 , ..., θi } describe processes that change the spatio-temporal configuration of

(2)

meets(δi , δj ) ∧ starts(δi , δ) ∧ ends(δj , δ).

These definitions can be used to represent and reason about people interactions involving people movement in the environment, as well as fine-grained activities based on body pose data.

4. Application: Grounding Visuo-Locomotive Interactions We demonstrate the above model for grounding everyday activities in perceptual data obtained from RGB-D sensing. The model has been implemented within (Prolog based) constraint logic programming based on formalisations of qualitative space in CLP(QS) [7].

Figure 4: RGB-D data of Human Activities with Corresponding Skeleton Data Human Activity Data RGB-D Data (video, depth, body skeleton): We collect data using Microsoft Kinect v2 which provides RGB and depth data. The RGB stream has a resolution of 1920x1080 pixel at 30 Hz and the depth sensor has a resolution of 512x424 pixels at 30 Hz. Skeleton tracking can track up to 6 persons with 25 joints for each person. Further we use the point-cloud data to detect objects on the table using tabletop object segmentation, based on plane detection to detect the tabletop and clustering of points above the table. For the purpose of this paper simple colour measures are used to distinguish the objects in the scene. 3D SLAM Data (3d maps, point-clouds, floor-plan structure): We collect 3D-SLAM data using Real-Time Appearance-Based Mapping (RTAB-Map) [20], which directly integrates with the Robot Operating System (ROS) [22] for self localisation and mapping under real-time constraints. In particular, for semantic grounding presented in this paper, we use the point-cloud data of the 3D maps obtained from RTAB-Map to extract floor-plan structures by, 1) detection of vertical planes as candidate wall-segments, 2) pre-processing of the wall-segments using clustering and line-fitting, and 3) extraction of room structures based on extracted wall-segments and lines. Plane Detection. Planes in the point-cloud data are detected using a region-growing approach based on the normals of points. To extract candidate wall-segments, we select planes, that are likely to be part of a wall, i.e., horizontal, and sufficiently high or connected to the ceiling. These planes are then abstracted as geometrical entities, specified

by their position, size, and orientation (given by the normal), which are used for further analysis. Clustering and Line-Fitting. The detected wall-segments are grouped in a two-stage clustering process using densitybased clustering (DBSCAN) [15], in the first step we cluster wall-segments based on their 2D orientation, in the second step, we align all wall-segments based on the average orientation of the cluster they are in and cluster the wall-segments based on the distance between the parallel lines determined by the wall-segments. We use least square linear regression to fit lines to the resulting wall clusters, which provide the structure of the environment. Extracting Room Structures. Candidate structures for rectangular rooms and corridors are determined by the intersection points of the lines fitted to the wall clusters by considering each intersection point as a possible corner of a room or a corridor. The actual rooms and corridors are then selected based on the corresponding wall segments, projected to the lines. Ex 1.

Human-Object Interactions

Sample Activity: “Making a Sandwich”. The activity of making a sandwich is characterised with respect to the interactions between a human and its environment, i.e. objects the human uses in the process of preparing the sandwich. Each of these interactions is defined by its spatio-temporal characteristics, in terms of changes in the spatial arrangement in the scene (as described in Sec. 3). As an result we obtain a sequence of interactions performed within the track of the particular instance of the activity, grounded in

room1 room1 corridor1 corridor1

extract 2D wall-segments extract2D 2D wall-segments wall-segments extract

room2 room2 room2 corridor2 corridor2

room1

connection connection

extract 2D wall-segments

corridor1

extract 2D wall-segments

corridor1 connection

room1

connection

corridor3 corridor3 corridor3

room2 connection room2 connection connection

connection connection corridor2 connection corridor2

connection

connection

connection

corridor3 corridor3

Extracting Room Structure based on Line Intersections and Wall-Segments Extracting Room Structure based on Line Intersections and and Wall-Segments Wall-Segments

Extracting Room Structure based on Line Intersections and Wall-Segments

Plane Detection PlaneDetection Detection Plane Plane Detection

Extracting Room Structure based on Line Intersections and Wall-Segments

Clusters and Average Orientation of Clusters Clusters and Average Orientation of Clusters

Plane Detection

Points in each Cluster Points Points in in each each Cluster Cluster

Clusters and Average Orientation of Clusters

Points in each Cluster

Clusters and Average Orientation of Clusters

RGBD SLAM - Input Data RGBDSLAM SLAM -- Input Input Data Data RGBD

Orientation Clustering Orientation Clustering

RGBD SLAM - Input Data RGBD SLAM - Input Data

Aligned Distance Clustering Aligned Distance Clustering

Aligned Distance Clustering

Line-Fitting

Orientation Clustering

Aligned Distance Clustering

Line-Fitting

Figure 5: Extracting Floor-Plan Semantics from 3D Data 3D SLAM SLAM Figure5:5:Extracting Extracting Floor-Plan Semantics Figure Floor-Plan Semantics fromfrom 3D SLAM DataData Figure 5: Extracting Floor-Plan Semantics from 3D SLAM Data

and moves the hand hand together with the bread back “Person1 reaches for the bread,with picks up a slice of bread, and moves the together the bread back to to thethe and moves the hand together with the bread back to the and moves the hand together with the bread back to the and moves the hand together with the bread back to the chopping board. ” chopping board. ” chopping board. ” chopping board.” chopping board.”

The data data we weobtain obtainfrom from the RGB-D sensor consists consists of of 3D3D The data we obtain fromthe theRGB-D RGB-Dsensor sensor consists of 3D The we obtain from the RGB-D sensor consists of 3D The data data we obtain from the RGB-D sensor consists of 3D positions of skeleton joints and tabletop objects for each positions of skeleton joints and tabletop objects for each positions of skeleton joints and tabletop objects for each positions of skeleton skeleton joints jointsand andtabletop tabletopobjects objects each positions of forfor each time-point. time-point. time-point. time-point. time-point. at(joint(id(0), at(joint(id(0), person(id(1))), person(id(1))), at(joint(id(0), person(id(1))), tracking_status(2), tracking_status(2), at(joint(id(0), person(id(1))), pos_3d(point(0.230083,-0.0138919,2.05633)), tracking_status(2), tracking_status(2), pos_3d(point(0.230083,-0.0138919,2.05633)), time_point(2168577589)). pos_3d(point(0.230083,-0.0138919,2.05633)), pos_3d(point(0.230083,-0.0138919,2.05633)), time_point(2168577589)). time_point(2168577589)). time_point(2168577589)). at(joint(id(1), at(joint(id(1), person(id(1))), person(id(1))), tracking_status(2), at(joint(id(1), person(id(1))), at(joint(id(1), person(id(1))), tracking_status(2), pos_3d(point(0.228348,0.275798,1.98048)), tracking_status(2), tracking_status(2), pos_3d(point(0.228348,0.275798,1.98048)), time_point(2168577589)). pos_3d(point(0.228348,0.275798,1.98048)), pos_3d(point(0.228348,0.275798,1.98048)), time_point(2168577589)). ... time_point(2168577589)). ... time_point(2168577589)). at(object(id(0)), type(bread), ... ... at(object(id(0)), type(bread), pos_3d(point(0.223528,0.500194,1.92038)), at(object(id(0)), type(bread), at(object(id(0)), type(bread), pos_3d(point(0.223528,0.500194,1.92038)), time_point(2168577589)). pos_3d(point(0.223528,0.500194,1.92038)), time_point(2168577589)). ... pos_3d(point(0.223528,0.500194,1.92038)), time_point(2168577589)). time_point(2168577589)). ... ... Grounded Interaction Sequence Based on the ...

Line-Fitting Line-Fitting Line-Fitting

Orientation Clustering

the spatio-temporal spatio-temporaldynamics dynamics of thescenario. scenario. As an examthe spatio-temporal AsAs an an examthe spatio-temporal dynamicsofofthe the scenario. example consider the sequence depicted in Fig. 4, the interactions the spatio-temporal dynamics of the scenario. As an examconsider thethe interactions ple consider considerthe thesequence sequencedepicted ple the sequence depictedininFig. Fig.4, 4, interactions in this sequence canbe be described asfollows: follows: plethis consider the sequence depictedas inas Fig. 4, the interactions this sequence can in this sequence can in sequence can bedescribed described follows: in this sequence can be described as follows: “Person1 Person1 reaches reachesfor for the bread, picks up a slice of bread, aa slice of of bread, Person1 reaches forthe thebread, bread,picks picksupup up a slice slice of bread, ““Person1 reaches for the bread, picks bread,

sensed

Grounded InteractionSequence Sequence Based Basedonon sensed Grounded data Interaction thethe sensed body-pose and the detected objects, a sequence inGrounded Interaction Sequence Based on theofsensed body-pose data and the detected objects, a sequence of body-posecan databeand the detected a sequences sequence of in- interactions queried from theobjects, example using body-pose data and the detected objects, a sequence of interactions canbe bequeried queriedfrom fromthe theexample example sequences using teractions can sequences using the interactive query answering mode of Prolog. teractions can be queried from the example sequences using the interactivequery queryanswering answeringmode mode Prolog. interactive ofof Prolog. ?- grounded_interaction( the interactive query answering mode of Prolog. occurs_in(Interaction, Interval), Grounding). ?- grounded_interaction( grounded_interaction( occurs_in(Interaction, Grounding). ?- grounded_interaction( occurs_in(Interaction,Interval), Interval), This results in all interactions identified in theGrounding). example seoccurs_in(Interaction, Interval), Grounding).

results interactions identified in in therespect example This results inall allrespective interactions identified the example quence and in their grounding with to setheseThis results in allrespective interactions identified in respect therespect example sequence and grounding with to to thethe quence and their their respective grounding with spatio-temporal dynamics constituting the interaction, quence and their respective grounding with respect to the spatio-temporal dynamics constituting the interaction, spatio-temporal dynamics constituting the interaction, Interaction = reach_for(person(id(1)), object(bread)), spatio-temporal dynamics constituting the interaction, Interval = interval(t1, t3), Interaction == reach_for(person(id(1)), object(bread)), Interaction reach_for(person(id(1)), object(bread)), Grounding =

Points in each Cluster

Interval = interval(t1, t3), Interval = interval(t1, t3), Interaction = reach_for(person(id(1)), object(bread)), [holds_in( Grounding = Grounding Interval = =interval(t1, t3), approaching( [holds_in( [holds_in( Grounding = body_part(right_hand, person(id(1))), object(bread)), approaching( approaching( [holds_in( interval(t1,t2)), body_part(right_hand, person(id(1))), object(bread)), body_part(right_hand, person(id(1))), object(bread)), approaching( holds_in( interval(t1,t2)), interval(t1,t2)), body_part(right_hand, person(id(1))), object(bread)), touching( holds_in( holds_in( interval(t1,t2)), body_part(right_hand, person(id(1))), object(bread)), touching( touching( holds_in( interval(t2,t3)]; body_part(right_hand, person(id(1))), object(bread)), body_part(right_hand, person(id(1))), object(bread)), touching( interval(t2,t3)]; interval(t2,t3)]; body_part(right_hand, person(id(1))), object(bread)), interval(t2,t3)];

Interaction = pick_up(person(id(1)), object(bread)),

Interaction == pick_up(person(id(1)), object(bread)), Interaction = pick_up(person(id(1)), object(bread)), Interval interval(t4, t6), Interval = interval(t4, t6), t6), Interval interval(t4, Interaction === pick_up(person(id(1)), object(bread)), Grounding Grounding Grounding = Interval = =interval(t4, t6), [occurs_at( [occurs_at( Grounding = [occurs_at( grasp( grasp( [occurs_at( grasp( body_part(right_hand, person(id(1))), object(bread)), body_part(right_hand, person(id(1))), object(bread)), grasp( body_part(right_hand, person(id(1))), object(bread)), timepoint(t4), timepoint(t4), body_part(right_hand, person(id(1))), object(bread)), timepoint(t4), holds_in( holds_in( timepoint(t4), holds_in( attached( attached( holds_in( attached( body_part(right_hand, person(id(1))), object(bread)), body_part(right_hand, person(id(1))), object(bread)), attached( body_part(right_hand, person(id(1))), object(bread)), interval(t5,t8)), interval(t5,t8)), body_part(right_hand, person(id(1))), object(bread)), interval(t5,t8)), holds_in( holds_in( interval(t5,t8)), holds_in( move_up(body_part(right_hand, person(id(1)))), move_up(body_part(right_hand, person(id(1)))), holds_in( move_up(body_part(right_hand, person(id(1)))), interval(t5,t6))]; interval(t5,t6))]; move_up(body_part(right_hand, person(id(1)))), interval(t5,t6))]; ... ... interval(t5,t6))]; ... ...

In In particular, the the interaction reach f or(person(id(1)), particular, interaction reach f or(person(id(1)), In particular, the interaction interaction reach f or(person(id(1)), or(person(id(1)), In particular, theoccurring interaction reach ftime-point or(person(id(1)), In particular, the reach ftime-point object(bread)) between t1 and object(bread)) occurring between t1 and object(bread)) occurring between time-point t1 and and object(bread)) occurring between time-point t1 and object(bread)) occurring between time-point t1 t3 t3 is is composed of the spatio-temporal pattern of composed of the spatio-temporal pattern of t3 is is composed of the spatio-temporal pattern of t3 is composed of the the spatio-temporal pattern of t3 composed of spatio-temporal pattern of approaching, stating that the right hand of person approaching, stating that the right hand of person approaching, stating that the during right hand of person approaching, stating that the right right hand of t1person person 1 1 is approaching the bread time-interval approaching, stating that the hand of is approaching the bread during time-interval t1 1to is the the bread duringduring time-interval t1 11t2,approaching ist2,and approaching the bread during time-interval t1 the the pattern touching, stating that that the is approaching bread time-interval t1 to and pattern touching, stating the to t2, and the pattern touching, stating that the to t2, and the pattern touching, stating that the right hand of person 1 is touching the bread durto t2, and the pattern touching, stating that the right of person 1 is touching the bread right handhand of person the dur- during right time-interval to 1t3.is 11 touching Similarly the bread interaction hand oft2person person is touching touching the bread durright hand of is the bread during time-interval t2 to t3. Similarly the interaction ing time-interval t2 tot2object(bread)) t3. Similarly interaction picking up(person(id(1)), isthecomposed of time-interval to t3. Similarly the interaction ing time-interval t2 to t3. Similarly the interaction pick up(person(id(1)), object(bread)) is composed of pick up(person(id(1)), object(bread)) is composed of grasping, attachment and upwards movement, with the pick up(person(id(1)), object(bread)) is composed composed of pick up(person(id(1)), object(bread)) is of grasping, attachment and upwards movement, with the grasping, attachment and upwards movement, with the difference, thatattachment grasping itself isupwards an interaction, that can grasping, attachment and upwards movement, with the and movement, with the difference, grasping isinteraction, an interaction, that can difference, that that grasping itselfitself is an be further grounded in movement dynamics. Thisthat kindcan of difference, that can that in grasping itself is dynamics. an interaction, that can be further grounded in movement This kind be further grounded movement dynamics. This kind of declarative grounding can be used, e.g., for relational learn-kind of be of further grounded in movement dynamics. This kind of grounding canused, be used, e.g., for relational declarative grounding can e.g., for relational learn- learningdeclarative by demonstration, etc.be declarative learngrounding can be used, e.g., for relational learnby demonstration, ing ing by demonstration, etc. etc.

ing by demonstration, etc.

Ex 2. Visuo-Locomotive Interactions Ex Ex 2. 2.Visuo-Locomotive Interactions Visuo-Locomotive Interactions Ex 2. Activity: Visuo-Locomotive Interactions Sample “Indoor Robot Navigation”. Robots Sample Activity: “Indoor Robot Navigation”. RobotsRobots Sample Activity: “IndoorinRobot Navigation”. having to behave and navigate environments populated Sample Activity: “Indoor Robot Navigation”. Robots Sample Activity: “Indoor Robot Navigation”. Robots having to behave and navigate in environments populated behave and navigate in activities environments populated by having humans to have to understand human and interhaving to behave and navigate in environments populated by humans have to understand human activities and interhaving to behave and navigate in environments populated actions and have to behave accordingly. this context, by humans have to understand humanInactivities and interby humans to understand human and interactions andabstractions havehave to behave accordingly. Inactivities thisIncontext, by humans have toto human activities interhigh-level ofunderstand human everyday activities and the actions and have behave accordingly. thisand context, actions and have to behave accordingly. In this context, high-level abstractions of human everyday activities and the actions and have to behave accordingly. In this context, semantics of the environment can be used to guide robot high-level abstractions of human everyday activities and the high-level abstractions of activities the semantics of the environment can be everyday used to guide robot high-level abstractions of human everyday activities and the decision-making forhuman humans moving theguide envi-and semantics of to theaccount environment can be usedin to robot semantics of the environment can be used to guide robot decision-making to account for humans moving in the envisemantics of the environment can be used to guide robot ronment. decision-making to account for humans moving in the environment. decision-making to decision-making to account account for for humans humans moving moving in in the the envienvironment.

ronment. ronment.

As an example consider a robot that has to move from room1 to room2, during this movement the robot has to As an example robot that has move from pass through theconsider corridor acorridor2. The to structure of the room1 to room2, during this movement the robot has to environment is represented as follows: pass through the corridor corridor2. The structure of the floorplan_structure(id(room1), type(room)). environment is represented as follows: geom(id(room1), polygon([ point(33.04074928940648, 47.78135448806469), floorplan_structure(id(room1), type(room)). point(41.95226523762394, 53.36407052934558), geom(id(room1), polygon([ point(44.20648040233633, 49.48777998147939), point(33.04074928940648, 47.78135448806469), point(35.17204967399538, 43.83961776145081) point(41.95226523762394, 53.36407052934558), ])). point(44.20648040233633, 49.48777998147939), point(35.17204967399538, 43.83961776145081) floorplan_structure(id(corr1), type(corridor)). ])). geom(id(corr1), polygon([ floorplan_structure(id(corr1), type(corridor)). point(34.17204967399538, 42.83961776145081), geom(id(corr1), polygon([ ... ])). point(34.17204967399538, 42.83961776145081), ... ])). floorplan_structure(id(room2), type(room)). geom(id(room2), polygon([ ... ])). floorplan_structure(id(room2), type(room)). geom(id(room2), polygon([ ... ])). ... ... Based Based on on the the extracted extracted semantic semantic floor-plan floor-plan structure structure and and the the detected people in the environment, the robot can make Based on the extracted semantic floor-plan structure and thededetected people in the environment, the robot can make dedetected people in the environment, the robot can make decisions cisions using using high-level high-level navigation navigation rules, rules, e.g. e.g. defining defining how how cisions using high-level navigation rules, e.g. defining how to behave in narrow passages, such as the corridor to behave in narrow passages, such as the corridor when when to behavepeople in narrowthe passages, such as thecan corridor when there there are are people in in the corridor. corridor. E.g. E.g. we we can define define aa rule rule therea are people the enter corridor. E.g. we(the canrobot definecan a rule that robot can in only a corridor exethat a robot can only enter a corridor (the robot can exethat athe robot can action only enter a corridor (the Structure, robot can execute control enter(F loorplan T )), cute the control action action enter(Floorplan loorplanStructure, Structure, T )), theiscontrol )), ifcute there no person inenter(F the corridor, or the person isTpassifif there is no person in the corridor, or the person is passis no person the corridor, passingthere the corridor in theinsame directionorasthetheperson robot.isFor this ing the corridor in the thesame samedirection directionasasthe therobot. robot.For For this ing the corridor in this example we use a simple action language for planing the acexample we use a simple action language for planing the example werobot, use a simple language for planing ac-actions of the in this action context the predicate possthe at(✓, t) tions the robot, robot, in inthis thiscontext contextthe thepredicate predicateposs poss at(θ, tions of ofthe the t) t) defines spatio-temporal configuration, in whichat(✓, the condefines spatio-temporalconfiguration, configuration,ininwhich which condefines the spatio-temporal thethe control action ✓ can be executed. trol action θ can be executed. trol ✓ can be executed.

poss_at( poss_at( enter(fp_struct(id(FS_ID), type(corridor))), enter(fp_struct(id(FS_ID), type(corridor))), timepoint(T)) :timepoint(T)) :not(holds_at( not(holds_at( inside(person(P_ID), fp_struct(FS_ID)), inside(person(P_ID), fp_struct(FS_ID)), timepoint(T))). timepoint(T))). poss_at( poss_at( enter(fp_struct(id(FS_ID), type(corridor))), enter(fp_struct(id(FS_ID), type(corridor))), timepoint(T)) :timepoint(T)) :-

holds_at( holds_at( inside(person(P_ID), fp_struct(id(FS_ID),_)), _)), inside(person(P_ID), fp_struct(id(FS_ID), timepoint(T)), timepoint(T)), occurs_in( occurs_in( passing(person(P_ID), fp_struct(id(FS_ID), passing(person(P_ID), fp_struct(id(FS_ID),_), _),Dir1), Dir1), interval(I)), interval(I)), time(during(T, I)), time(during(T, I)), trajectory(person(P_ID), P_Path, trajectory(person(P_ID), P_Path,interval(I)), interval(I)), planed_trajectory(R_Path), planed_trajectory(R_Path), movement_direction(P_Path, Dir1), movement_direction(P_Path, Dir1), movement_direction(R_Path, Dir2), movement_direction(R_Path, Dir2), same_direction(Dir1, Dir2). same_direction(Dir1, Dir2).

The above above rules The rules state state that that the the robot robot can can only onlyex-execute the control action enter(f p struct(id(F S SID), ecute the control action enter(f p struct(id(F ID), type(corridor))) if one of the two rules is true. The first type(corridor))) if one of the two rules is true. The first

simply states states that that the the robot robot can can enter enter the the corridor corridor ifif rule simply no person person in in the the corridor. corridor. The The second second rule rule states, states, there is no rule simply states that theinside robot the cancorridor, enter theand corridor if is that if there is the there is aa person person inside the corridor, and the person person is there is no through person in thecorridor, corridor.the second passing the robot can enter the through the corridor, theThe robot can rule enterstates, the corricorrithat if there a person inside corridor, andthrough the person dor, if theistrajectory of person passing the trajectory of the thethe person passing through theiscorricorripassing through the corridor, the robot can enter the corridor and the the planed planed path path of of the the robot robot are are passing passing the the corridor corridor dor, if the trajectory of the person passing through the corriin the same same direction. direction. dor and the planed path of the robot are passing the corridor In this way, high-level way, high-level semantic semantic behaviour behaviour descriptions descriptions can can in the same direction. be used for guiding low-level robot controls, such as path for guiding low-level robot controls, such as path In this way, high-level semantic behaviour descriptions can planning, etc. etc. be used for guiding low-level robot controls, such as path planning, etc.

5. 5. Summary Summary and and Outlook Outlook 5. Summary and Outlook

Visuo-locomotive Visuo-locomotive sensemaking sensemaking for for practical practical cognitive cognitive robotics in contextualised settings with mobility, Visuo-locomotive sensemaking for practical cognitive robotics in contextualised settings with humans, humans, mobility, and interaction is robotics in contextualised settings with humans, endeavour mobility, reand human-machine human-machine interaction is aa complex complex endeavour requiring integrative methodologies combining the and human-machine is a complex endeavour re- of quiring integrativeinteraction methodologies combining the state state of the from research such quiring methodologies combining the stateAI, ofHCI, the art artintegrative from several several research areas areas such as as vision, vision, AI, HCI, Our the theand art engineering. from several research areasemphasis such as vision, AI, HCI,merit and engineering. Our research research emphasis the particular particular merit and engineering. Our research emphasis the particular merit of combining visual processing with semantic of combining visual processing with semantic knowledge knowledge of representation combining visual processing with semantic knowledge representation and and reasoning reasoning techniques techniques rooted rooted in in artificial artificial representation and reasoningcommonsense techniques rooted in artificial intelligence, particularly reasoning. intelligence, particularly commonsense reasoning. We We have have intelligence, particularly commonsense reasoning. We have presented presented aa declarative declarative commonsense commonsense model model for for groundgroundpresented a declarative commonsense model for the grounding embodied visuo-locomotive interactions; ing embodied visuo-locomotive interactions; the proposed proposed ingmodel embodied visuo-locomotive interactions; the proposed —consisting of a formal characterisation of space, model —consisting of a formal characterisation of space, model —consisting of a formal characterisation of space, time, space-time, and motion patterns— is geared towards time, space-time, and motion patterns— is geared towards time, space-time, and motion patterns— is geared towards such knowledge representation and reasoning capabilities knowledge representation and reasoning capabilities such knowledge representation and reasoning capabilities such as commonsense abstraction, learning, reasoning, embodas commonsense abstraction, learning, reasoning, embodas commonsense abstraction, learning, reasoning, embodied simulation. With aa particular focus on the simulation. With particular focus onrepresentation the representation representation iedied simulation. With a particular focus on the of space-time histories and motion patterns, the model is ilof space-time histories and motion patterns, the model of space-time histories and motion patterns, the model is il- is illustrated with select RGB-D datasets corresponding to lustrated with select RGB-D datasets corresponding to repreplustrated with select RGB-D datasets corresponding to representative activities from a larger dataset of everyday acresentative activities from a larger dataset of everyday resentative activities from a larger dataset of everyday ac- activities. Immediate next steps involve integration with state tivities. Immediate steps involve integration tivities. Immediate nextnext steps involve integration with with state state robot control platforms as of of thethe artart robot control platforms suchsuch as ROS; this this will of the art robot control platforms such as ROS; ROS; this will will be accomplished via integration into the ExpCog commonbe be accomplished via via integration into into the ExpCog commonaccomplished integration the ExpCog commonsense cognition robotics platform for // simsense cognition robotics platform for experimental / simsense cognition robotics platform for experimental experimental simulation purposes, and within openEASE as a state of ulation purposes, andand within openEASE as a as state of theof the ulation purposes, within openEASE a state the cognition-enabled control of robotic control platform artart cognition-enabled control of robotic control platform for for art cognition-enabled control of robotic control platform for 2 2 real robots. real robots. real robots.2

Acknowledgements Acknowledgements Acknowledgements This research is partially funded by the Research This research is partially funded by Germany the Germany Research This research is partially funded by theResearch Germany Research Foundation (DFG) via via the the Collabortive Center Foundation (DFG) Collabortive Research Center Foundation via Activity theActivity Collabortive Research Center (CRC) EASE –(DFG) Everyday Science and Engineer(CRC) EASE – Everyday Science and Engineer(CRC) EASE – Everyday Activity Science and Engineering inging (http://ease-crc.org). We We also also acknowledge (http://ease-crc.org). acknowledge (http://ease-crc.org). WeThomas alsoHudkovic, acknowledge thethe support of of Omar Moussa, Thomas and the Visupport Omar Moussa, Hudkovic, andsupViport of Omar Moussa, Thomas Hudkovic, and Vijayanta Jain jayanta Jain in preparation of parts of the overall activity jayanta Jain in preparation of parts of the overall activity in preparation of parts of the overall activity dataset. dataset. dataset. 2 ExpCog – http://www.commonsenserobotics.org 22ExpCog – http://www.commonsenserobotics.org

ExpCog http://www.commonsenserobotics.org openEASE – –http://ease.informatik.uni-bremen.de/ openEASE openEASE –– http://ease.informatik.uni-bremen.de/ http://ease.informatik.uni-bremen.de/ openease/ openease/ openease/

References [1] M. Aiello, I. E. Pratt-Hartmann, and J. F. v. Benthem. Handbook of Spatial Logics. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2007. [2] G. Bartels, I. Kresse, and M. Beetz. Constraint-based movement representation grounded in geometric features. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots, Atlanta, Georgia, USA, October 15–17 2013. [3] B. Bennett, A. G. Cohn, P. Torrini, and S. M. Hazarika. Describing rigid body motions in a qualitative theory of spatial regions. In Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on on Innovative Applications of Artificial Intelligence, pages 503–509, 2000. [4] B. Bennett, A. G. Cohn, P. Torrini, and S. M. Hazarika. A foundation for region-based qualitative geometry. In Proceedings of the 14th European Conference on Artificial Intelligence, pages 204–208, 2000. [5] M. Bhatt. Reasoning about space, actions and change: A paradigm for applications of spatial reasoning. In Qualitative Spatial Representation and Reasoning: Trends and Future Directions. IGI Global, USA, 2012. [6] M. Bhatt, H. Guesgen, S. W¨olfl, and S. Hazarika. Qualitative spatial and temporal reasoning: Emerging applications, trends, and directions. Spatial Cognition & Computation, 11(1):1–14, 2011. [7] M. Bhatt, J. H. Lee, and C. P. L. Schultz. CLP(QS): A declarative spatial reasoning framework. In Spatial Information Theory - 10th International Conference, COSIT 2011, Belfast, ME, USA, September 12-16, 2011. Proceedings, pages 210–230, 2011. [8] M. Bhatt, C. Schultz, and C. Freksa. The ‘Space’ in Spatial Assistance Systems: Conception, Formalisation and Computation. In T. Tenbrink, J. Wiener, and C. Claramunt, editors, Representing space in cognition: Interrelations of behavior, language, and formal models. Series: Explorations in Language and Space. 978-0-19-967991-1, Oxford University Press, 2013. [9] A. G. Cohn and J. Renz. Qualitative spatial reasoning. In F. van Harmelen, V. Lifschitz, and B. Porter, editors, Handbook of Knowledge Representation. Elsevier, 2007. [10] E. Davis. Pouring liquids: A study in commonsense physical reasoning. Artif. Intell., 172(12-13):1540–1578, 2008. [11] E. Davis. How does a box work? a study in the qualitative dynamics of solid objects. Artif. Intell., 175(1):299–345, 2011. [12] E. Davis. Qualitative spatial reasoning in interpreting text and narrative. Spatial Cognition & Computation, 13(4):264– 294, 2013. [13] M. Eppe and M. Bhatt. Narrative based postdictive reasoning for cognitive robotics. In COMMONSENSE 2013: 11th International Symposium on Logical Formalizations of Commonsense Reasoning, 2013. [14] C. Eschenbach and K. Schill. Studying spatial cognition a report on the dfg workshop on ”the representation of motion”. KI, 13(3):57–58, 1999. [15] M. Ester, H. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases

[16] [17]

[18]

[19] [20]

[21] [22]

[23] [24] [25]

[26]

[27]

[28] [29] [30]

with noise. In E. Simoudis, J. Han, and U. M. Fayyad, editors, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, USA, pages 226–231. AAAI Press, 1996. H. W. Guesgen. Spatial reasoning based on Allen’s temporal logic. Technical Report TR-89-049. International Computer Science Institute Berkeley, 1989. P. J. Hayes. Naive physics I: ontology for liquids. In J. R. Hubbs and R. C. Moore, editors, Formal Theories of the Commonsense World. Ablex Publishing Corporation, Norwood, NJ, 1985. S. M. Hazarika. Qualitative Spatial Change : Space-Time Histories and Continuity. PhD thesis, The University of Leeds, School of Computing, 2005. Supervisor - Anthony Cohn. D. Hern´andez, E. Clementini, and P. Di Felice. Qualitative distances. Springer, 1995. M. Labbe and F. Michaud. Online Global Loop Closure Detection for Large-Scale Multi-Session Graph-Based SLAM. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2661–2666, Sept 2014. R. Moratz. Representing relative direction as a binary relation of oriented points. In ECAI, pages 407–411, 2006. M. Quigley, K. Conley, B. P. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng. Ros: an open-source robot operating system. In ICRA Workshop on Open Source Software, 2009. D. A. Randell, Z. Cui, and A. G. Cohn. A spatial logic based on regions and connection. KR, 92:165–176, 1992. J. Renz and B. Nebel. Qualitative spatial reasoning using constraint calculi. In Handbook of Spatial Logics, pages 161–215. 2007. A. Scivos and B. Nebel. The Finest of its Class: The Natural, Point-Based Ternary Calculus LR for Qualitative Spatial Reasoning. In C. Freksa et al. (2005), Spatial Cognition IV. Reasoning, Action, Interaction: International Conference Spatial Cognition. Lecture Notes in Computer Science Vol. 3343, Springer, Berlin Heidelberg, volume 3343, pages 283– 303, 2004. M. Spranger, J. Suchan, and M. Bhatt. Robust Natural Language Processing - Combining Reasoning, Cognitive Semantics and Construction Grammar for Spatial Language. In IJCAI 2016: 25th International Joint Conference on Artificial Intelligence. AAAI Press, 2016. M. Spranger, J. Suchan, M. Bhatt, and M. Eppe. Grounding dynamic spatial relations for embodied (robot) interaction. In PRICAI 2014: Trends in Artificial Intelligence - 13th Pacific Rim International Conference on Artificial Intelligence, Gold Coast, QLD, Australia, December 1-5, 2014. Proceedings, pages 958–971, 2014. S. Tellex. Natural language and spatial reasoning. PhD thesis, Massachusetts Institute of Technology, 2010. A. Tyler and V. Evans. The semantics of English prepositions: spatial scenes, embodied meaning and cognition. Cambridge University Press, Cambridge, 2003. F. Worgotter, E. E. Aksoy, N. Kruger, J. Piater, A. Ude, and M. Tamosiunaite. A simple ontology of manipulation actions based on hand-object relations. 2012.