Learning Hierarchical Bayesian Networks for human skill ... - CiteSeerX

10 downloads 0 Views 78KB Size Report
arm motion in cello playing. This problem is inherently hierarchical and therefore well- suited for modelling by HBNs. The task is to construct a descriptive model ...
Learning Hierarchical Bayesian Networks for human skill modelling Elias Gyftodimos Computer Science Department University of Bristol [email protected]

Abstract In previous work [3] we have proposed Hierarchical Bayesian Networks (HBNs) as an extension of Bayesian Networks. HBNs are able to deal with structured domains, and use knowledge about the structure of the data to introduce a bias that can contribute to improving inference and learning methods. In effect, nodes in an HBN are (possibly nested) aggregations of simpler nodes. Every aggregate node is itself an HBN modelling independencies inside a subset of the whole world under consideration. In this paper we introduce inference in HBNs using a stochastic sampling algorithm, and a learning method for HBNs based on the Cooper and Herskovits structure likelihood measure [1]. We furthermore explore how HBNs can be applied to the problem of modelling right arm motion in cello playing. This problem is inherently hierarchical and therefore wellsuited for modelling by HBNs. The task is to construct a descriptive model for a player’s movements observing the position of different joints as well as muscular activity of the right arm during the execution of a short musical extract. Different datasets were used to construct models both for an amateur and a professional cello player, and differences between the derived HBNs can be used to interpret the differences on each person’s “tacit knowledge” on the task.

Peter A. Flach Computer Science Department University of Bristol [email protected]

Links in the network represent probabilistic dependencies the same way as in standard Bayesian Networks, the difference being that those links may lie at any level of nesting into the data structure. In this paper we define inference and learning methods for HBNs. Inference is performed using a stochastic sampling algorithm, and learning is based on the approach of [1]. We validate the approach by applying HBNs to the problem of modelling human cello performance. This problem has been studied by [2], where Bayesian Networks are proposed as a suitable descriptive model. Their approach is based on analysing data measurements acquired by an amateur and a professional cello player executing a short music extract. The task is to build a model for each performer’s behaviour and see how differences in their playing are reflected in those models. Our experiments demonstrate that HBNs are able to employ hierarchical domain knowledge in an intuitive way. Furthermore, the declarative bias arising from assuming a hierarchy makes exhaustive search feasible. Finally, the learned models are meaningful and capture essential differences in playing style in an intuitive way. The outline of the paper is as follows. We begin by presenting preliminary terminology and definitions on HBNs in section 2. Section 3 shows an adaptation of a popular algorithm for learning standard Bayesian Networks for the case of HBNs. Section 4 addresses the problem of using an HBN in order to perform inference. Section 5 presents the application of HBNs on the problem of modelling cello performance. We conclude this paper with our main conclusions and perspective for further work.

1 Introduction Bayesian Networks [6] are a popular framework for reasoning under uncertainty. However, inference mechanisms for Bayesian Networks are compromised by the fact that they can only deal with propositional domains. Hierarchical Bayesian Networks (HBNs) [3] are an extension of Bayesian Networks, in which nodes in the network may correspond to (possibly nested) tuples of atomic types.

2 Hierarchical Bayesian Networks: Preliminaries A standard Bayesian Network is a graphical model that is used to represent conditional independencies among a set of variables. Typically, it consists of two parts: The structural part, a directed acyclic graph in which nodes stand for random variables and edges

t

A

t

A

B B

A

BII

BI

BI

C

a1

BI

(c)

P(BII|A,BI)

a2

0.3 0.7 P(C|BI,BII)

c1

c2

c3

bI1bII1 bI1bII2 bI1bII3 bI2bII1 bI2bII2 bI2bII3

0.2 0.3 0.2 0.5 0.6 0.7

0.6 0.4 0.2 0.3 0.2 0.2

0.2 0.3 0.6 0.2 0.2 0.1 (d)

BII

C

BII

(b)

(a) P(A)

C

a1bI1 a1bI2 a2bI1 a2bI2

bII1 bII2 0.3 0.1 0.7 0.9

P(BI|A) bI1 a1 a2

0.4 0.5 0.1 0.1

bII3 0.3 0.4 0.2 0

bI2

0.4 0.6 0.8 0.2

Figure 1: A simple Hierarchical Bayesian Network. (a) Nested representation. (b) Tree representation. (c) Standard BN expressing the same dependencies. (d) Probabilistic part. for direct conditional dependencies between them; and the probabilistic part that quantifies the conditional dependencies, and in the case of discrete variables is a set of CPTs, each containing the conditional probability of a variable given the values of its parents in the graph. The underlying property in a Bayesian Network is that a variable is independent of its nondescendents given the values of its parents in the graph. This property can be exploited to decompose the full joint probability of all the variables using the chain rule of probabilities: n

P x1  x2  xn 

∏ P xi π i  i i

where πi denotes the set of parents of xi in the graph. Hierarchical Bayesian Networks are a generalisation of standard Bayesian Networks, defined over structured data types. An HBN consists of two parts: the structural and the probabilistic part. The former (also referred to as the HBN-tree structure or simply HBN structure) describes the part-of relationships and the probabilistic dependencies between the variables. The latter contains the quantitative part of the conditional probabilities for the variables that are defined in the structural part. In this paper we will restrict our analysis to discrete domains, so the probabilistic part will be a set of conditional probability tables. Figure 1 presents a simple Hierarchical Bayesian Network. The structural part consists of three variables, A  B and C, where B is itself a pair BI  BII  . This may be represented either using nested nodes (a), or by a tree-like type hierarchy (b). We use the symbol t to denote a top-level composite node that includes all the variables of our world.

In (c) it is shown how the probabilistic dependency links unfold if we flatten the hierarchical structure to a standard Bayesian Network. In an HBN two types of relationships between nodes may be observed: Relationships in the type structure (called t-relationships) and relationships that are formed by the probabilistic dependency links (p-relationships). We will make use of everyday terminology for both kinds of relationships, and refer to parents, ancestors, siblings, spouses etc. in the obvious meaning. In the previous example, B has two t-children, namely BI and BII, one p-parent (A) and one p-child (C). The scope of a probabilistic dependency link is assumed to “propagate” through the type structure, defining a set of higher-level probabilistic relationships. Trivially, all p-parents of a node are also considered its higherlevel parents. For example, the higher-level parents of C are B (as a trivial case), BI and BII (because they are t-descendants of B and there exists a p-link B C). We will now provide more formal definitions for HBNs. We begin by introducing hierarchical type aggregations, over which Hierarchical Bayesian Networks are defined. Currently, the only aggregation operator that we allow for composite types is the Cartesian product, but we plan to extend composite types to include aggregations such as lists and sets. This will demand a proper definition of probability distribution over these constructs, such as the ones used in the 1BC2 first-order naive Bayesian classifier [4]. Definition 2.1 (Type) An atomic type is a domain of constants. If τ1  τ2    τn  is a set of types, then the Cartesian product τ  τ1  τ2     τn is a composite type. The types τ1  τ2   τn are called the component types of τ. Definition 2.2 (Type structure) The type structure corresponding to a type τ is a tree t such that: (1) if τ is an atomic type, t is a single leaf labelled τ; (2) if τ is composite, t has root τ and as children the type structures that correspond to the components of τ. Definition 2.3 (HBN-tree structure) Let τ be an atomic or composite type, and t its corresponding type structure. An HBN-tree structure T over the type structure t, is a triplet  R  where 



R is the root of the structure, and corresponds to a random variable of type τ.  is a set of HBN-tree structures called the tchildren of R. If τ is a simple type then this set is empty, otherwise it is the set of HBN-tree structures over the component-types of τ. R is also called the t-parent of the elements of  .



2

is a set of directed edges between elements of  such that the resulting graph contains no directed cycles. For v v   we say that v and v participate in a p-relationship, or more specifically that v is a p-parent of v and v is a p-child of v. 

If τ is an atomic type, an HBN-tree structure over t will be called an HBN-variable. We will use the term HBN-variable to refer also to the random variable of type τ that the root of the structure is associated to. Definition 2.4 (Higher-level parents and children) Given an HBN-tree structure T   R   and a t-child of R,  R        , then for any vP  vC  vi such that vP  v ! " v vC ! # vi    , we say that vP is a higher-level parent of vi , and that vi is a higher-level parent of vC . Furthermore, if vHLP is a higher-level parent of v, then vHLP is also a higher-level parent of vi , and if v is a higher-level parent of vHLC then vi is also a higher-level parent of vHLC . Definition 2.5 The HBN-Probabilistic Part related to an HBN-structure T consists of: (1) a probability table for each HBN-variable in T that does not have any p-parents or higher-level parents; (2) a conditional probability table for each other HBNvariable, given the values of all HBN-variables that are its p-parents or higher-level parents. Definition 2.6 A Hierarchical Bayesian Network is a triplet  T %$& t  where  

t is a type structure



T t $





R  is an HBN-tree structure over

B

C

We will refer to two useful operations on HBN structures, pruning and flattening (Figure 2). Pruning a structure on a composite node v replaces the whole subtree under v with a single variable node. The domain of the new node is the cross product of the domains of the variable nodes that existed in the subtree. Flattening the structure on v is a process of “shifting” the t-children of v up one level in the structure, replacing the composite node. The p-links between v and its t-siblings are replaced by links to or from each one of the “shifted” nodes. If we apply this flattening operation repeatedly on all the t-children of the root of an HBN until there are no more composite nodes, we will end up with an HBN that has only one level of type nesting under the root. The nodes on this level will be the variable nodes of the initial HBN and the p-relationships

B

D

C12

D

(b) A C1

C2 (a)

B

C1

C2

D

(c)

Figure 2: (a) A part of an HBN-tree structure. (b) Result of pruning the structure on node C. (c) Result of flattening the structure on C. between them will correspond to the initial higherlevel parents and children. Disregarding the root of the structure, what is left is a standard Bayesian Network. We will call this the corresponding Bayesian Network of the original HBN. Definition 2.7 (Probability distributions over types) If τ is an atomic type, Pτ x   x  τ, is the probability distribution over τ. If τ  τ1    τn and x  τ, then P x ' P x1  xn  , where xi  τi are the components of x. An HBN maps the conditional independencies between its variable nodes, in a way that the value of an atomic variable is independent of all atomic variables that are not its higher-level descendants, given the value of its higher-level parents. The independencies that an HBN describes can be exploited using the chain rule of conditional probability, to decompose the full joint probability of all the atomic types into a product of the conditional probabilities, in the following way: (

is the HBN-Probabilistic Part related to T

A

A

P x 

PτX x  ∏ni 1 P xi Par xi 

if x  τX is atomic otherwise

where x1  x2  xn are the components of x and Par xi  are the (direct) p-parents of xi in the structure. Example 2.8 For the HBN structure of Figure 1, we have: P t ) P A  B  C * P A  P B A  P C B * P A  P BI  BII A  P C BI  BII   P A  P BI A  P BII BI  A  P C BI  BII  .

3 Learning HBNs One important area of concern is the problem of learning HBNs, i.e., given a database of observations, to construct an HBN that fits the data in a satisfactory way. There are several levels at which this problem can possibly be addressed. For instance, we can learn the HBN-probabilistic part for

a given HBN-structure; or we can learn both the HBN-probabilistic and structural part, given the type structure. Although trying to learn the type structure as well is in theory possible, we do not address this problem for two reasons. First, the complexity of the task is significant, and combined with learning the HBN structure would give no computational gains over learning a standard Bayesian Network directly. More importantly, we see the type structure as available prior knowledge that is an inherent characteristic of the domain and should be exploited. In our analysis, we assume that there are no missing values in the database, and that different observations in the database occur independently. Learning the probabilistic part can be achieved in a straightforward manner, using the relative frequencies of events in the database in order to estimate the values of the respective conditional probabilities. Given the independence of different instances, the relative frequencies will converge to the actual probability values when the database is sufficiently large. So, when the HBN structure is known and there is enough data available, estimating the conditional probabilities is a relatively easy process. Deriving the HBN structure from the database is a more complex task. Knowledge of the type structure is exploited in HBNs as a declarative bias, as it significantly reduces the number of possible network structures. For example, the number of possible structures for a Bayesian Network with 10 nodes exceeds 1018 , whereas given a type structure that consists of a pair of 5-tuples the total number of possible HBN structures is approximately 2  3  1010, and for a type structure of a 5-tuple of pairs the number drops to approximately 7  106. Clearly, HBNs subsume standard Bayesian Networks for which the type structure consists of a single level, so the actual gains achieved in a domain depend on the type structure. However, we argue that many domains are naturally structured in a hierarchical way, and it is exactly in those domains that the application of HBNs is a powerful tool. Our approach to learning the HBN structure is an adaptation of the method described in [1]. We use a Bayesian method to compute the likelihood of a structure given the data, and search for the structure that maximises that likelihood. In [1] a formula is derived to compute P BS  D  for a Bayesian Network structure BS , depending on the prior P BS  . That result is based on the assumptions that (a) the variables in the database are discrete, (b) different instances occur independently given the structure, (c) there are no missing values, and (d) that before seeing the database, we consider all the possible conditional probability values setups for a given structure equally likely.

Theorem 1 (Cooper and Herskovits) Let BS be a Bayesian Network structure containing n discrete variables xi , each associated to a domain vi 1   viri  , and πi be the set of parents of xi in BS . Suppose D is a database of m instantiations of the variables xi , and let wi 1   wiqi  be all the unique instantiations of πi in D. Let Ni jk be the number of cases where xi  vik and πi is instantiated to wi j , and let Ni j  ∑rki 1 Ni jk . The joint probability of having the structure BS and the database D is given by: n

P BS  D  P BS 

qi

∏∏

i 1 j 1

ri ri + 1  ! N ! ∏ Ni j , ri + 1  ! k 1 i jk

Definition 3.1 Let BHS be an HBN structure, and BS the corresponding Bayesian Network structure of BHS . We define the joint probability of the structure BHS and the database D as P BHS  D  αP BS  D  where α is a normalising constant such that ∑HS P BHS  D  1. This means that in order to compute the above joint probability, instead of parents (in the standard Bayesian Network case) we consider higher-level parents. The constant α is needed because there are fewer possible HBN structures than standard BN structures containing the same variables, given the type structure. Additionally, we assume equal prior probabilities among different structures. So, in order to compute the likelihood for an HBN structure, we simply need to extract the corresponding BN structure from it and apply the above formula. The next step is to find the HBN structure that maximises that expression. Since probabilistic links occur only between siblings in the type structure, this introduces a limit to the number of possible parents we need to consider for each node. The search space of the possible structures is significantly reduced compared to the standard Bayesian Network case. Furthermore, if we apply an ordering on the nodes of the type structure (in fact, we only need an ordering for each subset of siblings) the number of possible structures decreases dramatically. In [1] it is shown that with an ordering on the nodes and a sufficiently tight limit on the number of parents for each node, the derivation of the structure that maximises P BS  D  is computationally feasible. Our approach is a clear step towards that direction. Allowing plinks only between t-siblings reduces the maximum number of parents, but this is achieved in a domain specific way, instead of simply imposing a hard limit for all the nodes. Our learning algorithm is a recursive search for the best possible p-link setup among a set of tsiblings (t-children of the same node in the type

structure), beginning from the root of the type structure and proceeding towards the leaves. Algorithm 1 (HBN-structure learning) Given: a type structure T ; a node k in the type structure; a (partial) setup E of p-links of the HBN-structure; and a database D of instantiations of the atomic variables. Output: a setup of p-links over the HBN-structure. To compute learn T  k  E  D  : 1. if k is a variable node, return E 2. if k is composite: 3. find the set E  of p-links between the tchildren of k that maximises P E - E   D  4. for each t-child ki of k, let Ei be the result of learn T  ki  E - E   D  5. return the set E - E .- - i Ei  of all the plinks derived Typically, for a type structure T with root t, the initial query is learn T  t    D  . However, prior knowledge about p-links that we want to assert that will occur in the solution can be provided at this stage, substituting the empty set with the set of those p-links. As an example, suppose we would want to reconstruct the HBN structure of Figure 1. The type structure would be known, and a database of instances of values for A  BI  BII and C would be available. The algorithm initially would search among all possible p-link configurations between A  B and C, find the optimal structure, and subsequently extend it by searching among possible configurations of p-links between BI and BII. Currently, in order to maximise P E - E   D  , we perform an exhaustive search among possible structures. An ordering on the nodes may be taken into account, although one is not necessary. Note that such a search would normally be infeasible if conducted on a standard Bayesian Network. The reduction of the search space, due to hierarchy, allows for a more detailed search. However, if the families in the type structure grow too large, exhaustive search will eventually be rendered infeasible. In such cases, the search space of possible sets of plinks could be traversed using any traditional search method, such as steepest line hill climbing (which is used by Cooper and Herskovits in their K2 algorithm), beam search, iterative deepening etc.

4 Inference in HBNs Hierarchical Bayesian Networks provide a means for probabilistic inference on the variables they contain.

The aim is, given a valuation for a subset of the variables (the evidence), to compute the probability of a second variable set (the query variables) having some specific values. One way of estimating the probability P Q E  for given valuations of sets of query and evidence variables Q and E is by sampling data from the Hierarchical Bayesian Network. Our approach is a straightforward extension of existing algorithms for standard Bayesian Networks, as described e.g. in [7, 6]. The HBN serves as a model that describes the joint probability of all the variables. In order to derive stochastically a value for one variable, all we need is the values of its higher-level parents. Since the graph resulting from higher-level relationships is always acyclic, there exist a total ordering of the variables such that every variable is preceded by its higher-level parents. We can then assign values to the variables in that order, according for each variable to the conditional probability given its higherlevel parents, which will have already been given a value. Obviously, the first variables instantiated will be the ones that have no higher-level parents, and this will be according to their prior probability distribution. If we generate a large number of such instantiations, the relative frequency of the cases where Q holds divided by the relative frequency of the cases where E holds will converge to P Q E  . One shortcoming of the above method is that when the likelihood of the evidence is small, convergence will be very slow. We address this problem using the method described as logic sampling in [7]: We associate each instance we generate to a weight, initially set to 1.When the instantiation procedure reaches an evidence node, instead of stochastically choosing a value, we deterministically assign to it the value it has as evidence, and multiply the weight for that run by the corresponding conditional probability of the node given the values of its parents (which will have already been instantiated). In order to compute the final relative frequency of the query being satisfied given the evidence, we take into account the final weight of each instance. A further improvement is to use an MCMC method, such as Gibbs sampling [5], that produces the same results with increased efficiency. We present here an algorithm for Gibbs sampling in HBNs. This is a straightforward adaptation of Gibbs sampling for standard Bayesian Networks. Here we denote as π X  the higher-level parents and as φ X  the higher-level children of a variable X. Algorithm 2 (HBN-Gibbs sampling) Given: a set V of variables; an instantiation Q of a set of query variables; an instantiation E of a set of evidence variables; and a sample size n.

Output: an approximation of P Q E  . To compute sample Q  E  n  : 1. instantiate variable nodes, in the order from higher-level parents to higher-level children; if a variable i xi is in the evidence set, then assign the evidence value vi , otherwise choose a value randomly according to the variable’s CPT 2. let e /

5.1 Representation

0

3. choose a cyclic ordering of the nodes V

be unable to describe exactly how this precision is achieved, what groups of muscles are activated at each moment and what the posture of the arm should be. The detailed control occurs unconsciously, and it is therefore useful to extract and model a player’s “tacit knowledge” by analysing music performance, for instance by Machine Learning methods as in this paper.

+

E

4. repeat n times, assigning to X the next variable in the ordering: 5. Sample the value of X from the distribution P X π X  ∏Y 0 φ 1 X 2 P Y π Y  given the values of the variables in V 6. if instance satisfies Q then let e / e , 1 7. return e 3 n

5 Modelling Arm Activity in Cello Playing Modelling human skill aims at increasing our understanding of how individuals learn to perform specific tasks. Furukawa et al. have proposed a Bayesian Network representation for modelling cello performance [2]. One of the specific problems they are addressing consists in measuring the right arm motion of a performer during the execution of a particular music extract. The motion is described using several attributes of two distinct kinds: attributes that describe the position of the arm structure and attributes that describe muscle activity captured by surface EMG. The former are a sequence of measurements, through the duration of the execution of the musical extract, of the angles formed on the wrist, the elbow and the shoulder. The latter are sequences of EMG derived signals from six different positions, namely the thenar muscle (located between the elbow and the wrist), the biceps and triceps brachii muscles, and three different signals from the deltoid muscle group referred to as anterior, middle and posterior deltoid signals. The music extract is executed by two players of different skill levels. The goal of the experiment is to construct a descriptive model for each of the two players. Differences in the respective models will reflect the professional player’s expert knowledge. The reason why such an analysis is useful is that human motor skills, acquired through intensive practice and repetition of finger exercises, are hard to be explained. A professional player is able to execute difficult exercises with great precision, but may

[2] argue that a potentially good descriptive model is a Bayesian Network where nodes represent joint motion and muscle activity, and edges follow the direction in which muscles affect the joints, from the shoulder to the wrist. Some joints are also influenced by muscles that are located higher on the arm. It is evident in their approach that a tree-like type structure is suitable for describing this domain: nodes that correspond to joints are composite, containing information both for angular velocity and acceleration; furthermore, similar dependencies are observed between joints and groups of muscles that are located closely in the arm, so it is meaningful to cluster those groups of muscles into composite nodes in an HBN. Figure 3(a) shows an HBN structure that is an adaptation of a standard Bayesian Network used by [2] to describe this prior knowledge on the domain. The type structure is ours, while the dependencies come from the Bayesian Network used by [2]. It is important to note that this model was provided by an expert, rather than learned from the data. In the next section we describe the HBNs that we learned from the data, using the approach described in Section 3. 5.2 Learning the Musculoskeletal HBN In order to apply our approach to this data, we calculated angular velocity and acceleration for each instance as the first and second derivatives, respectively, of the corresponding angle variable. Subsequently, both angular and EMG data were quantised to a small number of possible values. Low pass filters were used throughout the process in order to eliminate noise. The derived values for each time point were used as independent instances of data. Note that this is a significant simplification, and that information on temporal dependencies in the data is lost.1 We have conducted two series of experiments in order to learn HBNs that model the available data. In the first one we derived the probabilistic part for an HBN having the structure shown in Figure 3(a). Different CPTs were derived for the amateur and the 1 4 Some of the temporal dependency is actually captured by the use of derivatives.

deltoid

anterior middle posterior

deltoid

anterior middle posterior

deltoid

anterior middle posterior

shoulder

angle velocity acceleration

shoulder

angle velocity acceleration

shoulder

angle velocity acceleration

brachii

biceps triceps

brachii

biceps triceps

brachii

biceps triceps

elbow

angle velocity acceleration

arm elbow

arm

angle velocity acceleration

(a)

angle velocity acceleration

thenar

thenar

wrist

elbow

angle velocity acceleration

wrist

arm

thenar angle velocity acceleration

angle velocity acceleration

wrist

(b)

(c)

Figure 3: Musculoskeletal HBNs, proposed by expert (a), and derived from the data for the amateur (b) and the professional (c) player. professional player, applying the method that is described in Section 3. In order to eliminate zero values, we made use of Laplace estimate. In the second series of experiments, only the type structure was given, and the task was to learn both the p-links and the probabilistic part. For the structure learning we applied the method described in Section 3, both for the amateur and the professional data. For efficiency reasons, an ordering on the nodes has been taken into account on the first level of the type structure, in the direction from the shoulder to the wrist. We make no further assumptions about dependencies between the nodes. The structures derived are shown in Figure 3(b,c), and we see that the optimal structures for the available data deviates from the model supplied as prior knowledge in Figure 3(a). Notable differences exist between the structures derived for the amateur and the professional. For instance, it can be seen that for the professional shoulder and brachii are independent given deltoid, which does not hold for the amateur. There are also various differences at the lowest level of the type hierarchy. Finally, the respective CPTs were derived for each player using the same method as above. Learning in HBNs is more efficient than in standard BNs in terms of computational complexity. In the example we presented here, the reduction of the size of the search space allowed for an exhaustive search over all possible HBN structures. In a standard Bayesian Network with 15 variables, such a search would be impossible. To overcome the complexity problem, one could use a different search strategy, e.g. the K2 algorithm of [1], which is a greedy search that begins with an empty structure and adds one by one links that increase the likelihood of the structure, until no improvement can be achieved. The disadvantage of such an approach is that it is prone to be trapped in a local optimum. An-

Query Q1 Q2 Q3 Q4 Q5 Table 1: queries.

HBNP5 a 0.12 0.46 0.46 0.27 0.19

HBNP5 p 0.68 0.68 0.66 0.20 0.21

HBND 5 a 0.43 0.41 0.46 0.22 0.18

HBND 5 p 0.79 0.78 0.78 0.11 0.19

Probabilities associated with various

other strategy in standard Bayesian Networks would be to impose a tight limit on the number of parents allowed for each node. This also dramatically reduces the search space. The advantage of using HBNs is that the bias introduced is directly related to the particular domain. 5.3 Model Evaluation In the previous section we described how four different models were derived. For each of the players we built two HBNs, one over a fixed HBN structure that was supplied as prior knowledge (HBNP ) and one where the HBN structure was derived from the observations (HBND ). In order to evaluate those models further we applied the inference method described in Section 4 to compute the probabilities of a set of queries that reflect some intuitive rules for the domain. Q1 P wrist.velocity  6 0  : This is a measure of the extent that the player was using his wrist. Q2 P elbow.velocity  6 0 shoulder.angle  Closed  : Quantifies the movement of the elbow when the arm is close to the body. Q3 P elbow.velocity  6 0 shoulder.angle  Open  : Quantifies the movement of the elbow when the arm is away from the body.

Q4 P triceps=High elbow.velocity  0  : Measure of the triceps muscle activity when the elbow joint is opening. Q5 P deltoid.anterior=High elbow.velocity  0 : Measure of the anterior deltoid muscle activity when the elbow joint is opening.

Presently, we are working towards extending HBNs by introducing more aggregation operators for types, such as lists and sets. This will allow the application of our framework to structures of arbitrary form and length, such as web pages or DNA sequences.

Acknowledgements Results displayed in Table 1 show that a lot of meaningful information is captured regarding the professional’s skill. From Q1 we see that the experienced player is making significantly more use of his wrist. Q2 and Q3 suggest that the professional is also moving his elbow more, for different positions of the shoulder joint. These observations reflect a known fact among cello tutors, namely that experienced students make better use of their wrist and elbow, whereas beginners tend to rely a lot on their shoulder for moving the bow. Q4 and Q5 show that the professional is more likely to be using his deltoid muscle than the brachii triceps when opening the elbow (“down bowing” movement); for the amateur, the probabilities are slightly higher in favour of using the brachii triceps. Note that this difference is better reflected in the HBND models. We have shown that HBNs can be useful for modelling right arm movement in cello playing. Further evaluation will require queries specified by domain experts in order to analyse in depth the derived results. It will also prove useful to acquire more data for different musical extracts that will closely correspond to the application of specific skills of the player.

6 Conclusions and Further Work In this paper we have presented Hierarchical Bayesian Networks, a framework for inference and learning from structured data. We have defined inference in HBNs using a stochastic sampling algorithm, and a learning method for HBNs based on the Cooper and Herskovits structure likelihood measure. This framework was applied to the problem of modelling the movement of the right arm in cello playing. We have demonstrated how our approach exploits domain knowledge in order to increase learning efficiency, compared to standard Bayesian Networks. Evaluation of the derived results by domain experts is still needed, in order to compare the quality of the proposed models with previous ones and to judge the extent to which our method contributes to addressing the problem of skill modelling. Extensions of the present work may take the time factor into account, exploiting temporal dependencies in the data.

Part of this work was funded by the EPSRC project Efficient models for inference and learning. Thanks are due to Koichi Furukawa and Ken Ueno for providing the cello data. Thanks also to Mark Crean for designing and implementing an XML interface for HBNs.

References [1] Gregory F. Cooper and Edward Herskovits. A bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309–347, 1992. [2] K. Furukawa, S. Igarashi, K. Ueno, T. Ozaki, S. Morita, N. Tamagawa, T. Okuyama, and I. Kobayashi. Modeling human skill in bayesian network. Linkoping Electronic Articles in Computer and Information Science, 7(012), 2002. Submitted. Available at http://www.ida.liu.se/ext/epa/cis/2002/012/tcover.html. [3] Elias Gyftodimos and Peter A. Flach. Hierarchical bayesian networks: A probabilistic reasoning model for structured domains. In Edwin de Jong and Tim Oates, editors, Proceedings of the ICML-2002 Workshop on Development of Representations. University of New South Wales, 2002. [4] Nicolas Lachiche and Peter A. Flach. 1BC2: a true first-order Bayesian classifier. In S. Matwin and C. Sammut, editors, Proceedings of the 12th International Conference on Inductive Logic Programming, pages 133–148. Springer-Verlag, 2002. [5] Judea Pearl. Evidential reasoning using stochastic simulation of causal models. Artificial Intelligence, 32(2):245–257, 1987. [6] Judea Pearl. Probabilistic Reasoning in Intelligent Systems — Networks of Plausible inference. Morgan Kaufmann, 1988. [7] Stuart J. Russell and Peter Norvig. Artificial intelligence, a modern approach. Prentice Hall, 2nd edition, 2003.