Institute for Electronic Systems

1 downloads 0 Views 297KB Size Report
in a general Bayesian network in polynomial time in almost all cases. .... p5 p6 p907 p908p909 p910 p911 p912 p560 p562 p570p569 p561 p563 p106 p340 p181 p439 ...... Fusion, propagation, and structuring in belief networks,. Arti cialĀ ...
AALBORG UNIVERSITY

A Simple Method for Finding a Legal Con guration in Complex Bayesian Networks by Claus Skaanning Jensen R-96-2046

November 1996

Institute for Electronic Systems

Department of Computer Science Fredrik Bajers Vej 7E | DK 9220 Aalborg  | Denmark Tel.: +45 98 15 85 22 | Fax: +45 98 159889

d

A Simple Method for Finding a Legal Con guration in Complex Bayesian Networks Claus Skaanning Jensen Aalborg University [email protected] January 14, 1998 Abstract

This paper deals with an important problem with large and complex Bayesian networks. Exact inference in these networks is simply infeasible due to the huge storage requirements of exact methods. Markov chain Monte Carlo methods, however, are able to deal with these large networks but to do this they require an initial legal con guration to set o the sampler. So far nondeterministic methods like forward sampling have often been used for this even though the forward sampler may take an eternity to come up with a legal con guration. In this paper a novel algorithm will be presented that allows nding a legal con guration in a general Bayesian network in polynomial time in almost all cases. The algorithm will not be proven deterministic but empirical results will document that this holds in most cases. Also, the algorithm will be justi ed by its simplicity and ease of implementation. Keywords: Bayesian network, junction tree, pedigree analysis, Markov chain Monte Carlo, Gibbs sampling, starting con guration, conditioning

1

1 Introduction For inference in Bayesian networks many methods have been developed over recent years. Fast and exact methods (Pearl 1986, Lauritzen & Spiegelhalter 1988, Shenoy & Shafer 1990, Lauritzen 1992) handle only moderate size networks, as computation in Bayesian networks is NPhard. In general, the storage requirements of exact methods grow exponentially with the size and complexity of the Bayesian network, complexity having directly to do with the number of loops in the network. In Bayesian networks with a high number of loops, the cliques (notion from the junction-tree representation of Bayesian networks (Jensen 1988)) grow too large. For exact inference in the Bayesian network of Figure 1, the storage requirements are more than 10160 MB. This network represents a pedigree of a population of Greenland Eskimos (Gilberg, Gilberg, Gilberg & Holm 1978). Markov chain Monte Carlo (MCMC) methods, including the Metropolis-Hastings algorithm (Metropolis, Rosenbluth, Rosenbluth & Teller 1953, Hastings 1970) and the Gibbs sampler (Geman & Geman 1984) have provided a good alternative as they are able to handle problems of very large size. For recent reviews, see (Gelfand & Smith 1990, Thomas, Spiegelhalter & Gilks 1992, Gelman & Rubin 1992, Geyer 1992, Smith & Roberts 1993). MCMC methods simulate realizations from probability distributions whose densities are known up to a normalizing factor. If h(x) is a probability distribution on the sample space, the Gibbs sampler and Metropolis-Hastings algorithm simulate a Markov chain whose equilibrium distribution is proportional to h(x) using only evaluations of h(x). Even with very complicated problems, it is often possible to design a Markov chain with the wanted equilibrium distribution. If the Markov chain is irreducible, the averages of the realizations of the sampler converge to the equilibrium distribution as the sample size goes to in nity, but if the chain is mixing slowly it may require huge sample sizes to get precise results. Slow mixing occurs when problems are of high dimension, have very correlated variables, and when the 2

p87 p99 p98 p88 p89 p90 p100p68 p59 p91 p177p60 p92 p93 p104p182p36 p37 p105p17 p153p154p181p78 p18 p77 p49 p56 p42 p40 p39 p79 p80 p151p48 p152p85 p174p183p179p44 p199p28 p27 p51 p84 p70 p15 p19 p16 p186p41 p8

p20 p6

p82 p5

p52 p61 p62 p75 p2

p178p185p47 p83 p33 p66 p101p29 p65 p102p21 p30 p97 p14 p63 p64 p13 p175

p53p329 p81p358 p354 p355 p357 p356 p348 p326 p338 p325 p193 p365 p322 p323 p324 p302 p86p281 p72p103 p125 p23p298 p295 p292 p294 p31p291 p293 p282 p10p287 p318 p25p297 p95p271 p94p270 p171 p170 p321 p12p26p9 p313 p4 p236 p240 p38p74p303 p238 p180 p176 p307 p285 p309 p368 p195 p248 p249 p235 p234 p11p69p198 p7 p22p346 p314 p50p317 p337 p336 p43p280 p288 p278 p296 p343 p247 p1 p245 p299 p279 p244 p76p239 p344 p233 p148 p275 p274 p3 p241 p273 p272 p306 p243 p304 p242 p262 p259 p257 p260 p73p46p341 p342 p284 p106 p340 p230 p226 p227 p228 p34p327 p330

p184 p558 p557 p559 p359 p360 p361 p196 p173 p347 p45 p139 p587 p366 p194 p192 p136 p191 p277 p96 p141 p24 p155 p143 p537 p169 p334 p485 p335 p488 p487 p486 p452 p316 p300 p301 p142 p144 p369 p588 p290 p289 p339 p483 p481 p286 p308 p482 p372 p250 p145 p463 p462 p237 p550 p551 p552 p577 p453 p456 p459 p455 p454 p457 p460 p458 p187 p311 p310 p55 p67 p312 p478 p480 p477 p415 p426 p567 p566 p565 p568 p246 p423 p522 p424 p349 p520 p422 p521 p467 p469 p465 p470 p471 p472 p466 p352 p351 p350 p232 p448 p449 p450 p379 p381 p376 p427 p305 p497 p430 p433 p431 p525 p428 p429 p416 p419 p417 p494p495 p418 p493 p491 p492 p532 p535 p461 p527 p528 p529 p530 p531 p475 p253 p251 p254 p255 p252 p434 p437 p436 p261 p258 p395 p283 p396 p564 p578 p579 p580 p581 p435 p438 p560 p563 p562 p561 p229 p267 p265 p264 p269 p443 p447 p444 p231 p445p266 p263 p538 p539 p331 p328 p505 p541 p503 p504

p156 p140 p197 p35 p137 p190 p189 p948 p946 p367 p134 p135 p138 p200 p276 p363 p146 p938 p484 p345 p320 p451 p150 p364 p373 p168 p172 p202 p374 p585 p206 p913 p914 p464 p315 p353 p523 p332 p524 p333 p479 p553 p554 p533 p534 p573 p574 p575 p548 p547 p546 p939 p940 p545 p941 p924 p425 p926 p925 p421 p608 p654 p609 p468 p656 p607 p655 p895 p896 p377 p603 p605 p378 p771 p628 p768 p770 p629 p627 p630 p769 p893 p555 p882 p883 p432 p637 p636 p884 p885 p635 p638 p420 p723 p688 p687 p686 p685 p684 p496 p889 p891 p917 p918 p916 p915 p890 p473 p714 p556 p501 p500 p502 p383 p388 p386 p384 p385 p387 p256 p474 p526 p595 p881 p598 p597 p596 p880 p593 p392 p393 p390 p391 p536 p594 p549 p397 p394 p398 p399 p921 p887 p920 p919 p886 p439 p911 p910 p907 p912 p908 p909 p570 p569 p888 p582 p618 p619 p620 p892 p632 p268 p633 p602 p514 p446 p901 p515 p518 p513 p517 p599 p516 p902 p440 p904 p906 p441 p905 p903 p507 p929 p928 p542 p543 p508 p512 p625 p544 p511 p510 p936 p506 p509 p622 p937 p927 p606 p930 p934 p935 p621 p626 p540 p571 p931 p933 p645 p644 p623 p572 p932 p576

p161 p129 p212 p222 p218 p113 p210 p223 p114 p165 p319 p157 p160 p147 p362 p947 p1588 p205 p203 p204 p201 p162 p221 p208 p370 p130 p120 p110 p132 p989 p993 p992 p991 p986 p987 p118 p988 p159 p590 p211 p375 p584 p945 p586 p1122 p1124 p1123 p1004 p1006 p725 p724 p942 p900 p898 p923 p899 p664 p922 p944 p631 p666 p610 p897 p894 p613 p611 p616 p614 p617 p612 p604 p382 p380 p1014 p1015 p727 p1017 p726 p728 p1020 p1016 p1018 p836 p838 p982 p981 p984 p983 p980 p985 p943 p1182 p1013 p1008 p1007 p1009 p1010 p976 p977 p975 p979 p974 p978 p1011 p1021 p1012 p715 p682 p499 p1055 p1048 p1054 p1053 p679 p678 p1049 p1052 p680 p1051 p498 p738 p740 p739 p490 p489 p408 p407 p409 p401 p404 p403 p405 p406 p402 p400 p640 p643 p641 p639 p642 p1003 p729 p651 p648 p649 p1111 p673 p389 p674 p667 p672 p668 p671 p731 p735 p737 p734 p733 p677 p676 p719 p718 p720 p721 p717 p722 p683 p476 p1027 p1026 p1024 p1022 p1025 p1023 p970 p973 p971 p967 p966 p968 p754 p753 p757 p1042 p1043 p962 p960 p961 p752 p751 p601 p959 p955 p954 p958 p1070 p956 p442 p957 p1071 p600 p730 p950 p1038 p1035 p1036 p1039 p1037 p1040 p1041 p1031 p951 p949 p953 p952 p1083 p1585 p702 p701 p703 p760 p1077 p750 p1078 p704 p742 p743 p744 p749 p1075 p711 p710 p709 p659 p660 p706 p1101 p662 p712 p1097 p1100 p716 p1096 p1045 p657 p1074 p1099 p1073 p1098 p965 p1072 p1044 p661 p658 p624 p964 p1080 p1076 p1587 p759 p1079 p1586

p209 p214 p149 p207 p188 p112 p224 p213 p215 p158 p216 p111 p225 p116 p166 p117 p109 p123 p217 p220 p371 p1267 p990 p1271 p1266 p1269 p1270 p1265 p1272 p1268 p107 p1607 p1606 p1611 p1605 p1613 p1609 p1608 p1612 p1614 p1610 p122 p119 p127 p163 p131 p589 p591 p1558 p1509 p1508 p1510 p1520 p1521 p1121 p1125 p1005 p837 p834 p835 p833 p663 p995 p1332 p994 p1337 p1333 p1336 p1335 p1329 p1338 p996 p1334 p1331 p1330 p615 p871 p1458 p1191 p1194 p1193 p1197 p1046 p1047 p741 p1019 p844 p858 p713 p793 p1578 p1577 p1050 p681 p1495 p1056 p873 p788 p787 p790 p841 p789 p792 p791 p785 p786 p867 p874 p852 p851 p850 p853 p1057 p1059 p1058 p410 p412 p414p413 p411 p653 p839 p828 p830 p827 p829 p831 p826 p832 p652 p761 p762 p764 p763 p650 p669 p857 p866 p670 p845 p736 p732 p846 p802 p797 p801 p798 p803 p800 p799 p1181 p1180 p1179 p758 p1140 p689 p1028 p696 p697 p695 p1030 p691 p690 p699 p1139 p692 p698 p1029 p1243 p694 p700 p693 p1244 p1138 p1203 p1204 p1209 p969 p1202 p1205 p1207 p1198 p1208 p1206 p1199 p1201 p1200 p972 p796 p794 p795 p755 p848 p854 p1593 p634 p1144 p1142 p1145 p1143 p1141 p1146 p1147 p1419 p1032 p1218 p1034 p1033 p1421 p519 p1418 p748 p1420 p1417 p1459 p647 p1066 p1069 p1063 p1064 p1068 p1061 p1060 p646 p1065 p1062 p1067 p1094 p1095 p998 p997 p1148 p1002 p1149 p1001 p1000 p999 p1086 p1084 p1087 p1088 p1085 p1092 p777 p775 p1127 p772 p1375 p1455 p1372 p774 p1374 p1368 p1376 p1371 p1370 p1187 p782 p776 p1373 p1454 p1369 p1456 p784 p778 p779 p1190 p1126 p1457 p855 p1185 p780 p1186 p1184 p856 p783 p781 p1219 p1188 p705 p1189 p1104 p707 p1105 p1108 p1110 p1177 p1106 p1107 p1102 p1128 p1109 p1175 p1176 p963 p1103 p1135 p1130 p1174 p1171 p1137 p1132 p1136 p1133 p1134 p1183 p1232 p1230 p1089 p708 p1231 p1093 p1172 p1082 p1090 p1229 p1223 p1228 p1170 p1227 p1091 p767 p766 p745 p746 p747 p1081 p1436 p1129 p1225 p1590 p1168 p843 p1438 p1169 p1435 p1437 p1166 p1165 p1220 p765 p1163 p1162 p1226 p1434 p1222 p864 p1221 p1592 p1589 p1439 p1349 p1591 p1161 p1173 p1224

p128 p126 p583 p219 p124 p167 p121 p115 p108 p133 p164 p592 p860 p859 p665 p1195 p1512 p1490 p1192 p1477 p1579 p863 p810 p808p807 p806 p804 p805 p809 p675 p862 p847 p861 p1550 p1551 p1178 p1552 p1525 p1583 p1529 p1527p1526 p1528 p1473 p878 p823 p818 p819 p824 p849 pp1196 820 p822 p825 p821 p814 p812 p817 p813 p815 p811 p1447 p1445 p1444 p1446 p1476 p1448 p872 p865 p1489 p1283 p1566 p1275 p1279 p1276 p1277 p1274 p1280 p1282 p1285 p1284 pp1408 1281 p1273 p1278 p1517 p1537 p1519 p1516 p1538 p1518 p1472 p1470 p1471 p1540 p1539 p756 p1544 p1506 p1466 p1524p1523 p1522 p1505p1504 p1155 p1497 p1498 p1210 p1496 p1399 p1474 p1397 p1478 p1398 p1541 p1113 p1114 p1213 p1117 p1120 p1115 p1152 p1118 p1159 p1116 p1153 p1158 p1156 p1160 p1151 p1212 p1211 p1214 p1154 p1216p1217 p1119 p1215 p1157 p1567 p1328 p1440 p1584 p1441 p1345 p1348 p1344 p1340 p1342 p1347 p1341 p1339 p1343 p1346 p1468 p1462 p1286 p1294 p1291 p1293 p1289 pp1288 1292 p1287 p1290 p1415 p1403 p1326 p1411 p1414 p1327 p1410 p1402 p1412 p1325 p1311 p1416 p1150 p1308 p1413 p1400 p1401 p1324 p1409 p1467 p1304 p1309 p1307 p1310 p1306 p1312 p1305 p1260 p1257 p1261 p1262 p1264 p1256 p1263 p1246 p1255 p1258 p1250 p1252p1253 p1245 p1259 p1254 p1249 p1247 p1251 p1248 p1392 p1394 p1393 p1388 p1391 p1396 p1389 p1390 p1464 p1450 p1451 p1453 p1449 p879 p773 p840 p1463 p1395 p1452 p1405 p1404 p1407 p1406 p1233 p1427 p1431 p1433 p1429 p1299 p1428 p1323 p1432 p1297 p1296 p1430 p1319 p1300p1301 p1237 p1320 p1318 p1242 pp1321 1302 p1313 p1303 p1322 p1298 p1422 p1315 p1317 p1239 pp816 1316 p1314 p1295 p1442 p1426 p1352 p1350 p1241p1240 p1443 p1236 p1238 p1569 p1530 p1235 p1465 p870 p869 p1131 p1351 p1423 p1424 p877 p1425 p876 p1234 p1555 p1556 p1565 p868 pp1112 1564 p875 p1380 p1461 p1557 p1387 p1378 p1601 p1386 p1377 p1381 p1603 p1595 p1167 p1460 p842 p1385 p1600 p1164 p1599 p1594 p1597 p1382 pp1563 1596 p1598 p1383 p1379 p1384 p1514 p1602 p1488 p1604 p1486 p1487

p1560p1511p1484p1559p1572p1507p1492p1571p1494p1493p1562p1491p1570p1574p1535p1502p1500p1503p1576p1501p1549p1548p1469p1480p1533p1582p1534p1536p1475p1361p1362p1482p1483p1360p1481p1547p1546p1353p1355p1356p1357p1358p1359p1354p1545p1575p1513p1479p1531p1542p1485p1561p1499p1367p1365p1532p1366p1363p1364p1573p1515p1568p1580p1581p1554p1553p1543

Figure 1: A Bayesian network representation of a large pedigree. The pedigree contains 1614 individuals of the Greenland Eskimo population. The pedigree was pretty printed using methods described by Thomas (1988) and implemented by the author. Using this software, the process of arranging the individuals in an \optimal" way, took more than one week on a SPARCstation-20. 3

sampler updates one variable at a time, such as is the case for the Gibbs sampler. Another serious problem occurs when the Markov chain is reducible. This means that the sample space contains noncommunicating sets (or classes) that cannot be bridged with an MCMC method updating too few variables at a time. For these hard problems better MCMC samplers are needed. One recent promising algorithm is simulated tempering (Geyer 1991, Marinari & Parisi 1992, Geyer & Thompson 1995) which has adopted its name from simulated annealing (Kirkpatrick, Gelatt & Vecchi 1983). Simulated annealing is a method for optimization that starts with a \heated" version of the problem, then \cools" it down until a solution is found. Simulated tempering maintains a hierarchy of Markov chains ranging from the most \heated" chain that is easy to sample from, to the \coldest" chain that induces the distribution of interest and su ers from reducibility and/or multimodality. The simulated tempering algorithm samples from one of the chains in a period then proposes a move from the current chain to one of its adjacent chains. Moves are proposed such that the individual limit distributions are maintained. At the \coldest" chain we can collect information on the target distribution. At the \hottest" chain we are able to move around with much faster mixing, and allowing the sampler to jump between di erent modes. There are some problems with this approach, however. It seems that the computational overhead of running all the non-target chains may be large, and also it seems that the construction of such a hierarchy of \heated" chains may be a dicult task in practice. A more direct algorithm for approaching the problems of high-dimensional reducible/multimodal problems is blocking Gibbs sampling (Jensen, Kong & Kjrul 1995). This algorithm is based on the substantial amount of research in the area of exact inference in Bayesian networks and a particular exact scheme, the junction-tree propagation method (Lauritzen & Spiegelhalter 1988, Jensen, Lauritzen & Olesen 1990). The algorithm e ectively makes it possible to sample blocks of 4

variables simultaneously, with blocks usually containing more than 90% of the variables. Due to the joint updating of the majority of variables, this sampling scheme can have very fast mixing and avoid problems of reducibility and multimodality. The method has been used successfully with very hard applications in genetics, see (Jensen et al. 1995, Jensen & Kong 1996). There are still some issues that must be settled before blocking Gibbs becomes a truly general method for inference in very large and complex Bayesian networks. The most important is the selection of blocks. As of yet there is no general method for the construction of blocks that guarantee irreducibility. This problem corresponds to nding the noncommunicating classes of an MCMC sampler discussed by Lin, Thompson & Wijsman (1994) and more recently by Jensen & Sheehan (1996). A general problem for all of the above MCMC methods is that of nding a suitable starting con guration. In general, if the Bayesian network is too large and complex for exact methods, it may be dicult to nd a legal con guration. Very little research have been done on this problem so far. Until now, researchers have applied nondeterministic methods such as forward sampling or the annealing-like rejection sampler of Sheehan & Thomas (1993). If the Bayesian network contains many observed variables, the forward sampler may have to run for years before coming up with a legal con guration. A simple calculation can quickly illustrate this. Imagine, in Figure 1, that 100 bottom level variables have been observed and that each of these have three states. Now, the forward sam? 1 100 = 1:9 10-48 of nding a legal con gurapler has a probability of 3 tion in one iteration. Another small calculation shows that the forward sampler must run for more than 1030 years to have a realistic chance of nding a legal con guration. The rejection sampler of Sheehan & Thomas (1993) has been used successfully for nding legal con gurations in large complex Bayesian networks but the generality of the method is not known. 

5

These methods for nding a legal con guration are both nondeterministic. In the following, an almost deterministic algorithm will be presented. The algorithm will not be proven deterministic but empirical results will show that in practice this seems to be the case. Furthermore, the usefulness of the algorithm is high, as it is simple and very easily implemented. Finally, the complexity of the algorithm will be outlined, and a discussion will conclude with directions for further research.

2 The Algorithm The underlying idea of the algorithm is to take advantage of the idea of conditioning to break the Bayesian network into manageable pieces. Conditioning has been described in detail by Pearl (1988). The bene ts of conditioning can be explained as follows. Consider Figure 2. In Figure 2a, a simple Bayesian network can be seen. Variables c and h have been observed. With this information, the network in Figure 2a can be turned into the identical but di erent looking network in Figure 2c. The Bayesian network in Figure 2c is obtained in two steps. First, as c is observed, the loops between the parents of c and the o spring of c going through c can be broken, due to the simple fact that if c is instantiated to any value, the parents and o spring of c become independent and each of the o spring of c becomes independent of the other o spring. To illustrate this, a number of clones of c replace c, one connected to the parents of c (c1 ), and one for each of the o spring (c2 ; c3 ; c4 ). Second, as h is observed, the same operation can be carried through for h, replacing it with three clones. In (Pearl 1988), conditioning was used as a method of reducing complexity of a Bayesian network such that exact inference becomes possible. A variable, v, connecting many parts of the network and thus being responsible for much of the complexity is selected and then v is instantiated with each of its states, one at a time. With v instantiated, it can be replaced by a number of clones, thus e ectively breaking loops 6

a

a

b

111 000 c d 000 111 000 111 f

e

111 g 000 h 000 111 000 111

i

a

b

b

111 000 c1 000 111 000 111 000 111 111 000 c 2 000 c 3 111 c4 d e 000 111 000 111 111 000 000 111 000 111 000 111 000 111 g h f 000 111 000 111

j a

i

j

c

b

111 000 c1 000 111 000 111 000 111 111 000 c 2 000 c 3 111 c4 d e 000 111 000 111 111 000 000 111 000 111 000 111 000 111 g h1 f 000 111 000 111 000 111 000 111 000 111 000 111 000 000 111 h2 111 h3 000 111 000 111 00 11 000 111 000 111 000 111 00 11 000 111 00 11 000 111 00 00011 111 i

j

Figure 2: Conditioning upon variables c and h, turns the initial Bayesian network in a into the broken-up network in c. The dark circles represent variables that have been observed.

7

in the Bayesian network. For each of the states of v, exact inference is performed in the Bayesian network, obtaining marginal beliefs on all other variables with v instantiated. Finally, the true marginal beliefs can be found by summing over the states of v. In this algorithm, the idea of conditioning is applied in a quite di erent fashion and one of the basic assumptions of conditioning is violated. To split a variable into a number of clones to break loops is only legal if the variable is observed. In this algorithm, we will perform this splitting into clones even if the variable is only partly observed, i.e., contains soft evidence. Soft evidence is non-categorical evidence that places a belief on some (more than one) of the states of the variable as opposed to hard evidence that places a positive belief on only one state. Using soft evidence to break loops is essentially illegal as this type of evidence does not render the parents independent of the o spring, or one o spring independent of another. However, as we will see, the algorithm attempts to maintain their dependency by inserting the beliefs of one of the independent variables as soft evidence in another, etc. We operate with two Bayesian networks in the algorithm. BN0 is the original network that we want to nd a legal con guration for. BNe is an \exploded" version in which all variables are assumed observed and replaced by their clones. BNe thus consists of a large number of nuclear families1 as seen in Figure 3 which depicts the \exploded" version of the network in Figure 2a. It follows that exact inference in BNe is always possible for arbitrary BN0 . The feasible states of a variable v in BN0 are the states that are currently possible with the current evidence in the system (soft evidence on clones of v, and initial evidence on other clones in the network). The goal of the algorithm is to narrow down the feasible states of all variables suciently that no states illegal due to evidence are among them, then selecting one of these states and moving on to the next variable: 1

Here, o spring can have one or more parents.

8

1. BN0 is \exploded" into BNe 2. Evidence is inserted into all clones, i.e., if v in BN0 is observed, this evidence is inserted into all clones of v in BNe 3. For each variable, v, in BN0 , perform substeps 3a, 3b and 3c: (a) While the feasible states of any variable in BN0 can be narrowed further down, perform substeps 3(a)i and 3(a)ii: i. For each variable, w, in BN0 , perform substeps 3(a)iA and 3(a)iB: A. Multiply the marginal beliefs of the clones of w together B. Insert the normalized resulting belief table (the feasible states of w) as soft evidence in each of w's clones ii. Perform exact inference in BNe (for instance with the junction-tree propagation method), propagating the effects of the soft evidence of last step further away (b) Multiply the marginal beliefs for v's clones together to nd the feasible states of v (c) One of the feasible states of v is selected and inserted as hard evidence in each of v's clones This is not the complete algorithm, however. It may occur, despite the continuing narrowing down of feasible states that an illegal state (illegal given the initial evidence) is inserted at step 3c. This will later be detected at step 3(a)iA where an all zero belief table will show up for some variable. In this case, an extra backtracking step must be added to algorithm after step 3(a)iA: Backtrack: if an all zero belief table is found, then step back to the previously instantiated variable and try its next feasible state. If all feasible states for this variable have already been \tried", then backtrack to the previous, etc. 9

a1

a2

b

111 000 c1 000 111 000 111 000 111 111 000 c 2 000 c 3 111 c4 d e 000 111 000 111 111 000 000 111 000 111 000 111 000 111 g1 f h1 000 111 000 111 000 111 000 g2 h2 111 h3 000 111 000 111 0 1 000 111 000 111 0 1 0 1 0 1 i

j

Figure 3: The \exploded" version of the Bayesian network in Figure 2a. Thus we have to remember the order in which the variables are determined in step 3c, and in addition the feasible states that have been \tried" for each variable. Usually only one state is \tried" for each variable. However, if backtracking has occurred, more than one of v's states have been \tried". This information must be stored such that if backtracking at v ever occurs again, the algorithm will know which states have already been \tried".

3 Results In this section an empirical investigation of the algorithm of Section 2 will be performed. We will attempt to disclose the complexity of the algorithm, along with some statistics on the occurrence of backtracking. First, we have applied the algorithm to a number of linkage analysis problems in various sizes and of various types, some constructed and 10

some real-world problems. These linkage analysis problems are highly complex and dicult to handle for any inference algorithm as they often contain a large number of variables and loops. Furthermore, the conditional probability tables of the variables contain a large number of zeros, invalidating samplers updating one variable at a time, due to the large number of noncommunicating classes. The goal of linkage analysis is to establish linkage between a marker (a gene of known location) and a disease gene, and thus e ectively nding the location of the disease gene. This can be done by representing the entire pedigree along with variables for the marker and disease genes and their recombination in a Bayesian network, and then applying some MCMC method, e.g., a Gibbs sampler. The rst pedigree (Figure 4) is a highly inbred human pedigree affected with a rare heart-disease (the LQT syndrome) originating from professor Brian Suarez. The pedigree (henceforth termed Problem 1) contains 73 individuals, but the Bayesian network representation of the linkage problem contains 903 variables (see (Kong 1991, Jensen & Kong 1996) for descriptions of this representation), many of which are observed. Similarly, in the remaining examples, the Bayesian network that the algorithm is applied to, contains several times more variables than the number of individuals in the pedigree. Problem 2 is shown in Figure 1. In this problem it is attempted to establish linkage between the ABO and MN bloodgroups. The pedigree contains 1614 individuals, but the Bayesian network representation of the linkage analysis problem contains 19762 variables, many of which are observed. In both Problem 1 and 2, evidence is located all over the pedigree with a majority in the lower part. Problems 3 and 4 are reduced versions of Problem 2. In Problem 3, the top-most and bottom-most generations in Problem 2 have been removed. In Problem 4, the top-most and bottom-most generations of Problem 3 have been removed. Problem 5 is shown in Figure 5. It is a highly inbred pedigree with problems of reducibility, and it was constructed particularly to high11

61

60

64

5

6

4

72 7

12

17 (1,3)

19 (1,4)

27 | 28 2-(1,1)

29| 38 4-(1,3) 3-(1,4) 3-(3,4)

9 (1,2)

13 (3,4)

14

18 (1,2)

20 (1,4)

21 (1,3)

39

40

41

(1,1)

(2,4)

(1,1)

63

2

1

3

62

73

8

22 (1,4) 42| 53 4-(1,3) 4-(1,4) 4-(3,4)

69 66

68

25 (2,3)

11

15

23 (1,4)

26 (2,4)

16

24 (1,3) 55 (1,4)

54 (1,3)

56 (1,1)

67

71

70 10

65

57 | 59 (1,3) (1,4) (3,4)

Figure 4: The LQT pedigree. The special marriage node graph representation is used in the gure. Marriages and individuals are each depicted by nodes on the graph and the individual nodes are square for males, circular for females and diamond-shaped for unmarried o spring of unknown sex. Individuals are shaded if they have data. In this case, marker genotypes are in brackets, and darker diamonds denote a ected o spring. Some diamonds represent several individuals, e.g., 29{38 represent the 10 individuals 29; : : : ; 38, four of which have the genotype (1; 3), three of which have the genotype (1; 4) and three of which have the genotype (3; 4). 12

light problems with MCMC methods in pedigree analysis, see (Jensen & Sheehan 1996). Noncommunicating classes for MCMC methods updating one variable at a time are created on individuals 2 and 3, and on 11 and 12. The homozygous individuals 7 and 10 forces their respective o spring, 11 and 12, to carry an A allele each. And, the common o spring of 11 and 12, individual 13, forces them to carry a B and a C allele. Thus, 11 and 12 only have two legal con gurations, (g11 = AB; g12 = AC) and (g11 = AC; g12 = AB). Similarly, the noncommunicating con gurations of 2 and 3 can be found. It will be impossible for any single variable updating scheme to jump between these con gurations. It is assumed that the presence of the noncommunicating classes will cause the algorithm of Section 2 trouble, forcing it to backtrack more often.

1

2

3

5 AA 7

4

6 8

9

AA

AA 10 AA

11

12

13 BC

Figure 5: A small inbred pedigree. Again, the marriage node graph representation is used. 13

Problem 6 is shown in Figure 6. This pedigree is not inbred like the one in Problem 5, but still there are problems with noncommunicating classes. Like Problem 5, Problem 6 is a constructed pedigree. Problem 6 is also discussed in the context of determining the noncommunicating classes by Jensen & Sheehan (1996). 1

2

3

4

AA

5

AA

6

16

17

AD

9 AD

10

7

8

AD

12

11

AE

13 AD

14 AE

15 BC

Figure 6: A small non-inbred pedigree. Problem 7 is shown in Figure 7. This inbred pedigree is also constructed. A simple implementation of the algorithm described in the previous section has been implemented and run on the seven test problems. The results are shown in Table 1. We can make a number of observations from these results. First, in the real world examples there are no backtracking. This occurs only in the constructed Bayesian networks. This is not sucient to conclude 14

1

2

3

AA

6

11

AA

4

12

7

5

AA

15 AA

AA

14

8

9

AD

13 AA

16 AD

10 BC

Figure 7: Another small inbred pedigree. Table 1: Results of the algorithm for the seven linkage problems. Complexity is the amount of storage required to handle this problem exactly. Time is the time it took for the algorithm to nd a legal con guration. All results are averages over 10 runs. Problem 1 2 3 4 5 6 7

#Variables

903 19762 7829 3513 121 149 153

Complexity Time 2:2 108 2.83 m 7:0 10159 32.37 h 9:8 1055 3.96 h 2:9 1012 51.1 m 4:0 105 5.8 s 4:0 106 14.5 s 4:7 105 9.5 s 













15

Time/Variable 0.19 s 5.90 s 1.8 s 0.87 s 0.048 s 0.097 s 0.062 s

#Backtracks

0 0 0 0 8 2 3

that backtracking will be rare in real world networks but it does suggest it. Second, we can see that the most backtracking occurred in Problem 5 (Figure 5) which is actually the smallest network with only 121 variables. However, this pedigree has been constructed especially to cause problems for the algorithm. We will now analyze a situation in which backtracking occurs. Due to the structure of this pedigree, con gurations are forced upon individuals due to the information on their relatives. Here, individuals 2 and 3 must each carry one A-allele, enforced by their respective homozygous o spring, 5 and 6. Furthermore, the two alleles B and C, carried by individual 13 must originate from 2 and 3, each of which can carry only one of them. This forces 2 and 3 to be in one of two legal con gurations : (g2 = AB; g3 = AC) or (g2 = AC; g3 = AB), where gi denotes the genotype of i. Imagine that the algorithm visits individual 2 and sets this variable to genotype AB. Now, even with this information the algorithm is not able to narrow down 3's genotype to AC which is its only legal state. It is then possible for 3 to be set to, e.g., AA, causing problems for the subsequent narrowing down. The algorithm thus has to backtrack. If, in another run, individual 2 was rst set to genotype AA. Then, the subsequent narrowing down would not detect the mistake and would continue setting new variables until a situation arises where it has to backtrack. In this situation, the algorithm may have to backtrack through several variables to get into a legal con guration with g2 = AB or g2 = AC. The problem of backtracking is likely to occur whenever there are noncommunicating classes in the Bayesian network such as the above described. It is the author's belief that this occurs rarely in networks representing real-world problems and even when it occurs, it is possible to modify the algorithm such that it handles the situation better. This can be done by handling the variables in a speci c order instead of a random, such as outlined in the description of the algorithm. The variables could be ordered such that variables that are connected in the 16

Bayesian network are treated in succession. This would ensure that the algorithm would rarely have to backtrack far to get into a legal con guration. The above is of course based on the belief that the e ects of an observed variable usually do not travel very far from the variable itself. Of course, the beliefs of all variables in the network may be modi ed slightly based on the new observation, but states are usually only rendered impossible for variables in the local neighborhood. In all types of Bayesian networks where this property holds, the algorithm will be ecient.

4 Complexity of the Algorithm The complexity of the algorithm can be analyzed by looking at the description in Section 2. It is:

O(sn2 );

(1)

where n is the number of variables in the network, and s is the average number of states. The rst n is for the main loop which performs its steps for each variable in the network. s n is for the next loop which performs its internal steps until no feasible states for any variable can be narrowed further. As there are s n states in all, in the limit this loop can run s n times. Each time one narrowing down step has been performed, exact inference is performed in the \exploded" network. This operation has complexity O(n) as the network consists of nuclear families only. Thus, the complexity is O(n(sn + n)) which reduces to O(sn2 ). In Figure 8, the results from Table 1 are shown as a graph. The x-axis represents the number of variables and the y-axis represents the time measured in seconds. Both axes are represented in log-scale to determine whether 1 holds approximately. If the algorithm has polynomial complexity (O(a nb )), the points in Figure 8 will be oriented on a line. 







17

As we see, this is in fact the case. The slope of the tted line is approximately 2, meaning that for the examined cases, the complexity was very close to that of 1.

12 Points1 Points2 Fitted line

11 10 9

log(t)

8 7 6 5 4 3 2 1 4

5

6

7 log(n)

8

9

Figure 8: The points from Table 1 are plotted in the above gure. The x-axis represents the number of variables, and the y-axis represents the time measured in seconds. Both axes are in log-scale. The best t to the points is shown as a line. The diamond points (Points1) are taken from Table 1 and the cross points (Points2) are some extra results, included to get some more basis for the tting of the line.

18

10

5 Discussion We have shown with empirical results that the outlined algorithm is indeed quite ecient with a complexity of O(sn2 ). Furthermore, the algorithm is easily understandable and implemented. One of the weak points of the algorithm is that the amount of backtracking that it will have to do in a general setting is uncertain. Here, we can only provide vague indications that it will rarely have to backtrack much in real-world problems, as the noncommunicating classes that enforce backtracking, seem to occur rarely. Furthermore, the algorithm can easily be modi ed to counter this by handling the variables in a more intelligent order. The implementation used to document the complexity of the algorithm on the example Bayesian networks was very inecient, thus causing the large running times shown in Table 1. However, the algorithm could quite easily be optimized manyfold to provide legal con gurations for most networks fast. Indeed, Heath (1996) independently developed a similar algorithm in the setting of pedigrees and peeling which was implemented far more eciently than this. Another way to optimize the algorithm would be to handle larger parts of the network exactly. It is perfectly possible and not hard to implement an algorithm that only \explodes" a minimal amount of the Bayesian network, an amount just sucient to be able to handle the partly \exploded" network with an exact inference method. This will lead to even less backtracking as many variables would be sampled jointly. However, the algorithm would probably become slower in most \easy" networks where little backtracking occurs. As a future investigation, the author would like to investigate further the situations in which the algorithm has to backtrack. It would be interesting to identify this class of Bayesian networks. 19

6 Acknowledgements The author thanks the remaining members of the Decision Science Support Systems group at Aalborg University (http://www.cs.auc.dk/odin) for providing a stimulating environment. This research was supported by the Danish Research Councils through the PIFT programme.

References Gelfand, A. E. & Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities, Journal of American Statistical Association 85(410): 398{409. Gelman, A. & Rubin, D. B. (1992). Inference from iterative simulation using single and multiple sequences (with discussion), Statistical Science 7: 457{511. Geman, S. & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence 6(6): 721{741. Geyer, C. (1992). Practical Markov Chain Monte Carlo, Statistical Science 7, no. 4: 473{511. Geyer, C. J. (1991). Markov chain Monte Carlo maximum likelihood, Computing Science and Statistics, Proceedings of the 23rd Symposium on the Interface, pp. 156{163. Geyer, C. J. & Thompson, E. A. (1995). Annealing Markov chain Monte Carlo with applications to ancestral inference, Journal of American Statistical Association 90(431): 909{920. Gilberg, A., Gilberg, L., Gilberg, R. & Holm, M. (1978). Polar Eskimo Genealogy, Vol. 203 No. 4 of Meddelelser om Grnland, Nyt Nordisk Forlag, Arnold Busck, Kbenhavn. 20

Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications, Biometrika 57(1): 97{109. Heath, S. (1996). Generation of consistent genotypic con gurations for multi-allelic loci and complex pedigrees. Unpublished. Jensen, C. S. & Kong, A. (1996). Linkage analysis in large pedigrees with many loops { blocking gibbs, in M. Clementi & P. Forabosco (eds), European Mathematical Genetics Meeting, Cooperativa Libraria Editrice Universita di Padova, pp. 84{98. Jensen, C. S., Kong, A. & Kjrul , U. (1995). Blocking-Gibbs sampling in very large probabilistic expert systems, International Journal of Human-Computer Studies 42: 647{666. Jensen, C. S. & Sheehan, N. (1996). Limitations of an algorithm for determining the noncommunicating classes for MCMC applications in pedigree analysis. Unpublished - to be submitted to Biometrics. Jensen, F. V. (1988). Junction trees and decomposable hypergraphs, Research report, Judex Datasystemer A/S, Aalborg, Denmark. Jensen, F. V., Lauritzen, S. L. & Olesen, K. G. (1990). Bayesian updating in causal probabilistic networks by local computations, Computational Statistics Quarterly 4: 269{282. Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. (1983). Optimization by simulated annealing, Science 220: 671{680. Kong, A. (1991). Ecient methods for computing linkage likelihoods of recessive diseases in inbred pedigrees, Genetic Epidemiology 8: 81{103. Lauritzen, S. L. (1992). Propagation of probabilities, means and variances in mixed graphical association models, Journal of American Statistical Association 87(420): 1098{1108. 21

Lauritzen, S. L. & Spiegelhalter, D. J. (1988). Local computations with probabilities on graphical structures and their application to expert systems, Journal of the Royal Statistical Society, Series B 50(2): 157{224. Lin, S., Thompson, E. & Wijsman, E. (1994). Finding noncommunicating sets for Markov chain Monte Carlo estimations on pedigrees, American Journal of Human Genetics 54: 695{704. Marinari, E. & Parisi, G. (1992). Simulated tempering: A new Monte Carlo scheme, Europhysics Letters 19: 451{458. Metropolis, N., Rosenbluth, A., Rosenbluth, M. & Teller, A. (1953). Equations of state calculations by fast computing machines, Journal of Chemistry and Physics 21: 1087{1091. Pearl, J. (1986). Fusion, propagation, and structuring in belief networks, Arti cial Intelligence 29: 241{288. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Series in Representation and Reasoning, Morgan Kaufmann Publishers, Inc. Sheehan, N. & Thomas, A. (1993). On the irreducibility of a Markov chain de ned on a space of genotype con gurations by a sampling scheme, Biometrics 49: 163{175. Shenoy, P. P. & Shafer, G. R. (1990). Axioms for probability and belieffunction propagation, in R. D. Shachter, T. S. Levitt, L. N. Kanal & J. F. Lemmer (eds), Uncertainty in Arti cial Intelligence 4, Elsevier Science Publishers B. V. (North-Holland), Amsterdam, pp. 169{198. Smith, A. F. M. & Roberts, G. O. (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods, Journal of the Royal Statistical Society, Series B 55(1): 5{23. 22

Thomas, A. (1988). Drawing pedigrees, IMA Journal of Mathematics Applied in Medicine & Biology 5: 201{213. Thomas, A., Spiegelhalter, D. J. & Gilks, W. R. (1992). BUGS: A program to perform Bayesian inference using Gibbs sampling, in J. M. Bernardo, J. O. Berger, A. P. Dawid & A. F. M. Smith (eds), Bayesian Statistics 4, Clarendon Press, Oxford, UK, pp. 837{842.

23