Anabaptist genealogy database - Wiley Online Library

6 downloads 13874 Views 97KB Size Report
freely available software package, PedHunter, to answer genetically relevant ... reason, the project was renamed Anabaptist Genealogy Database (AGDB).
American Journal of Medical Genetics Part C (Semin. Med. Genet.) 121C:32 – 37 (2003)

A R T I C L E

Anabaptist Genealogy Database RICHA AGARWALA, LESLIE G. BIESECKER,

AND

¨ FFER* ALEJANDRO A. SCHA

In late 1996 we set out to build a computer-searchable genealogy of the Old Order Amish of Lancaster County, Pennsylvania, for use by geneticists. The goals of the project included: 1) using the genealogy to expedite the mapping of genes mutated in three rare recessive disorders under study at the National Institutes of Health (NIH); 2) building a freely available software package, PedHunter, to answer genetically relevant queries on our database and other similar databases; and 3) providing genealogy assistance to researchers outside NIH. All of these scientific goals had to be accomplished while maintaining the confidentiality of the persons in the database and the confidentiality of preliminary research results. We expanded the project to include complementary data sources that contained many individuals who were Anabaptist, but not Amish, and many individuals who never lived in Lancaster County. For this reason, the project was renamed Anabaptist Genealogy Database (AGDB). All of the initial goals of the project have been accomplished, and we recently marked the 5-year anniversary of answering the first of over 100 queries by researchers outside NIH. Thus, it is an opportune time to review the construction of AGDB, summarize its usage to date, and speculate on future projects it might stimulate and facilitate. Published 2003 Wiley-Liss, Inc.{ KEY WORDS: Amish; Mennonite; Anabaptist; consanguinity; inbreeding; genealogy; Steiner trees

INTRODUCTION Amish and Mennonite communities are fascinated with their ancestry, which is one of many factors that make them attractive study populations for medical geneticists [McKusick, 1978a]. There are thousands of Anabaptist genealogy books, several libraries that collect them, and social groups, such as the Lancaster Mennonite Historical Society, which meet regularly to discuss them. Genealogy books are a valuable resource in constructing large pedigrees, useful for

genetic linkage analysis and disease gene hunting [Angius et al., 2001; Ewald et al., 2002]. Genealogy books tend to be more reliable sources of relationship data than interviews with family members [Zlotogora et al., 1998]. Leafing through genealogy books can be enjoyable, but medical geneticists can spend their time more productively in the clinic and the laboratory. Therefore, we set out to construct a digital genealogy database and query software that would solve pedigree-related problems automatically and systematically.

Dr. Richa Agarwala received her Ph.D. in Computer Science from lowa State University in 1994. She was a Postdoctoral Fellow for ‘‘Special Year(s) in Mathematical Support for Molecular Biology’’ at Center for Discrete Mathematics and Theoretical Computer Science (DIMACS), Rutgers University from 1994 to 1996. She is currently a Staff Scientist at National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH). Her interest is to research and develop algorithmic tools for better understanding of the human genome. Dr. Leslie Biesecker received his M.D. from the University of Illinois. He received pediatrics training at the University of Wisconsin and Medical and Molecular Genetics training at the University of Michigan. He is a senior investigator at the National Human Genome Research Institute at the NIH in Bethesda, MD. He directs a clinical and laboratory research program in the molecular genetics of birth defects. Alejandro Scha¨ffer was born July 1, 1963 in Montevideo, Uruguay. He received his Ph.D. in Computer Science from Stanford University in 1988, focusing on theoretical computer science. In 1992, he switched his research focus to software for genetics. Dr. Scha¨ffer is best known for leading the development of the genetic linkage analysis software package FASTLINK and for implementing the PSI-BLAST module of the sequence analysis software package BLAST. Since 1998, he has been a Staff Scientist at the National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH). *Correspondence to: Alejandro A. Scha¨ffer, DHHS/NIH/NLM/NCBI, Building 38A, Room 8N805, 8600 Rockville Pike, Bethesda, MD 20894. E-mail: [email protected] DOI 10.1002/ajmg.c.20004

Published 2003 Wiley-Liss, Inc. { This article is a U.S. Government work and, as such, is in the public domain in the United States of America.

We set out to construct a digital genealogy database and query software that would solve pedigree-related problems automatically and systematically. We have heard anecdotally that medical geneticists often consider a pedigree construction successful if all affected individuals are connected to a common ancestor or a common ancestor couple, who serve as founder(s), which we call the connectedness criterion. For disorders inherited in an autosomal recessive pattern, one seeks to connect each obligate carrier to a founder and to explain how two copies of the mutated gene may have been passed to each affected individual. However, it was rarely determined that 1) the putative founder was the most recent common ancestor, 2) all relevant parentchild links were included, or 3) the connected pedigree reflected the most likely paths of transmission of the diseaseassociated allele(s) by any criterion. Checking these conditions by eye is

ARTICLE

tedious and difficult, especially since the genealogy sources may contain errors or inconsistencies that are not apparent by checking a few entries at a time [Ewald et al., 2002]. Even by the simple connectedness criterion, medical geneticists’ genealogy tracing and pedigree construction efforts often fail. This can be seen by the numerous papers that announce the discovery of a founder effect based on the discovery of the identical causative mutation in supposedly unrelated individuals, only after the disease gene has been identified by positional cloning. For example, in the case of sitosterolemia (OMIM 210250), Lee et al. [2001] extended studies of Berge et al. [2000] to multiple communities and did a ‘‘concerted genealogical search’’ to connect two obligate carrier Amish couples who carried a shared haplotype across the sitosterolemia locus and were not known to be related initially. We suggest that for many populations, such as Anabaptists, it is more efficient to find putative founders at the start of the disease gene hunt and use more genealogy information to make larger pedigrees, thereby increasing the power to detect linkage and reducing the number of individuals who need to be tested for mutations when a candidate gene is identified. We initially set out to construct an Amish Genealogy Database (AGDB) of the Old Order Amish of Lancaster County, Pennsylvania, with the goal of constructing large pedigrees to use in mapping the causative genes for three rare disorders inherited in an autosomal recessive pattern under investigation by researchers at the National Institutes of Health (NIH). These disorders included McKusick-Kaufman syndrome, Amish nemaline myopathy, and Amish microcephaly. We subsequently expanded the database and included numerous nonOld Order Amish and Mennonites from North America. Therefore, we renamed the database Anabaptist Genealogy Database, retaining the acronym AGDB. We hoped that our database would be useful to researchers outside NIH studying the Amish and Mennonites, and that our query software, PedHunter, would be useful for searching other genealogies.

AMERICAN JOURNAL OF MEDICAL GENETICS (SEMIN. MED. GENET.)

All of these goals have been met. We used pedigrees constructed from AGDB toward identifying the causative genes for McKusick-Kaufman syndrome (OMIM 236700) [Stone et al., 1998], Amish nemaline myopathy (OMIM 605355) [Johnston et al., 2000], and most recently Amish microcephaly (OMIM 607196) [Rosenberg et al., 2002]. We have answered over 100 queries from other researchers. The AGDB genealogy information has been

We used pedigrees constructed from AGDB toward identifying the causative genes for McKusick-Kaufman syndrome, Amish nemaline myopathy, and most recently Amish microcephaly. used in studies on blood pressure [Hsueh et al., 2000a], diabetes [Hsueh et al., 2000b], obesity [Hsueh et al., 2001; Steinle et al., 2002], aging [Mitchell et al., 2001], family size [Pollin et al., 2001], and osteogenesis imperfecta [McBride et al., 2002]. We used the database ourselves for a large-scale study of the effects of inbreeding [Agarwala et al., 2001], reexamining and extending the pioneering studies of Khoury et al. [1987a, 1987b, 1987c, 1987d] based on an earlier, smaller Amish genealogy [Egeland, 1972]. Our query software has been used by a few other groups and in one published study that we know of [Greenwood et al., 2001]. In this review article we summarize what is in AGDB, how AGDB can be accessed, and how we have used PedHunter to query AGDB. We conclude with some speculations about benefits of our genealogy project and how this resource might be used in the future.

Amish and Mennonite Genealogy Sources We merged three genealogy sources to construct AGDB. In the process we

33

corrected numerous errors and inconsistencies. Our gene mapping research efforts have been focused on the Old Order Amish of southeastern Pennsylvania, so we began with the Fisher Family History (henceforth denoted FFH) [Beiler, 1988], which is the most complete book for the contemporary Amish in Lancaster, PA. Thanks to the editor of that book, we have an updated version of the published book that now has 55,636 individuals organized by family units, with an informal syntax. While using the first version of AGDB (which contained only FFH data), it became apparent that nearly all individuals in the book were descendants of an Amish pioneer immigrant named Christian Fisher. Therefore, pedigree construction efforts would be highly biased toward making him and his two spouses the predicted mutationcarrying founders for any large pedigree of a disorder inherited in an autosomal recessive pattern. We therefore chose to merge into the database a second source book entitled Amish and Amish Mennonite Genealogies (AAMG) [Gingerich and Kreider, 1986], which organized families under 226 categories in 848 pages and included 30,853 individuals. The AAMG book focused on individuals born before 1870. As the title implies, AAMG contains information about many Anabaptist individuals who were not Old Order Amish; also, many of the individuals never lived in eastern Pennsylvania. Because of the date limitation, AAMG has limited use for studying present-day individuals, but the combination of FFH and AAMG works well. For example, we were able to find potential founders for a pedigree of 33 sibships with Amish nemaline myopathy [Johnston et al., 2000], even though Christian Fisher (or anybody else in FFH) appears not to be an ancestor of all 66 obligate carrier parents. The engineering challenges of digitizing the AAMG data and merging it with the FFH data were described previously in Agarwala et al. [1999]. Through Dr. Judith Westman (Ohio State University), we became aware of a much larger computerized Anabaptist genealogy maintained by Mr. James

34

AMERICAN JOURNAL OF MEDICAL GENETICS (SEMIN. MED. GENET.)

Hostetler (Richmond, VA), who kindly provided us with his file. Mr. Hostetler’s file is in GEDCOM format, which is widely used by genealogists [GEDCOM, 1997], making it easier to parse than the first two sources. The merger of Mr. Hostetler’s data with the earlier sources to make AGDB 3.0 was summarized previously [Agarwala et al., 2001]. Mr. Hostetler continues to find and add more individuals to his Anabaptist genealogy, and we hope to update AGDB in the future with his information. Access to AGDB Creation and maintenance of AGDB is considered human subjects research and was done under a research protocol approved by an Institutional Review Board (IRB) at NIH. In response to concerns expressed by the board and Amish bishops, AGDB information is not accessible from any web page on the World Wide Web. Queries (see description of query software below) are sent to R.A. or A.A.S. by e-mail or in person, and output files representing the answers are returned by e-mail. All queries are kept confidential. In most cases, we do not know what phenotype was being studied, nor do we know who among the submitted individuals is affected by that phenotype. To the IRB, we report only the number of queries we received, not the diseases that were studied. Any researcher who has an IRB-approved protocol or an IRB exemption for a study of Amish or Mennonites may apply to obtain access to AGDB, and all such requests have been approved. AGDB Contents AGDB version 3.0 currently contains information on 295,122 Amish and Mennonite individuals and 68,216 marriages. The number of individuals is slightly higher than reported in Agarwala et al. [2001] because we subsequently added those individuals to the database for whom we had only one known parent. A few duplicate entries were also discovered. The data are organized in four tables: person table, relationship table, ID table, and genera-

tion table. These tables together contain the following information: 1. A consistent set of family relationships among individuals 2. The name, gender, address, and birth and death dates for most individuals (names may be missing and may never have been given if a child was stillborn or died as a neonate) 3. A marriage date for most couples Information about adoptions, occupation, and religious designation is present but is incomplete. Unlike some project-specific medical genealogy databases (e.g., PhenoDB [Cheung et al., 1996]), AGDB contains no information on disease phenotypes or genotypes. Information on a few traits of interest to geneticists such as twinning [Agarwala et al., 2001], lifespan [Mitchell et al., 2001], and family size [Pollin et al., 2001] can be extracted by analyzing birth dates, death dates, and family relationships. The lack of disease information is essential to protect patient confidentiality and is desirable to allow multiple, competing researchers to use AGDB to study the same disease.

PEDHUNTER QUERY SOFTWARE AND SAMPLE APPLICATIONS We developed query software called PedHunter to efficiently extract information from AGDB.

We developed query software called PedHunter to efficiently extract information from AGDB. PedHunter 1.0 was described in Agarwala et al. [1998]; the current version (v 1.2) includes additional queries, a few of which are mentioned below. PedHunter is freely available and can be downloaded by following links from http://www.ncbi.nlm.nih.gov/ CBBresearch/Schaffer/pedhunter. html. PedHunter can be, and has been, used to

ARTICLE

query genealogy databases other than AGDB [Greenwood et al., 2001]. It provides several useful features not available in earlier systems such as PEDSYS [Dyke, 1992]. We made two variants of PedHunter depending on whether the user wished to store the data in a traditional relational database using the query language SQL [Date, 1990] or instead with the tables in a structured ASCII format, but no use of SQL. Within the main PedHunter programs and AGDB, individuals are numbered 1, 2, 3, . . . For AGDB usage, we have developed utility programs that convert from the identifier formats used in the three sources into the AGDB identifiers and back. PedHunter 1.2 supports four categories of queries as basic operations: 1. Queries testing a relationship. For example, Is X an ancestor of Y? Is X a first cousin of Y? 2. Queries to find all individuals satisfying a relationship. For example, find all aunts and uncles of X; find all descendants of Y; find all founders. 3. Requests to print information. For example, print name, birth date, and death date for every identifier in a file. 4. Complex queries. One example is to find the inbreeding coefficient (with respect to the genealogy) of every individual in a file. Another example is, given a set of individuals, find a maximal subset that has a common ancestor. A third example is to find all the connected sets in the genealogy when allowing parent-child and marriage links. Queries of the first two types are essential building blocks to answer the complex queries, but can also be used alone to check previous information or construct lists of individuals who might be of interest in an ongoing study. For example, we used PedHunter to check and correct information in a previously published pedigree for McKusick-Kaufman syndrome [McKusick, 1978b], and McBride et al. [2002] used information on ancestors and descendants to identify likely carriers of a known mutation. Queries regarding demographic infor-

ARTICLE

mation have been essential in large-scale studies of family relationships [Agarwala et al., 2001] and aging [Mitchell et al., 2001]. The query to construct all connected sets in the genealogy was useful for discovering duplicates in the sources that were not obvious due to major discrepancies in the two versions of the individual’s data; one version of the duplicate individual was stranded in a small connected set. AGDB 3.0 contains one large connected set of 294,895 individuals and 52 small sets ranging in size from 1 to 13 individuals. Two complex queries in PedHunter, ASP (all shortest paths) and minimal, were especially pertinent to the problem of selecting a pedigree for linkage analysis, given a set, C, of affected individuals. The ASP query finds the set L of lowest common ancestors of C, and then for each ancestor A in L, it finds all minimum-length parent-child paths (i.e., fewest generations) from A to each individual in C. The output is presented as a set of LINKAGE-format pedigree files, one for each common ancestor. The common ancestors and shortest paths can be computed quickly using algorithms well known in computer science [Dijkstra, 1959; Even, 1979]. The minimal query takes as input an ASP pedigree and a required set of individuals, R, and produces as output a pedigree with a minimal-size set of parent-child links, such that there is a path of inheritance from the founder to each required individual. In the canonical application, one is studying a disease inherited in an autosomal recessive pattern and R is the set of obligate carrier parents of any affected children. In this application, a minimal pedigree is one that has the fewest possible number of meioses, while still providing an explanation of how each affected child might be homozygous for the causative mutation. The problem of constructing minimal pedigrees is one type of problem in a well-studied class called Steiner tree problems [Hwang and Richards, 1982] or, more generally, Steiner arborescence problems [Zelikovsky, 1997; Cong et al., 1998]. Most Steiner tree problems are intractable in a formal sense for large instances, and the special case of

AMERICAN JOURNAL OF MEDICAL GENETICS (SEMIN. MED. GENET.)

minimal pedigree construction turns out to be intractable also [Provan, 1983]. The heuristic method we implemented to solve smaller instances of minimal pedigree construction was described previously in Agarwala et al. [1998, appendix]. For the larger instances that arose in the nemaline myopathy and microcephaly projects, we used Steiner tree software that is much more sophisticated but not freely available [Koch and Martin, 1998]. The pedigree outputs from the two pedigree construction queries can be easily combined with phenotype and genotype data for genetic linkage analysis, as we have done in several studies. The pedigrees can also be drawn with programs such as CYRILLIC [Chapman, 1990], PEDDRAW [Curtis, 1990], or PedigreeDraw [Mamelka et al., 1993]. The pedigrees in Stone et al. [1998], Johnston et al. [2000], and Rosenberg et al. [2002] were all drawn initially with PedigreeDraw, but manual editing was required to make the layout publication ready. We illustrate the functionality of PedHunter and information in AGDB 3.0 by reconsidering the genealogical search done manually in Lee et al. [2001] to link two couples, each of whom had a child with the disease sitosterolemia, which is inherited in an autosomal recessive pattern. They used the AAMG hard copy book, and they reported finding five possible pedigrees headed by founders from different surnames: Hertzler, Schmucker, Blank, Mast, and Yoder. The names, partial birth dates, and partial AAMG identifiers for the four obligate carriers analyzed by Lee et al. [2001] were kindly given to us by Dr. Alan Shuldiner (University of Maryland). Using the utility program that converts name and identifier information, we found that AGDB 3.0 has three of the four individuals. We explored information on a website (http:// www.omii.org/omii.htm) that has more recent Hostetler data than what we used while making AGDB 3.0 (but cannot be easily used for pedigree construction) and found that the parents of the fourth individual are present in AGDB 3.0 as well. Using ASP, we found six couples

35

with both spouses as possible minimal ancestors connecting all four individuals. We found four of the five founders found by Lee et al. [2001], but did not find a Yoder as a founder because one of his descendants, a spouse of Hertzler, was found as a more recent ancestor common to all four carriers. Two additional founder couples we found were Siever (AAMG id SV) and Beiler (AAMG id BY3). Minimal pedigrees for six ASP pedigrees have 40 (Hertzler), 37 (Schmucker), 42 (Blank), 40 (Mast), 44 (Siever), and 40 (Beiler) individuals. Our analysis, including tracking parents for the fourth individual, took us less than 1 day. We show the Beiler pedigree in Figure 1 because 1) it was not found at all by Lee et al. [2001]; 2) it has the unusual property, which we have not seen before, where ASP and minimal pedigrees are identical; and 3) it illustrates that in order to exhaustively find all minimal common ancestors, one should continue to trace back even those paths that do not find a common ancestor for several generations, which is hard to do without software. If we were to choose one of the six pedigrees for linkage analysis, we would follow our criterion of picking the one with the fewest individuals and choose the Schmucker pedigree.

DISCUSSION We created AGDB and PedHunter to aid medical genetics researchers in mapping disease-causing genes. Specifically, we sought to enable clinicians studying Anabaptist populations to have more time in the clinic and the lab, by eliminating the need to peruse genealogy books and to connect hypothetical pedigrees. AGDB has been used in early steps of mapping three disease genes and in studying some complex traits. One of us (L.G.B.) estimated that the work to construct the McKusick-Kaufman syndrome pedigrees in Stone et al. [1998] would have taken 50 hr of manual checking of one source only, with limited confidence that all relevant parent-child links had been explored. We can extrapolate to 142 AGDB queries to date and estimate that 3.5 person-years of book searching have been saved.

36

AMERICAN JOURNAL OF MEDICAL GENETICS (SEMIN. MED. GENET.)

ARTICLE

Figure 1. This pedigree was derived from a semiautomated search of the AGDB using PedHunter. It connected two nuclear families affected with sitosterolemia (originally published in Lee et al. [2001]) to a shared ancestral couple six generations back in the genealogy. This pedigree was not found by Lee et al. [2001]; see text for discussion.

Ideally, finding a disease gene and the causative mutation(s) in isolated populations is intended to help the study population, in addition to the obvious benefit to scientific and medical knowledge. Side projects from AGDB gave us three opportunities to aid members of the Lancaster community in their avid pursuit of Anabaptist genealogy information. First, the data sources contained numerous errors and inconsistencies, even ignoring intersource discrepancies. If one knows what types of problems to look for, they can be systematically detected by using computer programs. Thus, we were able to provide the keepers of FFH and AAMG with lists of items to check and correct. Second, one of us (R.A.) implemented a set of computer programs that a member of the Anabaptist community has used to add new entries to the computerized version of FFH, and these programs can also produce a new edition using a format similar to that of Beiler [1988]

in Microsoft Word. Third, the Lancaster Mennonite Historical Society (current holders of the copyright on AAMG [Gingerich and Kreider, 1986]) sought and received massive data files from us, toward the goal of producing a new edition. This was desirable because the source data for the original was on disks for which no reading device could be found, and because we have corrected numerous typographical errors and inconsistencies. The large size of the AGDB affords some opportunities for geneticists to study common traits, for social scientists to study demographic trends, and for computer scientists to study practical instances of some algorithmic problems. We give some examples that we have touched on, but hardly put to rest, in our research. Geneticists might be interested in AGDB as a tool to study factors controlling twinning [Agarwala et al., 2001], aging [Mitchell et al., 2001], family size

[Agarwala et al., 2001; Pollin et al., 2001], etc. One could extract a twin registry from AGDB, with the limitation that one would not know, without fieldwork, if same-gender twins were monozygotic. Social scientists might be interested in AGDB to test hypotheses about migration within North America, immigration from Europe, changes in surname spelling, isonymy, etc. Computer scientists might wish to use AGDB as a source of Steiner tree problems, pedigree drawing problems, or subgraph isomorphism problems (e.g., testing whether the individuals and parentchild relationships in two pedigrees correspond one to one). The problems we tackled in merging the three discrepant and partially overlapping data sources are quite analogous to, and special cases of, problems in synthesizing data from different pages on the World Wide Web. Another type of problem we have seen, at the interface of genetics and computer science, is how to use

ARTICLE

the genealogy information to optimize sample collection and fieldwork. A basic example arose in recent work on locating and testing Amish individuals who might carry a mutation of the COL1A2 gene that sometimes has a subclinical phenotype [McBride et al., 2002]. In summary, the construction and usage of AGDB has assisted medical geneticists, aided the Lancaster community, and provided us some challenging, practical scientific research problems. We are committed to maintaining AGDB and PedHunter for the foreseeable future.

REFERENCES Agarwala R, Biesecker LG, Hopkins KA, Francomano CA, Scha¨ffer AA. 1998. Software for constructing and verifying pedigrees within large genealogies and an application to the Old Order Amish of Lancaster County. Genome Res 8:211–221. Agarwala R, Biesecker LG, Tomlin JF, Scha¨ffer AA. 1999. Towards a complete North American Anabaptist genealogy: a systematic approach to merging partially overlapping genealogy resources. Am J Med Genet 86:156–161. Agarwala R, Scha¨ffer AA, Tomlin JF. 2001. Towards a complete North American Anabaptist genealogy II: analysis of inbreeding. Hum Biol 73:533–545. Angius A, Melis PM, Morelli L, Petretto E, Casu G, Maestrale GB, Fraumene C, Bebbere D, Forabosco P, Pirastu M. 2001. Archival, demographic and genetic studies define a Sardinian sub-isolate as a suitable model for mapping complex traits. Hum Genet 109:198–209. Beiler K. 1988. Fisher family history. Lancaster, PA: Eby’s Quality Printing. 568 p. Berge KE, Tian H, Graf GA, Yu L, Grishin NV, Schultz J, Kwiterovich P, Shan B, Barnes R, Hobbs HH. 2000. Accumulation of dietary cholesterol in sitosterolemia caused by mutations in adjacent ABC transporters. Science 290:1771–1775. Chapman CJ. 1990. A visual interface to computer programs for linkage analysis. Am J Med Genet 36:155–160. Cheung K-H, Nadkarni P, Silverstein S, Kidd JR, Pakstis AJ, Miller P, Kidd KK. 1996. PhenoDB: an integrated client/server database for linkage and population genetics. Comp Biomed Res 29:327–337. Cong J, Kahng AB, Leung KS. 1998. Efficient algorithms for the minimum shortest path Steiner arborescence problem with applications to VLSI physical design. IEEE Trans Comput Aided Des Integr Circ Syst 17: 24–39. Curtis D. 1990. A program to draw pedigrees using LINKAGE or LINKSYS data files. Ann Hum Genet 54:365–367. Date CJ. 1990. An introduction to database systems, vol. 1. New York: Addison-Wesley. 800 p.

AMERICAN JOURNAL OF MEDICAL GENETICS (SEMIN. MED. GENET.)

Dijkstra EW. 1959. A note on two problems in connexion with graphs. Numerische Mathematik 1:269–271. Dyke B. 1992. PEDSYS: a pedigree data management system. Population Genetics Laboratory, San Antonio, TX: Southwest Foundation for Biomedical Research. Egeland JA. 1972. Descendants of Christian Fisher and other Amish-Mennonite pioneer families. Baltimore: Moore Clinic. 605 p. Even S. 1979. Graph algorithms. Rockville, MD: Computer Science Press. 249 p. Ewald H, Flint TJ, Jorgensen TH, Wang AG, Jensen P, Vang M, Mors O, Kruse TA. 2002. Search for a shared segment on chromosome 10q26 in patients with bipolar affective disorder of schizophrenia from the Faroe Islands. Am J Med Genet 114:196–204. GEDCOM. 1997. GEDCOM Coordinator–3T, Family History Department, 50 East North Temple, Salt Lake City, UT 84150. E-mail: [email protected]. Gingerich HF, Kreider RW. 1986. Amish and Amish Mennonite genealogies. Gordonville, PA: Pequea Publishers. 858 p. Greenwood CMT, Bureau A, Loredo-Osti JC, Roslin NM, Crumley MJ, Brewer CG, Fujiwara TM, Goldstein DR, Morgan K. 2001. Pedigree selection and tests of linkage in a Hutterite asthma pedigree. Genet Epidem 21:S244–S251. Hsueh WC, Mitchell BD, Schneider JL, Wagner MJ, Bell CJ, Nanthakumar E, Shuldiner AR. 2000a. QTL influencing blood pressure maps to the region of PPH1 on chromosome 2q31-34 in Old Order Amish. Circulation 101:2810–2816. Hsueh WC, Wagner MJ, Mitchell BD, St Jean PL, Aburomia R, Knowler WC, Pollin T, Burns DK, Sakul H, Bell CJ, Ehm MG, Shuldiner AR, Michelsen BK. 2000b. Diabetes in the Old Order Amish: characterization and heritability analysis of the Amish Family Diabetes Study. Diabetes Care 23:595–601. Hsueh WC, Mitchell BD, Schneider JL, St. Jean PL, Pollin TI, Ehm MG, Wagner MJ, Burns DK, Sakul H, Bell CJ, Shuldiner AR. 2001. Genome-wide scan of obesity in the Old Order Amish. J Clin Endocrinol Metab 86:1199–1205. Hwang FK, Richards DS. 1992. Steiner tree problems. Networks 22:55–89. Johnston JJ, Kelley RI, Crawford TO, Morton DH, Agarwala R, Koch T, Scha¨ffer AA, Francomano CA, Biesecker LG. 2000. A novel nemaline myopathy in the Amish caused by a mutation in Troponin T1. Am J Hum Genet 67:814–821. Khoury MJ, Cohen BH, Chase GA, Diamond EL. 1987a. An epidemiologic approach to the evaluation of the effect of inbreeding on prereproductive mortality. Am J Epidemiol 125:251–262. Khoury MJ, Cohen BH, Diamond EL, Chase GA, McKusick VA. 1987b. Inbreeding and prereproductive mortality in the Old Order Amish I. Genealogic epidemiology of inbreeding. Am J Epidemiol 125:453–461. Khoury MJ, Cohen BH, Newill CA, Bias W, McKusick VA. 1987c. Inbreeding and prereproductive mortality in the Old Order Amish II. Genealogic epidemiology of prereproductive mortality. Am J Epidemiol 125:462–472.

37

Khoury MJ, Cohen BH, Diamond EL, Chase GA, McKusick VA. 1987d. Inbreeding and prereproductive mortality in the Old Order Amish III. Direct and indirect effects of inbreeding. Am J Epidemiol 125:473–483. Koch T, Martin A. 1998. Solving Steiner tree problems in graphs to optimality. Networks 32:207–232. Lee M-H, Gordon D, Ott J, Lu K, Ose L, Miettinen T, Gylling H, Stalenhoef AF, Pandya A, Hidaka H, Brewer B Jr, Kojima H, Sakuma N, Pegoraro R, Salen G, Patel SB. 2001. Fine mapping of a gene responsible for regulating dietary cholesterol absorption; founder effects underlie cases of phytosterolaemia in multiple communities. Eur J Hum Genet 9:375–384. Mamelka PM, Dyke B, MacCluer JW. 1993. Pedigree/Draw for the Apple Macintosh. Population Genetics Laboratory Technical Report 1. San Antonio, TX: Southwest Foundation for Biomedical Research. [Originally published in 1988; 2nd edition, 1993]. McBride DJ Jr, Streeter EA, Mitchell BD, Shuldiner AR. 2002. Variable expressivity of a COL1A2 gly-61-cys mutation in a large Amish pedigree [abstract]. Am J Hum Genet 71(Suppl):351. McKusick VA. 1978a. Medical genetic studies of the Amish: selected papers. Baltimore: Johns Hopkins University Press. 525 p. McKusick VA. 1978b. The William Allan Memorial Award Lecture: genetic nosology: three approaches. Am J Hum Genet 30:105–122. Mitchell BD, Hsueh W-C, King TM, Pollin TI, Sorkin J, Agarwala R, Scha¨ffer AA, Shuldiner AR. 2001. Familial contributions to life span in the Old Order Amish. Am J Med Genet 102:346–352. Pollin TI, Agarwala R, Schaffer AA, Lodge AL, King TM, Shuldiner AR, Mitchell BD. 2001. Fecundity is a familial trait in the Old Order Amish [abstract]. Am J Hum Genet 69(Suppl):422. Provan JS. 1983. A polynomial algorithm for the Steiner tree problem on terminal planar graphs. University of North Carolina Chapel Hill Report TR-83/10. Rosenberg MJ, Agarwala R, Bouffard G, Davis J, Fiermonte G, Hilliard MS, Koch T, Kalikin LM, Makalowska I, Morton DH, Petty EM, Weber JL, Palmieri F, Kelley RI, Scha¨ffer AA, Biesecker LG. 2002. Mutant deoxynucleotide carrier DNC is associated with congenital microcephaly. Nat Genet 32:175–179. Steinle NI, Hsueh WC, Snitker S, Pollin TI, Sakul H, St Jean PL, Bell CJ, Mitchell BD, Shuldiner AR. 2002. Eating behavior in the Old Order Amish: heritability analysis and a genome-wide linkage analysis. Am J Clin Nutr 75:1098–1106. Stone D, Agarwala R, Scha¨ffer AA, Weber JL, Vaske D, Oda T, Chandrasekharappa SC, Francomano CA, Biesecker LG. 1998. Genetic and physical mapping of the McKusick-Kaufman syndrome. Hum Molec Genet 7:475–481. Zelikovsky A. 1997. A series of approximation algorithms for the acyclic directed Steiner tree problem. Algorithmica 18:99–110. Zlotogora J, Bisharat B, Barges S. 1998. Can we rely on family history? Am J Med Genet 77:79–80.