Genetic structures in the Po Delta: Principal ... - Springer Link

3 downloads 0 Views 1MB Size Report
Genetic structures in the Po Delta: principal components, systemic functions and the relative age of the beta-thalassemia polymorphism*. I. Barrai J, M. Beretta 1, ...
Hum Genet (1992) 88:613-618

© Springer-Verlag 1992

Genetic structures in the Po Delta: principal components, systemic functions and the relative age of the beta-thalassemia polymorphism* I. Barrai J, M. Beretta 1, E. Mamolini 1, C. Scapoli 1, R. Canella 1, R. Barale 1, C. Vullo 2, and A . Ravani 2 1Department of Evolutionary Biology, University of Ferrara, Via Borsari, 46, 1-44100 Ferrara, Italy 2USL (Local Health Unit) 31, Ferrara, Italy Received July 1,1991

Summary. The principal c o m p o n e n t representations of the genetic structure of the h u m a n population of the Po Delta, obtained f r o m 7 polymorphic loci, are c o m p a r e d with the representations obtained f r o m the systemic function of gene frequencies devised by W o m b l e 1951. It is noted that, when tridimensional representations are used, some consistency is visible in the results of the two methods for the description of the genetic population structure in the area under study. Both methods indicate that the present structure of the balanced p o l y m o r p h i s m for beta-thalassemia in the area appears to be m o r e recent than the structure of the neutral polymorphisms studied.

tative chronology of the emergence of polymorphisms was also proposed on the basis of the behaviour of kinship profiles as a function of geographic distance. We now want to present the application of an interesting descriptive method, the 'systemic function' devised by W o m b l e (1951) for differential systematics, to the Delta gene frequencies; we want to compare this method with other descriptive methods, notably the geographic maps of principal components (Menozzi et al. 1978; Piazza et al. 1981; Rendine et al. 1986) that were used by Beretta et al. (1989) to describe the structure of the Delta population. In so doing, we also hope to obtain some information concerning the relative chronology of the emergence of the beta-thalassemia polymorphism in the area, relative to the neutral polymorphisms studied.

Introduction Beretta et al. (1989), using 7 presumably neutral genetic systems, presented a s u m m a r y structure of the h u m a n population of the Po Delta in Northern Italy, from which it was possible to draw some inferences regarding the migratory pressures that m a y have influenced the colonization of this area in historical times. T h e y conjectured that, with their data, it might also be possible to infer a tentative chronology of the emergence of polymorphisms in the Po Delta. To pursue this conjecture, Barrai et al. (1989b, 1991) explored the behavior of kinship profiles for 7 different genetic systems in the same area. Since the 7 systems are presumed to be neutral, and since the relevant phenotypes were determined in the same samples of individuals, Barrai et al. (1989b, 1991) were able to test, with this set of data, whether the spatial behavior of kinship was the same in each systems. Kinship analysis indicated that, for some genetic systems, allele frequencies were similar at the longitudinal margins of the province. A ten* This paper is in celebration of the Sixth Centennial Jubilee of the University of Ferrara, 1391-1991 Offprint requests to: I. Barrai

Materials and methods The genetic systems The 7 genetic systems studied were ACP, ESD, OLO1, OPT, PGD, POM1 and PGP; each one was typed in 1363 persons in the 26 communes of the Ferrara province. The frequencies of the principal allele in each system are taken from Beretta et al, (1989). In Table 1, we give, for each commune, the allele frequendes and the coordinates on the map of the Ferrara province at a scale of 1/100000, starting from the left lower corner of the map; coordinates are in centimeters, so that the distances calculated from them are in kilometers. Table 1, which is taken from Beretta et al. (1989), is the source of all our calculations in the present work, and may be used to test further hypotheses and to propose other descriptive methods.

Multivariate methods Pricipal components were calculated from the dispersion matrix of the 7 allele frequencies. Calculations of components were checked through reconstruction of the eigenvalue corresponding to each component obtained from the internal product of the vector of the 7 frequencies at a location, and the corresponding eigenvector. The systemic function used is that defined by Womble (1951) as the weighted average of the absolute values of the derivatives of

614 Table 1. Frequencies of 7 alleles of 7 enzyme systems in 26 communes of the province of Ferrara, 1987. X and Y are the coordinates of the communes in the map of Italy at 1/100000

Commune

Allele

Argenta Berra Bondeno Cento Codigoro Comacchio Copparo Ferrara Formig. na Goro Jolanda Lagosanto M.Torello Massafisc. Mesola Migliarino Migliaro Mirabello Ostellato R Renatico P. Maggiore Ro Ferrara S. Agostino Tresigallo Vigarano Voghiera

ACP*A ESD*I

GLOI*I GPT*I

PGD*I

PGM*I PGP*I

X

Y

0.231 0.281 0.255 0.380 0.250 0.415 0.250 0.189 0.280 0.340 0.220 0.250 0.321 0.279 0.290 0.241 0.260 0.222 0.245 0.245 0.324 0.300 0.306 0.308 0.343 0.310

0.398 0.368 0.443 0.343 0.433 0.481 0.346 0.344 0.410 0.377 0.400 0.375 0.375 0.442 0.390 0.448 0.450 0.389 0.400 0.439 0.435 0.370 0.426 0.461 0.472 0.370

0.972 0.974 0.962 0.972 0.952 0.991 0.981 0.992 1.000 0.981 0.960 0.986 0.973 0.990 0.960 0.974 0.970 0.972 0.964 0.980 0.963 0.970 1.000 0.971 0.944 0.990

0.787 0.640 0.717 0.694 0.711 0.755 0.702 0.664 0.630 0.689 0.760 0.833 0.750 0.664 0.830 0.672 0.660 0.806 0.736 0.816 0.648 0.800 0.787 0.750 0.713 0.830

70 80 36 27 91 98 69 52 71 98 81 94 65 83 91 76 80 40 77 42 67 63 34 74 43 62

32 74 61 44 55 39 62 56 56 57 61 48 51 53 66 49 51 55 46 48 40 68 51 54 56 48

Copparo



Z



Vigarane M. •

S. Agostino •Mirabei[o • ~ e Poggio Cento/ ~Renatico

//

FERRARA I

0.574 0.535 0.613 0.593 0.452 0.509 0.539 0.549 0.570 0.491 0.540 0.444 0.500 0.596 0.500 0.457 0.490 0.509 0.591 0.500 0.454 0.590 0.343 0.510 0.528 0.580

0.935 0.939 0.934 0.917 0.942 0.943 0.942 0.934 0.940 0.896 0.960 0.986 0.911 0.923 0.930 0.922 0.920 0.926 0.946 0.969 0.926 0.950 0.935 0.923 0.935 0.940

Berra,

Ro Bondeno

0.843 0.912 0.821 0.843 0.856 0.755 0.914 0.820 0.820 0.877 0.900 0.847 0.884 0.789 0.840 0.741 0.830 0.898 0.809 0.908 0.843 0.900 0.843 0.836 0.833 0.830

el°landa

Fermignana •

Savoia

di

Mesola

Cedigero % I

Gero

Tresigalle • Masitorelle• • eMassafiscaglia,~ Voghiera Migliaroe Lagesanto • Migtiarinoe • Osfeltato Comacchio

Argen o

Fig. 1. Map of the Po Delta showing the communes studied gene frequencies at corresponding points. "Ilaese are calculated as the slope between any two values of frequencies at a given distance. In the present case, the systemic function was not weighted, since all systems were typed in the same individuals, and approximately the same number of individuals was typed at each location.

Tridimensional representations of frequencies The frequencies of the principal alleles of the 7 genetic systems and the coordinates North-South and East-West of the locations where the samples were collected were used in a standard program that

615 interpolates the frequencies in a rectangular area defined by the user (Sowerbutts 1988) and displays them as surfaces. The area represents a rectangle in which the Ferrara province is contained (Fig. 1), and the surface of gene frequencies is represented by isophenes that are interpolated by the program. The derivatives are also represented as surfaces (Barbujani et al. 1989; Barrai et al. 1989a).

Results

Phenic surfaces Following Womble (1951), we define a surface of gene frequencies a 'phenic' surface. The phenic surfaces of the 7 different alleles are given, each with its derivative, in Figs. 2-15. The same procedure is used to give the principal components of gene frequencies, in Figs. 1623. The tridimensional representations of the principal components are equivalent to the maps devised by Menozzi et al. (1978), with the difference that instead of colors of different intensities, the same color is used for isophenes of different heights; the highest points would correspond to the strongest color intensity in the method of Menozzi et al. (1978). The systemic function was also calculated for the principal components, which are a linear combination of gene frequencies, and which, as such, can be considered a trait of the population studied.

Differential surfaces Womble (1951) visualizes discontinuities in the gene frequencies in an area through the derivative between points at the same distance. Lines uniting points with the same values of the derivative are defined as isoclines. We are interested in North-South migration, and have differentiated each phenic surface along the East-West direction, at points at a distance of 3 km. The derivatives are displayed as surfaces. It would seem appropriate, following Womble (1951), to define these surfaces as being 'clinal', but we prefer the term 'differential' surfaces to avoid misunderstanding with phenic surfaces that might constitute a cline in the traditional sense. The differential surfaces are displayed below the corresponding phenic surface. There is considerable variation in the surfaces of the different systems; in general, the differential surfaces give indications that appear to be consistent with the results from kinship studies. The variances of the differential surfaces considered as sets of points correlate with the values of F~t in each system, but no significance is reached, possibly because we have only 7 values of Fst (z = 0.989, t = 2.27, 10% < P < 5%).

The systemic function and the Euclidean composition of principal components The systemic function of the Delta area obtained from the 7 systems is given in Figs. 24 and 25; one is the tridimensional representation (Fig. 24), and the other is the bidimensional East-West projection of the surface (Fig. 25) viewed from the South.

Discussion

The construction of phenic surfaces over an area implies the use of a multiple of the number of points that were sampled. In the present case, the 26 original points were increased to 400 points in a rectangular lattice made up of squares having a 3-km side. The interpolation was through the inverse distance between sampled points. However, there are just 26 observations and 25 df for each surface; any consideration regarding the significance of these representations should therefore be viewed in the light of this basic limitation, which also applies to the geographic projection of principal components, but which is less problematic in the latter case, since it is possible to test the significance of the eigenvalue associated with the surface. The representations are of help in visualizing phenomena whose significance has previously been demonstrated by standard statistical analysis, for example in the case of the ESD cline; here, the phenic surface neatly visualizes the cline detected by the linear correlation between gene frequency and geographic distance. The tridimensional representations of the principal components bring the maps of Menozzi et al. (1978) into relief; our representations do not add to their maps, but the vertical coordinate provides a ready measure of the color intensity. The other point that seems to us to be of some interest in this description is the systemic function of the Delta. With the present set of data, the systemic function can be exactly compared with the chromatic representations of principal components. We note that the systemic function is U-shaped along the East-West direction, indicating a sharp change in gene frequencies at the margins of the area; this is consistent with the results obtained from kinship analysis, and from the first principal component. Such a shape of the systemic function could be generated by a NorthSouth migrational wave (or waves) that have only moderately changed the frequencies at the margins of the area, toward the sea and the western side of the province. We have calculated the correlations, using pairs of geographically corresponding points, between the total systemic function and (I) the Euclidean composition of the principal components, (II) the differential surface of this composition, and (III) the systemic function of the three principal components, considered separately. None of the coefficients of correlation was significant, although all were near or below the 10% probability level. We invoke migration as an explanation of these results, since a local adaptive differential would seem unlikely for apparently neutral systems in such a small area. The inferred migration wave must have been remote enough in time either not to disrupt the East-West gradient of the frequencies of beta-thalassemia in the area (Barrai et al. 1984, 1991), or to permit re-formation of the thalassemia gradient, which is indeed correlated with a local adaptive differential corresponding to the intensity of past malarial endemism. As a consequence, we believe that the present thalassemia structure in the area is more recent than the structure of the polymorphisms of the other systems studied here.

616

!'i

tt

J

2

7~

....



,

i >'i"

. i

cD

~

i,{ ............ _, T""

\-

,t0

!

i

\, tl

CO Figs. 2-15. Tridimensional representations of gene frequencies in the Ferrara province. The cline observed for allele ESD 1 emerges

clearly from the phenic surface. It may be noted that the ordered frequencies result in a uniform differential surface for this allele

617

/o,

i

'

co

×

m i

T-'r T )

C~4

....

z

\ ,

iii '

-°"~ p,

" m

'B

i !'

/

}

/

Ol

!

'3

i

\

X