Long term labour market effects of individual sports ... - Semantic Scholar

6 downloads 0 Views 188KB Size Report
Keywords: Leisure sports, health, labour market, matching estimation, panel data. ...... activities (fun, relaxing after an exhausting working day, etc.), as well as ...
Jogging for the money? Long term labour market effects of individual sports activities Michael Lechner*

First version: February, 2008 Date this version has been printed: 26 February 2008

Preliminary version Please, do not quote without permission of the author Comments very welcome Abstract: This microeconometric study analyzes the effects of individual leisure sports participation on long term labour market variables, on socio-demographic as well as on health and subjective wellbeing indicators for West Germany based on individual data from the German Socio-Economic Panel study (GSOEP) 1984 to 2006. Econometric problems due to individuals choosing their own level of sports activities are tackled by combining informative data and flexible semiparametric estimation methods with a specific way to use of the panel dimension of the data. The paper shows that sports activities have sizeable long term labour market effects in terms of earnings, wages, and the labour supply of women. One example of several results concerning non-economic variables is that sports activities reduce the number of men being divorced and separated and increase the number of men living together with their wife.

Keywords: Leisure sports, health, labour market, matching estimation, panel data. JEL classification: I12, I18, J24, L83, C21. Address for correspondence: Michael Lechner, Professor of Econometrics, Swiss Institute for Empirical Economic Research (SEW), University of St. Gallen, Varnbühlstrasse 14, CH-9000 St. Gallen, Switzerland, [email protected], www.sew.unisg.ch/lechner.

*

I am also affiliated with ZEW, Mannheim, CEPR and PSI, London, IZA, Bonn, and IAB, Nuremberg. This project received financial support from the St. Gallen Research Center in Aging, Welfare, and Labour Market Analysis (SCALA). I thank Marc Flockerzi for helping in the preparation of the GSOEP data and for carefully reading a previous version of this manuscript. The usual disclaimer applies.

1

Introduction Positive effects of physical activities on individual health are widely acknowledged in

academics and the general public. Nevertheless, there is still a substantial part of the population that is not active in sports. For example, in Germany about 40% of the population older than 18 years does not participate in sports activities at all, which is about an average value for Europe (they tend to be lower in Southern and higher in Northern Europe). Similarly in the USA: Based on a broader definition, about 25-30% of the relevant adult population does not engage in leisure time sports.1 These non-activity figures are surprisingly high considering that many Western countries substantially subsidize the leisure sports sector.2 The large subsidies can be justified by considerable externalities sports participation may have, for example by increasing public health and fostering social integration of migrants or other social groups which are difficult to integrate (for Germany, see Deutscher Bundestag, 2006). This paper addresses two issues that are important from the perspective of the individual as well as the public: The first question is whether the health gains appearing in medical studies are still observable when taking a long run perspective. It is conceivable that the health gains disappear, because the additional 'health capital' may be 'invested' in less healthy activities such as working harder on the job. This of course would raise doubts on one of the main justifications for public subsidies. Second, even if the direct health effects are absent in the long run, sports participation may increase individual productivity which appears desirable as well. Such an increase will be visible in standard labour market outcomes like earn-

1

2

The figures for Germany are taken from Bundestag (2006, p. 94). The source for the European numbers is Gratton and Taylor, (2000, chapter 5), while the US figure comes from Ruhm (2000) and Wellman and Friedberg (2002). The US figures are based on a broader definition of activities than the European ones including general physical activities. Public expenditures come in various forms and from various levels of government. They may be directed to investments in infrastructure and the subsidisation of sports organisations, information campaigns, tax rebates for sports related expenditures (in particular donations), etc. The relative importance of the different expenditure categories and the overall amounts, as well as the way how the support system is organized vary drastically from one country to another (see Gratton and Taylor, 2000). In addition, health organisations and firms invest in encouraging people to take up physical activities. This diversity of sponsoring institutions and types of expenditures makes it extremely difficult to get a reliable estimate of the total expenditures for non-professional sports.

1

ings, wages, and labour supply. Knowing such effects would be valuable information that could be used in public information campaigns to increase participation in leisure sports. There are at least four strands of the literature that are relevant for this topic. The first strand analyzes the effects of participating in high school sports on future labour market outcomes. Based on different data sets mainly from the USA and different econometric methods to overcome the problem of self-selection into high school sports, this literature broadly agrees that participation in such type of sports improves future labour market outcomes (e.g., Barron, Ewing, Waddell, 2000, Ewing, 1998, Ewing 2007, Long and Caudill, 2001, Persico, Postlewaite, and Silverman, 2004, and Stevenson 2006, for the USA, and Cornelissen and Pfeifer, 2007, for Germany).3 Next, the positive effect of sports activity on health is well documented in the medical and epidemiological literature (e.g., Lüschen, Abel, Cockerham, and Kunz, 1993, US Department of Health and Human Services, 1996). Furthermore, there is recent microeconometric evidence of a positive relationship: Rashad (2007) analyzes the effects of cycling on health outcomes. Lakdawalla and Philipson (2006) find that physical activity at work reduces body weight and thus the probability of individuals becoming obese (which is known to cause many illnesses, e.g., Andreyeva, Michaud, and van Soest, 2005, or Wellman, and Friedberg, 2002). Bleich, Cutler, Murray, and Adams (2007) consider the relation of physical activity to the obesity problem as well, although they find that the international trend of increasing obesity is more related to changes in eating behaviour than to reductions in physical activity. This finding is somewhat in contrast to some of the medical literature suggesting a more important role of declining physical activities over time (e.g., Prentice and Jebb, 1995). Finally, there is the literature linking health and labour market outcomes: Declining health reduces productivity and as a consequence it reduces wages and might reduce labour market participation. There is one channel that is particularly relevant for this study, namely

2

the impact of body weight, in particular obesity, on labour market outcomes. Obesity increases the risk of mortality, diabetes, high blood pressure, asthma, and other diseases.4 Finally, it is stressed in many policy papers (e.g., Deutscher Bundestag, 2006) that an important channel of how participation in sports, particularly team sports, may improve future labour market performance is by increasing social skills. Therefore, the sociological literature on how social capital may improve labour market performance (e.g., Aguilera and Barnabé, 2005) and on 'positive' extracurricular activities of youths leading to a more successful labour market performance in later years (e.g., Eccles, Barber, Stone, and Hunt, 2003) is relevant as well. Despite this large literature on topics closely related to the subject studied in this paper, there are no estimates of the effects of leisure sports on labour market outcomes. Since the effects of sports on labour market success take some time to materialise, estimating long run effects appears to be of particular relevance. Such a study comes with particular challenges: The first challenge is that a data set is required that allows following individuals for a sufficiently long time. This data should contain measurements of sports activities, labour market and other outcome variables of interest, as well as the (confounding) variables that jointly influence the outcomes of interest as well as the decision about sports activities. In the next section, it is argued that the German Socio-Economic Panel Study (GSOEP) with annual measurements from 1984 to currently 2006 is such a data set.5 The second challenge concerns the problems coming from individual self-selection into different sports activity levels. For example, if those individuals on well-paying jobs choose higher levels of sports activity, then a comparison of the labour market outcomes of individuals with low and high sports activity will not only contain the effects of different activity levels, but also reflect that these groups are different in other dimensions (selection bias). The fact that selection into sports is not at 3

4

For a related analysis of the effect of high school sports participation on suicides, see Sabo, Miller, Melnick, Farrell, and Barnes (2005). For examples of this literature see the extensive list of references given in Ruhm (2007).

3

all random is documented, for example, by Becker, Klein, and Schneider (2006) and Schneider and Becker (2005) for Germany, and by Farrell and Shields (2002) for England. However, solving this problem in the usual way, which means conditioning on the variables that pick up these 'differences in other dimensions' (it is argued below that the GSOEP contains the key variables influencing sports activity levels) may not solve the problem as the values of these conditioning variables may depend on past sports participation (endogeneity problem of control variables). This endogeneity problem is solved by using a flexible semiparametric econometric estimation technique (a specific variant of a so-called matching estimator) together with considering only changes in sports activities, i.e. by performing the analysis in subsamples defined such that in each subsample all individuals have the same level of sports activities. Then, within each subsample the effects of the next subsequent change of these levels are analyzed. This approach removes (most of) the endogeneity problem as the control (confounding) variables are measured in a period when everybody has the same level of sports activity and their measurement can thus not be influenced by differences in activities. The paper intents to contribute to the literature in three dimensions: The first goal is to learn more about the correlates of sports activities by using the GSOEP data with its wealth of information. Since this is done in a way to overcome the endogeneity problem, the interpretation of the results should be less controversial than in previous studies. The second and main contribution of this study is to uncover the long run effects of sports participation on labour market success and several other socio-demographic and health variables. Finally, a methodological point is made by adapting existing semiparametric econometric estimation methods to the specific panel data situation without having to impose the restrictive assumptions the popular fixed and random effects panel data estimators imply. The results of the analysis of the selection process into leisure sports activities suggest that sports activities are higher for men than for women. They are much lower for non-Ger5

For a detailed description of this data, see Wagner, Frick, and Schupp (2007).

4

mans, particularly for female non-Germans. Sports activities increase with education, earnings, and 'job quality'. They are also higher for healthier women, whereas no such relationship appears for men. Marriage and children as well as a higher age are associated with lower sports activities. The analysis of the effects of sports activities on outcomes revealed sizeable labour market effects. As a rough estimate participants in sports earn about 1000 to 1.500 EUR p.a. more than non-participants over a horizon of 15 years. There are also positive effects on wages and on the labour supply of women. Effects are not at all homogenous for men and women: For example, sports activities increase the number of married men living together with their wife (reducing the number of men being divorced and separated), whereas no such effects appear for women. No clear-cut results are found for health outcome. This suggests that sports participation increases health enough to compensate for the higher injury risk of participants, but probably not much more. This non-result on health allows the speculation that leisure sports participation may increase productivity via different routes, like providing relaxation, increasing mental capacity, and providing social contacts. The next section analyzes the correlates of the participation in sports activities. It also describes the data and the endogeneity problem. Section 3 describes the econometric approach to identify and estimate the effects of sports on the various outcome variables considered. Section 4 contains the main results and robustness checks. Section 5 concludes. Appendix A discusses a couple of data issues. Appendix B contains an exact description of the econometric matching estimator used. For the sake of brevity, additional results are relegated to a second appendix that is available in the internet (www.sew.unisg.ch/lechner/sports_GSOEP).

2

Who participates in leisure sports activities?

2.1

Previous results

As mentioned in the introduction, there seems to be common agreement in the literature that sports activities tend to decrease with age, tend to increase with earnings or social 5

status, and that men are more active than women. However, although not much is known in general on the further determinants of sports participation, there are some studies based on individual data that allow uncovering more factors that influence the participation in sports activities. Based on the British Health and Lifestyle Survey with interviews around 1984 and using a logit analysis for sports participation, Gratton and Taylor (2000) report in addition negative associations for past illnesses. Furthermore, they find positive associations of sports participation and not working full-time, as well as for sports participation and being separated or divorced. In a more recent study based on the Health Survey for England conducted in 1997, Farrel and Shields (2002) roughly confirm these findings using a probit model for sports participation. They further point to a negative association of sports participation and the presence of young children, as well as to a positive association with respect to the presence of older children for men. Furthermore, being a drinker, being white, and not being a smoker is also positively associated with sports participation. Schneider and Becker (2005) use a binary logit model and the German National Health survey with interviews between 1997 and 1999 for a similar analysis. They confirm the previous findings, except with respect to smoking. They further find that being more satisfied with life in general, having a lower body mass index (BMI), and having received medical advice on physical activity is also positively associated with sports participation. In similar work, Becker, Klein, and Schneider (2006) analyze the 2003 cross-section from the German GSOEP. In addition to the 'usual' findings concerning education and age, they find that for 2003 women are more likely than men, and singles are more likely than people who are or have been married to participate in sports. They also find a negative correlation for being a foreigner. Furthermore, they detect correlations for some subjective variables on social networks, subjective and objective health variables, as well as variables capturing policy in6

terest, and general life satisfaction (all measured at the same time than sports participation) that are correlated However, how to interpret the results of these cross-sectional studies is not obvious because they relate a phenomenon that could have been going on for a long time (sports activity) to other variables that may be influenced by past and present sports activities as well. For example, in the last study it is not at all clear whether good health increases sports activity or sports activity improves health. The same problem holds for some of the other time varying variables. This gives raise to the so-called endogeneity or reverse causality problem which makes a causal interpretation of the correlates identified in such studies difficult. In the following section, we suggest to use panel data to considerably reduce, if not eliminate, this problem. 2.2

Findings based on the German Socio-Economic Panel (GSOEP)

2.2.1 The endogeneity problem reconsidered

In a cross-section, the different sports participation statuses of the individuals have to be related to covariates measured at the same time as the participation status. Therefore, the measurement of the time varying variables in a particular period may already be influenced by current or past sports participation. If we were able to observe values of those variables as they would have been realized for a specific sports participation status, such values would not be subject to the endogeneity problem as they are not influenced by the actual realisation of the sports participation (i.e. the values of past labour market experience had the individual not participated in sports activities). However, as for every individual we observe only the values of the covariates along with specific realized sports participation, such (partly counterfactual) values are not available in a cross-section. Particularly so as variation in the sports participation status is needed to be able to analyze its determinants.

7

With panel data it is possible to circumvent this problem by exploiting both the variation of the sports status over time as well as over individuals. 'Determinants' of sports status should be measured close but prior to the sports participation decision (as it would be logically inconsistent for future events to influence past events). Therefore, the endogeneity problem is resolved, if the analysis is based on populations comprising individuals who are in the same sports status in the period before the specific sports participation decision is analyzed, and measurements of the covariates prior to that period are used. Thus, using some standard cross-sectional logit or probit model, for example, with the sports participation status of the current period as the dependent variable and last periods' measurements of the covariates as independent variables for such a specific subsample (e.g., the sample of those who did not do any active sports in the previous period) leads to results that would be considerably more credible than in a cross-section.6 Of course, the drawback is that the conclusions are valid only for the specific population with the particular sports participation status. However, this can be resolved by considering all such populations one-by-one (and taking appropriate averages if desired). 2.2.2 The data

The empirical analysis uses two such subsamples of the West German population. The no-sports sample consists of those individuals who did not participate in sports the year before the decision is analyzed. The sports sample comprises all individuals indicating some sports activity prior to the relevant year.7 Furthermore, since previous results suggest very substantial differences between men and women, the empirical analysis is stratified by sex. To be able to concentrate on a fairly homogenous population for which long-term labour market outcomes can still be observed fifteen years later, it is required that in the year of the decision

6

I refrain from using off-the-shelve panel econometric models, i.e. in this case fixed effects or random effects logit or probit models, because they require a considerable number of undesirable assumptions, like strict exogeneity and rely importantly on functional form assumptions for identification (e.g., see Lechner, Magnac, and Lollivier, 2008).

7

See Appendix A for further information on the data, the sample, and the variables used.

8

individuals under consideration should be between 18 and 45.8 Again, for being able to measure long-run outcomes as well as pre-decision control variables, the focus is on the West German subsample (observed yearly from 1984 to 2006) and sports participation decisions in the years 1985, 1986, 1988, and 1990.9 The price to pay for focusing on these four subsamples is a lack of precision due to the smaller sample sizes as compared to a pooled sample. We mitigate this problem by pooling the four different starting cohorts within each of those four samples. Furthermore, to be consistent with the sections discussing the effects of sport, the results from a balanced panel are reported.10 Furthermore, individuals stating that they were hospitalized in the year of the decision or the year before are not considered to avoid basing the results on seriously ill people, who are expected to participate in sports for other reasons if at all. As an unavoidable side effect, this rule excludes most women giving birth in those two years. Sports participation is measured in four different categories (at least every week, at least every month but not every week, less often than every month, none). In Table 2.1 we show the development of that variable over time for the combined sample considered to get an idea about the dynamics of sports participations. First of all, Table 2.1 shows that in 1985 35% of the men and 50% of the women did not participate in any sports, whereas 36% of the men and 26% of the women were active on weekly basis. However, in 2005, these gender differences disappeared since although more women than men did not participate in any activity (40% compared to 37%), fewer men than women (32% compared to 37%) are active at least on a weekly basis. Thus, while the women 8

9

10

Increasing the lower age limit to 24 years leads to similar results, but there is a loss of precision due to the smaller sample size. Information for East Germany starts in 1990. For the West, the years 1987 and 1989 are omitted due to data limitations regarding the sports variable. To be precise, it is required to be observed in the years -1, 0, 1, 2, 3, 4, 5, 7, 9, 11, 13, and 15. (0 denotes the year of the participation decision, -1 the year before, etc.). The results for a corresponding unbalanced panel requiring only to be observed in the years -1 and 0 are available on request. They support the findings presented in this paper, but of course the meaning of the outcome variables (coded 0 if individuals were not observed) is less clear. These results also show that

9

in the sample increased their activity levels, the activity levels for men remained fairly constant over time. Becker, Klein, Schneider (2006) find similar trends using GSEOP data starting 1992. However, the activity levels are lower, because they base their analysis on a broader definition of the underlying population.11 Table 2.1: Trends of sports participation over time for men and women (balanced sample) Men

1985 1986 1988 1990 1992 1994 1995 1996 1997 1998 1999 2001 2003 2005 Note:

Women Frequency of leisure sports activities weekly monthly < monthly none weekly monthly < monthly none 36 8 21 35 26 6 18 50 38 7 19 35 27 6 17 50 36 8 19 37 27 6 18 49 38 11 26 25 32 9 23 36 32 11 22 36 27 6 20 47 31 9 23 36 26 7 20 47 36 9 24 31 32 8 22 38 32 9 24 35 27 7 21 44 31 9 23 38 28 6 19 46 33 11 25 31 32 7 24 37 29 10 23 37 29 7 18 47 30 9 21 40 32 5 17 46 33 10 27 30 41 5 18 36 32 9 21 37 36 6 18 40 In 1990, 1995, 1998, and 2003 a five point scale is used. In such a case, the categories daily and weekly are combined for the analysis of this paper.

The no-sports sample consists of those individuals answering 'none' in the previous year, whereas the sports sample comprises the other individuals. The same holds true for sports participation, although the definitions have been varied to assess the sensitivity of the results with respect on how to define sports participation (see Section 4.3). Unfortunately, there are not enough observations to analyze the different categories within the sports sample separately. In the no-sports sample there are 1297 men and 1697 women, of whom 457 men and 452 women started sports in the next period. In the sample with at least some sports, out of the 2126 men and 1554 women, 350 men and 326 women ended their sports activities. It is

sports participation does not influence the probability of being in the observed in the balanced sample, indicating that the analysis can be conducted on the balanced sample without attrition bias. 11

Note that there are the expected effects in the years in which are five point scale is used instead of a four point scale. In those years, it appears that people avoid the 'extremes' of the scale more frequently. This pattern has also been observed by Breuer (2004).

10

already clear from these numbers that men are more likely to participate in sports than women. 2.2.3 Results

Table 2.2 contains sample means of the various covariates for the four different samples (sports / no sports and men / women) further stratified according to the sports status in the year analyzed. Thus, pair-wise comparisons of columns (2) vs. (3), (5) vs. (6), (8) vs. (9), and (11) vs. (12) allows to assess the covariate differences that come with the different sports participation status within each subsample. As a further tool to assess the relevance of the specific covariates, estimated coefficients of a binary probit model with sports participation as dependent variable are presented in columns (4), (7), (10), and (13). To avoid flooding the reader with numbers, coefficients not significant at least at the 10% level are omitted from the data (empty cell in table). When specific variables are omitted from the probit specification, it is usually either because there have been chosen as being part of the reference category (denoted by 'R'), or the cell counts are too small ('-').12 Next the different groups of variables are considered in turn. The cohort dummies capturing the year of the pooled participation decisions indicate that sports participation is rising over time, a finding consistent with Table 2.1. as well as with the literature mentioned above. However, the 1990 dummy variable may also pick the effects of using a five point scale compared to a four point scale in the previous years. The next block of variables relates to the socio-demographic situation. The results show that younger individuals are more likely to be observed to be active. The relation of sports activity and nationality is clear-cut: Non-Germans are less likely to be observed active, and this relation is considerably stronger for women than for men. In addition, being married is associated to less sports activity whereas the relation of divorce to sports activities is

11

somewhat unclear. Finally, the existence of children in the household of different ages is generally related to lower levels of sports activities.13 Table 2.2: Descriptive statistics for the selection process into sports activities Individuals with no sports before Men Women Mean in ProMean in Prosubsample bit subsample bit Characteristics (1)

Sport

No S.

(2)

(3)

S-NS

Sport

No S.

S-NS

(10)

(11)

(12)

(13)

R

24 .26 .24 .26

28 .29 .27 .16

R

-

31 .29 .97 .55 .05 .81 .10 .34 .51

32 .26 .80 .64 .07 .97 .17 .41 .56

R

38

50

R

32 34 32 32 34 17 17 8 .31* 28 21 29 21 44 16 25 59 66 50 60 59 10 7 3 14 9 Past and current employment status (in years) 9.5 12 -.03* 5.1 6.4 -.02* 7.3 9.3 .16 .17 1.6 1.2 .19 .24 -.16** .19 .42 .28 .33 .17 .23 Current employment status (in %) -.58** 1 1 26 41 1 1 4 6 4 4 3 3 2 1 23 16 3 1 84 86 39 37 81 86 Information on current employer (coded 0 if not employed; in %) 14 9 .16* 15 8 23 15 17 19 16 14 17 18 21 24 -.22* 13 8 26 20

40 22 16 64 10

34 16 28 57 8

.28** .37*

5.6 1.3 .19

5.7 1.4 .25

.03*

21 3 19 46

25 4 21 44

19 18 12

15 14 13

1985 1986 1988 1990

.32 .20 .19 .28

Age in years Age: 18-25 (dummy) German nationality Married Divorced # of kids in household Mother of kids age < 3 Mother of kids age < 7 Mother of kids age < 10

32 .25 .78 .61 .03 1.0 -

Lower secondary school or no degree Intermediate sec. school Upper secondary school No vocational degree Degree below university University

49

Full time work Part time work Unemployment Out of labour force Unemployed Part time employed Full time employed Public sector Firm size < 20 Firm size > 2000 Table 2.2 to be continued.

S-NS

Sport

No S.

S-NS

Individuals with some sports before Men Women Mean in ProMean in Prosubsample bit subsample bit Sport

No S.

(4) (5) (6) (7) (8) (9) Year of sports participation considered .31 R .30 .30 R .24 .25 -.27** .29 -.22* .19 .30 .27 .29 .24 .20 .24 .24 .31 .16 .36** .30 .17 .44** .26 .15 Socio-demographic characteristics 33 32 33 30 32 .16 .33* .26 .18 31 26 .68 .85 .60 .56** 84 69 .71 -.52* .64 .77 .49 .62 .02 .07 .05 .30 .04 .04 1.3 .95 1.4 .77 .99 .16 .20 .43 .54 .20 .58 .79 -.19 Highest schooling and vocational degrees (in %) 51 R 49 60 R 40 45

30 21 22 63 10

.33**

.27** -.24*

.40** -.03* 1.1** -.39**

-.38+ -.41**

12

To support these probit specifications, tests for omitted variables, as well as further general specifications tests against non-normality and heteroscedasticity are conducted. These respective test statistics do not point to serious violations of the statistical assumptions underlying the probit model. They are available on request from the author.

13

Further socio-demographic information such as immigration information and household income has been considered in the estimation but not presented in the table because they have no further explanatory power in the probit (conditional on the variables already included).

12

Table 2.2 continued … Individuals with no sports before Men Women Mean in ProMean in Prosubsample bit subsample bit Characteristics (1)

Monthly earnings in EUR Net family income Weekly hours In vocational training Self-employed Civil servant Occupation: Office Occ. with low autonomy … below medium auton. … medium autonomy … high autonomy … fits vocational degree Job prestige (Treimann, 13-78, 78: highest)

Sport (2)

No S. (3)

S-NS (4)

Sport (5)

No S. (6)

S-NS (7)

Individuals with some sports before Men Women Mean in ProMean in Prosubsample bit subsample bit Sport (8)

No S. (9)

S-NS (10)

Sport (11)

No S. (12)

Information on current occupation (coded 0 if not employed) 1883 1842 .0001* 761 653 1825 1734 2150 1953 1978 1955 2204 2066 35 36 19 17 34 35 .05 .03 .04 .02 .08 .08 .04 .07 -.46* .03 .03 .04 .04 .06 .04 .15 .08 .13 .06 .15 .10 .21 .13 .26 .13 .33* .19 .29 .13 .24 -.35* .10 .23 .27 .26 R .17 .11 R .20 .28 .20 .14 .21 .23 .11 .25 .16 .35** .15 .15 .05 .04 .20 .13 .47** .39 .36 .28 .17 .45 .37 36 33 .009* 35 30 36 35

901 2192 21 .07 .04 .05 .27 .06 .17 .28 .09 .37 38

795 1910 20 .05 .03 .04 .22 .14 .17 .19 .07 .25 34

S-NS (13)

.0001**

R

Health and smoking .25 .24 .28 .23 .26 .21 .29** .33 .34 -.16 .43 .45 .41 .37 .25* 2.9 2.5 1.8 1.5 .01 2.7 2.5 .16 .16 -.18 .12 .10 .17 .15 .24** .50 .61 -.16* .45 .40 .52 .53 General satisfaction with life (in %) Medium 39 40 38 39 37 37 34 37 High 28 29 29 25 .40* 30 26 29 28 Highest 27 25 26 29 .28 28 30 30 30 Regional information (in %) Unemployment rate 7.9 8.2 8.0 8.1 7.7 7.7 .03 8.2 7.8 Southern states 37 31 .28* 37 39 37 35 .18 32 38 Central states 17 17 14 16 15 18 14 14 Town > 500.000 33 36 31 35 -.23 28 32 30 30 100.000-500.00 8 12 -.38* 10 12 9 8 10 11 20.000-100.000 7 6 R 7 6 R 6 7 8 6 R 5.000-20.000 9 9 11 10 10 9 9 10 < 5.000 8 8 8 9 6 7 7 9 City centers 26 29 28 29 25 28 28 27 # of observations 457 822 1279 452 1145 1597 1876 350 2636 1228 326 1554 Note: The dependent variable in the probit is a dummy variable which is one if the individual participated in some sports activities in the relevant year. All independent variables are measured prior to the dependent variable. Coefficients are only reported when significant at the 10% level. If they are significant at the 5% (1%) level, they are marked by one (two) '*'. The probit includes a constant term. It also includes a control for the sports intensity (1-3) for the 'sports-before' sample. Some variables in the table are not included in the estimation. They are either marked by R (reference category), or '-' (variable deleted for other reasons like too small cell size). Some groups of explanatory variables do not add up to 100% because of variables omitted, or because of missing values. Satisf. with health high Satisf. w. health highest Visits of medical doctor Chronical illness Never smoked

.26 .27 1.5 .11 .43

.27 .25 1.6 .11 .34

The educational information known from other studies to play an important role is described by several variables related to formal schooling as well as vocational education. The results of Table 2.2 support the general finding that sports activities increase with education.

13

For those who worked in the year before they started their sports participation, variables are included to characterize the firm (size, sector), the job (duration, earnings, hours, required vocational education, sector, occupation, prestige, autonomy, position; only selected variables in table). For those not working, their current status is known as well (unemployed, out of labour force, pensioner, in educational system, etc.). Furthermore, there is information on job histories like total duration in full-time or part-time employment, and so on. The results for these durations are however difficult to interpret as they are by definition positively correlated with age. The most clear cut association is for women not working and not unemployed who are more likely to be observed as being active. There is also a clear positive relation to earnings, although the respective coefficients are not significant in all four probit specifications. This is most likely due to the fact that several other occupational variables as well as educational variables pick up those effects, as they are highly correlated with earnings. By and large the different occupational variables confirm the general finding that individuals in 'better' jobs (requiring more responsibilities and a higher level of training, and pay more) as well as individuals with jobs in the public sector are more likely to be observed to be active in sports. The association to firm size appears to be somewhat ambiguous. Health is measured by several variables. There are some 'objective' health measures, like number of visits of a MD's in the last three months, days hospitalized (not presented), degree of disability (not presented) or whether the individual experiences any chronicle diseases. Furthermore, there is a measure of self-assessed satisfaction with own health using an 11-point scale. For the women in the sports sample, subjective health status is positively associated with sports participation. No such relation can be found in the other samples or for the other variables. Smoking is known to be a possible important factor of sports participation (e.g. Farrel and Shields, 2002), however, it is observed only from 1998 onwards in the GSOEP, which impedes its use as control variable, because it has already been influenced by the sports par14

ticipation before. However, in 1999, 2001, and 2002, individuals are also asked whether they 'never smoked'. This variable is included in the probit estimation.14 The results are somewhat surprising in the sense that no or only small differences appear for the sports sample, whereas in the no-sports sample non-smoking men are more likely to be observed in active sports, whereas non-smoking women are less likely to participate in sports. Variables measuring worries (not presented), general life satisfaction and height (not presented) are considered as well to capture further individual traits that may influence the decision to participate. However, no systematic differences appear. Unfortunately, weight is measured only much later so that a pre-decision BMI could not be calculated. The same is true for alcohol and tobacco consumption. To account for regional differences, the information on the German federal states (used at an aggregated level) and the types of urbanization is supplemented by regional indicators coming from the special regional files of the GSOEP allowing an extensive socio-economic characterization of the region the individual is living in. However, it is hard to detect any systematic patterns, perhaps with the exception that living in a large city seems to be negatively associated with sports participation. However, even the latter finding is hardly visible in the sports sample. To conclude, these results confirm the findings that exist in the literature so far (see Section 2.1) with the exception of the relation to smoking and being divorced, and the effects of older kids on men's sports activities (note that compared to 2003, between 1985 and 1990, men are much more likely to participate in sports than women). Interestingly, Section 4 will show that sports participation will have an effect on being divorced so that one might conjecture that those differences are related to the endogeneity problem most of the other studies are

14

This variable relates to the past as well as to the present and the future and is thus less influenced by current sports participation. To avoid ignoring this important selection variable it is included despite the endogeneity problem. However, the internet appendix contains the complete set of results without this variable. These results show clearly that none of the conclusions depend on the inclusion of this variable.

15

confronted with. Of course, it is impossible to exclude the possibility that the differences for the smoking variable are due to the already mentioned measurement problem of this study.

3

The effect of sports participation on labour market and other outcomes: Identification and estimation

3.1

Identification

The previous section showed that participation in sports activities is not a random event. Based on this analysis, comparing earnings of sports participants and non-participants is expected to result in a positive earnings effect for the sports participants simply because better educated individuals are more likely to participate in sports. Therefore, such crude comparisons lead to biases for the 'causal effects' of sports participation that have to be corrected for. Such a correction for the presence of confounding variables influencing the selection into sports can be done by various econometric methods, some of them to be discussed below, if these confounding variables are not affected by sports participation (exogenous). The 'confounding' variables that need to remove selection bias, i.e. the required control variables, are those variables that are jointly related to the outcomes of interest (e.g. earnings 15 years later) and the particular sports participation 'decision' (e.g. Rubin, 1974, 1979). Again, the previous section showed how the emphasis on particular subsamples with the same sports status prior to the participation solves the endogeneity problem.15 Therefore, the next step consists in defining which variables should be considered as 'confounding'. The empirical literature discussed above points to a couple of variables that are almost all covered in our data base, mostly in a more detailed way than in those studies. The main missing variables are some life-style related variables measuring eating and drinking habits. These variables are measured in the GSOEP, but only in recent years. Thus, they cannot be used because due to the later measurement they will be more like 'outcomes' than con-

16

trol variables, i.e. they are not exogenous. The literature (e.g. Farrel and Shields, 2002) suggests that drinking may in fact be related to higher sports participation and could also be negatively related to earnings. Thus, a downward bias appears to be likely. On the other hand, excess weight is related to lower sports participation and lower labour market outcomes which leads to an upward bias. There are several reasons why these biases might not be too severe: First, the missing life-style variables are correlated with other socio-economic variables that are controlled for, in particular labour market histories, earnings, type of occupation, and education, among others. Second, the biases plausibly go in different directions so some of them are likely to cancel. Third, it is reassuring that no significant effect of sports participation on them could be detected when treating weight, drinking and smoking formally as outcome variables in the estimation process. An alternative route to analyze the selection problem is to consider sports participation from a rational choice perspective comparing expected costs and benefits from this activity (see for example Cawley, 2004, who used this approach to analyze eating and drinking behaviour). The expected cost consist of direct monetary costs (e.g. buying equipment, fees for fitness studio, travel to sports facilities), as well as foregone earnings, forgone home production, and foregone utility from other leisure activities (assuming that sports activity is a substitute for work or leisure, or both). Some types of (unpleasant) sports activities may also be associated with a direct disutility. The gains of leisure sports comes as direct utility from sports activities (fun, relaxing after an exhausting working day, etc.), as well as from the role of sports as an investment in so-called health capital. The latter can be seen as a part of an individuals' human capital as it enhances productivity and the value of leisure (see Grossmann, 1972).

15

A remaining problem could be that people anticipate that they will start sports activities next year and change behaviour already today in anticipation of that. However, such long term planning for a leisure activity seems to be unlikely.

17

What do these considerations imply with respect to the variables that are needed as controls so that the empirical analysis has a causal meaning? In fact, they are the same variables as already discussed. For example, direct costs depend on location since sports participation is typically more expensive when living in inner cities than in suburbs or small villages. Furthermore, opportunity costs depend on the value of the alternatives to sports, which are work, household production, and leisure. The value of these alternatives is in turn highly correlated with (and determined by) the socio-demographic variables discussed above (type of occupation, education, household composition, health, age, gender, etc.). Furthermore, their value should be related to the conditions in the local labour market. The concept of health capital appears to suggest that individuals with higher returns (or lower investment costs) should invest more in such capital. Again, it could be conjectured that the socio-demographic variables that determine the returns from work will also be related to the stock of health capital. However, this remains somewhat speculative as there is not much empirical research on how to measure the returns from health capital. Furthermore, the individual discount factors should play some role since individuals who value the future relatively more should invest more in their health capital. However, such preferences are notoriously hard to measure in survey. It follows from these considerations that using the homogenous initial sample approach allows to condition on most of the relevant exogenous variables. Thus, it will most likely remove (most of) the selection bias and does not require further restrictive modelling assumptions. An alternative to the proposed approach are fixed effects models. They appear to be attractive at first sight because they allow for some unobserved heterogeneity related to the selection process.16 However, these models rely on assumptions that are unattractive in this context. First, generally only the linear version of the fixed effects models identifies the re16

The comparison made here is made for fixed effects models, as random effects models require strictly stronger assumptions than the methods proposed below, because random effects models do not allow for any unobservables correlated with the regressors (see Lechner, Lollivier, and Magnac, 2008).

18

quired effects. As many of the dependent variables are binary, this is clearly unattractive. Second, the assumption of strict exogeneity of the independent variables (i.e. the assumption that last years' sports participation does not influence next years' measurement of the independent variables) is very unlikely to hold. Third, the assumptions that the unobservable variable has a constant effect over 15 years would be very hard to justify in this context. A further alternative to identify the effects would be to use an instrumental variable approach (e.g. Imbens and Angrist, 1994). Such an approach requires an exogenous variable that influences the outcomes under consideration only by influencing sports participation (any direct effect is ruled out). In the present context such a variable does not appear to be available. 3.2

Estimation methods

As explained above, the identification and estimation problem can be tackled using an approach that exploits the panel structure of the data by performing the analysis in subsamples defined by the sports activities in the previous year and then analyzing the effects of the movements in or out of sports. In principle, once the data have been reconfigured to correspond to such a set-up, a linear or non-linear regression analysis could be used with future labour market and other outcomes as dependent variables and sports participation as well as all the other control variables as independent variables (measured in the last period when all individuals are in the same state). Such methods are well known and have been heavily used, but they suffer from potential biases when the implied functional form assumptions are not satisfied. This is particularly worrying as these assumptions in turn imply that the effects have to be homogeneous in the population or specific subpopulation (see for example Heckman, Smith, and LaLonde, 1999). Such assumptions are clearly not attractive in this context. Recently, a flexible semiparametric method that circumvents these problems became very popular in labour economics, i.e. the method of matching (see Imbens, 2004, for a survey). It is briefly described and applied below.

19

Before going more into details, it is worth pointing out that all possible parametric, semi- and nonparametric estimators of (causal) effects are built on the principle that for every comparison of two states (here sports activity versus no sports activity), we need comparison observations from the other state with the same distribution of relevant characteristics. As discussed above, characteristics are relevant if they jointly influence selection and outcomes. Here, an adjusted propensity score matching estimator is used to produce such comparisons (see Rosenbaum and Rubin, 1983, for the basic ideas). A clear advantage of these estimators is that they are semiparametric and that they allow for arbitrary individual effect heterogeneity. To obtain estimates of the conditional choice probabilities (the so-called propensity scores) used in the selection correction mechanism to form the comparison groups, the probit models presented in the previous section are applied. The matching procedure actually used incorporates the improvements suggested by Lechner, Miquel, and Wunsch (2005). These improvements tackle two issues: (i) To allow for higher precision when many 'good' comparison observations are available, they incorporate the idea of calliper or radius matching (e.g. Dehejia and Wahba, 2002) into the standard algorithm used for example by Gerfin and Lechner (2002). (ii) Furthermore, matching quality is increased by exploiting the fact that appropriately weighted regressions that use the sampling weights from matching have the so-called double robustness property. This property implies that the estimator remains consistent if either the matching step is based on a correctly specified selection model, or the regression model is correctly specified (e.g. Rubin, 1979; Joffe, Ten Have, Feldman, and Kimmel, 2004). Moreover, this procedure should reduce small sample bias as well as asymptotic bias of matching estimators (see Abadie and Imbens, 2006a) and thus increase robustness of the estimator. The actual matching protocol is shown in Table B.1 in Appendix B. See Lechner, Miquel, and Wunsch (2005) for more detailed information on this estimator.

20

There is an issue here on how to draw inference for this rather involved estimator that is a combination of weighted radius matching and weighted regression. Although Abadie and Imbens (2006b) show that the 'standard' matching estimator is not smooth enough and, therefore, bootstrap based inference is not valid, the version of the estimator implemented here is by construction much smoother than the estimator studied by Abadie and Imbens (2006b). Therefore, it is conjectured that the bootstrap is valid. It is implemented following MacKinnon (2006) by bootstrapping the p-values of the t-statistic directly based on symmetric confidence intervals (rejection regions). The p-values for the non-symmetric confidence intervals are typically smaller and are reported in the internet appendix. Bootstrapping the pvalues directly as compared to bootstrapping the distribution of the effects or the standard errors has advantages because the t-statistics on which the p-values are based are asymptotically pivotal whereas the standard errors or the coefficient estimates are not.

4

Results

4.1

Introductory remarks

In this section the effects of sports participation on various outcome measures are presented. The considered outcomes relate to success in the labour market, like earnings, wages, and employment status, as well as to various objective and subjective health measures, further socio-demographic outcomes and some direct measures of satisfaction with life in general. For each group of outcome variables, only a few specific variables are presented in the following tables for the sake of brevity. For (almost) all variables the effects are estimated annually over the 15 years after the change in sports participation status. Again for the sake of brevity, results are presented only for every second year (starting with year 3). The mean effects for the no-sports sample and the sports sample are shown separately in Tables 4.1 and 4.2. Finally, the effects presented in the table are those for the group of individuals remaining or becoming active (average treatment effect on the treated). The results for the groups of men 21

and women becoming or remaining inactive are mentioned when relevant and are available on request from the author. As before, the four decision years (1985, 1986, 1988, and 1990) are pooled to increase precision. To acknowledge the considerable sex specific heterogeneity in the selection process and to uncover interesting heterogeneity, sex specific results are reported. Inference is based on the usually more conservative symmetric bootstrapped p-values based on 999 bootstrap replications as explained above. In Tables 4.1 and 4.2 only significance levels are indicated (+ for 10%, * for 5%, and ** for 1%), but for the no-sports sample the internet appendix contains many more details, like descriptive statistics for the levels of the various outcome variables, bootstrapped standard errors, and the p-values for the asymmetric bootstrap. The detailed results for the sports sample are available on request. Before discussing the effects of sports participation on various outcome measures in detail, it is useful to precisely define the 'treatment', i.e. sports participation. It is the comparison of the no-sports state compared to any sports activity in one of the four periods. This contrast is conditional on the pre-decision state that is either measured one year (1985, 1986) or two years earlier (1988, 1990). Then, the effect of this contrast is observed over the following 15 years. Over this period, there is no guarantee that the different sports statuses of the two groups remain constant.17 Therefore, the first row in the upper and lower parts of Tables 4.1 and 4.2 show the impact of the particular change in sports participation status considered on future sports participation.18 Although the differences between the groups shrink with respect to their sports activities, persistent differences remain that are significant for all 15 years. Interestingly, in the no-sports sample the values for men converge because the average activity level of the 'non-active' men increases over time whereas those classified 'active' do not 17

Keeping the sports status constant over this long period would raise the endogeneity problems discussed before because time varying covariates would have to be included to correct for dynamic selection problems. Flexible selection corrections in such a dynamic framework would require dynamic treatment models of the sort discussed by Robins (1986) or Lechner (2008). However, such models are too demanding with respect to sample size to be applicable in this context.

22

change their average activity level. For women, both groups become more active in a way that the difference remains stable after some initial period of convergence. The trends in the sports sample are different. While women of the active group keep their average activity level fairly constant, women classified as non-active (who were active before, otherwise they would not be in sports sample) increase their average activity level. Non-active men and active men decrease their activity levels somewhat so that the estimated differences of active and non-active men tend to be fairly constant. These trends may be part of possible explanations for potential differences between the effects obtained in the four different samples below. 4.2

Estimation results

4.2.1 Labour market effects of sports participation

The upper parts of Tables 4.1 and 4.2 contain three measures for the changing returns of work due to sports participation. Monthly earnings are measured as gross earnings in the month before the interview. Average monthly earnings are the monthly earnings summed up year by year until the year in question averaged over the valid interviews. The average monthly earnings are constructed as a measure of the total effect over time. They have the additional advantage that these averages are much smoother than the yearly snapshots. Thus, they exhibit a smaller variance so that smaller effects could be detected with given precision. Wages are computed by dividing monthly gross earnings by weekly hours (x 4.3). All these variables are coded as zero when the individual is not working. Furthermore, they are de- or inflated to Euros of the year 2000 to facilitate comparisons over time and entry cohorts. Some non-monetary labour market outcomes are considered as well: Employment intensity is directly measured by weekly hours worked (0 if not working), and indicator variables for working full time or part time. Of course, there are substantial difference in the levels of these variables for men and women, as about 90% of the men work full time, whereas 18

The ordinal coding of the sports variable is used directly (on the 4 point scale with 4 meaning 'no sports'). Using dummy variables for the different categories instead gives similar results.

23

only about 35% of the women work full time and a similar number of women work part time, whereas only very few men work part time. The earnings results for women in both samples suggest that there is about a monthly gross earnings gain of about 100 EUR (or more) that appears to be fairly persistent over time. It is more pronounced in the sports sample than in the no-sports sample. The fact that sports activities increase employment levels to some extend as well, suggests that part of this increase comes as additional employment, but most likely not all of it (there is some evidence of increasing wages that is however significant only for the sports sample). The corresponding effects for men are less stable as they increase to a level of above 200 EUR and then drastically fall towards the end of the 15 year follow-up period. This is partly due to the fact that sports participation not only increases earnings, but also decreases male full time employment towards non-employment as well as part time employment when the sample members age. Since there is no employment reaction before, it is not surprising that there is also evidence of increased wages by about 1 EUR per hour in that period. In conclusion, the results for the summary measure of average earnings suggest that after 15 years there is a monthly average earnings gain of about 100 EUR (averaging the results for the sports and no-sports sample) which adds up to an overall gain of close to 20.000 EUR over this period due to increased sports activity for men as well as for women. Comparing those results to the results obtained for the groups of individuals deciding to be non-active (average treatment effect for the non-treated) reveals similar earnings effects, but without the decline at the end observed for men. Thus, these results suggest that this group would have benefited substantially had they chosen to participate in sports instead. 4.2.2 Health effects of sports activities

Individual health is measured by objective as well as subjective measures. Objective measures include days spent in a hospital in the last year, degree of disability (a reduction in the capacity to work on a scale between 0 and 100%; not presented), the number of visits of a 24

medical doctor in the last three months prior to the interview, as well as whether somebody dies (not presented19). These measures are supplemented by subjective health information in which either people state their degree of health on a five point scale from very good to very bad, or they indicate their general satisfaction with their health status on a 11-point scale.20 Overall, the results for men and women point fairly unsystematically in different directions that change over time, it appears that either there is no overall systematic effect of sports on health, or such an effect is too small be able to be detected with the size of the samples at hand. It is however conceivable as well that there is a positive effect of sports participation on health, but that the positive effect of sports on earnings counteracts that effect. For example, people who are more productive due to their sports activities might work harder or in a more stressful and better paid job so that no overall health effect is visible. However, this remains speculative. For the group of non-active individuals there is some more systematic indication that sports participation would indeed lead to better health in the long run, in particular when using the subjective measures.

19

The effect of sports on death has been estimated on an unbalanced sample.

20

It is rightly considered no good econometric practise to use ordinal scales directly as outcome measures. However, since using (many) indicators for the specific values of the scales qualitatively leads to the same results as when using the scales directly, the effects on the ordinal scales are good summary measures in this case.

25

Table 4.1: Effects of sports participation on various outcome measures: No-sports sample Mean effects x year after starting sports activities 3 5 7 9 11 13 15 Sports activities Men Sports activities (scale 1-4; 4: none) -.66** -.60** -.46** -.40** -.30** -.28** -.26* Labour market Men Monthly gross earnings 33 143* 107 250+ 260* 115 92 Men Average monthly gross earnings 46 65 79* 104* 124* 129* 125* Men Gross wages per hour -.22 .69+ .56 .79 1.3* 1.1+ .11 Men Weekly working hours -.17 -.02 -.48 1.1 2.1+ .27 -1.7 Men Full time employed (in %) -1 -2 -1 0 2 -2 -5+ Men Part time employed (in %) 0 2* 1 0 1 0 2* Health Men Days at hospital in last year -.26 .20 -.86 -.10 .59 -.50 -.29 Men Doctoral visits in last 3 months .15 -.55 .46 -.16 -.02 .02 -.47+ Men State of health (scale 1-5; 5: very bad) n. a. n. a. -.04 -.04 -.03 -.03 -.09 Men Satisfied with health (scale 0-10; 10: high) .26+ .12 -.12 .02 .04 .16 .18 Marital status Men Married (in %) 3 6* 6* 6* 6* 8** 8* Men Divorced (in %) -4 -4+ -3 -3+ -4 -5* -4+ Worries and general life satisfaction Men Considerable worries about the the eco-3 2 0 -1 -6+ 2 -3 nomic situation (in %) Men Satisfied with life (scale 0-10; 10: high) .10 .01 -.03 -.01 .15 -.08 -.07 Future sports activities Women Sports activities (scale 1-4; 4: none) -.54** -.54** -.45** -.45** -.47** -.44** -.55** Labour market Women Monthly gross earnings 59 103* 102+ 59 64 74 92 Women Average monthly gross earnings -3 15 31 36 38 42 47 Women Gross wages per hour .19 .52 .51 .44 -.07 .74 .14 Women Weekly working hours 1.8+ 2.4* 2.2+ .53 .35 1.4 .43 Women Full time employed (in %) 6+ 7* 6+ 4 0 3 2 Women Part time employed (in %) 0 -1 0 -1 1 -2 -1 Health Women Days at hospital in last year -.71 -.42 -.30 .48 .93* -.65 -.31 Women Doctoral visits in last 3 months -.30 -.07 -.12 -.14 -.06 .04 -.18 Women State of health (scale 1-5; 5: very bad) n. a. n. a. -.08 .01 .03 .02 -.05 Women Satisfied with health (scale 0-10; 10: high) .10 .00 .29+ .06 -.09 .18 .25+ Marital status Women Married (in %) 0 -3 -2 -1 0 1 3 Women Divorced (in %) 1 1 2 1 0 1 0 Worries and general life satisfaction Women Considerable worries about the economic -6* -3 -9** -7* -3 -1 -1 situation (in %) Women Satisfied with life (scale 0-10; 10: high) -.07 -.07 .13 .22* -.01 .23* .07 Note: One (two) '*' denotes significance at the 5% (1%) level based on symmetric p-values (bootstrapped, see section 3.2), + denotes significance at the 10% level. Bootstrap based on 999 replications. Monthly average earnings are accumulated over valid yearly interviews and divided by the number of valid interviews. All monetary information is in EURO, inflated or deflated to the year 2000 by using the (West) German consumer price index. All monetary and job related information is coded as '0' if the individual does not work. They are all based on the imputed version of the gross earnings provided in the GSOEP. Outcome variable

Sex

26

Table 4.2: Effects of sports participation on various outcome measures: Sports sample Outcome x year after continuing sports activities 3 5 7 9 11 13 15 Sports activities Men Sports activities (scale 1-4; 4: none) -.94** -.70** -.60** -.54** -.62** -.55** -.50** Labour market Men Monthly earnings 85 88 35 262** 73 79 -65 Men Average monthly earnings 80** 83* 71 98+ 96+ 94+ 83 Men Wages per hour (0 if not employed) 1.0* 1.1* .60 1.6** .38 -.88 -.45 Men Weekly working hours (0 if not employed) .70 -.14 -.69 1.7 .56 .69 -.33 Men Full time employed (in %) -1 0 -1 3 -2 -2 -5* Men Part time employed (in %) 1 2** 0 -1 0 2* 2** Health Men Days at hospital in last year -.15 .26 .44** .23 .04 .65+ -.06 Men Doctoral visits in last 3 months .36 .15 .67* .47** .00 .24 .20 Men State of health (scale 1-5; 5: very bad) n.a. n.a. -.06 -.03 -.11 -.08 -.06 Men Satisfied with health (scale 0-10; 10: high) -.14 .11 -.10 -.04 .20 .21 .15 Marital status Men Married (in %) -2 -5+ -4 -4 0 2 8* Men Divorced (in %) 2 5** 3+ 4* 1 -1 -6 Worries and general life satisfaction Men Considerable worries about the economic -4 -1 -4 -6+ -5 0 -4 situation (in %) Men Satisfied with life (scale 0-10; 10: high) -.01 -.17 -.03 -.04 .20 .13 .29* Future sports activities Women Sports activities (scale 1-4; 4: none) -.79** -.61** -.66** -.61** -.57** -.47** -.54** Labour market Women Monthly earnings 55 111 174* 166+ 204* 195* 155+ Women Average monthly earnings 64* 75* 92* 104* 114* 123** 124* Women Wages per hour (0 if not employed) .37 .33 .75 1.0 1.2* 1.2** .46 Women Weekly working hours (0 if not employed) .75 2.5+ 1.5 1.3 1.2 .14 .85 Women Full time employed (in %) 3 4 1 0 -1 0 2 Women Part time employed (in %) 2 -5 0 1 6+ 1 1 Health Women Days at hospital in last year -.64 -.24 -.67 .43+ -.58 -.46 .21 Women Doctoral visits in last 3 months -.19 .15 -.52 -.20 -.25 -.15 -.36 Women State of health (scale 1-5, 5: very bad) n.a. n.a. -.06 -.03 .06 -.01 .03 Women Satisfied with health (scale 0-10: 10: high) .11 .07 .07 .12 -.03 -.15 -.13 Marital status Women Married (in %) 1 -1 -1 -3 1 2 3 Women Divorced (in %) -2 2 1 3 1 0 0 Worries and general life satisfaction Women Considerable worries about the economic -1 -3 0 -3 -1 3 0 situation (in %) Women Satisfied with life (scale 0-10; 10: high) .05 -.05 -.07 .07 .01 -.02 -.02 Note: See note below Table 4.1. Outcome variable

Sex

4.2.3 Effects of sports participation on marital status, worries, and life satisfaction

Several variables are used to indicate marital status. The table presents the results for being married and being divorced (and not remarried). The other indicators for marital status (single, widowed, etc.) as well as variables for household composition, in particularly regarding children in the household are not presented in the tables as they do not contain much 27

additional information. The results in both tables show the same clear picture: For women the effects are small and insignificant throughout. For men, however, sports activities significantly increase the share of men being married (and living together) and decrease the share of men being divorced or being married but separated. Note that for the sports sample, the latter effect comes with some delay, after an initial increase in divorces and separations and an initial corresponding decrease in being married and not separated.21 Further results indicating positive long run effects with respect to the number of older kids living in the households point in the same direction. Very similar effects appear for the groups of non-sports participants. The explanations why these effects appear so clearly for men but not for women are not obvious, so far. Finally, a bulk of indicators is considered that measure subjective life quality and subjective outlooks. The tables show two such measures, namely an indicator whether an individual is very worried about the economic situation in general, as well as a measure of general satisfaction with life (measured on a ten point scale). There is some scattered evidence in Tables 4.1 and 4.2 that sports participation reduces worries and increase life satisfaction somewhat for sports participants and would reduce worries and would increase life satisfaction of actual non-participants. 4.3

Sensitivity checks

Several checks are performed to better understand the sensitivity of the results with respect to arbitrary specification and variable choices as well as to discover further important heterogeneity. The first set of checks concerns socio-demographic variables influencing outcomes and selection that do not come as surprise but can be well planned or anticipated. Thus, the individual takes into account events that materialize in these variables one or two years ahead. 21

Although, it is easy to speculate whether this effect might be due to remarrying, the data is silent with respect to the credibility of such speculations. In any case, it would not be clear why such an explanation would be relevant for the

28

If this is true, these future values should be included in the probits or sample selection rules as they indicate current or past decisions that have not yet materialized. Here, children and being married (two years ahead) are included in the probits. Furthermore, individuals with days in the hospital in the current and the following year (year 1) are removed from the sample. However, the results are robust with respect to both of these changes. In a similar attempt several ways to specify the various health variables (different functional forms, different sets of variables) are explored, but the final results are not sensitive to different (reasonable) ways to measure health. The health variables are also used to select the sample in different ways, but again no sensitivity could be detected. The second set of checks concerns the definition of the sports participation variable. The following checks are performed: (i) Comparing the two most extreme categories (1 & 2) to the no-sports (4) category; (ii) comparing (1 & 2) to (3 & 4); and (iv) comparing (2 & 3) with (4) using the consideration that too much sports may be not good either. However, these changes did not change the results much, although it should be noted that at least the first sharper definition of the 'treatment' as well as definition (iii) reduces the number of observations and thus leads to noisier estimates. In another check, estimation was conducted without conditioning on the sports status before. This results in a much larger sample (3780 observations) and thus more precise estimates. A couple of health variables become significant in the expected direction. Nevertheless, this specification remains dubious because of the endogeneity problem discussed above. To understand the robustness with respect to enforcing the balanced panel structure (required for meaningful interpretation of many of the outcome variables), the effect of sports participation on being in the balanced part of the sample has been estimated in an unbalanced panel design. It turned out that there is no such effect and thus it appears innocuous in this particular application to require a balanced panel over such a long horizon. sports sample only.

29

The age restriction may also be of concern as some fairly young individuals are included when requiring a lower age limit of 18 year, some of them may still be in the education system. Restricting the sample to individuals 24 years old and older instead leads to an efficiency loss due to the smaller sample, but otherwise similar results. Increasing the upper age limit to 50 instead of 44 increases precision but some of the individuals are now 65 at the end of the follow-up period. Therefore, more observations withdraw from the labour market. Thus, it is much harder to detect any earnings effects. This phenomenon was already visible in the main results, particularly for men at the end of the follow up period (most pronounced in the sports sample). Furthermore, the sample has been restricted to those working full-time in the relevant period to get the 'pure' earnings effects. The results point in the same direction as for the overall sample, however, the samples are reduced considerably and the additional noise made it very hard to obtain enough precision to obtain significant estimates. In conclusion, the results appear to be robust to reasonable deviations from the specifications underlying the conclusions drawn from Tables 2.2, 4.1, and 4.2.

5

Conclusion This microeconometric study described the correlates of sports participation and ana-

lyzed the effects of sports participation on long term labour market variables, socio-demographic as well as on health and subjective well-being outcomes for West Germany using individual data from the German Socio-economic Panel study (GSOEP) 1984 to 2006. The issue that people choose their level of sports activities and, thus, participants in sports may not be comparable to individuals not active in sports, is resolved by using very informative data, flexible semiparametric estimation methods, and an innovative use of the panel dimension of the GSOEP.

30

The analysis of the selection process into leisure sports activities suggests that sports activities are higher for men than for women, and much lower for non-Germans, particularly for female non-Germans. Activities increase with education, earnings, and 'job quality'. They are also higher for more healthy women, whereas no such relationship appears for men. Marriage and children as well as a higher age are associated with lower sports activities. The analysis of the effects of sports activities on outcomes revealed sizeable labour market effects. As a rough estimate, active sports increases earning by about 1.000 to 1.400 EUR p.a. over 15 year compared to no sports activities. The results translate to rates of return of sports activities in a range of 2% to 10%, suggesting roughly similar magnitudes than for one additional year of schooling. There are also positive effects on wages and on the labour supply of women. Effects are not at all homogenous for men and women. For example, sports activities increase the number of men who are married and live together with their wife (reducing the number of being divorced and separated), whereas no such effects appear for women. One potential channel for the positive labour market effects could be health. However, although the data contain some objective and subjective health information, no clear cut pattern emerged. This suggests at least that sports participation increases health enough to compensate for the higher injury risk of participants. With respect to relevant channels for productivity increases, one has, therefore, to speculate that leisure sports participation increases productivity via different routes, like providing relaxation, increasing mental capacity, and providing social contacts, among other effects. Future research should focus on improving data quality in longitudinal studies to better understand how the channel from sports participation to labour market outcomes. Such improved data should include not only much more detailed health and life style data, but also more information on the intensity and type of sports. Apparently, even if such a database was started now, it would take a long time before an empirical analysis could be based on it. Until 31

then, it is hoped that this paper provides valuable information about the effects of leisure sports participation on labour market and socio-demographic outcomes.

References Abadie, A., and G. W. Imbens (2006a): "Large Sample Properties of Matching Estimators for Average Treatment Effects", Econometrica, 74, 235-267. Abadie, A., and G. W. Imbens (2006b): "On the Failure of the Bootstrap for Matching Estimators", mimeo. Aguilera, V., and M. Bernabé (2005): "The Impact of Social Capital on the Earnings of Puerto Rican Migrants," The Sociological Quarterly, 46, 569-592. Andreyeva, T., P. Michaud, and A. van Soest (2005): "Obesity and Health in Europeans Aged 50 and above", Working Paper, Rand, 331. Barron, J. M., B. T. Ewing, and G. R. Waddell (2000): "The Effects of High School Athletic Participation on Education and Labor Market Outcomes", The Review of Economics and Statistics, 82, 409-421. Becker, S., T. Klein, and S. Schneider (2006): "Sportaktivität in Deutschland im 10-Jahres Vergleich", Deutsche Zeitschrift für Sportmedizin, 57, 226-232. Bleich, S., D. Cutler, C. Murray, and A. Adams (2007): "Why Is The Developed World Obese?", NBER Working Paper 12954. Breuer, C. (2004): "Zur Dynamik der Sportnachfrage", Sport und Gesellschaft, 1, 50-72. Cawley, J. (2004): "An Economic Framework for Understanding Physical Activity and Eating Behaviors", American Journal of Preventive Medicine, 27 (3S), 117–125. Cornelissen, T., and C. Pfeifer (2007): "The Impact of Participation in Sports on Educational Attainment: New Evidence from Germany," IZA DP 3160. Crossley, Th. F., and S. Kennedy (2002): "The reliability of self-assessed health status," Journal of Health Economics 21 (2002) 643–658. Dehejia, R. H., and S. Wahba (2002): "Propensity-Score-Matching Methods for Nonexperimental Causal Studies", Review of Economics and Statistics, 84, 151-161. Deutscher Bundestag (2006): "11. Sportbericht der Bundesregierung," Drucksache des Deutschen Bundstags, 16/3750, 4.12.2006, Berlin. Eccles, J. S., B. L. Barber, M. Stone, and J. Hunt (2003): "Extracurricular Activities and Adolescent Development", Journal of Social Issues, 59, 865-889. Ewing, B. T. (1998): "Athletes and work", Economics Letters, 59,113–117. Ewing, B. T. (2007): "The Labor Market Effects of High School Athletic Participation: Evidence From Wage and Fringe Benefit Differentials", Journal of Sports Economics, 8, 255-265. Farrell, L., and M. A. Shields (2002): "Investigating the economic and demographic determinants of sporting participation in England", Journal of the Royal Statistical Society A, 165, 335-348.

32

Gerfin, M., and M. Lechner (2002): "A Microeconometric Evaluation of the Swiss Active Labor Market Policy," The Economic Journal, 112, 854-893. Gratton, C., and P. Taylor (2000), The Economics of Sport and Recreation, London: Taylor and Francis. Grossman, M. (1972): "On the Concept of Health Capital and the Demand for Health", The Journal of Political Economy, 80, 223-255. Imbens, G. W. (2004): "Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review", The Review of Economics and Statistics, 86, 4-29. Imbens, G. W., and J. D. Angrist (1994): "Identification and Estimation of Local Average Treatment Effects," Econometrica, 62, 467-475. Joffe, M. M., T. R. Ten Have, H. I. Feldman, and S. Kimmel (2004): "Model Selection, Confounder Control, and Marginal Structural Models", The American Statistician, 58-4, 272-279. Heckman, J. J., R. LaLonde, and J. A. Smith (1999): "The Economics and Econometrics of Active Labor Market Programs", in: O. Ashenfelter and D. Card (eds.), Handbook of Labour Economics, Vol. 3, 1865-2097, Amsterdam: North-Holland. Krueger, A. B., and D. A. Schkade (2007): "The Reliability of Subjective Well-Being Measures", NBER Working Paper, 13027. Lakdawalla, D., and T. Philipson (2006): "Labor Supply and Weight", mimeo. Lechner, M. (2008): "Sequential Causal Models for the Evaluation of Labor Market Programs", forthcoming in the Journal of Business & Economic Statistics. Lechner, M., R. Miquel, and C. Wunsch (2005): "Long-Run Effects of Public Sector Sponsored Training in West Germany", CEPR Discussion Paper 4851. Lechner, M., S. Lollivier, and T. Magnac (2008): "Parametric Binary Choice models", forthcoming in P. Sevestre and L. Matyas (eds.), The Econometrics of Panel Data, 2nd edition, chapter 7. Long, J. E., and S. B. Caudill (2001): "The Impact of Participation in Intercollegiate Athletics on Income and Graduation", The Review of Economics and Statistics, 73, 525-531. Lüschen, G., T. Abel, W. Cockerham, and G. Kunz (1993): "Kausalbeziehungen und sozio-kulturelle Kontexte zwischen Sport und Gesundheit", Sportwissenschaft, 23, 175-186. MacKinnon J. G. (2006): Bootstrap Methods in Econometrics, mimeo. Michaud, P., A. H. O. van Soest, and T. Andreyeva (2007): "Cross-Country Variation in Obesity Patterns among Older American and Europeans", Forum for Health Economics & Policy, 10 (2), Article 8, 1-30. Persico, N., A. Postlewaite, and D. Silverman (2004): "The Effect of Adolescent Experience on Labor Market Outcomes: The Case of Height", Journal of Political Economy, 112, 1019-1053. Prentice, A. M., and S. A. Jebb (1995): "Obesity in Britain: gluttony or sloth", British Medical Journal, 311, 437-439. Rashad, I. (2007): " Cycling: An Increasingly Untouched Source of Physical and Mental Health", NBER Working Paper 12929. Robins, J. M. (1986): "A New Approach to Causal Inference in Mortality Studies with Sustained Exposure Periods - Application to Control of the Healthy Worker Survivor Effect", Mathematical Modelling, 7, 1393-1512. 33

Rosenbaum, P., and D. Rubin (1983): "The Central Role of the Propensity Score in Observational Studies for Causal Effects", Biometrica, 70, 41-55. Rubin, D. B. (1974): "Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies", Journal of Educational Psychology, 66, 688-701. Rubin, D. B. (1979): "Using Multivariate Matched Sampling and Regression Adjustment to Control Bias in Observational Studies", Journal of the American Statistical Association, 74, 318-328. Ruhm, C. J. (2000): "Are Recessions Good For Your Health?", The Quarterly Journal of Economics, 617-650. Ruhm, C. J. (2007): "Current and Future Prelevence of Obesity and Severe Obesity in the United States", Forum for Health Economics & Policy, 10 (2), Article 6, 1-26. Sabo, D., K. E. Miller, M. J. Melnick, M. P. Farrell, and G. M. Barnes (2005): "High School Athletic Participation And Adolescent Suicide: A Nationwide Us Study", International Review For The Sociology of Sport, 40/1, 5–23. Schneider, S., and S. Becker (2005): "Prevalence of physical activity among the working population and correlation with work-related factors. Results from the First German National Health Survey", Journal of Occupational Health, 47, 414-423. Statistisches Bundesamt (2005), "Körperliche Aktivität", Robert-Koch-Institut, Gesundheitsberichterstattung des Bundes, Heft 26. Stevenson, B. A. (2006): "Beyond the Classroom: Using Title IX to Measure the Return to High School Sports", American Law & Economics Association Annual Meetings, Year 2006, Paper 34. US Department of Health and Human Services, Centers for Disease Control and Prevention and National Center for Chronic Disease Prevention and Health Promotion (1996): "Physical Activity and Health: A Report of the Surgeon General", International Medical Publishing, Atlanta, 87-144. Wagner, G. G., J. R. Frick, and J. Schupp (2007), "The German Socio-Economic Panel Study (SOEP) –Scope, Evolution and Enhancements", Schmollers Jahrbuch, 127, 139-169. Wellman, N. S., and B. Friedberg (2002): "Causes and consequences of adult obesity: health, social and economic impacts in the United States", Asia Pacific Journal of Clinical Nutrition, 11 (Suppl): S705–S709.

34

Appendix A: Data A.1

Definition of some important variables

In this section some additional information on key variables is provided. Such variables are those defining the 'treatment' (sports participation), the outcomes and the control variables. Discussing all of the latter variables would be beyond the space constraints of this paper, so the ones based on ordinal scales are discussed only, as they are probably non-standard. A.1.1 Sports participation in the GSOEP

The information on leisure sports activity differs over the years. For example, in the initial survey of 1984, the relevant question asked in three categories whether people do sports in their free time ("How often do you engage in the following activities in your free time? Active

sports:

never/rarely;

occasionally;

often/regularly").

Individuals

answering

'never/rarely' constitute the no-sports sample with respect to the sports decision in 1985, whereas the remaining group constitutes the sports sample. In 1985 and thereafter there were two types of questions. Both are more precise than the 1984 version: The first type says "Which of the following activities do you do in your free time? Please enter how often you practice each activity. … Active sports participation: each week; each month; less often; never". This question was posed in 1985, 1986, 1988, 1992, 1994, 1996, 1997, 1999, 2001, and 2005. The alternative formulation used in 1990, 1995, 1998, and 2003, was "How frequently do you do the following activities? … do sports: daily; once per week; once per months; less than once a month; never". Although, the wording is not exactly the same, once the two first categories (daily, once a week) of the second type of the questions are aggregated, both type of questions appear to be sufficiently similar to be used in combination. This is also confirmed by a comparison of the respective descriptive statistics over time. A more serious problem is that for the years 1987, 1989, 1991, 1993, 2000, 2002, 35

and 2004 no such information is available. When required for the definition of the pre-participation status and the outcomes, the missing information is taken from the previous year. A.1.2 Selected subjective variables

The questions about worries are phrased in the following way: "How about the following areas? Do they worry you? … general economic development: ... Very worried, slightly worried, not worried". The variable used in the empirical analysis is an indicator for 'very worried'. One of the health questions uses a 5-point scale and the following wording: "How would you describe your health at present? Very good; good; satisfactory; poor; very poor." The variables for satisfaction with health are based on the following wording "How satisfied are you today with the following areas of your life? Please answer by using the following scale, in which 0 means totally unhappy and 10 means totally happy. If you are partly happy and partly not, select a number in between. How satisfied are you ... with your health?". Finally, the question about satisfaction with life in general is worded in the following way: "At the end we like to ask you for your satisfaction with your entire life. Please answer by using the following scale, in which 0 means totally unhappy and 10 means totally happy. How happy are you at present with your life as a whole? …".22 There may be an issue with the quality of subjective health information. Although recent work suggests that the quality of self-assessed health data may have some random component that may be related to other socio-economic variables (i.e., Crossley and Kennedy, 2002) the fact that a panel data set is used that keeps these factors constant over time and that many socio-economic characteristics are conditioned on in the empirical analysis suggests that these issues are not particularly relevant for this analysis. Of course, similar concerns

22

All translations of the questions from the (German) questionnaires are taken from the official website of the GSOEP.

36

may be raised concerning the subjective well-being measures.23 Again, note that this issue would only be relevant, if there was a systematic difference in the reliability between participants and nonparticipants in sports activities. It is very hard to see why this should be the case. A.2

Sample selection rules

The motivation and construction of the sports and no-sports sample, as well as the pooling of the different sport-participation decisions are already discussed in the main part of the text. The following additional sample selection rules are applied. First, individuals without valid sports information in the relevant years of and before the participation decision are not considered. Second, the analysis is based on a balanced panel over up to 18 years so that the longterm outcome variables as well as the covariates have meaningful measurements. Using an unbalanced panel for the 15 years in which the outcomes are measured, sports participation has no effect on the probability of being observed in the balanced part of the sample. Thus, there is no worry that requiring balancing does induce any substantial bias in the results presented. Third, individuals are restricted to be aged between 18 and 44. The lower age limit is to avoid analyzing individuals still in school, whereas the upper limit is imposed to avoid that retirement issues become too important, as individuals will not be older than 60 when their long term outcomes are measured. Fourth, only individuals not disabled in the years of and before the participation decision are considered. Furthermore it is required that during the year of the decision as well as the year after the decision the individual does not attend hospitals. Both restrictions are im23

However, Krueger, and Schkade (2007) study the reliability of such measures and conclude optimistically that "While reliability figures for subjective well-being measures are lower than those typically found for education, income and many other microeconomic variables, they are probably sufficiently high to support much of the research that is currently being

37

posed to be able to concentrate on the healthy part of the population. Finally, due to very small cell sizes, individuals in agriculture and mining, etc., both physically demanding occupations, are removed.

Appendix B: The matching estimator used For the sake of completeness, the matching protocol for the estimator used here is reproduced below. For further explanations and details the reader is referred to Lechner, Miquel, and Wunsch (2005). The computation of the standard errors of this estimator is explained in Section 3.

undertaken on subjective well-being, particularly in studies where group means are compared (e.g., across activities or demographic groups)." (last sentence of their abstract).

38

Table B.1: A matching protocol for the estimation of the average effect for the sports participants Step 1 Step 2 Step 3

Estimate a probit model to obtain the choice probabilities conditional on covariates for all observations: Pˆ ( X i ) Restrict sample to common support: Delete all observations with probabilities larger than the smallest maximum and smaller than the largest minimum of all subsamples defined by S. In each of the 4 samples no more than 20 observations had to be removed. Estimate the respective (counterfactual) expectations of the outcome variables. The following steps are performed: Standard propensity score matching step (binary treatments) a-1) Choose one observation in the subsample defined by participation in sports and delete it from that pool. b-1) Find an observation in the subsample of non-participants that is as close as possible to the one chosen in step a-1) in terms of ⎡ Pˆ ( x), x ⎤ . 'Closeness' is based on the Mahalanobis distance. Do not remove ⎣ ⎦ that observation, so that it can be used again. c-1) Repeat a-1) and b-1) until no participant in sports is left. Exploit thick support of X to increase efficiency (radius matching step) d-1) Compute the maximum distance (d) obtained for any comparison between treated and matched comparison observations. a-2) Repeat a-1). b-2) Repeat b-1). If possible, find other observations in the subsample of non-participants in sports that are at least as close as R * d to the one chosen in step a-2) (to gain efficiency); we choose R to be 90%. Do not remove these observations, so that they can be used again. Compute weights for all chosen comparisons observations that are proportional to their distance (calculated in b-1). Normalise the weights such that they add to one. c-2) Repeat a-2) and b-2) until no participant in sports is left. d-2) For any potential comparison observation, add the weights obtained in a-2) and b-2). Exploit double robustness properties to adjust small mismatches by regression e) Using the weights w( xi ) obtained in d-2), run a weighted linear regression of the outcome variable on the variables used to define the distance (and an intercept). f-1) Predict the potential outcome cients of this regression:

y l ( xi ) of every observation in l (no sports) and m (sports) using the coeffiyˆ l ( xi )

f-2) Estimate the bias of the matching estimator for E (Y l | S = m) as:

1( S = m) yˆ l ( xi ) 1( S = l ) wi yˆ l ( xi ) − . Nm Nm i =1 N



g) Using the weights obtained by weighted matching in d-2), compute a weighted mean of the outcome variables in the non-active. Subtract the bias from this estimate. Final estimate h) Compute the treatment effect by subtracting the weighted mean of the outcomes in the comparison group of non-active from the weighted mean in the group of sports participants.

39