COMMUNITY COLLEGE SCIENTISTS AND SALARY

0 downloads 0 Views 3MB Size Report
Community College Scientists and Salary Gap: Navigating Socioeconomic and ...... expected salary decrease of about 30% —a finding that is consistent with the ...
Manuscript

Running head: COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP

Community College Scientists and Salary Gap: Navigating Socioeconomic and Academic Stratification in the U.S. Higher Education System Manuel S. González Canché The Institute of Higher Education University of Georgia

Author Note Manuel S. González Canché is an Assistant Professor at Institute of Higher Education, University of Georgia. Mailing address: The University of Georgia, 116 Meigs Hall, Athens, GA 30602. Tel. 706-583-0048, email: [email protected]

1

COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP Abstract

2

More than four decades of research on community colleges indicates that students who begin in these institutions realize lower levels of educational attainment than initial 4-year entrants. In terms of labor market outcomes, studies overwhelmingly focus on comparing 2-year entrants with high school graduates who did not attend college. In contrast, this study concentrates on 2year entrants who became scientists in STEM fields and compares their individual and professional characteristics and monetary compensation over a 10-year period with those of scientists who entered college in the 4-year sector. The data analyzed come from two National Science Foundation’s longitudinal and nationally representative samples of doctorate recipients. The analytic techniques relied on the instrumental variables approach for dynamic panel data and propensity score weighting. Findings consistently revealed that 2-year entrants came from lowerincome backgrounds, had lower mean salary, and lower salary growth than their 4-year sector counterparts. Despite these negative salary-based effects, data showed that the 2-year sector has had an active function in the early formation of scientists. As the competition for science and technology development tightens worldwide, initiatives should identify understudied venues to increase the production of STEM graduates. Considering its scope, the 2-year sector could be one of them.

COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP Community College Scientists and Salary Gap: Navigating Socioeconomic and Academic

3

Stratification in the U.S. Higher Education System Introduction As of 2014 less than 2.9% of the U.S. population 18 years of age or older had earned a doctoral or professional degree (U.S. Census Bureau, Current Population Survey, 2014 Annual Social and Economic Supplement [Table 1-01], 2014). Of these advanced degree holders, roughly 22% have historically majored in science, technology, engineering, and mathematics (STEM) fields (US Census Bureau, Survey of Income and Program Participation, 2008 Panel. [Table 3E], 2009). Synthesizing this information, one can say that only about 0.6% of the U.S. population 18 years or older hold a doctorate or professional degree in STEM fields. As the competition for science and technology development tightens across the world, the production of STEM graduates is of increasing importance to any country aspiring to remain competitive in scientific production worldwide. Ultimately, these highly trained individuals are required to conduct “creative work undertaken on a systematic basis to increase the stock of knowledge...[and] their ideas have the potential to radically change our understanding of an important existing scientific concept” (National Science Board, 2010, p. 9). Based on this conceptualization, doctorates in STEM fields will be referred to as scientists herein, as they not only hold the highest academic degree offered, but they are also responsible for advancing knowledge and spurring innovation on a global level. Educational stratification in the U.S. is not a new concept (Hauser, 1970). Research has consistently shown a strong relationship between academic achievement and socioeconomic status (SES) whereby students coming from high SES families tend to have better educated parents, have access to academically rigorous (usually private and selective) schools that prepare them to perform better academically, earn higher scores on standardized admission tests, and finish high school ready to tackle college-level academic demands (Palardy, 2008; Wyatt, Wiley, Proestler, & Camara, 2012). With so many potential advantages associated with high SES, educational stratification lenses entice one to imagine scientists as students coming from high SES families, who started college in selective institutions that provided them with the means to

COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP 4 navigate undergraduate and graduate education without interruptions, eventually resulting in the attainment of graduate degrees in prestigious fields. What is not part of our current conceptualization of the academic trajectories followed by scientists, is that a portion of them first entered college in what is considered one of the most controversial post-secondary education sectors: the public, 2-year (or community) college sector (Brand, Pfeffer, & Goldrick-Rab, 2014; Dougherty, 1994). The relevance of analyzing this unusual career path is that despite the well-known fact that community colleges are unique institutions in terms of the heterogeneous student populations they enroll and their commitment to providing multiple pathways of access, especially for first-generation, ethnic minority, lowincome, and academically under-prepared students (Bragg, Kim, & Barnett, 2006; Brand et al., 2014; Dietrich & Lichtenberger, 2015; Goldrick-Rab, 2010), there are few studies to date (Melguizo and Dowd, 2009) that have tried to shift the rhetoric of previous work related to the community college penalty. Indeed, this sector has been labeled as an unrealistic route toward a 4-year degree that is only marginally better than dropping out of the higher education system altogether (Lin & Vogt, 1996; Reynolds, Stewart, MacDonald, & Sischo, 2006). The presence of the 2-year path culminating in a doctorate in a STEM field, then, provides some evidence that may contradict the social reproduction function in higher education that the 2-year sector has arguably played in the United States. Considering that community colleges continue to educate 53% of students attending public institutions and 84% of all part-time undergraduate students (IPEDS, 2014), additional research is needed to examine stratification issues in higher education that revolve around the effect of the 2-year sector on academic and professional opportunities. This study departs from the high school to college transition literature that has characterized more than four decades of study on the effects of community colleges. Instead, it focuses on the analysis of the characteristics and monetary compensation received by scientists who began their postsecondary education in the 2-year sector. Based upon the lack of literature concerning this sub-group of highly trained individuals, they will be referred to as “Community College” (CC) scientists hereafter. The analysis of how CC scientists are faring in their professional careers is not only important in order to re-evaluate our notions of the stratification of opportunities in American society, but also to explore under-studied ways in which the

COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP formation of well-prepared workforces can be boosted. The U.S. cannot afford to waste the

5

potential contributions of citizens who —likely due to financial reasons– have followed ‘nontraditional’ pathways to a graduate education in STEM fields. The purpose of this study is two-fold: the first is to analyze whether CC scientists are systematically different from their non-CC counterparts in the variables employed to estimate the models. This analysis is important because if there are no systematic differences between these two groups of scientists, then this finding would serve as evidence that the 2-year sector merely helped students who could have started in the 4-year sector (e.g., came from better SES and academic backgrounds than typical 2-year students) rather than contributed to closing the educational and socioeconomic gaps (Lee & Frank, 1990). However, if socioeconomic indicators systematically vary between groups, reflecting that CC scientists came from lower SES, minority, and/or traditionally underrepresented in higher education backgrounds, then this would be evidence of a more equalizing role of the 2-year sector in American higher education than is traditionally depicted in the literature. The second purpose of the study is to analyze whether there are salary disparities between CC and non-CC scientists over time. Relying primarily on an instrumental variable (IV) estimation for dynamic panel data, the magnitude of important individual time-variant (e.g. years since PhD degree, sector of employment) and time-invariant (e.g. race, field of formation, type of degree-granting institution) variables will be measured for two nationally representative and longitudinal panel samples taken from the National Science Foundation’s Survey of Doctorate Recipients (SDR). In case systematic differences are found between CC and non-CC scientists’ observed characteristics, the models will also rely on propensity score weighting (PSW) mechanisms to ensure the only observed difference between participants is CC status. The PSW models will complement the IV dynamic panel specification modeling approach with the purpose of obtaining less biased estimations (see Figure 2). Background and previous studies An extensive literature review revealed no evidence of studies aimed at comparing the characteristics and occupational trajectories of CC and non-CC scientists. The literature on community colleges and labor market outcomes is largely characterized by comparison of

COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP 6 community college students’ salaries with high school graduates’ salaries (Grubb, 2002; Kane & Rouse, 1995; Marcotte, Bailey, Borkoski, & Kienzl, 2005; Sanchez, Laanan, & Wiseley, 1999). Perhaps the most influential work in the literature on this topic is a study by Kane and Rouse (1995), wherein these authors used data from NLSY:72 and NLSY:79 to examine wage and annual earnings differentials 14 years after high-school graduation, and 6 to13 years after high school graduation, respectively. They found the average person who attended a 2-year college earned about 10% more than a person without a college education. An important study by Grubb (2002) reviewed then current research on the returns of sub-baccalaureate degrees and coursework at community colleges. Grubb found that individuals who completed associate degrees earned about 20% to 30% more than high school graduates, with estimates for men being somewhat lower than those for women, a finding similar to that showed by Marcotte et al. (2005). Grubb also concluded that one year of coursework (without completing a degree) at either a 2- or a 4-year college increased an individual’s earnings by about 5% to 10%. Similarly, Marcotte et al. (2005) found sufficient evidence to conclude that, compared to a high diploma, a community college education, with or without credential attainment, had positive effects on earnings. Jacobson and Mokher (2009) explored the impact of different types and levels of schooling on salaries of students enrolled in Florida in 1996. This descriptive study showed that higher levels of schooling are associated with higher earnings. The authors also provided evidence regarding the benefits of attending community colleges relative to dropping out of the postsecondary education system after high school. A similar study by Sanchez et al. (1999) analyzed data from 700,564 students from the California community college system during the 1992-1993 academic year and showed that there were positive payoffs to students attending a community college relative to high school graduates. Furthermore, they found this impact was particularly strong for economically disadvantaged students. A recent study (Jepsen, Troske, & Coomes, 2012) showed that among 2-year entrants, associate’s degrees and diplomas have higher returns than certificates. From the literature reviewed it is apparent that researchers have mostly focused on the returns to community college attendance relative to high school completion. To date there are

COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP 7 very few studies that have gone beyond this approach by comparing the annual salaries among two-year students (Dadgar & Trimble, 2014) and between 2- and 4-year students (Gill & Leigh, 2003; González Canché, 2012). Dadgar and Trimble (2014) relied on longitudinal non-selfreported educational and occupational records of students who attended the Washington State Community College System and found that earning an associate’s degree leads to positive increases in wages in practically any field, compared to earning some credits but not attaining a credential. González Canché (2012), relying on NELS:88-00 data, compared the salaries of 2and 4-year students who attained a bachelor’s degree within nine years of initial college enrollment and found no evidence of a salary gap. Similarly, Gill and Leigh (2003), who compared 2- and 4-year entrants’ salaries using data from the NLSY:79, found that 4-year college graduates, who started at a community college, were not at a substantial earnings disadvantage relative to those who started at a 4-year college. Furthermore, Gill and Leigh observed that 2-year students in terminal training programs had a positive payoff comparable to non-graduating students who started in a 4-year college. In summary, based on the literature reviewed, previous research would indicate that analysis of CC and non-CC scientists should not present differences in annual salaries as both groups attained similar levels of education (as found by Gill & Leigh, 2003 & González Canché, 2012) in prestigious fields. Nonetheless, empirical evidence is still required to corroborate this assumption. Conceptual framework This study relies upon social stratification and human capital theories. Under social stratification theory (see Marx, 1887, 2000), it is clear that hierarchies constructed by societies matter and that groups of people, institutions, or organizations that are able to place themselves close to the top of a given hierarchy will have access to more resources, power, and greater overall probabilities of success in maintaining their privilege. An important source of inequality in determining the payoffs associated with investment in human capital may be the level of prestige of the sector in which students began their college education (e.g., 2- or 4-year sector). While it is clear that these two groups of students, who eventually became scientists in STEM fields, valued education and invested time, effort, and monetary resources in expanding their knowledge (G. S. Becker, 1962), scientists beginning in the 2-year sector may be expected to

COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP 8 receive smaller returns on their investments, despite having successfully navigated the stratified higher education system. This expectation is justified when considering that economists have recognized that the payoff for investment in human capital, such as education and training, could be unequal and even skewed to favor certain groups over others, often based on characteristics such as gender, ethnicity, and family SES (G. S. Becker, 1962; Mincer, 1958). In synthesis, then, payoffs for investment in human capital are affected by differing levels of stratification wherein access to more selective institutions to some (or to a great) extent dictates students’ educational and occupational prospects. Human capital and stratification theories inform this study by positing that students had access to different forms of capital (Bourdieu, 1986) that affected their decisions to start college in the 2-year sector, which has been associated with significantly lower probabilities of academic success (Dougherty, 1992, 1994; Doyle, 2009; Long & Kurlaender, 2009; Stephan, Rosenbaum, & Person, 2009). Accordingly, the variables selected to estimate the models aimed at accounting for as many of those resources as possible that not only may have affected initial decisions to attend the 2-year sector, but also may explain salary variation. The sub-section titled “intersection between theoretically sound variables and data source”, contained in the methods section below, highlights the theoretical relevance of the variables selected to achieve the purposes of this study. Methods Data source and Analytic samples The data analyzed come from a total of 14 restricted-use datasets provided by the National Science Foundation’s Survey of Doctorate Recipients (SDR). Through a repeated panel design, the SDR has been administered every two or three years since 1973 to individuals who received a doctoral degree from a U.S. institution in a science, engineering, or health field. The efforts and investment of NSF and SDR involve continuously gathering data and concomitantly updating the analytic sample (including new doctorate recipients) while also following a subset of participants over time. This unique data gathering strategy allowed for the creation of two analytic samples. The first sample captured individual trends for scientists who participated in the 1995, 1997,

COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP 9 1999, 2001, and 2003 waves. The second analytic sample accounted for participants in the 1999, 2001, 2003, 2006, and 2008 surveys. The use of 1995 as the baseline survey for the first analytic sample (1995-2003) is due to the fact that 1995 was the first time SDR began collecting information about scientists’ initial college enrollment in a 2-year institution. It is worth noting that participants taking part in the two analytic samples used are mutually exclusive. In order to be included in either sample, scientists were required to have participated in five survey follow-ups with first (or baseline) participation having taken place in 1995 or 1999, depending on the sample. As such, none of the participants of the 1995-2003 follow-ups were included in the 1999-2008 sample. The second inclusion criterion is that scientists were required to have majored in STEM fields during their doctoral programs. Following Gonzalez and Kuenzi’s (2012) guidelines, the doctorate recipients with academic formation in social sciences and psychology were excluded from the models as these fields do not adhere to the STEM category. Once the inclusion criteria were implemented, sample sizes were the following. The 1995 dataset had an unweighted total sample of approximately 30,930 respondents1 with valid responses in the category of interest (community college attendance). The standard sampling procedures followed by SDR, briefly described above, rendered data on 26,500 cases followed over this 10-year period. The second analytic sample (1999-2008) contains 28,300 observations. The purpose of using two mutually exclusive samples that are nationally representative is two-fold. The first is based on testing for robustness of the results. That is, rarely are studies capable of relying on more than one sample comprised of completely different individuals who are also from the same population of interest to test the same hypotheses. Ideally, if the findings from the initial sample are based upon structure and not random chance, then they should be consistent with those found in the second sample, thus providing an empirical test for robustness and validity concerning the conclusions reached. The second purpose of using two samples is to test whether any potential salary disparities between CC and non-CC scientists remained constant or changed across samples (over time).

COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP Intersection between theoretically sound variables and data source

10

The variables used in the models were strategically selected to comprehensively capture CC and non-CC scientists’ access to various resources. The selection and inclusion of these variables is important given that research regarding community college has consistently shown that, compared to 4-year entrants, 2-year students come from socioeconomic backgrounds traditionally associated with higher risks of non-degree completion (Dougherty, 1992, 1994; Doyle, 2009; Long & Kurlaender, 2009; Stephan et al., 2009). Consequently, whenever possible, financial indicators should be included in any models comparing 2- and 4-year students’ outcomes. Such indicators, like parental socioeconomic resources and student loan debt outcomes across CC and non-CC scientists, are available in the SDR surveys. Accordingly, all financial information available in SDR was included in the models and can be found under the subtitle called socioeconomic indicators in Table 1. The current state of the literature would also indicate that differences in individual ability or skills between 2- and 4-year entrants play an important role in explaining college enrollment paths (i.e. starting in the 2- versus 4-year sector). Unfortunately, SDR data do not contain high school academic performance indicators or ACT/SAT standardized test scores that would help researchers corroborate or negate whether such differences were realized in the particular samples analyzed herein. The reason for the omission of such academic indicators is that the sample analyzed is not concerned with the typical high school to college transition phenomenon upon which the differences in the literature are based. All participants included in the analytic samples of this study attained the highest conferrable degree in STEM fields in what is arguably the most important higher education system worldwide. Thus, differences in individual ability or skills between CC and non-CC scientists may or may not have played an important role in explaining their college enrollment paths. The only known and certain fact associated with ability is that the enrollment paths between CC and non-CC scientists were different and that their reasons for doing so were unobserved. This raised the need to account for unobservables in the models. Given the fact that the datasets analyzed are individual panel, econometric theory and methods were employed to address issues of omitted or unobserved variable bias (Arellano, 2003; Wooldridge, 2010). To this end, the “Unobserved heterogeneity” section describes the

COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP 11 rationale followed to allow for the inclusion of individual unobserved differences in ability that may have affected salary differences and salary increases during the 10 years that each individual was observed. Figure 2 contains the analytic framework documenting all the steps and decisions made to fit the final set of models. The models also accounted for individual traits such as gender, ethnicity, and U.S. born status. To control for potential differences in academic formation and job market characteristics, the models included Ph.D. and post-Ph.D. indicators. The Ph.D. related indicators accounted for field of study, type of doctoral granting institution, and time to degree. The post-doctoral indicators included sector of employment, post-doctoral appointment,2 and type of employing institution. In addition, following a Mincerian wage equation (Mincer, 1974), experience in the job market after doctorate attainment along with its quadratic term were added to the models. These variables were measured in years since Ph.D. attainment as they indicate when individuals completed their highest degree and were expected to start working with no school related obligations. It is important to note that post-doctoral indicators are time-variant as, for instance, scientists could have moved from one sector of employment to a different one during the 10 year period in which they were followed. All the pre-doctoral indicators are time-invariant, as they did not change over this 10-year period given the retrospective nature of this information. Note further that the indicators shown in Table 1 are the scientists’ baseline conditions across samples conditional on CC status. Analytic techniques The main predictor of interest between analytical subgroups and across samples is CC status. The longitudinal, panel structure of the datasets provides an ideal framework to implement robust techniques designed with quasi-experimental rigor in mind but that still allow for relaxation of some of the assumptions associated with such approaches. To this end, this study primarily relies on instrumental variables estimation for dynamic panel data (Balestra & Varadharajan-Krishnakumar, 1987; Croissant, Millo, et al., 2008; Wooldridge, 2005). While this method contains the words “instrumental variables” in its title, the use of this concept is not the traditional instrumental variables (IV) approach as described by Angrist, Imbens, and Rubin (1996). In the traditional IV approach, researchers need to meet several key assumptions

COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP associated with a valid instrumental variable through the reliance on numerous ad hoc and

12

formal tests of these assumptions (see Angrist et al., 1996). Contrary to the complexity inherent to the traditional IV approach, the modeling technique employed herein does not require the introduction of external instruments but instead requires the use of a longitudinal panel dataset. In models of this type, the time-variant factors and covariates are considered the best instruments, conditional on adequately modeling their lagged and current influence on the outcome of interest (Wooldridge, 2005). This simplifies the modeling process by allowing one to straightforwardly model time-variant effects as contemporaneously exogenous instruments (Wooldridge, 2005). Due to the dynamism of these time-variant indicators, modeling their variation renders more precise estimations as the models will be based upon matrices that are fully robust to serial correlation and heteroskedasticity of the error terms (Balestra & Varadharajan-Krishnakumar, 1987; Croissant et al., 2008). It is important to emphasize that while using instrumental variables is a relevant approach when estimating dynamic panel data models, the instruments in this case are not performing their classic role of removing or reducing endogeneity between the key independent variable and the dependent variable but instead removing the endogeneity between the lagged dependent variable and the error term. That is, the approach followed in this study accounts more for an issue of simultaneity by modeling time-variant effects as contemporaneously exogenous instruments (Croissant et al., 2008; Wooldridge, 2005). Under this framework one begins with a standard linear panel data model

yit = αit + Xitβ + Uit, t = 1,...,T,

(1)

where i = 1,..., n is the individual index (scientists in this case), t = 1,...,T is the time index, the set of predictors or control variables Xit is 1 x K for all t and for each t we have a vector of instruments, Zit which is 1 x Lt, Lt ≥ K. The main assumption is that 𝐸(𝑍𝑖𝑡′ 𝑈𝑖𝑡 ) = 0, 𝑡 = 1, . . . , 𝑇. The instruments are contemporaneously exogenous so that Xit−1 is uncorrelated with Uit, where Uit is the disturbance term for individual i at time t. Consequently, we allow the dimension of Zit

COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP 13 to change with t so that the instruments used in some periods, which are not strictly exogenous in other time periods, are contemporaneously exogenous across the system (Wooldridge, 2005). Following Croissant et al. (2008), equation (1) can be expanded to include the first lag of timevariant indicators as instruments in the model. Thus, the instruments for time t can include the observed history of the response variable up through time t−2. Based on this framework we obtain E(Uit|Xit−1,Xit−2,...,Xi1) = 0 = E(𝑍𝑖𝑡′ Uit). Therefore, we can use prior lags of X as instruments (Xit−1 was used in this study) for the tth equation (see Balestra & VaradharajanKrishnakumar, 1987; Croissant et al., 2008; Wooldridge, 2005, for a more detailed explanation on this approach). Unobserved heterogeneity Although the variable selection employed in this study aimed at capturing as much observed information as possible, there are unobserved factors that (a) change from scientist to scientist but remain —to a great extent– constant over time (e.g. values and beliefs about college education) and others that (b) define each scientist but evolve over time and are likely to have an effect on decision making processes and salary variation. An example of the latter is a scientist’s ability or skill set at a given observation point, which defines each scientist’s employment prospects during that point in time. This skill set, however, is not fixed; to the contrary, it is expected to change as time passes and the scientist continues to gain experience in her/his profession and area of specialization. Another example of this type of factors is motivation, which also influences scientists’ decisions and salary prospects. Since these factors were not measured by the SDR, failing to include them in the models would translate into a problem of omitted variable bias. That is, to the extent that skill set, motivation, and values/beliefs are associated with at least one of the predictors, such as community college enrollment, this issue will escalate to a problem of endogeneity as one of the predictors will be correlated to the error term. The information conveyed by the unobserved factors described in items (a) and (b) above

COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP 14 is called individual heterogeneity (Arellano, 2003; Green, 1993). Since we did not observe it, we refer to this as unobserved heterogeneity (Arellano, 2003; Stock & Watson, 2007). One of the most common ways in which econometric theory for panel data deals with issues of unobserved heterogeneity is by relying on fixed or random effects (Arellano, 2003; Green, 1993; Stock & Watson, 2007; Wooldridge, 2010). Following this approach, the models accounted for the influence of unobserved individual and unobserved time effects. To this end equation (1) was expanded to the form

yit = αit + Xitβ + Zi𝛾1 + St𝛾2 + Uit, i = 1,...,n, t = 1,...,T,

(2)

where the term Zi accounts for unobserved factors captured in point (a) mentioned above, and St accounts for unobserved factors captured in point (b) also exemplified above. To test whether unobserved heterogeneity effects were needed in the model specifications, Lagrange multiplier tests were implemented as suggested by Gourieroux, Holly, and Monfort (1982). Under these tests, if the null hypotheses of unobserved heterogeneity effects are not rejected, it means that no individual- or time-fixed procedures are required. Both cases rendered significant χ2 test statistics, indicating the need to control for unobserved heterogeneity and supporting the inclusion of models relying on the specification shown in equation (2).3 Dealing with systematic observed differences The samples accounted for in this study concern highly trained professionals who have navigated undergraduate and graduate education successfully; nonetheless, to assume that the only observed difference between them is the sector where they started college would be shortsighted. Just as with the case of unobserved differences, CC scientists may be systematically different in the set of observed variables that define their academic and professional prospects. This concern is especially salient given that the body of literature that compares 2- and 4-year entrants has consistently shown 2-year entrants coming from systematically lower academic and SES backgrounds (González Canché, 2014). If these differences are observed in the analytic samples, then they are likely to affect CC-scientists’ job

COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP 15 prospects and opportunities for salary increases during the period studied above and beyond CC status. In order to evaluate whether systematic differences existed among CC and non-CC scientists, propensity score weighting techniques were employed. Given that the results indicated that these differences were present (see Figure 1), the modeling framework justified the need to rely on these weighting techniques to be able to estimate the effect of CC-scientist status once all the other observables were statistically the same. Recall that the propensity score as introduced by P. Rosenbaum and Rubin (1983) is obtained as follows e(x) = Pr(Zi = 1|Xi),

(3)

where e(x) is the propensity score, or the probability of being in the treated group (Zi = 1) given a set of theoretically relevant and empirically influential covariates Xi. In this framework all participants included in the sample will receive a propensity toward CC status that ranges from 0 to 1 given all theoretically sound predictors included in Table 1. Given that e(x) can take an infinite value, one method to utilize its statistical power is to rely on matching mechanisms (P. Rosenbaum & Rubin, 1983) where, conditional on e(x) values, the covariates Xi become balanced (see S. O. Becker & Ichino, 2002, for a survey of the most frequently used balancing mechanisms). Another use of e(x) consists in using it as a weight to create a balanced sample in “which the distribution of measured baseline covariates is independent of treatment assignment” (Austin, 2011, p. 408). The main advantage of the weighting method is that these weights can be used in a similar form to survey sampling weights, thus allowing researchers to use these weights in different statistical approaches. For example, the propensity score weights (PSW) could be used to improve the model specification shown in equation (2). The PSW are defined as follows 𝑓(𝑧=1|𝑥)

𝑒(𝑥)

𝑤(𝑥) = 𝐾 𝑓(𝑧=0|𝑥) = 𝐾 1−𝑒(𝑥)

(4)

where e(x) is the propensity score described above and K is a normalization constant that will cancel out in the outcomes analysis (Ridgeway, McCaffrey, Morral, Burgette, & Griffin, 2014, p. 27). Figure 1 displays the performance test of the PSW applied. More specifically it shows the pre- and post-weighting p.values of comparisons between CC and non-CC scientists individual

COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP characteristics presented in Table 1. Before weighting (closed circles), the groups have

16

statistically significant differences across many variables (i.e., p-values are near zero). After weighting the p-values are even larger than would be expected in a randomized study as these pvalues are distributed across the 45-degree line (see Ridgeway et al., 2014, for more details about this approach). [Insert Figure 1 about here] Although the weighting procedures created comparable groups across all observable characteristics, f(x|z = 1) = w(x)f(x|z = 0), propensity score methods cannot adjust for unobserved heterogeneity. Consequently, simply controlling for observables may be subject to omitted variable bias. To address this issue, the IV models for panel data with individual and time unobserved heterogeneity shown in equation (2) must be improved by including the weights obtained from the PSW approach. The final set of models is a combination of PSW and covariate adjustment that resulted in triply robust estimations as the models accounted for observables, unobservables, and covariate adjustment. Figure 2 summarizes all the methodological steps and the rationale followed to estimate these models. [Insert Figure 2 about here] Performance test under the IV panel models framework and diagnostic check After specifying the models, several performance tests were conducted (Croissant et al., 2008; Wooldridge, 2010). The poolability test assesses whether the same coefficients apply to each individual included in the sample across time. According to Croissant et al., this test is “a standard F test based on the comparison of a model obtained for the full sample and a model based on the estimation of an equation for each individual” (Croissant et al., 2008, p. 26). If this test renders non-significant results, researchers would not need to rely on panel data estimations as the coefficients, excluding the intercepts, would be the same and a simple Ordinary Least Squares model would suffice to obtain consistent results (Croissant et al., 2008). This test resulted in significant differences, indicating that, within samples, all coefficients exhibited sufficient instability to require the implementation of panel data analyses. Finally, a test for serial

COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP 17 autocorrelation for models with small time periods and large sample size (such as the models specified herein) was computed following Wooldridge (2002, 2010) and as implemented by Croissant et al. (2008). As expected, because the models used lagged time variant predictors, the results from this test indicated that residuals were robust to AR(1) issues across the four models. Findings The main outcome of interest across models is salary. Each salary amount reported by the scientists was adjusted for inflation using the Consumer Price Index Inflation Calculator provided by the Bureau of Labor Statistics (2015) and is represented in 2015 dollars. Summary statistics of salary and its variation over time based on CC status are presented in Figure 3. The raw information consistently reflects that CC scientists have earned lower salaries at each observation point across waves. The color saturation in the figure indicates a higher concentration of the distribution of salaries for each period observed. Based on the panel data structure of the datasets, each plot contains two regression lines that aimed at capturing trends. The red line corresponds to a local regression model accounting for the unconditional distribution of salary captured in the figure. The locality level of this regression was subjected to each observation point. If this red line drastically differed from a linear regression line fitted across time (the yellow line in the plots), then the assumption of linearity would not hold in the models. As observed, the figure provides evidence supporting the feasibility of fitting models that assume linearity as a function of time variation across cohorts and within groups. [Insert Figure 3 about here] Table 1 contains the baseline predictor variables of the two samples analyzed.4 The first baseline pertains to the 1995 survey participants; the second is for the 1999 survey participants. Each baseline was separated by CC status and shows a p.value for the difference between groups within samples. For instance, we can see that in the 1995-2003 sample 91% of the CC scientists were born in the US, compared to 76% of non-CC scientists and this difference is statistically significant (p< .001). The 1999-2008 sample followed a similar pattern with 92% and 79% of CC and non-CC scientists, respectively, being US-born. This means that scientists who use the 2-

COMMUNITY COLLEGE SCIENTISTS AND SALARY GAP year path were significantly more likely to have been born in the U.S. compared to non-CC

18

scientists. [Insert Table 1 about here] Consistent with research indicating that 2-year entrants have fewer monetary resources, Table 1 shows that in both samples CC scientists appeared to have relied more on loans than their non-CC counterparts, and these differences are significant (p< .001). Indeed, CC scientists consistently reported having accrued debt amounts between $20,001 and $30,000 (21% and 14% in the 1995 and 1999 samples, respectively). Parental education showed similar patterns across samples. Table 1 shows that CC scientists tended to have parents without college degrees (some college, high school or no high school diploma). The opposite pattern was found for non-CC scientists as they generally came from families where parents had 4-year college degrees or more. Gender composition was not significantly different in the two samples. Overall, the samples are comprised of men (around 75%), and women were equally present in the 2- and 4year sectors across samples. CC scientists were more likely to be white than their non-CC counterparts (about 84% and 73%, respectively), a finding that was consistent and significant across samples (p< .001). Marital status presented no differences across samples (approximately 80% of participants were married); CC scientists, however, consistently tended to have more dependents than their non-CC counterparts (p< .001). Regarding the type of doctoral granting institution, the only difference was that CC scientists tended to have enrolled in Research II institutions at a greater rate across samples (p