Transport Costs and Economic Geography

0 downloads 0 Views 5MB Size Report
Apr 1, 2011 - is produced by the state electricity company, Perusahaan Listrik ...... roads (K1-K3), though the category with the next largest coverage is the ...
Transport Costs and Economic Geography: Evidence from Indonesia’s Highways Alexander D. Rothenberg∗ April 1, 2011

PRELIMINARY DRAFT: DO NOT CITE Abstract How do road improvements affect the spatial distribution of economic activity? Existing theories of economic geography offer conflicting predictions. In this paper, I study how manufacturing location choices responded to changes in road quality in Indonesia. Using new data, I document massive upgrades to the highway networks of Java, Sumatra, and Sulawesi during the 1990s, a period in which national funding for transport infrastructure increased by 83 percent. I first show how these road improvements were accompanied by a significant dispersion of manufacturing activity, away from urban areas. The amount of dispersion varied across industries in predictable ways. Next, I develop a structural model of firm location choice in which firms face a trade off between locating closer to their sources of demand and paying higher factor prices. The model can be estimated with discrete choice techniques, but identifying its parameters is extremely challenging. For instance, many of the location characteristics that firms consider when determining where to operate are themselves affected by the decisions that firms make, creating simultaneity problems. Using techniques to estimate a random coefficients logit model with endogenous choice characteristics, I find significant differences in willingness to pay for market access between different industrial sectors. Counterfactual policy simulations suggest that road improvements cause dispersion, but only to the immediate peripheries of existing urban areas. I thank Frederico Finan, Paul Gertler, Bryan Graham, Robert Helsley, Maximilian Kasy, Patrick Kline, Edward Miguel, and Sarath Sanga for helpful comments and suggestions. I also thank Glen Stringer for several invaluable detailed discussions the IRMS dataset, as well as Ir. Taufik Widjoyono, Ir. Julius J. Sohilait, and Yohanes Richwanto at Departmen Pekerjaan Umum for generously granting me access to their data and answering my many questions about it. This work was supported, in part, by a dissertation fellowship from the Fischer Center for Real Estate and Urban Economics, as well as grants from the Institute of Business and Economic Research (IBER) and the Center of Evaluation of Global Action (CEGA). ∗

1

1

Introduction

Inequality in the distribution of economic activity across space is a central feature of modern economies. In Indonesia, as in many countries, regional differences in output and manufacturing employment are large and striking. For instance, in 1990, per capita regional product of the richest province (East Kalimantan) was 16 times as large as that of the poorest province (East Nusa Tenggara).1 Manufacturing activity is also heavily concentrated in Indonesia. Four provinces on Java (East, West, Central, and DKI Jakarta) constitute over 75% of the nation’s manufacturing employment, and only a handful of provinces outside of Java contain significant portions of manufacturing activity. Although spatial inequalities and dense concentrations of economic activity are widely viewed as productive, there are considerable costs. Large agglomerations are often badly congested, as commuters do not internalize the externalities they impose upon others, and this congestion creates serious environmental problems. A long literature in urban economics discusses how market outcomes yield inefficiently large city sizes (e.g. Henderson, 1974), and nowhere is this more relevant than in Jakarta, where commuting times, even for short distances, are often astronomical. Moreover, regional differences in per-capita incomes can also create political controversy, generally over the levels of transfers between prosperous areas and poorer regions.2 In Indonesia, for instance, violent separatist movements of resource-wealthy regions in Kalimantan and Sumatra in the 1950s (e.g. Aceh) threatened to overwhelm Sukarno’s presidency and fragment the Indonesian nation. Regional tensions over how to properly distribute natural resource wealth between its sources in remote areas and the political center continued throughout Suharto’s regime. While it is far from clear on a priori grounds that policymakers should try to improve spatial inequalities, roads and transportation improvements have often been cited as a potential regional policy lever. For instance, in planning documents, the government of Indonesia has stated that investments in transport infrastructure “support economic growth, national stability, and the equitable distribution and dissemination of development efforts, penetrating the isolation and backwardness of remote areas, to further strengthen the Archipelago.”3 1 This figure is taken from Hill (2000). Note that it is possibly misleading, because the GDRP figures contain oil and mining revenues. However, even looking at per-capita non-oil gross GDRP, Indonesia’s richest province (DKI Jakarta) in 1990 was nearly 8 times richer the poorest province (East Nusa Tenggara). 2 Road improvements can also be important for political reasons. Friend (2003) quotes a village headman from Central Java in 1999 saying the following: “This village voted PPP in 1982. Then we were promised our road would be asphalted if we voted Golkar, so we did, in 1987, 1992, and 1997. But nothing has happened yet. This time I’m neutral. If the people want to choose whoever, let them.” (p. 210). 3 This quotation, translated by the author, is taken from a planning document describing transportation

2

This notion that road improvements can foster growth in remote areas has some support in economic theory, particularly in the literature on regional science and economic geography. For instance, as modeled by Helpman (1998), when cities become better connected with their hinterlands, firms may opt to locate outside of urban borders, taking advantage of access to cheaper land and labor. However, existing theories in this literature offer conflicting predictions about how lower trade costs affect the spatial distribution of economic activity. In the classic core-periphery model of Krugman (1991), reducing trade costs between two regions causes firms to agglomerate, pulling the entire manufacturing sector into one region. Thus, road improvements may actually exacerbate spatial inequalities instead of reducing them. The absence of sharp theoretical predictions calls for credible empirical work. In this paper, I study this question by examining the response of Indonesian manufacturing plants to large road improvements in the 1990s. During the 1970s and 1980s, quality paved highways in Indonesia were generally confined to a few major arteries connecting provincial capitals and other large cities. However, in the early 1990s, there was an 83 percent increase in funding allocated for road improvements, and road networks throughout the archipelago were rapidly improved. Improvement projects were not uniform over space or time, producing substantial variation in transport costs between locations that can be used to more credibly estimate their effects. To construct measures of transport costs, I make use of new data from Indonesia’s Integrated Road Management System. These data are extremely detailed, documenting the evolution of road quality measures (width, roughness, and surface type) along kilometer-post intervals of major inter-urban highways, annually from 1990 to 2007. These data are rich enough to enable the creation of annual transport cost measures between districts on the islands of Java, Sumatra, and Sulawesi, the three islands with the largest amounts of population and manufacturing activity in Indonesia. Using a series of reduced form exercises, I first document that Indonesia’s road improvements induced a modest dispersion of manufacturing activity. During the same period in which road improvement projects were occurring, I find that the spatial concentration of manufacturing employment, measured by the spatial Herfindahl and the Ellison and Glaeser (1997) index, fell on average by 20-25 percent. Importantly, the amount of dispersion varies across industries in predictable ways. For instance, producers of perishable goods, which deteriorate rapidly in transit and need to be consumed close to where they are produced, did not experience any dispersion, while producers of durable goods did. I also find that over the period, new manufacturing plants were locating more development objectives in Repelita VI. Similar sentiments are echoed throughout other planning documents.

3

intensively away from existing agglomerations, into neighboring areas. However, they were not locating in the remotest parts of Indonesia. Finally, using a series of linear fixed effects regressions, which are identified due to aspects of Indonesia’s road improvement program, I estimate positive, significant average effects of road improvements on new manufacturing establishments and employment. Unfortunately, this reduced form analysis has several limitations. It does little to uncover any mechanisms behind these effects, and because it only delivers an average treatment effect, it is unable to shed light on any heterogeneity in the responses of different regions to transport cost reductions. In order to provide richer, more policy-relevant counterfactual predictions, I next develop and estimate a structural model determining firm location choices. I begin with a multiple-region model of monopolistic competition and regional trade (e.g. Head and Mayer, 2004b), in which firms face a tradeoff between locating closer to their sources of demand and paying higher factor prices for production. The model’s key prediction is that firm profits depend on a location’s market potential (Harris, 1954), a weighted average of real regional incomes, where the weights decline with transport costs. Importantly, the model allows for sectoral differences in willingness to substitute between different location characteristics, motivated by the industry differences highlighted in the reduced form exercises. With some distributional assumptions on the unobserved components, I show how parameters of the model that govern firm location decisions can be estimated with discrete choice techniques. Unfortunately, identification is extremely challenging. Many characteristics of locations that firms consider when determining where to operate (including local wages, rents, and access to other markets) are themselves affected by the decisions that firms make, and this simultaneity undermines causal interpretation. New road improvements may also be targeted to developing particular areas, and estimates of the effects of better market access may be confounded with the fact that areas with more accessibility were selected by policymakers, creating targeting bias. Moreover, without data on how wages, rents, and market access vary across space and time, it seems impossible to distinguish features of firm profit functions that depend on these characteristics from those that depend on fixed natural productive amenities, many of which may be unobserved.4 Variation in location characteristics over time, especially variation in transport costs inducing changes in market access, is difficult to exploit because few datasets exist. This is especially true for a developing country context, where agglomerations are nascent and where we would expect 4 It is also worth emphasizing that the most important natural amenities of a particular location may well reflect transport costs. For instance, because maritime trade has been so fundamental to economic activity in the archipelago, many of Indonesia’s cities are coastal or riverine ports.

4

to see the greatest location effects of transport cost improvements, to the extent that they matter at all. To overcome these identification problems, I make use of new panel data on location characteristics and adapt techniques from industrial organization that allow researchers to estimate discrete choice models with endogenous choice characteristics (Berry et al., 1995). Annual data on road quality and market access enable me to control for time-invariant unobservables that might be correlated with the provision of infrastructure. For example, in Indonesia, long-term spatial plans dictated that certain areas would be targeted for road improvements. These plans were revised only once a decade, and to the extent that they were adopted, controlling for location fixed effects enables me to remove any targeting bias from parameter estimates. Fixed effects also allow me to separate out the effect of other unobserved factors, such as time-invariant productive amenities, from parameter estimates, so that I can isolate the effects of road improvements from omitted factors. To deal with simultaneity problems associated with identifying choice parameters, I combine the inclusion of location fixed effects with several sets of instrumental variables. I first use functions of lagged location characteristics as instruments for current location characteristics, under the assumption that regional productivity shocks are innovations, unpredictable given past information. This strictly relaxes the identification assumptions required for estimation with fixed effects alone. Next, I search for factor price shifters: instruments that affect wages, rents, taxes, and market access across locations but are unrelated to firm choices or choices made by policymakers. In Indonesia, many regional economies are predominantly agricultural, and because of this, changes in rainfall and earthquakes can explain variation within locations over time in output, rents, and wages. The fact that natural phenomena can partially predict movements in wages, rents, and market access gives me an additional way of breaking the simultaneity problem. However, this may create new problems, since estimates with the rainfall and earthquake instruments deliver effects only for compliers with those instruments (Imbens and Angrist, 1994), and this subpopulation of firms may be peculiar. I find that better market access increases a location’s mean profits, but there is substantial heterogeneity in firms’ willingness to pay for greater market access across industrial sectors. Textile firms and food producers have stronger preferences for locating closer to large markets than chemical manufacturers, makers of wood products, and other firms producing more durable goods. From counterfactual simulations of what would have happened to firm locations had the government of Indonesia more equitably paved its highways in earlier periods, I find that road improvements generally induce a suburbanization of industry. With better roads, manufacturing activity would have moved 5

further outside of existing urban centers, but it would not have relocated to the remotest parts of Indonesia. This work contributes to a long-standing research program that focuses on testing models of economic geography (e.g. Davis and Weinstein, 2003; Redding and Sturm, 2008). Within this literature, there is a line of research that uses discrete choice models to estimate firm location choices, dating back to Carlton (1983). The vast majority of papers in this literature estimate choices for a single cross-section of firms (Coughlin et al., 1991; Head et al., 1995; Henderson and Kuncoro, 1996; Head and Mayer, 2004b; Deichmann et al., 2005). While pioneering, this research design makes it impossible to distinguish between the effects of observed location characteristics and the effects of fixed unobservable factors. Moreover, in this literature, firm profit functions are generally specified in an ad-hoc manner, with little attempt to connect location choices to theory (Head and Mayer, 2004a). The few papers that actually derive choice probabilities from theory and study the effect of transport costs or market potential on firm locations make use of noisy measures of transport costs, such as road density (Deichmann et al., 2005) or physical distance (Head and Mayer, 2004b). Given these noisy measures, it isn’t surprising that existing work has found only modest effects of better market access on firm location choice probabilities. Most importantly, to my knowledge, no papers have adequately addressed the fundamental endogeneity problems associated with estimating firm location choices. The fact that a location’s wages, rents, and market potential not only affect firms choices but are also affected by the choices that firms make necessitates the use of instrumental variables. By working with a theoretical model, constructing a time-varying measure of transport costs from a dataset on road quality, estimating the model on a panel of new firms, and using modern techniques to address the endogeneity of location characteristics, this paper should make several contributions to the empirical literature on economic geography and transportation. Section 2 describes Indonesia’s road construction program and manufacturing activity in the late 1980s and 1990s, providing some background information on the policy experiment. Section 3 describes a new dataset on road quality in Indonesia and discusses how these data are used to construct proxies for transport costs. It also discusses data on newly entering manufacturing firms and the characteristic space of locations. Section 4 presents some reduced form evidence on how road improvements induced dispersion of manufacturing activity. To better understand the mechanisms behind the observed dispersion and to develop richer counterfactual predictions, Section 5 presents a structural model of monopolistic competition and regional trade. After delivering an expression that relates the probability a firm chooses a certain location to observable and unobservable characteristics, 6

it discusses identification and estimation issues. Section 6 presents parameter estimates from the choice model and the results of policy simulations, and Section 7 concludes.

2

Roads and Manufacturing in the 1990s

Although infamous for political repression, violence, and corruption, Suharto’s regime had an extraordinary development record. During the three decades in which he was in power, annual growth rates of GDP were around 5% per year, and the poverty rate fell from 60% in the mid 1960s to around 10% in the early 1990s (Hill, 2000). One reason for Indonesia’s economic successes was that the government invested heavily in major public works programs. These investments included building schools and hospitals, developing irrigation systems, and improving transport infrastructure. Roads that were developed by the Dutch colonial authorities in the late 18th and early 19th centuries were left to crumble and deteriorate during Sukarno’s regime after Indonesia gained independence (1945-1967). Recognizing the need to quickly improve the country’s infrastructure, Suharto made road improvements a priority of his first two five-year development plans, Repelita I (1969-1974) and Repelita II (1974-1979).5 However, funding was insufficient for broad transport improvements, and the roads that were upgraded were mostly major arteries connecting provincial capitals and other major urban centers. Projects during this period included the Trans-Java and Trans-Sumatra Highways, which were designed by the Dutch and largely consisted of metalled (gravel) roads. Rehabilitation of the major North-South artery of the Trans-Sumatran Highway began in the late 1970s, but large sections connecting the Southern provinces remained unfinished until the early 1980s. Importantly, improvements at this time were not targeted at smaller roads or roads directly connecting rural areas to the rest of the network.6 After the collapse of oil revenues in the late 1970s, spending on road infrastructure slowed considerably and was not a priority of either Repelita III (1979-1984) or Repelita IV (1984-1989). This was a transitional period for the Indonesian economy, as slow growth led to a reconsideration of the inefficient, state-controlled industrial policies that had been characteristic of the early Suharto years. During the mid 1980s, it became clear that in order to encourage more rapid growth, the private sector, and particularly labor-intensive manufacturing, would need to be supported. This period saw several banking deregulations, In Bahasa Indonesia, the phrase rencana pembangunan lima tahun is literally translated as “five year development plan”. In characteristic Indonesian fashion, this phrase is seldom spelled out but instead expressed by the acronym Repelita. 6 Discussion of transportation improvements in Indonesia during this period is difficult to find in the literature, but Leinbach (1989) and Azis (1990) provide some useful discussion. 5

7

making it easier for firms to acquire financial capital, and it saw a number of trade reforms (reform of the customs system, reduction of trade barriers) that encouraged export growth in the private sector. As manufacturing activity started to grow again, and roads that were improved in the 1970s began to require heavy maintenance, transportation again became a large development priority toward the end of the decade. Table 1 shows the massive increase in funding allocated to improving roads between Repelitas IV, V, and VI. During Repelita IV, the total budget for road improvements was 17.8 trillion IDR (in constant 2000 IDR). This was increased by 84 percent in Repelita V (1989-1994), to a sum of 32.7 trillion IDR. Transportation investments were the single largest item of the budget during Repelita V, forming nearly 18 percent of total planned development expenditures. Funds for road improvements in Repelita VI (1994-1999) were planned to be kept at similar levels as the first half of the decade, but the Asian financial crisis of 1997-1998 and its concurrent political upheaval resulted in less spending than originally intended. During the 1990s, road improvements were substantial and aimed at a much wider variety of projects than before. Explicit attention was given to connecting rural and less densely populated areas, and to infrastructure improvements outside of the major islands. Roads that had previously been under the control of local kabupaten governments were reassigned to the national government and were rapidly improved. The large increases in budgeted spending translated into huge improvements in the network. For instance, according to new data described in the next section, in 1990, 84 percent of Sulawesi’s network of national roads were unpaved. However, after a decade, only 46 percent of the network remained unpaved. In Sumatra, 68 percent of the network was unpaved in 1990, but by 2000, only 30 percent of the network was unpaved. Importantly, road improvements during this period were designed to adhere to long-term national spatial plans. These plans dictated that particular regions should receive infrastructure improvements, and they were revised very infrequently (approximately once a decade). This suggests that the road authorities were not very sophisticated in their targeting, as they were not regularly responding to changes in outcomes, and it also suggests that location fixed effects can remove much of the targeting bias. As the road network improved and rapidly expanded, Indonesia’s manufacturing sector grew considerably. From 1985 to 1992, manufactured exports grew at an average annual rate of over 20 percent in real terms, while the share of labor intensive manufactures grew from 40 percent of exports in 1982 to over 60 percent in 1992.7 After the Asian Financial Crisis (1997-1998), in which Indonesia experienced a massive 7

For more details on the rise of labor-intensive manufacturing in Indonesia, see Hill (2000).

8

exchange-rate depreciation that caused a financial crisis and political upheaval, spending on transport infrastructure slowed considerably. Moreover, local governments began to assert more authority during Indonesia’s program of decentralization, and this involved transferring the maintenance of many national roads to local kabupaten goverments. Anecdotal evidence suggests that many kabupaten governments were not well equipped to maintain the roads under their charge, and roads began to deteriorate. A special report by The Economist found that by 2007, infrastructure was the top contributor to the cost of doing business in Indonesia.8 Many urban road networks, particularly in Jakarta and other large cities, remain woefully inadequate to handle the growing capacity of cars, trucks, and motorcycles. Congestion contributes to higher commuting times and causes serious environmental problems.

2.1

To Do

• Read Hudalah, Delik (2010) “Peri-urban planning in Indonesia: Contexts, approaches and institutional capacity”. PhD Thesis. Expand the discussion of spatial planning.

3

Data and Measurement

In this section, I first discuss new data that document these highway improvements and their subsequent deterioration, and I explain how they are used to construct a panel of transport costs between locations. I also discuss Indonesia’s Survei Industry (SI), an annual census of manufacturing firms with more than 25 employees. More details on the data I use, as well as their various sources, can be found in Appendix A.

3.1

Data on Highway Improvements

Many of the roads and highways used in Indonesia today have been around in some form for centuries, meaning that their effects can only be studied by using variation in quality over time. This type of variation is different from the spatial variation in infrastructure access often used in the literature (Michaels, 2008; Donaldson, 2009). An understanding of the effects of road quality improvements is likely to be more relevant for policymakers acting in developing countries, since it is often much cheaper to repair and upgrade existing roads than to build new ones. Building new roads requires land acquisition, detailed feasibility studies, and considerably more materials than resurfacing existing roads. 8

The Economist, “A Special Report on Indonesia: A Golden Chance” September 12, 2009.

9

Data on the evolution of road quality in Indonesia come from a unique source: the Integrated Road Management System (IRMS), maintained by the Department of Public Works (Departemen Pekerjaan Umum, or DPU). In the late 1980s, DPU began to conduct extensive annual surveys of its road networks, collecting data along kilometer-post intervals of all major highways. Road quality surveys were conducted by a team of surveyors, who collected data on surface type, width, and interval data for computing the international roughness index (IRI).9 The original dataset is extremely detailed, with more than 1.2 million kilometer-post-interval-year observations. Although some of the road-link identifiers changed as roads were upgraded and given new function classifications, it is possible to merge the kilometer-post interval data to shapefiles of the road networks. This yields a panel of quality along major inter-urban roads from 1990 to 2007.10 Figures 1, 2, and 3 depict the evolution of pavement along the highway networks of Java, Sumatra, and Sulawesi respectively. These show considerable spatial variation in the timing and extent of the improvements, and they also highlight the magnitude of the road improvement program.

3.2

Measuring Transport Costs

Measuring the transport costs associated with regional trade is extremely challenging in the Indonesian context. A common approach in the trade literature is to first estimate a gravity equation, using detailed data on regional trade flows, and to back out transport costs from parameter estimates.11 Unfortunately, regional trade flow data have never been systematically collected in Indonesia, so this approach is infeasible. Another method involves backing out transport costs from price differences. This requires invoking an iceberg trade costs assumption (Samuelson, 1954), prior knowledge of where certain goods are produced, and observations of prices of that good in various locations.12 Although BPS collects detailed data on goods prices used in constructing the CPI, they do so only for a limited number of provincial capital cities, making it difficult to exploit much spatial The international roughness index (IRI) is a measure of road quality that was developed by the World Bank in the 1980s. It is constructed as the ratio of a vehicle’s accumulated suspension motion (in meters), divided by the distance travelled by the vehicle during measurement (in kilometers). Expressed in units of slope (m/km), IRI is a characteristic of a vehicle’s longitudinal profile. See Appendix A.1.2 for more details on IRI is and how it was measured. 10 Appendix A.1 presents more detail about the road quality data, and in particular the process of merging the interval data to network shapefiles and the creation of variables. 11 See Anderson and Van Wincoop (2004). 12 For instance, using the locations of mines producing different varieties of salt in colonial India and information on changes in the railway network, Donaldson (2009) relates changes in price differences to changes in railway access to obtain the effect of railroad improvements on transport costs. 9

10

variation.13 Moreover, many of these provincial capitals are also ports, so trade between them would not necessarily rely on using the road network. It is also difficult to pin down goods that are only produced in a single location.14 Faced with these challenges, I construct a reasonable proxy for transport costs using the data available to me. The proxy for transport costs is based on road roughness. When faced with potholes, ragged pavement, or unpaved surfaces, drivers slow down, and this reduction in speed increases travel time and hence the cost of travel. Of course, there is not a one-to-one relationship between road roughness and speed, because drivers choose the speed at which they travel, and different preferences for ride smoothness or the desired arrival time might induce different choices of speed. Yu et al. (2006) provide a mapping between subjective measures of ride quality and roughness at different speeds. This mapping can be used to infer the maximum speed that one can travel over a road with a given roughness level in order to maintain a certain level of ride quality. Given this roughness-induced speed limit, it is straightforward to calculate travel times along network arcs and to compute the shortest path between kabupaten centroids, using travel-time as the single cost factor (Dijkstra, 1959). Note that the travel times on road sections were computed using speeds derived from the extremely detailed kilometer-post-interval roughness data, which were then aggregated to form cost measures for the network arcs.15 Travel time is a useful way of measuring transport costs, because it is correlated with distance (and hence fuel consumption) and should also be related to drivers’ wage bills. Conducting surveys of trucking firms in throughout Indonesia, the Asia Foundation (2008) found that fuel and labor costs were the largest contributors to vehicle operating costs, reinforcing confidence in this measure.16 While most of the variation in travel times will be Another approach would be to make use of price data taken from consumption surveys, such as the SUSENAS. Unfortunately, prices are not directly recorded in those surveys, and the researcher must back out unit values from total quantities consumed and total values purchased. As discussed by Deaton (1997), unit values are subject to considerable measurement problems. Because differences in unit values across locations only compound these difficulties, it isn’t clear if much can be learned from trying to predict them. 14 Yet another approach would be to draw on civil engineering models, such as the World Bank’s HDM-IV, which estimate land transport costs more directly, through models of how vehicles consume different amounts of fuel along roads of different geometries and roughness levels. These models require a number of choices for calibration parameters and involve detailed data, some of which I have for this context, but others which I do not. The closest attempt at using this approach in economics is Combes and Lafourcade (2005). 15 See Appendix A.1.4 for more details. Note that the travel time measure incorporates a continuous measure of road quality, the international roughness index (IRI), rather than a simpler binary measure for whether not a road is paved. This was done to better match the transportation literature, but both measures are highly correlated. 16 Fuel and labor costs amounted to 53 percent of vehicle operating costs on average, according to their surveys. Other cost factors included lubricants and tires (13%), and other maintenance costs (4%), all of which should increase as cars are driven on rougher roads and experience wear and tear. The remaining cost factors are depreciation (5%), interest (10%), and others (5%) which should be less affected by road 13

11

from differences in quality of given roads over time, some new toll roads were also constructed during the period, creating variation in physical distances (and speeds) that is also captured in the measure.17 Table 2 presents summary statistics of average transport costs between a given kabupaten and all other kabupatens on that island for Java, Sulawesi, and Sumatra, for the period 1990-2005. Physical distances did not change substantially, because only a few toll roads opened up over the period, and these new toll roads were confined exclusively to Java. The average distance falls very slightly, but travel times decrease significantly from 1990 to 2000 (17 percent). Although physical distances remained unchanged in Sumatra, travel times fell by 24.3 percent, on average, from 1990 to 2000. Similarly, travel times in Sulawesi fell an average of 38.3 percent over the time period, despite any change in physical distances.18 The average summary statistics presented here mask substantial geographic and temporal variation in the areas that received the largest improvements in transport costs. Looking across Java, the largest improvements in travel times over the 1990-2000 period occured in Central Java (Figure 4). For Sumatra, the largest reductions in travel times occurred in the provinces of Riau, Jambi, Bengkulu, and South Sumatra (Figure 5). For Sulawesi, the provinces that received the largest improvements in average transport costs were Gorontalo and South Sulawesi (Figure 6).

3.3

Survey of Manufacturing Firms

Data on manufacturing plants comes from Indonesia’s Annual Survey of Manufacturing Establishments (Survei Tahunan Perusahaan Industri Pengolahan, or SI). The SI is intended to be a complete annual enumeration of manufacturing plants with over 20 employees. Administered by the Industrial Statistics Division of Indonesia’s central statistical agency (Badan Pusat Statistik, or BPS), the survey is extremely detailed, recording information on plant employment sizes, their industry of operation, cost variables, and measures of output and value added. Importantly for this work, enumerators recorded each plant’s operating location at the kabupaten level, enabling me to link the roughness. 17 Toll roads were always coded with minimum levels of roughness when they are introduced. Because the fee for using toll roads is generally very small compared to the value of goods or services shipped, I ignore it when measuring transport costs. 18 Another stark feature of the table is the significant increase in travel times from 2000 to 2005. This is due both to the rapid deterioration of roads typical in tropical countries, but also because of changes in government road management policies following Indonesia’s decentralization.

12

firm-level data to data on transport costs and other location characteristics.19 Although the dataset has firm-level identifiers which enable the recovery of panel structure, in practice, firms do not change their kabupaten of residence.20 I instead treat the data as a repeated cross-section of new firms, each of which begins production during the 1990-2005 period. This was a period over which transport costs were rapidly improved, and after which they subsequently deteriorated. New firms are counted when they appear in the dataset having never appeared before. Occasionally firms were not surveyed during their first year of operation, but since enumerators record each firm’s starting year, I can accurately time the entry of all firms in the sample.21

3.4

To Do

• Need to add a discussion of ports and the creation of links between islands. • Need to add information about the source of toll road improvements here. • More discussion of sample selection (dropping state-owned enterprises) would be prudent.

4

Trends in Industrial Location

In this section, I present some reduced form evidence on how the locations of Indonesian manufacturing plants changed in response to changes in transport costs. My aim is to keep this analysis as descriptive as possible, without imposing much structure on the data. I first discuss how industrial concentration measures have evolved over time for different industries. I then discuss trends in how new firms located in different types of regions. Finally, I link the changes in observed industrial concentrations to changes in market access.

4.1

Measures of Industrial Concentration

In the period from 1985 to 1996, Indonesian manufacturing was marked by a substantial number of new entrants. The upper-left panel of Figure 7 plots a time series of the number of new firms operating each year. In 1985, almost 1,700 new firms entered the market, and Throughout the discussion, I use plants and firms interchangeably. This is not technically correct, because in principle, multiple plants could be owned and operated by the same firm. However, in practice, it is likely that less than 5% of plants in the dataset are operated by multi-plant firms (Blalock and Gertler, 2008). 20 When firms do change locations, it is generally due to a coding error, since they often switch back to their original location in the next year. 21 Note that the starting year variable was not collected after 2000, so for 2001-2005, entry is determined by the first year that the plant’s unique identifier appears in the panel. 19

13

entry peaked in 1990 with over 2,100 new firms. However, in 1997, Indonesia experienced a massive exchange-rate depreciation that caused a financial crisis and political upheaval, and this coincided with a large reduction in entry for the years 1997-1999. Entry recovered to pre-crisis levels by 2001, though the series exhibits some volatility from 2001-2005.22 As new firms entered the Indonesian market, they moved away from existing agglomerations, and this resulted in reductions in industrial concentration across space. The simplest way to measure this is to construct the spatial employment Herfindahl index, which measures the probability that any two workers, randomly selected in a given year, � � k �2 k work in the same region. The index is defined by Htk = R r=1 λrt , where λrt is the share of industry k’s employment in region r at time t. This index was constructed for each 5-digit ISIC and year, and the upper-right panel of Figure 7 depicts how the mean and medians of this index evolved over time. From 1985 to 1996, average spatial concentration across all industries fell from 0.139 to 0.111, a reduction of over 20 percent.23 The index was pretty flat over the crisis, though it did increase slightly in 2001, leveling off from 2002-2005.24 Ellison and Glaeser (1997) emphasize that typical measures of spatial concentration, such as the spatial Herfindahl, do not adequately account for lumpiness in the plant size distribution. That is, an industry should not be classified as concentrated simply because it has a few large firms that dominate employment shares. To investigate the possibility that the drop in employment Herfindahls was due to changes in the plant size distribution, I � k 2 construct plant size Herfindahls for each industry-year, given by HPtk = M s=1 zkt , where zkt is the share of workers in industry k who work in plant s at time t, where s = 1, ..., M k . As depicted in the bottom-left panel of Figure 7, plant size Herfindahls were decreasing on average from 1985-1996. This indicates that some of the observed reduction in spatial concentration might be a product of the fact that new entrants over this period were generally smaller firms, and the reduction in concentration across plants mechanically corresponds to reductions in the spatial concentration measure. However, even when we compute the concentration index proposed by Ellison and Glaeser (1997) to adjust for changes in the distribution of plant sizes, reductions in concentration are still very apparent.25 As depicted in the bottom right panel of Figure 7, Some of the volatility in 2001-2005 may be accounted for by the fact that the starting year variable disappears in 2001 and I have to construct it by using the first year a new firm identifier appears in the panel. 23 This difference is statistically significant at an α = 0.05 level using a two-sample comparison of means (t = 3.760, 2-sided p−value = 0.0003, 1-sided p−value = 0.0001). 24 Note that in selecting the sample of industries for the analysis, I dropped industries if they were missing too many firm-year observations to construct a consistent measure over time or if they had fewer than 10 firms throughout the period. In a few instances, very similar ISIC codes were merged together in order to avoid dropping both from the sample. 25 To construct the Ellison and Glaeser (1997) index, it is helpful to first compute the “sum of square 22

14

the Ellison and Glaeser (1997) index experiences a sharp decline, from an average of 0.046 in 1985 to 0.033 in 1996, a reduction of over 25 percent.26 Interestingly, while spatial Herfindahls increased on average from 2001 to 2005, the Ellison and Glaeser (1997) index stays more or less flat over this period, suggesting that the post crisis increases in spatial employment Herfindahls were due to entirely to the fact that new firms entering after the crisis were larger than before. A variety of industries experienced reductions in concentration, as shown in Table 3. This table summarizes average changes in concentration across all 5-digit ISIC codes nested within a given 2-digit ISIC code. The furniture and wood products industry (ISIC 33) experienced the largest dispersion over the 1990-1996 period, with an average percent change in employment concentration of almost 30 percent. Reductions in concentration occurred for eight out of the ten 5-digit codes that comprise the 2-digit ISIC, with major reductions for producers of wood veneer and excelsior (ISIC 33114), home furnishings (ISIC 33230), and handicraft and wood carving (ISIC 33140). Food and beverage producers (ISIC 31) also experienced large reductions in concentration, with major dispersion for canned fruits and vegetables (ISIC 31130) and for preserved, processed meat (ISIC 31112). Interestingly, industrial concentration even fell for the the iron and steel industry (ISIC 37100). However, within industries, there are substantial differences in concentration trends. For instance, while storable processed foods, such as coconut and palm oil (ISIC 31151) and canned, processed seafood (ISIC 31140) experienced reductions in concentration, more perishable food products, such as tofu and tempe (ISIC 31242) and ice (ISIC 31230) remained flat or experienced increases in concentration. One hypothesis suggested by this comparison is that, during a period of large transport improvements, producers of durable goods experience reductions in concentration, while producers of highly perishable products deviations” measure of industrial concentration, Gkt , defined by: Gkt =

R � � r=1

λkrt − xrt

�2

where xrt is kabupaten r’s share of total manufacturing employment in year t, and λkrt is the share of industry k’s employment that is located in kabupaten r in year t. This index is meant to capture the fact that certain regions ought to have larger proportions of industries, simply because they are larger and have greater populations. The Ellison and Glaeser (1997) can be thought of as an adjustment to Gkt , given by: �r Gk − (1 − r=1 x2rt )HPtk EGkt = t �r (1 − r=1 x2rt )(1 − HPtk ) This difference is statistically significant at an α = 0.10 level using a two-sample comparison of means (t = 1.863, 2-sided p−value = 0.0653, 1-sided p−value = 0.0326). 26

15

remain unaffected. Highly perishable products need to be produced very close to where they are consumed, while more durable goods can be produced farther away, provided that transport costs are sufficiently low. Moreover, while finished metal, machines, and electronics (ISIC 38) experienced a modest reduction in concentration over the period, manufacturers of radios and television (ISIC 38320) and producers of optical and photographic equipment (ISIC 38500) experienced increases in concentration. These industries are more skill intensive than others and are probably more subject to Marshellian agglomeration economies than other manufacturers. This suggests that while road improvements might induce the dispersion of producers of low skill durable goods, they may not affect industries in which strong external economies are important. Table 4 presents difference-in-difference estimates and differential trend estimates for industrial concentration. The treatment group is the group of industries for whom inventory shares of output are “large”, relative to the control group; larger inventory shares are a proxy for durability. The change in industrial concentration for treated industries is negative relative to the change in industrial concentration for the control industries.27

4.2

Regional Trends

Another way of exploring the reductions in concentration is to investigate trends in the types of regions that were receiving new plants. Figure 9 depicts the shares of new firms locating in cities (defined as of 1990), in kabupatens that are neighbors of cities, in neighbors of neighbors of cities, and in other kabupatens, classified as rural. In 1985, 40 percent of new firms located in cities, but by 1996, only 27 percent of new firms located in cities. Neighbors of cities experienced an increase in the share of new firms of 19 percent (from 33 percent in 1985 to 0.39 percent in 1996), while neighbors-of-neighbors experienced a 73 percent increase (from 14 percent in 1985 to 24 percent in 1996). Rural shares are mostly flat, except for some noise during the crisis.28 Figure 10 decomposes the new firm location shares into those attributed to durable Note that these results on dispersion are very different from Sj¨oberg and Sj¨oholm (2004), who argue that over the 1980-1996 period, spatial concentration remained more or less unchanged. There are several reasons for discrepancies between my analysis and theirs, but the most important, I suspect, is that I use kabupatens as my spatial unit of analysis, while they use provinces. Since much of the changes I observe take place within provinces, it is not surprising that they find mixed results, while I find evidence pointing towards dispersion. Moreover, I also use a finer level of industrial classification (5-digit ISIC) to compute these indices, and I have access to surveys for every year. Without proper cleaning, these indices are very sensitive to outliers (misreported employment), especially for smaller industries, and for this reason, I drop all 5-digit ISIC codes with less than 10 firms. 28 Reclassifying kabupatens by physical centroid distance to the nearest 1990 city reveals similar trends. 27

16

goods industries (high inventory shares) and perishable goods industries (low inventory shares). It is apparent that most of the movements in these percentages are coming from the high-inventory share firms, which I’ve argued should be responding the most to changes in transport cost. For instance, of the 13 percentage point reduction in the share of new firms locating in cities between 1985 and 1996, 11 percentage points is attributable to firms with high inventory shares, while only 2 percent is attributable to firms with low inventory shares. Similarly, the entire 10 percentage point increase in the share of new firms locating in neighbors-of-neighbors of cities can be explained by high inventory share firms.

4.3

Regression Analysis

We can summarize the effects of road improvements on the activity of new manufacturing plants by estimating a series of fixed-effects regressions, exploiting variation in the timing and placement of road improvements across regions in Indonesia. Let r = 1, ..., R index regions (kabupatens), let j = 1, ..., J index 5-digit industrial sectors, and let t = 1, ..., T index years. In Table 5, we begin by estimating models of the following form: yrjt = βM Prt + εrjt where M Prt denotes region r’s market potential in year t and εrjt is an error term. In the first column of both panels, we assume that the errors take on the following form: εrjt = δr + δj + δt + ∆εrjt where the ∆εrjt are assumed to be strictly exogenous. This specification controls for all time-invariant unobservables that influence outcomes for particular sectors and for particular regions, and it also controls for all national unobservables that affect outcomes in each year. In the second column, we weaken the restrictions on the error term by making the following assumption: εrjt = δr + δjt + ∆εrjt As before, we control for all time-invariant unobservables that affect regions, but we now also control for any omitted variables that influence outcomes in each sector-year. In all specifications, standard errors are clustered at the region level to deal with possible serial correlation in the disturbances, as well as to allow for arbitrary correlations between the disturbances affecting different industries in the same region. Overall, these regressions show strong positive relationships between market potential 17

and new employment, new firms, and new value added. The dependent variable and explanatory variables are both expressed in logs, so that the coefficients can be interpreted as elasticities. A one percent increase in a region’s market potential results in a 0.6 percent increase in new firms and a 3.7 percent increase in employment. Since market potential varies only at the region-year level, the above specifications cannot rule out the possibility that other time-varying, region-specific confounders might actually be driving the results. However, since we have industry-level data, and we know that certain industries (durable goods producers) are more likely to be influenced by better market access, we can exploit variation within region-years in the effects of market potential across sectors. In the third column, we estimate models of the following form: yrjt = γ (Dj × M Prt ) + εrjt where Dj is an indicator for whether or not the industry is a producer of durable goods, and εrjt = δrj + δjt + ∆εrjt These specifications do not allow us to estimate the entire effect of improving market potential. Instead, they deliver estimates of the differential effect of market potential improvements on durable goods producers, relative to non-durables producers. The coefficients estimates are smaller, but still significant. Relative to non-durables producers, a one percent increase in a region’s market potential results in a 0.08 percent increase in new durable goods plants and a 0.28 percent increase in workers in the durable sector.

4.4

Summary

Overall, this analysis indicates that during the sample period, Indonesia has experienced reductions in industrial concentration. Areas that received improvements and expansions in market potential as a result of the road improvement program experienced a growth in manufacturing activity and employment, on average. Unfortunately, a limitation of this reduced form analysis is that it does little to shed light on why the road improvement program has had its effects. Moreover, the estimated parameters in the regression analysis are not particularly relevant for policymakers, since this average effect presumably masks substantial heterogeneity in the responses of different regions to improved accessibility. Heterogeneity in outcomes is expected from the theory and is important for policymakers in determining the likely impacts of different road improvement projects on different regions. To shed light on the mechanisms and provide a

18

richer set of counterfactual predictions, in the next section I develop and estimate a structural model of firm location choice.

4.5

To Do

• It would be good here to discuss how in models of multiple equilibria, we might not expect to see the same effects in both directions. For instance, if road improvements induce dispersion, road deterioration might not have any effect, because the dispersion created new equilibria. There are threshold effects that would require adjustment back to the old equilibrium, and deterioration does not reach those thresholds. • I should spend some time discussing the industries chosen for this analysis, how I dropped state-owned enterprises and very small industries (with fewer than 10 firms, and hence extremely noisy spatial concentration measures).

5

Structural Model

In this section, I present an extension to the firm location choice model of Head and Mayer (2004b). The original model is static, has multiple regions, and focuses on a single manufacturing sector, whose firms produce differentiated products under increasing returns to scale and imperfect competition. Randomness is introduced to firm marginal costs which, with specific distributional assumptions, leads to a likelihood function that can be used to estimate the parameters governing choice probabilities. Importantly, the model is partial equilibrium, in the sense that firms take demand and marginal cost variables as given and ignore the effect that their location choices have on these cost shifters. Unlike the original model, I explicitly allow for multiple industrial sectors, highlighting the importance of sectoral differences in location choice parameters. I also allow for unobserved productive amenities, common to all firms and all industries, that shift marginal cost functions at particular locations. Because these unobservable amenities may be directly correlated with wages, rents, and other factors influencing marginal costs, identification of the choice model’s parameters requires conditional moment restrictions and estimation becomes substantially more involved. There are R regions, indexed by r = 1, ..., R. We will often use o and d to index regions, denoting origin and destination regions for the production and consumption of tradable manufactures. As in Krugman (1991), there are two types of consumers who work in two different sectors: unskilled agricultural workers, and skilled manufacturing workers. Workers in the agricultural sector, which is constant returns to scale, are tied to their land 19

and perfectly immobile. Workers in manufacturing, which is produced under increasing returns to scale and imperfect competition, are initially perfectly mobile and free to decide upon a location that maximizes their utility. However, once they choose a location, skilled workers must live and work entirely in that location, spending all of their income locally.

5.1

Consumer Preferences

There are two types of goods consumed by individuals: manufactured products and agricultural products. Manufactured goods are differentiated products produced in one of Ks industries, indexed by k = 1, ..., Ks . Consumers choose quantities of varieties from each industry, indexed by j ∈ [0, 1], and a quantity of the agricultural good, A, to maximize the following utility function: U =C

�K s �

k=1

Mµk k



A

1−µ

where

Ks �

µk + µ = 1

(1)

k=1

This utility function represents Cobb-Douglas preferences over both agriculture and CES aggregates of manufacturing varieties for each industry, Mk , which are given by: Mk =

��

0

1

q k (j)

σk −1 σk

dj

k � σ σ−1 k

σk ≥ 1 , k = 1, ..., Ks

where q k (j) is the quantity of industry k’s variety j consumed, and σk is an industry-specific parameter governing the elasticity of substitution between an industry’s varieties. As σk tends to 1, varieties in that industry become less substitutable for one another, and competition within the industry is weaker as a result. As σk grows larger, the varieties in industry k become more substitutable, and competition grows more intense.29 We solve the consumer’s optimization problem by first choosing optimal bundles within a given industry and then by determining how to distribute income across industries. Using this approach and the usual CES calculus, it is straightforward to show that a consumer’s demand for variety j in industry k is given by: q k (j) =

pk (j)−σk µk Y (P k )1−σk

(2)

C is a � constant used to normalize the scale of a consumer’s indirect utility function, given by C −1 = Ks µk 1−µ (1 − µ) k=1 µk . 29

20

where P k is the CES price index specific to industry k, given by: k

P =

��

1 k

1−σk

p (i)

0

di

1 � 1−σ

k

(3)

Given this, we can also show that the consumer’s indirect utility function, mapping prices and income into the maximized value of utility (or welfare), is given by: V (pA , P 1 , ..., P Ks , Y) = where pA is the price of the agricultural good.30

5.2

Y � K s (pA )µ k=1 (P k )µk

Agriculture

Each region, indexed by r = 1, ..., R, is endowed with a mass LA r of unskilled workers who produce one unit of a homogenous agricultural good, A. The agricultural good is freely traded across locations, so that its price is the same everywhere. It is produced with constant returns to scale, so that a worker’s agricultural wage is equal to his or her marginal product. We set wA = pA ≡ 1, so that the agricultural wage is the numeraire of the model.

5.3

Manufacturing and Trade

Manufacturing varieties are produced with increasing returns to scale under Dixit-Stiglitz imperfect competition. Conditional on operating in region o, the cost to produce a quantity q k (i) of variety i in industry k is given by: � � c qok (i) = Fok − mko (i)wqok (i)

where Fok represents fixed costs of production in region o, and mko (i)w represents the marginal labor requirement. Note that the marginal cost is specific to the industry, operation region, and the variety. We will be more explicit about exactly what mko (i) is in a few sub-sections. Because of fixed costs, firms choose a single location in which to produce, shipping their products to all other locations. All firms face industry-specific iceberg transport costs (Samuelson, 1954), representing the amount that must be produced in region o in order to 30

For a derivation of (2) and (3), see Appendix B.1.

21

deliver one unit of the product to region d. These are given by: k τod ≥1 k Due to the transport technology, (τod − 1) units of the good “melt away” while being transported, so that only 1 unit is delivered to the destination region. We assume that k τoo = 1 for all regions o, so that transport within a region is costless.31 Transport costs are also assumed to satisfy a triangle inequality: k k k τod ≤ τos τsd

for all s = 1, ..., R

This assumption rules out any cross-region arbitrage opportunities in transport. Finally, for simplicity, we assume that the transport cost for industry k is just an industry-specific constant times an average transport cost measure, τod : k τod = η k τod

for all k = 1, ..., Ks

(4)

Since my measure of transport costs involves travel times based on the quality of road infrastructure, this assumption allows for different industries to be charged different rates for travel time expended during transport. As noted in Section 3.2, fuel and labor costs are the largest components of shipping costs in Indonesia, and both vehicle and labor costs increase with travel time. The form of the consumer’s utility function implies that all consumers in all locations consume every variety of every industry. This is obviously a gross simplification of reality, but it helps to clearly show how trade affects firm profits. Conditional on locating in region o, a firm in industry k has operating profits that are equal to the sum of profits obtained from shipping its output to all destination locations: Πko (i)

=

R � d=1

k πod (i)

=

R � � d=1

� pkod (i) − mko (i)w q k (i)

Firms are operating under Dixit-Stiglitz monopolistic competition, and they choose prices ignoring their effects on regional industry price indices, Pdk . From the structure of competition and consumer demands, we can show that the firm’s optimal pricing formula is I have data that would enable me to relax this assumption, since I have very good measures of withinregion transport. However, I’d need to think about exactly how to work with this, as calculating travel times to the farthest border might be a bit tricky to implement. 31

22

given by:32 pkod (i)

=



σk σk − 1



k mko (i)wτod

(5)

k k This expression implies a mill pricing strategy, as pkod (i) = τod po (i). Moreover, prices are just industry-specific markups over the firm’s marginal cost. The size of the markup is governed by the size of σk , the elasticity of substitution. As σk grows larger and industry k’s products become more substitutable, competition between firms in that industry intensifies and the markup falls. As σk grows smaller and industry k’s products become less substitutable, firms are less pressured by competition and can increase their markups. Note that a firm’s profits from locating in region o and shipping to region d are given by:

� � k k πod (i) = pkod (i) − τod mko (i)w q k (i)

Plugging in expressions for consumer demand (2), transport costs (4), and optimal pricing (5), we can rewrite this expression as: k πod (i)

= γk



�1−σk mko (i)w

Yd



Pdk τod

�−(1−σk )

where γk is a constant specific to industry k, and Pdk denotes the price index for industry k’s products consumed in region d.33 Summing across destination locations, we obtain the firm’s total operating profits from locating in region o: �

Πko (i) = γk mko (i)w

� R �1−σk � d=1

Yd



Pdk τod

�−(1−σk ) �

(6)

This expression tells us that a firm’s profits from operating in region o depend an industry-specific constant, γk , marginal costs, as well as the expression in brackets which is 32 33

A derivation of this result can be found in Appendix B.2 The exact form of the constant γk is given by: γk =

1 σk



σk η k σk − 1

�1−σk

µk

This constant is depends on the industry’s elasticity of substitution, transport cost parameters, and CobbDouglas budget share parameters for industry k.

23

defined as the industry-specific real market potential : RM Pok



R � d=1

Yd



Pdk τod

�−(1−σk )

This is a weighted sum of regional incomes, where the weights decline in transport costs and increase in the price index for that specific industry. In this model, market potential is the crucial variable that links firm profits from locating in region o to transport costs between that region and all others. As a location becomes closer to larger sources of demand, RM Pok increases. In the formula for real market potential, the price index should be thought of as a measure of the intensity of competition. Lower price indexes correspond to locations with lower markups and fiercer competition, while higher price indexes correspond to larger markups and weaker competition. Firms in industry k want to locate in regions that are closer to larger markets, but this preference is tempered by the competitiveness of those locations, reflected in the price indexes. The industry-specific real market potential is closely related to another variable, nominal market potential, discussed in an older literature on economic geography (Harris, 1954): � R � � Yd N M Po = τod d=1 The difference between N M Po and RM Pok is that real market potential explicitly accounts for competition, through the inclusion of price indices.

5.4

Firm Location Choices

Firms locate in region o if and only if their expected operating profits minus fixed costs from operating in region o are greater than those of all other locations. Following Head and Mayer (2004b), we assume that the fixed cost of locating in region o for a firm operating in industry k, Fok , is the same across all locations, i.e. Fok = F k for all o = 1, ..., R. Given this assumption, fixed costs do not play any role in location choice decisions and hence can be ignored. Define Vok (i) to be firm i’s value function for region o, a simple log transformation of operating profits minus fixed costs: Vok (i) ≡

ln Πko (i) − ln γk 1 = ln RM Pok − ln(mko (i)w) σk − 1 σk − 1 24

Now, assume that the log of the firm’s marginal cost of producing in region o can be written as follows: ln(mko (i)w) = x�o βi − ξo − εio Here, xo denotes a (K × 1) vector of observable shifters of marginal costs, while ξo denotes an unobserved factor, common to all firms and industries. It will be useful to think of ξo as an unobserved productive amenity (e.g. average ability of the workforce, or quality of life in region o), which shifts marginal costs for all firms and all industries. The term εio is a stochastic component of marginal costs, assumed to be distributed i.i.d. type 1 extreme value across locations for each firm. Note that the parameters βi are specific to each firm. Define Di to be a (D × 1) vector of firm-specific observables (including industry dummies). Also, let vik denote a random value component for xok . Also, define αk = 1/(σk − 1). Using this notation, we can write the firm’s value function as: Vok (i) = αi ln RM Pok − x�o βi + ξo + εio

(7)

where αi = α +

D �

πα,r Di,r + σα vi

r=1 D �

βi,k = β k +

πk,r Di,r + σk vi

k = 1, ..., K

r=1

In this setup, πk,d is a coefficient measuring how βi,k varies with firm characteristics, while σk represents the standard deviation of firm valuations for xok . Given this setup, we can write the utility a firm has from choosing location o as follows: Voi = = +

αi ln RM Pok

+

J �

xok βki + ξo + εkio

k=1 J �



α ln RM Pok +

� D �

k=1

xok β k + ξo



(Dir πα,r + σα vi ) ln RM Pok +

r=1

= δo + µoi + εio

� D K � � k=1

r=1

(Dir πk,r + σk vi ) xok

��

+ εio

The first term in this expression, δo , is the mean utility of choosing location o and is 25

common to all firms in all industries. It depends on (α, β � )� , the mean preference parameters, as well as ξo , the unobserved productive amenity. The second term, µoi , represents mean-zero heteroskedastic deviations from mean utility, capturing the effects of the sectoral differences. Firm i in industry k chooses to operate in location o if Vok (i) > Vdk (i) for all other locations, d. This expression implicitly defines the set of observed and unobserved variables that lead to the choice of location o. Formally, we can denote this set by Aj : Aj = Aj (x, ξj , δ· ; θ2 ) =

5.5



� � (Di , vi , εij ) � Vok (i) ≥ Vdk (i) ∀ d = 1, ..., R

Identification of the Choice Model

In an ideal experiment for studying firm location choices, we could randomly assign locations with factor prices, infrastructure access, and exogenous geographic features, and we could record firms’ location choice responses. However, in observational studies, market access and other cost shifters are not randomly assigned, and instead reflect a host of factors, such as the availability of commercial land for real estate, local supplies of labor and consumers, and other characteristics unobserved to researchers. If unobserved productive amenities are present, they will clearly raise the profitability of locating in certain regions, which, ceteris paribus, increases the number of workers and firms who locate in certain regions and raising incomes. Hence, the model implies that market access, wages, and rents will be directly correlated with unobserved productive amenities. This necessitates the use of instrumental variables: variables that are correlated with the endogenous choice characteristics but uncorrelated with omitted factors explaining the choices of firms. Distinguishing between between omitted factors, such as natural advantages, and other theories in understanding why agglomerations form is a classic identification problem in empirical urban economics (Ellison and Glaeser, 1999). While cross-sectional instruments are doubtless useful, finding them is challenging and their exclusion restrictions are often difficult motivate. However, if unobserved natural advantages are constant over time, the use of panel data and fixed effects can help us distinguish between natural advantages and transport costs theories. Panel data is useful for another reason: if firm cost-functions are time-invariant, it makes sense that as location characteristics change, with increases or decreases in wages, rents, and market access, the identifying power of our model improves. Although the parameters of the model could be estimated from data on a single market, with firms making only one choice, such an approach seems far removed from the ideal experiment of repeatedly assigning locations with different bundles of characteristics and observing 26

responses (Nevo, 2000). Nevertheless, in most applications of discrete choice to location decisions, such as Head and Mayer (2004b) and in the Indonesian context, Henderson and Kuncoro (1996) and Deichmann et al. (2005), authors only study a cross section of firm choices. To improve the identifying power of the discrete choice model, I estimate the parameters using variation in location characteristics across all years possible, where each year is viewed as a separate market. The fact that I make use of multiple time periods in the estimation suggests that dynamic considerations should be brought to the forefront of the modeling exercise. However, to avoid the complications with assuming that firms form expectations over the evolution of various cost shifters over space and time, we assume that the pool of firms is large enough so that each firm ignores the effect that its presence has on both current and future cost shifters. The parameters of the model can then be thought of as the effect of current plus future valuations. Abusing notation, collect all of the choice characteristics for location o at time t as xot = [ln RM Po , x�ot ]� , and let βi = (αi , βi� )� collect all of the choice parameters. With multiple time periods, the firm’s value function for location o at time t is the following: Voit = δot +



(Dir βr + vi σ)� xot + εiot

r

where the mean utility terms, δot , are given by: δot = x�ot β + ξo + υot Here, ξo represents any time-invariant unobserved productive amenity for region o (e.g. favorable geography), while υot can be thought of as an unobserved, time-varying productivity shock to location o. Conditional moment restrictions on υot will enable us to identify the model. I make use of three different conditional moment restrictions in order to identify the choice parameters. The first is similar to a strict exogeneity condition in linear panel models (Chamberlain, 1984): E[ υot | ξo , xo1 , ..., xoT ] = 0 (8) In words, this restriction says that once we condition on the unobserved fixed factor, ξo , the productivity shocks are uncorrelated with the entire history of the location characteristics, xo1 , ..., xoT . I view this condition as a benchmark case. Making use of it is an improvement over existing work, but in practice it is unlikely to hold. For instance, if policymakers were targeting more productive areas with better infrastructure, we would expect past 27

productivity shocks, υot−1 , to be correlated with future market access, xot , xot+1 , ..., xoT . Motivated by these dynamic targeting concerns, a second conditional moment restriction relaxes the first: E[ υot | ξo , xo1 , ..., xot−1 ] = 0 (9) This is a weak exogeneity moment restriction (Chamberlain, 1992), stating that current productivity shocks are innovations, uncorrelated with all previous realizations of the xot ’s. Note that this is a strictly weaker identifying assumption than (8), and if (8) holds, than so does (9). Finally, I make use of time-varying instruments for the location characteristics: E[ υot | ξo , zo1 , ..., zot−1 ] = 0

(10)

As I will document, conditional on fixed location effects, rainfall and earthquakes have significant predictive power as cost-shifters for wages, rents, and the regional incomes which determine a location’s market access. Indonesia is located in the Pacific Ring of Fire, a region noted for its extreme seismic activity, and large earthquakes are common but difficult to predict.34 Regional rainfall totals also shift agricultural productivity, which helps to determine local wages and regional incomes. While these variables seem like legitimate instruments, the firms who comply with these instruments (i.e. choose to locate in a place because of favorable rainfall or earthquake shocks) might be extremely short-sighted. Hence, the local average treatment effect that these instruments deliver is potentially not very interesting.

5.6

Estimation of the Choice Model

The assumption on the joint distribution of the εiot ’s gives rise to an expression for the probability that firm i chooses location o at time t: P�ijt

NS 1 � = NS s=1



� � k s exp{δjt + K x (σ v + π D + ... + π D )} k i k1 i1 kD iD k=1 jt �J �K k s 1 + m=1 exp{δmt + k=1 xmt (σk vi + πk1 Di1 + ... + πkD DiD )}

(11)

where the utility from choosing the outside option is normalized to zero in each period.35 A figure would be useful here. Note that because I do not observe new firms in every location at every period of time, the outside option (roughly, locating outside of kabupatens on Java, Sumatra, and Sulawesi) changes across years. To a certain extent, this variation in the choice set over time is artificial, but I don’t know how to resolve this. Restricting to the set of locations that are chosen every year results in too few locations and gets rid of much of the market access variation that I’d like to exploit. 34 35

28

I estimate the choice model using a two step procedure.36 In the first step, I estimate the δjt ’s and θ2 using maximum simulated likelihood. Although a full search over the δjt ’s and θ2 is possible, in practice, because of the large number of locations in the dataset and the multiple years over which those locations are observed, it is computationally infeasible. Consequently, I maximize the likelihood function only over θ2 . For each value of θ2 , I choose δjt = δjt (θ2 ) to ensure that the mean utility components satisfy a market share constraint. In the second step, to recover the linear parameters, I estimate the following regression, making use of conditional moment restrictions (8), (9), and (10): � δ� ot = xot β + ξo + υot

Specific details, such as how to compute the correct gradient in the maximum likelihood step and how to work out standard errors, correcting for the fact that the δ� ot ’s are estimated, are relegated to Appendix C.

5.7

Further Work

• Brief discussion of collinearity problems. Given that market access should change both wages and rents, how can we identify parameters? Some of the identification comes from adjustment: if wages and rents respond slowly to changes in market access, then we can effectively hold them constant in the regression analysis. Collinearity is only a problem if wages and rents respond perfectly to changes in market access or productivity shocks. • I need to spend more time discussing the time-invariant cost function assumption. If new technologies are introduced and change cost functions, the panel data that I exploit isn’t helpful at all. Wie (2000) suggests that Indonesian manufacturing in the 1990s and early 2000s is very labor intensive and characterized by a strong absence of technical progress (one reason why post-crisis growth has been so slow), so perhaps this is not a strong concern. But for this and a variety of reasons, I absolutely need to add time fixed-effects in estimation of the choice model. This two-step estimation procedure is similar to that used in Langer (2010) in studying demographic preferences for new vehicles, although that study uses second-choice data. 36

29

6 6.1

Results Summary Statistics

Making use of a variety of data sources, I construct measures to proxy for the characteristics that firms examine when determining where to locate. Summary statistics for these location characteristics can be found in Table 7. The first set of characteristics are the endogenous cost shifters: wages, commercial rents, market potential, and local GDRP, all of which vary over space and time. Wage and rent data taken directly from the SI, measured as the average wage rate and commercial rent paid by all firms surveyed in a particular location at a particular year. I also include a variety of physical and agro-climatic characteristics, including area, ruggedness, elevation, land uses, and physical distances to Jakarta, provincial capitals, and major ports. Physical distances to Malaysia and Singapore were also included. These variables were created using GIS software and a variety of raster files and digital elevation maps, and more details on these variables are contained in Appendix A.3. Table 6 presents summary statistics for the 19,730 new manufacturing plants entering over the period from 1990 to 2005. To make the model more parsimonious, I classify firms into one of 9 industrial sectors (Food and Beverage Processing, Textiles and Clothing, Wood Products, Paper Products, Chemical and Oil Products, Ceramics, Glass, Clay, and Non-Metallic Products, Iron and Steel Products, Finished Metal Products, and Other Manufacturing), using 2-digit industry definitions provided by BPS. Many new firms are textile producers (23%) and food producers (23%), while wood producers (19%) and finished metal producers (11%) are also significantly represented in the data. The least represented product category is other manufacturing (3%). Firms were also much more likely to enter earlier in the period. Entry in the crisis period (1997-1999) is significantly reduced, but it increases starting in 2001 to pre-crisis levels.

6.2

Constant Coefficient Logit Results

Table 9 presents results from estimating a constant coefficient version of the random coefficients logit model. This effectively sets σ and π equal to zero in (11), and the mean technological parameters are estimated from linear regression (Berry, 1994). The exact form of the linear regression the following: yjt ≡ log(sjt − s0t ) = x�jt β + αj + εjt

30

This specification is used to highlight some points about the methodology. Columns 1 and 2 present estimates of the mean technology parameters for a single cross-section of firms, here using all new entrants the year 1990 to construct market shares. Column 1 includes no other control variables, while Column 2 adds several fixed controls (e.g. elevation, ruggedness, type of land). Though not always statistically significant, the signs on the wages and rent variables are positive, suggesting that firms are more profitable when they locate in places with higher factor prices. This is not exclusively a feature of my dataset; for instance, Head and Mayer (2004b) find positive wage coefficients in many of their specifications predicting the location decisions of Japanese car manufacturers in Europe. The problem is that the wage and rent variables are correlated with unobservable productive amenities, and without instruments, estimation on a single cross-section of firms cannot hope to recover accurate parameter estimates. This is the same problem observed by (Berry et al., 1995) in their study of consumer demand for cars; a conditional logit gives a positive relationship between prices and demand, but this is because prices are correlated with unmeasured product quality. Columns 3-6 use the entire panel of locations (from 1990-2005) and estimation includes fixed effects, which should control for any time-invariant unobservable productive amenities. Column 3 shows that the wage coefficient, which was previously positive and insignificant, is now negative and statistically significant. Coefficients on rents are still positive, but the are insignificant (potentially because of mis-measurement). The coefficient on market potential is large and statistically significant, and the ratio of the wage and market potential coefficients suggests that firms would be willing to accept a 2.4 percent wage increase for a 1 percent increase in a location’s market potential. In Columns 4 and 5, I include the density of paved roads as another location characteristic.37 Road density is used frequently as a proxy for the quality of local infrastructure, and its coefficient is sometimes interpreted to reflect market access (Deichmann et al., 2005). Although both measures are correlated, including both in the specification suggests that firms are more willing to accept wage increases in response to market potential improvements than they are for improvements in road density. This is evidence that while manufacturing firms do care about local road quality, the usefulness of better roads is driven primarily through a market access channel. To the extent that other infrastructure improvements were occurring at the same time as the road improvements, my coefficients might be biased, picking up more than they should. In Column 6, in addition to the market potential variable, I include as a dependent variable the log of the median percentage of electricity consumed by firms in the region that 37

The density of paved roads is measured as total km of paved roads per 100 km2 of land.

31

is produced by the state electricity company, Perusahaan Listrik Negara (PLN). Electricity provision was improved dramatically over the period, and the coefficients on this variable are large and statistically significant. They imply that firms are willing to accept a 2.7 percent wage increase for a 1 percent increase in electricity provided by the state. The coefficient on market potential is attenuated slightly, suggesting that firms would accept only a 2.1 percent increase in wages in response to a 1 percent improvement in market potential. Nevertheless, the electricity effects do not overwhelm the market potential effects, which are still large and significant. In all future specifications, I include the median PLN share as a time-varying location characteristic. In Column 7, I allow the effect of distance in the market potential variable to vary non-linearly. Recall that the market potential variable used in the analysis, M Pot , was defined as R � Ydt M Pot = τodt d=1 where Ydt is real GDRP for kabupaten d at time t, and τodt is the roughness-based transport cost measure between locations, measured as the travel time (in hours) between locations o and d at time t. In Column 7, I estimate the effects of market potential when it is defined slightly differently: � R � � Ydt � M P ot = f (τodt ) d=1 where

2 3 f (τodt ) = δ0 + δ1 τodt + δ2 τodt + δ3 τodt

Estimation proceeds by using non-linear least squares, due to the non-linear way that the δ’s enter the square residuals minimizing problem. Column 7 shows large, statistically significant coefficients for a third-order polynomial, and the implied distance-function is depicted graphically in Figure 11. Results with panel instruments are presented in columns 8, 9, and 10. In Column 8, I use dynamic panel instruments motivated by 9, the weak exogeneity moment restriction. Note that this restriction is a relaxation of the strict exogeneity moment restriction, 8, implying that shocks to a location’s productivity are innovations, uncorrelated with lagged location characteristics. Column 9 uses rainfall and earthquake variables as instruments, invoking the 10, panel IV moment condition. Column 10 combines both sets of instruments. Generally, the instrumental variables estimates imply larger willingness to pay; in Columns 8 and 10, for instance, firms would be willing to accept a 2.8 percent increase in wages for a 1 percent improvement in market potential.

32

6.3

To Do: BLP Results

The following questions will be answered once decent-looking results are obtained: 1. Are the wage coefficients negative? Statistically significant? 2. Are the rent coefficients negative? Statistically significant? 3. Are market potential coefficients positive? Statistically significant? 4. Are the PLN share coefficients positive? Statistically significant? 5. Are any standard deviation terms significant? 6. Which industries have the largest wage sensitivity? Market potential sensitivity? PLN share sensitivity? 7. Are there significant differences in the substitutability of market potential across different industrial sectors? Do these differences conform to expectations / reduced form results? 8. How do the different IV specifications affect estimates? Are any IV specifications unrealistic? To evaluate the fit of the model, it will be useful to discuss whether or not the predicted cross-price elasticities are sensible. The cross-wage and cross-market potential w MP elasticities between location k and location j at time t, denoted ηjk,t and ηjk,t respectively, are defined as follows: ∂sj,t wk,t · ∂wk,t sj,t ∂sj,t M Pk,t = · ∂M Pk,t sj,t

w ηjk,t = MP ηjk,t

These cross-price elasticities tell us the percentage change in share of new firms choosing location j that would result from a one-percent increase in wages (or market access) in w location k. We would expect ηjk,t to be positive; increasing wages in location j should MP increase demand for location k. We would expect ηjk,t to be negative; increasing market potential in location j should decrease demand for location k. Figures 12 and 13 summarize the distribution of predicted cross wage and cross market potential elasticities for the year 1990. They also depict how these are related to differences in location characteristics, such as physical distance, 1990 population differences, and 1990 GDRP differences. 33

w 1. Test whether or not ηjk,t is a decreasing function of the distance between j and k. Of the difference in 1990 populations and 1990 gdp levels? MP 2. Test whether or not ηjk,t is a decreasing function of the distance between j and k. Of the difference in 1990 populations and 1990 gdp levels?

7

Conclusion

To be written. Some ideas: • Krugman (2009): The original core-periphery model is all about increasing returns, and as Krugman discusses in his nobel lecture, “concentrations due to increasing returns peaked before World War II.” The agglomeration-inducing forces of lower transport costs seem to be waning in the United States, with the collapse of the manufacturing belt. • Kim (1998): In the U.S., industrial concentration peaked in the 1930s.

34

References Anbarci, N., M. Escaleras, and C. A. Register (2005): “Earthquake Fatalities: The Interaction of Nature and Political Economy,” Journal of Public Economics, 89, 1907–1933. Anderson, J. E. and E. Van Wincoop (2004): “Trade Costs,” Journal of Economic Literature, 42, 691–751. Azis, I. J. (1990): “Analytic Hierarchy Process in the Benefit-Cost Framework: A Post-Evaluation of the Trans-Sumatra Highway Project,” European Journal of Operations Research, 48, 38–48. Bennett, C. R., A. Chamorro, C. Chen, H. de Solminihac, and G. W. Flintsch (2007): “Data Collection Technologies for Road Management,” Technical report, East Asia Pacific Transport Unit, World Bank. Berry, S. (1994): “Estimating Discrete-Choice Models of Product Differentiation,” RAND Journal of Economics, 25, 242–262. Berry, S., J. Levinsohn, and A. Pakes (1995): “Automobile Prices in Market Equilibrium,” Econometrica, 63, 841–890. Blalock, G. and P. J. Gertler (2008): “Welfare Gains from Foreign Direct Investment Through Technology Transfer to Local Suppliers,” Journal of International Economics, 74, 402–421. Carlton, D. (1983): “The Location and Employment Choices of New Firms: An Econometric Model with Discrete and Continuous Endogenous Variables,” Review of Economics and Statistics, 65, 440–449. Chamberlain, G. (1984): “Chapter 22 Panel data,” in Handbook of Econometrics, Volume 2, ed. by Z. Griliches and M. D. Intriligator, Elsevier, 1247 – 1318. ——— (1992): “Comment: Sequential Moment Restrictions in Panel Data,” Journal of Business & Economic Statistics, 10, 20–26. Combes, P.-P. and M. Lafourcade (2005): “Transport Costs: Measures, Determinants, and Regional Policy Implications for France,” Journal of Economic Geography, 5, 319–349. Coughlin, C. C., J. V. Terza, and V. Arromdee (1991): “State Characteristics and the Location of Foreign Direct Investment within the United States,” Review of Economics and Statistics, 73, 675–683. Davis, D. and D. Weinstein (2003): “Market Access, Economic Geography and Comparative Advantage: an Empirical Test,” Journal of International Economics, 59, 1–23. Deaton, A. (1997): The Analysis of Household Surveys: A Microeconometric Approach to Development Policy, World Bank Publications. Deichmann, U., K. Kaiser, S. V. Lall, and Z. Shalizi (2005): “Agglomeration, Transport, and Regional Development in Indonesia,” World Bank Policy Research Working Paper 3477.

35

Dijkstra, E. W. (1959): “A Note on Two Problems in Connexion with Graphs,” Numerische Mathematik, 1, 269–271. Donaldson, D. (2009): “Railroads of the Raj: Estimating the Impact of Transportation Infrastructure,” Job Market Paper, Unpublished. Ellison, G. and E. L. Glaeser (1997): “Geographic Concentration in U.S. Manufacturing Industries: A Dartboard Approach,” Journal of Political Economy, 105, 889–927. ——— (1999): “The Geographic Concentration of Industry: Does Natural Advantage Explain Agglomeration?” American Economic Review, 89, 311–316. Friend, T. (2003): Indonesian Destinies, Cambridge: Belknap Press. Harris, C. D. (1954): “The Market as a Factor in the Localization of Industry in the United States,” Annals of the Association of American Geographers, 44, 315–348. Head, K. and T. Mayer (2004a): “Chapter 59: The Empirics of Agglomeration and Trade,” in Handbook of Regional and Urban Economics, Volume 4, ed. by J. V. Henderson and J.-F. Thisse, Elsevier, 2609–2669. ——— (2004b): “Market Potential and the Location of Japanese Investment in the European Union,” The Review of Economics and Statistics, 86, 959–972. Head, K., J. Ries, and D. Swenson (1995): “Agglomeration Benefits and Location Choice: Evidence from Japanese Manufacturing Investments in the United States,” Journal of International Economics, 38, 223–247. Helpman, E. (1998): “The Size of Regions,” in Topics in Public Economics: Theoretical and Applied Analysis, ed. by D. Pines, E. Sadka, and I. Zilcha, Cambridge University Press: Cambridge, 33–54. Henderson, J. V. (1974): “The Sizes and Types of Cities,” The American Economic Review, 64, 640–656. Henderson, J. V. and A. Kuncoro (1996): “Industrial Centralization in Indonesia,” The World Bank Economic Review, 10, 513–540. Hill, H. (2000): The Indonesian Economy, Cambridge: Cambridge University Press. Imbens, G. and J. Angrist (1994): “Identification and Estimation of Local Average Treatment Effects,” Econometrica, 62, 467–475. Kim, S. (1998): “Economic Integration and Convergence: US Regions, 1840-1990,” Journal of Economic History, 58, 659–683. Krugman, P. (1991): “Increasing Returns and Economic Geography,” Journal of Political Economy, 99, 483–499. ——— (2009): “The Increasing Returns Revolution in Trade and Geography,” American Economic Review, 99, 561–571.

36

Langer, A. (2010): “Demographic Preferences and Price Discrimination in New Vehicle Sales,” Job Market Paper, Unpublished. Leinbach, T. R. (1989): “Transport Policies in Conflict: Deregulation, Subsidies, and Regional Development in Indonesia,” Transportation Research Part A: General, 23, 467–475. Michaels, G. (2008): “The Effect of Trade on the Demand for Skill: Evidence from the Interstate Highway System,” The Review of Economics and Statistics, 90, 683–701. Nevo, A. (2000): “A Practitioner’s Guide to Estimation of Random-Coefficient Logit Models of Demand,” Journal of Economics and Management Strategy, 9, 513–548. Redding, S. J. and D. M. Sturm (2008): “The Costs of Remoteness: Evidence from German Division and Reunification,” American Economic Review, 98, 1766–1797. Samuelson, P. A. (1954): “The Transfer Problem and Transport Costs, II: Analysis of Effects of Trade Impediments,” The Economic Journal, 64, 264–289. Sappington, J. M., K. Longshore, and D. Thompson (2007): “Quantifying Landscape Ruggedness for Animal Habitat Analysis: A Case Study using Bighorn Sheep in the Mojave Desert,” Journal of Wildlife Management, 71, 1419–1426. Sayers, M. W., T. D. Gillespie, and W. D. Paterson (1986): “Guidelines for Conducting and Calibrating Road Roughness Measurements,” World Bank Technical Paper 46. ¨ and F. Sjo ¨ berg, O. ¨ holm (2004): “Trade Liberalization and the Geography of Production: Sjo Agglomeration, Concentration, and Dispersal in Indonesia’s Manufacturing Industry,” Economic Geography, 80, 287–310. Wie, T. K. (2000): “The Impact of the Economic Crisis on Indonesia’s Manufacturing Sector,” The Developing Economies, 38, 420–453. Yu, J., E. Chou, and J. Yau (2006): “Development of Speed-Related Ride Quality Thresholds using International Roughness Index,” Transportation Research Record, 1974, 47–53.

37

Table 1: Transportation Budgets for Indonesia’s 5-Year Development Plans Repelita IV FY 1984-89

Repelita V FY 1989-94

Repelita VI FY 1994-99

Roads Railways and Freight Ports and Shipping Airports and Aircraft Total

17.8 6.7 8.3 5.6 38.4

32.7 6.4 6.0 7.0 52.1

32.9 5.6 4.4 5.7 48.7

Transport as a Percentage of Total Allocations

11.6

17.6

18.8

Figures report trillions of IDR allocated to spending on transportation during Indonesia’s five year development plans (Rencana Pembangunan Lima Tahun, abbreviated as Repelita). Budget numbers were converted to 2000 IDR using OECD data on annual CPI indices. Source: Various planning documents for Repelita IV, V, and VI.

38

Table 2: Transport Cost Summary Statistics Java

1990

1995

2000

2005

Distance on Roads (km)

375.44 (233.79) N = 5671

375.44 (233.79) N = 5671

374.87 (233.32) N = 5671

374.87 (233.32) N = 5671

Roughness-based Travel Time (hours)

4.59 (2.68) N = 5671

4.16 (2.54) N = 5671

3.81 (2.27) N = 5671

4.51 (2.69) N = 5671

1990

1995

2000

2005

Distance on Roads (km)

725.83 (436.00) N = 2145

725.83 (436.00) N = 2145

725.83 (436.00) N = 2145

725.83 (436.00) N = 2145

Roughness-based Travel Time (hours)

10.74 (6.24) N = 2145

9.49 (5.58) N = 2145

8.12 (4.82) N = 2145

9.62 (5.79) N = 2145

1990

1995

2000

2005

Distance on Roads (km)

683.69 (494.97) N = 561

683.69 (494.97) N = 561

683.69 (494.97) N = 561

683.69 (494.97) N = 561

Roughness-based Travel Time (hours)

13.77 (10.33) N = 561

10.59 (7.03) N = 561

8.50 (5.78) N = 561

8.89 (6.08) N = 561

Sumatera

Sulawesi

Source: Unit of observation is a pair of kabupatens on the same island. Standard deviations in parentheses.

39

Table 3: Changes in Employment Herfindahls, by Industry, 1990-1996 Decription 33 37 31 34 39 35 38 32 36

Furniture and Wood Products Iron and Steel Food and Beverages Paper Products Other Manufacturing Chemical Products Finished Metal, Machines, and Electronics Textiles Ceramics, Glass, Cement and Clay Products

Mean ∆

Median %∆

# Decreased / Total

-0.080 -0.047 -0.028 -0.017 -0.016 -0.011 -0.011 -0.003 0.015

-22.6 -34.5 -11.1 -11.2 -15.5 -15.2 -16.0 -2.6 0.9

8/10 1/1 21/29 4/5 4/5 9/16 11/14 10/15 4/8

Source: SI and author’s calculations. Averages are taken over all 5-digit industries within a given 2-digit industry.

40

Table 4: Industrial Concentration Regressions Panel A: Difference-in-Differences Employment HH 1985-1996 1985-2000 post

treatedXpost

Adj. R2 N 5-Digit ISIC FE

EG Index 1985-2000 1985-2000

0.004 (0.004)

0.007 (0.006)

0.006 (0.003)*

0.007 (0.004)*

-0.062 (0.027)**

-0.061 (0.026)**

-0.050 (0.026)*

-0.049 (0.027)*

0.611 206 Yes

0.583 206 Yes

0.616 206 Yes

0.569 206 Yes

Panel B: Trend Regressions Employment HH 1985-1996 1985-2000

EG Index 1985-2000 1985-2000

trend

0.007 (0.001)***

0.005 (0.001)***

0.006 (0.001)***

0.005 (0.001)***

trendXinvShare

-0.082 (0.007)***

-0.060 (0.006)***

-0.071 (0.007)***

-0.055 (0.006)***

Adj. R2 N 5-Digit ISIC FE

0.849 1236 Yes

0.826 1648 Yes

0.883 1236 Yes

0.862 1648 Yes

Robust standard errors in parentheses. * denotes significant at the 10% level, ** denotes significant at the 5% level, and *** denotes significant at the 1% level.

41

Table 5: Reduced Form Regressions Panel A: New Firms log MP

(1)

(2)

0.589 (0.149)***

0.589 (0.149)***

log invXMP DM

Adj. R2 N Kabupaten FE Year FE Sector FE Kabu-Year FE Sector-Year FE

Panel B: Employment log MP

0.078 (0.011)*** 0.097 359264 Yes Yes Yes . .

0.100 359264 Yes . . . Yes

0.082 352672 . . . Yes Yes

(1)

(2)

(3)

3.776 (0.829)***

3.776 (0.830)***

log invXMP DM

Adj. R2 N Kabupaten FE Year FE Sector FE Kabu-Year FE Sector-Year FE

(3)

0.280 (0.041)*** 0.096 359264 Yes Yes Yes . .

0.102 359264 Yes . . . Yes

0.093 352672 . . . Yes Yes

Unit of observation is a region-industry-year. Robust standard errors in parentheses, clustered at the kabupaten level. * denotes significant at the 10% level, ** denotes significant at the 5% level, and *** denotes significant at the 1% level.

42

Table 6: New Firm Summary Statistics Mean

SD

N

Industrial Sector (2-Digit) 31. Food and Beverage Processing 32. Textiles and Clothing 33. Wood Products 34. Paper Products 35. Chemical and Oil Products 36. Ceramics, Glass, Clay, and Non-Metallic Products 37. Iron and Steel Products 38. Finished Metal Products 39. Other Manufacturing

0.224 0.235 0.175 0.037 0.098 0.076 0.009 0.117 0.030

(0.417) (0.424) (0.380) (0.188) (0.297) (0.266) (0.094) (0.321) (0.170)

18492 18492 18492 18492 18492 18492 18492 18492 18492

Year 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

0.109 0.086 0.086 0.068 0.078 0.078 0.061 0.038 0.027 0.033 0.018 0.072 0.042 0.047 0.086 0.072

(0.312) (0.280) (0.280) (0.253) (0.268) (0.269) (0.238) (0.191) (0.163) (0.179) (0.131) (0.258) (0.201) (0.212) (0.280) (0.258)

18492 18492 18492 18492 18492 18492 18492 18492 18492 18492 18492 18492 18492 18492 18492 18492

Source: SI and author’s calculations.

43

Table 7: Choice Characteristics Summary Statistics Mean

SD

N

Endogenous Cost Shifters Log Wages Log Commercial Rents Log Market Potential Log PLN Share

7.143 11.005 16.852 0.613

(1.261) (1.330) (1.417) (0.210)

2186 2186 2186 2186

Physical and Agroclimatic Chars Area Ruggedness Elevation Percentage of Cultivated Land Percentage of Forested Land Percentage of Grassland Distance to Jakarta Distance to Major Cities Distance to Major Ports Distance to Malaysia Distance to Singapore

3.182 7.143 0.268 0.366 0.207 0.140 6.800 0.778 0.856 7.781 11.228

(5.077) (8.468) (0.244) (0.166) (0.162) (0.080) (5.273) (0.509) (0.563) (2.625) (4.763)

2186 2186 2186 2186 2186 2186 2186 2186 2186 2186 2186

Year 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

169 171 157 155 159 157 146 113 112 107 77 140 126 131 140 126

Source: SI and author’s calculations.

44

Table 8: BLP First Stage Results Dependent Variable Market Potential Wages Land Value MPIV rainTotal

0.026 (0.008)***

-0.001 (0.042)

0.336 (0.037)***

MPIV shockMonths2

0.072 (0.020)***

0.674 (0.121)***

0.259 (0.076)***

MPIV shockMonths1

-0.008 (0.035)

0.240 (0.191)

-0.031 (0.149)

MPIV QuakeI

1.224 (0.046)***

0.855 (0.226)***

3.105 (0.188)***

MPIV bigQuakeI

-1.976 (0.225)***

8.797 (1.366)***

-5.889 (1.048)***

rainTotal

-0.000 (0.000)***

-0.000 (0.000)

-0.000 (0.000)***

shockMonths2

-0.057 (0.011)***

-0.079 (0.062)

-0.108 (0.043)**

shockMonths1

0.031 (0.012)**

-0.181 (0.072)**

0.079 (0.052)

QuakeI

-0.076 (0.006)***

0.029 (0.029)

-0.183 (0.024)***

bigQuakeI

1.202 (0.210)***

-7.168 (1.204)***

4.371 (1.026)***

0.987 2186 Yes

0.393 2186 Yes

0.652 2186 Yes

Adj. R2 N Kabupaten Fixed Effects

Robust standard errors in parentheses. * denotes significant at the 10% level, ** denotes significant at the 5% level, and *** denotes significant at the 1% level.

45

46

0.403 (0.063)***

MP

(2)

(3)

(4)

(5)

(6)

0.595 2186 15.326 . Yes . .

(8)

-0.059 2006 41.170 . Yes Yes .

0.792 (0.227)***

0.651 (0.131)***

0.016 (0.036)

-0.230 (0.020)***

-0.227 2178 24.959 . Yes . Yes

-0.157 (3.257)

1.891 (0.458)***

-0.205 (0.123)*

-0.415 (0.060)***

(9)

-0.058 2006 41.265 . Yes Yes Yes

0.786 (0.227)***

0.647 (0.130)***

0.015 (0.036)

-0.231 (0.020)****

(10)

Panel IV (1990-2005)

Robust standard errors in parentheses. * denotes significant at the 10% level, ** denotes significant at the 5% level, and *** denotes significant at the 1% level.

Adj. R2 N F Statistic Agroclimatic Controls Kabupaten FE Dynamic Panel IVs Nature IVs

0.016 (0.003)***

0.542 (0.167)***

0.283 (0.090)***

0.025 (0.023)

δ3

0.596 2186 23.563 . Yes . .

(7) -0.171 (0.018)***

0.087 (0.018)***

0.477 (0.160)***

0.370 (0.094)***

0.014 (0.024)

-0.177 (0.020)***

δ2

0.596 2186 22.346 . Yes . .

0.146 (0.075)*

0.292 (0.120)**

0.012 (0.024)

-0.175 (0.020)***

0.494 (0.072)***

0.595 2186 27.343 . Yes . .

0.262 (0.059)***

0.038 (0.022)*

-0.160 (0.019)***

δ1

0.595 2186 28.378 . Yes . .

0.430 (0.093)***

0.012 (0.024)

-0.178 (0.020)***

4.173 (0.220)***

0.279 169 9.010 Yes . . .

0.398 (0.079)***

0.090 (0.070)

-0.056 (0.151)

Fixed Effects LS (1990-2005)

δ0

sharePLN

0.210 169 15.519 . . . .

0.155 (0.069)**

land value

pavedDensity

0.006 (0.139)

wage rate

(1)

OLS (1990)

Table 9: Constant Coefficient Logit Results

Table 10: Random Coefficients Logit Results: Fixed Effects

Foods & Beverages

Textiles & Clothing

Means for Industrial Sectors Ceramics, Wood Chemicals & Glass, & Products Oil Prods. Non-Metals

wage rate

-0.353 (0.000)***

-0.043 (0.002)***

-0.104 (0.002)***

0.155 (0.002)***

-0.693 (0.002)***

0.837 (0.002)***

0.085 (0.003)***

0.004 (0.000)***

land value

-0.100 (0.000)***

-0.109 (0.001)***

-0.454 (0.001)***

0.228 (0.001)***

-0.241 (0.001)***

0.256 (0.001)***

0.419 (0.001)***

0.002 (0.000)***

MP

1.080 (0.000)***

1.377 (0.000)***

0.806 (0.000)***

1.074 (0.000)***

0.901 (0.000)***

1.026 (0.000)***

1.461 (0.000)***

0.002 (0.000)***

PLN share

-0.083 (0.001)***

2.255 (0.003)***

0.043 (0.000)***

0.166 (0.001)***

0.197 (0.002)***

0.096 (0.001)***

1.003 (0.003)***

-0.026 (0.002)***

Finished Metal Products

Other Products

Standard Deviations σ

The model is estimated on the full sample of the new firms dataset. There are 18, 492 firms across all years choosing locations, and given the variation in the choice set across years, there are a total of 2, 690, 167 observations. The first step mixed logit model was estimated with simulated log-likelihood equal � = 0.917. Standard to −77077.78, and the simulated likelihood ratio index is equal to ρ ≡ 1−SLL(0)/SLL(β) errors in parentheses, computed using asymptotic GMM results and the delta method (see Appendix C.3 for more details). * denotes significant at the 10% level, ** denotes significant at the 5% level, and *** denotes significant at the 1% level.

Table 11: Random Coefficients Logit Results: Dynamic Panel GMM

Foods & Beverages

Textiles & Clothing

Means for Industrial Sectors Ceramics, Wood Chemicals & Glass, & Products Oil Prods. Non-Metals

wage rate

-0.360 (0.000)***

-0.050 (0.002)***

-0.111 (0.003)***

0.148 (0.003)***

-0.700 (0.003)***

0.830 (0.003)***

0.078 (0.004)***

0.004 (0.000)***

land value

0.019 (0.000)***

0.010 (0.001)***

-0.335 (0.001)***

0.347 (0.001)***

-0.122 (0.001)***

0.375 (0.001)***

0.538 (0.002)***

0.002 (0.000)***

MP

0.901 (0.000)***

1.198 (0.000)***

0.627 (0.000)***

0.895 (0.000)***

0.721 (0.000)***

0.847 (0.000)***

1.281 (0.001)***

0.002 (0.000)***

PLN share

0.196 (0.000)***

2.534 (0.007)***

0.323 (0.003)***

0.445 (0.001)***

0.476 (0.004)***

0.375 (0.002)***

1.282 (0.008)***

-0.026 (0.003)***

Finished Metal Products

Other Products

Standard Deviations σ

The model is estimated on the full sample of the new firms dataset. There are 18, 492 firms across all years choosing locations, and given the variation in the choice set across years, there are a total of 2, 690, 167 observations. The first step mixed logit model was estimated with simulated log-likelihood equal � = 0.917. Standard to −77077.78, and the simulated likelihood ratio index is equal to ρ ≡ 1−SLL(0)/SLL(β) errors in parentheses, computed using asymptotic GMM results and the delta method (see Appendix C.3 for more details). * denotes significant at the 10% level, ** denotes significant at the 5% level, and *** denotes significant at the 1% level.

47

Table 12: Random Coefficients Logit Results: Natural Instruments GMM

Foods & Beverages

Textiles & Clothing

Means for Industrial Sectors Ceramics, Wood Chemicals & Glass, & Products Oil Prods. Non-Metals

wage rate

-0.716 (0.000)***

-0.406 (0.002)***

-0.468 (0.002)***

-0.208 (0.002)***

-1.056 (0.002)***

0.474 (0.002)***

-0.279 (0.003)***

0.004 (0.000)***

land value

-0.469 (0.000)***

-0.477 (0.001)***

-0.822 (0.001)***

-0.141 (0.001)***

-0.609 (0.001)***

-0.112 (0.001)***

0.050 (0.001)***

0.002 (0.000)***

MP

4.489 (0.000)***

4.786 (0.000)***

4.215 (0.000)***

4.483 (0.000)***

4.309 (0.000)***

4.435 (0.000)***

4.869 (0.001)***

0.002 (0.000)***

PLN share

-6.545 (0.002)***

-4.207 (0.003)***

-6.418 (0.000)***

-6.296 (0.002)***

-6.265 (0.003)***

-6.366 (0.000)***

-5.459 (0.005)***

-0.026 (0.002)***

Finished Metal Products

Other Products

Standard Deviations σ

The model is estimated on the full sample of the new firms dataset. There are 18, 492 firms across all years choosing locations, and given the variation in the choice set across years, there are a total of 2, 690, 167 observations. The first step mixed logit model was estimated with simulated log-likelihood equal � = 0.917. Standard to −77077.78, and the simulated likelihood ratio index is equal to ρ ≡ 1−SLL(0)/SLL(β) errors in parentheses, computed using asymptotic GMM results and the delta method (see Appendix C.3 for more details). * denotes significant at the 10% level, ** denotes significant at the 5% level, and *** denotes significant at the 1% level.

Table 13: Random Coefficients Logit Results: All Instruments GMM

Foods & Beverages

Textiles & Clothing

Means for Industrial Sectors Ceramics, Wood Chemicals & Glass, & Products Oil Prods. Non-Metals

wage rate

-0.359 (0.001)***

-0.049 (0.002)***

-0.111 (0.003)***

0.149 (0.003)***

-0.699 (0.003)***

0.831 (0.003)***

0.078 (0.004)***

0.004 (0.000)***

land value

0.013 (0.000)***

0.005 (0.001)***

-0.341 (0.001)***

0.341 (0.001)***

-0.128 (0.001)***

0.369 (0.001)***

0.532 (0.002)***

0.002 (0.000)***

MP

0.915 (0.000)***

1.213 (0.000)***

0.641 (0.000)***

0.909 (0.000)***

0.736 (0.000)***

0.861 (0.000)***

1.296 (0.001)***

0.002 (0.000)***

PLN share

0.168 (0.000)***

2.506 (0.007)***

0.294 (0.003)***

0.417 (0.001)***

0.448 (0.003)***

0.347 (0.002)***

1.254 (0.008)***

-0.026 (0.003)***

Finished Metal Products

Other Products

Standard Deviations σ

The model is estimated on the full sample of the new firms dataset. There are 18, 492 firms across all years choosing locations, and given the variation in the choice set across years, there are a total of 2, 690, 167 observations. The first step mixed logit model was estimated with simulated log-likelihood equal � = 0.917. Standard to −77077.78, and the simulated likelihood ratio index is equal to ρ ≡ 1−SLL(0)/SLL(β) errors in parentheses, computed using asymptotic GMM results and the delta method (see Appendix C.3 for more details). * denotes significant at the 10% level, ** denotes significant at the 5% level, and *** denotes significant at the 1% level.

48

Table 14: Counterfactual Simulated Herfindahls Number of Firms

Employment

0.0259

0.0486

Counterfactual 1 Mean

0.0255 (0.0235,0.0265)

0.0309 (0.0258,0.0373)

Counterfactual 2 Mean

0.0254 (0.0235,0.0267)

0.0307 (0.0259,0.0373)

Counterfactual 3 Mean

0.0256 (0.0235,0.0267)

0.0310 (0.0259,0.0373)

Actual Herfindahl Index

This table shows the effects of counterfactual transport costs arrangements on Herfindahl indices of the share of new firms and new manufacturing employment. Average indices are reported, as well as 95 percent confidence intervals (taken from the empirical distribution of all simulations). There were 15 simulations for each of the 3 counterfactual scenarios.

49

Figure 1: Evolution of Pavement on Java’s Road Network

Source: IRMS and author’s calculations. Thick black lines correspond to road sections that are 80 percent paved or greater, while thin black lines correspond to road sections that are less than 80 percent paved.

50

Figure 2: Evolution of Pavement on Sumatra’s Road Network

Source: IRMS and author’s calculations. Thick black lines correspond to road sections that are 80 percent paved or greater, while thin black lines correspond to road sections that are less than 80 percent paved.

51

Figure 3: Evolution of Pavement on Sulawesi’s Road Network

Source: IRMS and author’s calculations. Thick black lines correspond to road sections that are 80 percent paved or greater, while thin black lines correspond to52road sections that are less than 80 percent paved.

Figure 4: Average Travel Times: Java

53

Figure 5: Average Travel Times: Sumatera

54

Figure 6: Average Travel Times: Sulawesi

55

Figure 7: Evolution of New Firm Counts and Industrial Concentration

Source: SI data and author’s calculations. Lines depict annual means or medians of different indices of industrial concentration across 5-digit industries.

56

Figure 8: Shares)

Evolution of Industrial Concentration Measures (Inventory

57

Figure 9: Share of New Firms Locating in Different Types of Kabupatens (Adjacency)

Source: SI data and author’s calculations. Lines depict shares of new firms locating in different types of kabupatens within Java, Sumatra, and Sulawesi. A total of 51 out of 218 kabupatens were classified as Kota / Kotamadya in 1990. “Neighbors of Kota / Kotamadya” are kabupatens that share a border with 1990 cities; there were 60 kabupatens in this category. “Neighbors of Neighbors of Kota / Kotamadya” are kabupatens that share a border with kabupatens who share a border with 1990 cities; there were 78 kabupatens in this category. The remaining 29 kabupatens are categorized as “Rural”. In classifying, some kabupatens fit into multiple categories, and when this occurred, the kabupaten was assigned to the closest group possible.

58

Figure 10: Share of New Firms Locating in Different Types of Kabupatens (Adjacency, Inventory Share)

Source: SI data and author’s calculations. Lines depict shares of new firms locating in different types of kabupatens within Java, Sumatra, and Sulawesi. A total of 51 out of 218 kabupatens were classified as Kota / Kotamadya in 1990. “Neighbors of Kota / Kotamadya” are kabupatens that share a border with 1990 cities; there were 60 kabupatens in this category. “Neighbors of Neighbors of Kota / Kotamadya” are kabupatens that share a border with kabupatens who share a border with 1990 cities; there were 78 kabupatens in this category. The remaining 29 kabupatens are categorized as “Rural”. In classifying, some kabupatens fit into multiple categories, and when this occurred, the kabupaten was assigned to the closest group possible.

59

Figure 11: Distance Function

x axis is τ (travel time, in hours), and the y axis is f (τ ) = δ0 + δ1 τ + δ2 τ 2 + δ3 τ 3 .

60

Figure 12: Cross Wages Elasticities: 1990

Source: Author’s calculations.

61

Figure 13: Cross Market Potential Elasticities: 1990

Source: Author’s calculations.

62

A A.1

Data Appendix Road Quality Data

Data on the quality of Indonesia’s highway networks were produced by DPU as part of Indonesia’s Integrated Road Management System (IRMS). This appendix section begins by providing some background on road management in Indonesia, describing the road classification system and discussing IRMS coverage. It then discusses the measures of road quality that are collected in IRMS and how they are measured. I then discuss how the road network data were created.

A.1.1

Background on Road Management

Indonesia’s national road network is currently managed and maintained by the Department of Public Works (Departemen Pekerjaan Umum, DPU), specifically by the Directorate General of Highways (Direktorat Jenderal Bina Marga). According to Law No. 38, 2004, roads are classified into four different types of roads, primarily based on their function for users. Arterial roads (jalan arteri ) serve as the major transportation linkages between urban areas, and are characterized by longer distances, higher speeds, and limited access. Speeds are meant to be a minimum of 60 km/h, and width should be at least 11 meters to accommodate larger traffic volumes. Collector roads (jalan kolektor ) serve “collector or distributor transportation” and are characterized by medium distance travel with medium speeds. Collector roads are subdivided into primary collector roads (jalan kolektor primer ), which should have a minimum speed of 20 km/h and width of 9 meters, and secondary collector roads, which should have a minimum speed of 20 km/h and width of 9 meters. Local roads (jalan lokal ) and Neighborhood Roads (jalan lingkungan) serve local areas at lower speeds, and are characterized by unlimited access. Roads can also be classified by their management authority, or “status” (wewenang penyelenggaraan). Generally, arterial and primary collector roads are managed by the national government (specifically by DPU). Secondary and tertiary collector roads are managed by provincial governments, while local and neighborhood roads are managed by the kabupaten, kecamatan, and desa governments. Table A.1 describes the road classification system, minimum speed and width guidelines, and management authorities. Table A.2 depicts the coverage of the IRMS dataset by road function and managing authority, as measured by counts of the number of kilometer-post observations that appear in the entire dataset. Most of the observations, and indeed most of the road network, is made up by collector roads (K1-K3), though the category with the next largest coverage is the arterial roads. Local and neighborhood roads are not very well surveyed in this dataset. Although the network of village and kabupaten roads is doubtless extremely dense, I cannot use this dataset to say very much about it. But since the data do cover arterial and collector roads, the major roads connecting regions and cities in Indonesia, this dataset seems particularly well suited for evaluating models of economic geography and regional trade.

A.1.2

Measures of Road Quality

There are a number of different devices that transport engineers have developed to collect measurements of road quality, and there are several different measures of road quality. The most widely used measure of road roughness, and the measure used in this study, is the international roughness index (IRI), developed by the World Bank in the 1980s. IRI is constructed as a filtered

63

ratio of a standard vehicle’s accumulated suspension motion (in meters), divided by the distance travelled by the vehicle during measurement (in kilometers). Expressed in units of slope (m/km), IRI is a characteristic of a vehicle’s longitudinal profile. Importantly, since it is a measure of a physical quantity, IRI is standardized, as opposed to other subjective measures of ride quality. Figure A.1 shows the relationship between different ranges of IRI and surface type; generally, larger roughness levels correspond to worse surfaces, but the mapping is not one-to-one. Bennett et al. (2007) distinguish between several different types of devices for measuring road roughness and provide a good overview of their relative strengths and weaknesses. Over the course of its existence, Indonesia’s IRMS has largely made use of two different types of measuring devices.38 Before 1999, roads were surveyed using devices like the ROMDAS, which estimate IRI indirectly. The ROMDAS machine is a calibrated bump integrator, which must first be calibrated and estimates IRI from correlation equations. It is very useful for measuring roughness on bumpy roads and can record high levels of IRI, but the device must be calibrated manually, and measurement error can occur if the device is miscalibrated. The ROMDAS device is also portable, meaning that it can be used inside different vehicles (each of which would require unique calibrations). The portability contrasts with devices like the high-speed laser profilometer, which is essentially a separate vehicle reserved entirely for the purposes of collecting road quality data. The device uses lasers and optical techniques to scan the road as it is traversed and create measures of surface profiles. These instruments are very accurate, but are much more expensive. Moreover, they might become mis-calibrated on extremely rough roads. Indonesia started using the high speed laser profilometer for collecting its road quality data in 1999, licensing vehicles from the Australian government. Road width and surface type are more straightforward variables to measure, involving visual inspection and simple measurement. I categorize a kilometer-post interval as being unpaved if it is either an earth, gravel, or sand road, or if it was given a granular base (crushed stone) treatment, a first step in the process of paving.

A.1.3

Creation of Road Network Data

Using GIS shapefiles of the road network provided to me by DPU, I have georeferenced the kilometer post observations of road quality, in order to capture the evolution of Indonesia’s transportation network over space and time. This proved to be a challenging exercise, because the identifiers for each road-link-interval observation were not consistent over time, and because the identifiers in the shapefile and in the linearly referenced dataset were often different, even though both did refer to exactly the same link. Once the IRMS interval data was successfully merged to the regional network shapefiles, I converted the GIS database of road links into a weighted graph of arcs and nodes, as commonly used in the transportation literature. Nodes represent locations (such as ports, cities, or the centroids of kabupatens, my unit of analysis), arcs represent the possibility of traveling between two nodes, and weights represent the cost of moving goods along a given arc. Weights were I am very grateful for the extensive discussions I’ve had with Glen Stringer about IRMS; this section of the appendix benefits highly from our conversations. 38

64

constructed according to the IRMS data on road quality, and for simplicity, the cost of moving along each road was assumed to be the same, no matter which way you were traveling.39 For computational reasons, I have used a simplified representation of Indonesia’s road network, where the number of nodes and links was small enough for network algorithms to operate on it using a desktop computer.40 Table A.3 depicts the number of network arcs, the total distance of the network, and merge statistics for the kilometer-post observations. Merge statistics are pretty good for arterial and collector roads, but the quality of merges falls substantially for local and neighborhood roads, due most likely to poor shapefile coverage for that type of road network. The interval observations were not matched directly to their exact locations in the network, because I had no knowledge of the exact location of the kilometer posts. To deal with this, I first aggregated the kilometer-post interval observations to the road-link level by constructing distance-weighted averages of the road quality variables. Each network arc-year observation was then assigned the value of this average road quality variable that corresponds to its road link.41

A.1.4

Roughness, Speed, and Ride Quality

One effect that rough roads have on vehicles is that they require the driver to travel at lower speeds. When faced with potholes, ragged pavement, or poor surfaces, drivers slow down, and this reduction in speed increases travel time and hence the cost of travel. Of course, there is not a one-to-one relationship between road roughness and speed, because drivers choose the speed at which they travel, and different preferences for smoothness of the ride or the desired arrival time might induce different choices of speed. Yu et al. (2006) explore the relationship between jolt, or the “jerk” experienced by road users, and subjective measures of ride quality and road roughness at different speeds.42 Using survey data in which users were asked to rate the quality of particular rides, the authors find that people experience greater discomfort while traveling at higher speeds on rough roads, but lowering speed on rough roads can reduce discomfort. The authors provide a mapping between subjective measures of ride quality and roughness at different speeds, and this mapping can be used to infer the maximum speed that one can travel in order to achieve a ride of a certain quality, given pavement roughness. Table A.4 reproduces this mapping. Because travel times were unreasonably long for high quality rides given Indonesia’s rough roads, and because the subjective quality measures were chosen by Western drivers, I have focused on the poor ride quality speed thresholds in my empirical work. Another tedious issue involved the construction of junction points where the road links intersected. The shapefiles were originally stored as MapInfo files, an older shapefile format that required conversion for use with Arcview, and in this conversion, information on where the roads crossed was lost, requiring painstaking editing. The shapefiles were also not designed to be used in any network analysis, so much care had to be taken to make them usable. 40 The road lines were straightened using the “Generalize” command from ET Geotools, which employs the RamerDouglasPeucker algorithm for reducing the number of points that represents a line. 41 In some cases, when a network arc had no data for a particular year, I assigned the network arc the average value of road quality for arcs with the same function. This was done because constructing the transport cost variables involved a search over the entire network, and if certain network arcs were coded as missing, this could distort the search substantially. Overall, imputation amounted to no more than 5 percent of network arc observations in any given year. 42 Jolt is officially defined as the vector that specifies the time-derivative of acceleration; in other words, the third derivative of the vertical displacement of vehicle to time t. 39

65

Given the maximum speed that one can travel on roads of different roughness levels, it is straightforward to calculate travel times for each network arc, the primary measure of transport costs used in this study. Note that the travel times on road sections were computed using the detailed kilometer-post interval roughness data. These were then aggregated to the network arcs using distance-weighted averages.

A.1.5

Shortest Paths Between Kabupaten Centroids

Given the distance and travel times associated with traversing each network arc, constructing the shortest path between points on the network is straightforward, using Djekstra’s shortest path algorithm. Although the network of inter-urban roads is fairly dense, kabupaten centroids were generally not directly connected to the network. When this was the case, a small segment was added to the network connecting the centroid to the closest road junction point, on the assumption that the network of local and neighborhood roads is sufficiently dense for this to be a reasonable approximation. The shortest-path search was then conducted on this augmented network.

A.2

Administrative Boundaries

Administrative boundary shapefiles were constructed by BPS for use during the 2000 Household Census. These shapefiles contain the polygon boundaries of all provinces, kabupatens, kecamatans, and desas for the entire extent of the Indonesian archipelago. However, after the fall of Suharto and a massive decentralization program, many new kabupatens were created, splitting existing kabupatens into new ones. For instance, in 1990 there were 290 kabupatens and kotas, but by 2003, there were 416 kabupatens and kotas. The fact that administrative boundaries are not fixed over time create difficulties for the analysis. Because of the need for a geographic unit of analysis that was consistently defined over time, I used kabupaten borders as they were defined in 1990. BPS provided the administrative boundary shapefile for 2000, as well as a correspondence table between kabupaten codes in 2000 and kabupaten codes from 1990 to the present. This information was processed using ArcView to create the 1990 shapefiles that form the basis of the analysis. Throughout the paper, all survey data were appropriately merged back to the 1990 kabupaten definitions.

A.3

Spatial, Topographical, and Agro-climatic Variables

Agricultural and climatic variables were created from a variety of sources and often were calculated with the assistance of GIS software (ArcView). This section describes those data in detail and how each of the variables were constructed.

A.3.1

Map Projection

To compute distances correctly, using linear units of measurement (i.e. meters of kilometers), I made use of the Batavia Transverse Mercator (TM) 109 SE projected coordinate system in all of my GIS work. Specific details on the map projection for use with ArcView or other GIS software are the following: Projected Coordinate System: Batavia_TM_109_SE Projection: Transverse_Mercator False_Easting: 500000.00000000

66

False_Northing: 10000000.00000000 Central_Meridian: 109.00000000 Scale_Factor: 0.99960000 Latitude_Of_Origin: 0.00000000 Linear Unit: Meter Geographic Coordinate System: GCS_Batavia Datum: D_Batavia Prime Meridian: Greenwich Angular Unit: Degree

A.3.2

Slope, Aspect, and Elevation Data

Topographical variables were created using raster data from the Harmonized World Soil Database (HWSD), Version 2.0.43 The raster files are compiled from high-resolution source data and aggregated to 30 arc-second grids (approximately 1 km2 cells). Elevation data were computed for each administrative boundary polygon as the average elevation over the entire polygon. They were also computed for each centerline GPS coordinate, and in the event that the altitude was not properly recorded during the centerline survey, the HWSD elevation data were used.44 Slope and aspect data were also recorded for each administrative boundary polygon and calculated similarly. Slope rasters were computed as the percentage of each 30 arc-second grid that has a 0% to 0.5% gradient (slope1), a 0.5% to 2% gradient (slope2), a 2% to 5% gradient (slope3), a 5% to 10% gradient (slope4), a 10% to 15% gradient (slope5), a 15% to 30% gradient (slope6), a 30% to 45% gradient (slope7), and a gradient greater than 45% (slope8). Aspect raster data were recorded as the percentage of each 30 arc-second cell sloping North (315◦ 45◦ ), South (135◦ 225◦ ), East (45◦ 135◦ ), and West (225◦ − 315◦ ). The raster files are only calculated if the gradient is greater than 2%. Variables equal to the average share of each administrative boundary corresponding to each slope class were constructed using ArcView Software. Data Citation: Fischer, G., F. Nachtergaele, S. Prieler, H.T. van Velthuizen, L. Verelst, D. Wiberg, 2008. Global Agro-ecological Zones Assessment for Agriculture (GAEZ 2008). IIASA, Laxenburg, Austria and FAO, Rome, Italy. Data from the HWSD project are publicly available and can be downloaded here: http://www.iiasa.ac. at/Research/LUC/luc07/External-World-soil-database/HTML/index.html?sb=1. The terrain, slope, and aspect database provided by HWSD researchers was compiled from a high-resolution digital elevation map constructed by the Shuttle Radar Topography Mission (SRTM). SRTM data is also publicly available as 3 arc-second digital elevation maps (DEM) (approximately 90 meters resolution at the equator), available here: ftp://e0srp01u.ecs.nasa.gov/srtm/. The proper data citation is: Fischer, G., F. Nachtergaele, S. Prieler, H.T. van Velthuizen, L. Verelst, D. Wiberg, 2008. Global Agro-ecological Zones Assessment for Agriculture (GAEZ 2008). IIASA, Laxenburg, Austria and FAO, Rome, Italy. 44 The HWSD elevation raster file records the median elevation (in meters) for each 30 arc-second grid of the Earth’s surface. The median is computed across space, from the values of all 3 arc-second cells in the SRTM database. 43

67

A.3.3

Ruggedness

A 30 arc-second ruggedness raster was computed for Indonesia according to the methodology described by Sappington et al. (2007). The authors propose a Vector Ruggedness Measure (VRM), which captures the distance or dispersion between a vector orthogonal to a topographical plane and the orthogonal vectors in a neighborhood of surrounding elevation planes. To calculate the measure, one first calculates the x, y, and z coordinates of vectors that are orthogonal to each 30-arc second grid of the Earth’s surface. These coordinates are computed using a digital elevation model and standard trigonometric techniques. Given this, a resultant vector is computed by adding a given cell’s vector to each of the vectors in the surrounding cells; the neighborhood or window is supplied by the researcher. Finally, the magnitude of this resultant vector is divided by the size of the cell window and subtracted from 1. This results in a dimensionless number that ranges from 0 (least rugged) to 1 (most rugged).45 For example: on a (3 × 3) flat surface, all orthogonal vectors point straight up, and each vector can be represented by (0, 0, 1) in the Cartesian coordinate system. The resultant vector obtained from adding all vectors is equal to (0, 0, 9), and the VRM is equal to 1 − (9/9) = 0. As the (3 × 3) surface deviates from a perfect plane, the length of the resultant vector gets smaller, and the VRM increases to 1.

A.3.4

Rainfall and Temperature

We obtain historical rainfall and temperature data from weather stations across Indonesia using the Global Historical Climatology Network (GHCN) Precipitation and Temperature Data (Version 2). The data include monthly records for each weather station as well as the latitude and longitude coordinates of the stations location.46 The station coordinate information was first projected using ArcView and distances to the centerlines of each kilometer post were computed. We disallow matches between districts and rainfall stations that are more than 500 kilometers apart. Because the number of rainfall and temperature stations varies over time, data from different rainfall stations may be linked to the same district over time.

A.3.5

Earthquakes

Data on significant earthquakes that have occurred in Indonesia are taken from the U.S. Geological Survey’s Earthquake Data Base.47 The database contains a number of variables, including the latitude and longitude coordinates of the earthquake’s epicenter, the magnitude of the earthquake (measured by the Richter scale), and importantly the depth of the epicenter. The authors have generously provided a Python script for computing their Vector Ruggedness Measure (VRM) in ArcView. The script and detailed instructions for installation can be found here: VectorRuggednessMeasure(VRM)ToolforArcGIShttp://arcscripts.esri.com/details.asp?dbid= 15423. 46 These data are produced jointly by the National Climatic Data Center, Arizona State University, and Carbon Dioxide Information Analysis Center at Oak Ridge National Laboratory. They are publicly available and can be accessed online here: http://www.ncdc.noaa.gov/oa/climate/research/ghcn/ghcn.html. 47 The USGS Earthquake Data Base is publicly available and can be found here: http://neic.usgs.gov/ neis/epic/database.html. 45

68

I use data on all earthquakes that have occurred from 1985 to 2007, although the database extends back historically much further. Many earthquakes occur underneath the sea floor and can cause tsunamis, such as the 9.3 magnitude Indian Ocean earthquake in December, 2004, which triggered a massive tsnunami and caused tremendous devastation for the province of Aceh in North Sumatra. For this reason, I included all earthquakes within a latitude-longitude window of the Indonesian archipelago for my analysis.48 To measure the intensity of earthquakes that affected given kabupatens, I construct a measure relating the magnitude of all earthquakes experienced in year t weighted by the focal distance between the earthquake’s epicenter and the particular kabupaten centroid. Let qt = 1t , 2t , ..., Qt index the set of earthquakes occurring in year t, and let Dqt j measure the distance between earthquake qt and the centroid of kabupaten j, given by the following: � Djqt = (xqt − xj )2 + (yqt − yj )2 + (zqt − zj )2 where x and y denote longitude and latitude coordinates, respectively, and z denotes elevation (or depth). The earthquake data were projected onto the GCS Batavia projected coordinate system, so that the latitude and longitude coordinates were converted to meters. Let Mqt denote the magnitude of earthquake qt , measured by the usual Richter scale. I compute the intensity of earthquakes experienced in kabupaten j in year t as follows: Qt � Mq t Ijt = Djqt qt =1

This measure is standard in the geological literature and has been used in economics, for instance, by Anbarci et al. (2005). Since not all earthquakes are large enough to do significant damage, I also compute an intensity measure using only large earthquakes, in which the Richter scale is greater than 5.5.

The exact window used in the data search is given by the following latitude and longitude coordinates: (11◦ N, 145◦ E) and (11◦ S, 89◦ E). 48

69

Table A.1: Indonesia’s Road Classification System Function

Code

Arterial Collector-1 Collector-2 Collector-3 Local Neighborhood

A K1 K2 K3 L Z

Minimum Speed 60 40 20 20 20 15

Minimum Width

Management Authority

11 m 9m 9m 9m 7.5 m 6.5 m

National National Provincial Provincial Kabupaten & Desa Kabupaten & Desa

km/h km/h km/h km/h km/h km/h

Source: Departemen Pekerjaan Umum, 2008

Table A.2: Road Function and Managing Authority, Kilometer-Post Observations, 1990-2007 Road Function

Java

Sumatra

Sulawesi

Managing Authority

Code

Number of Obs.

Share of Total

Code

Number of Obs.

Share of Total

A K1 K2 K3 L Z Total

52,917 40,889 121,386 10,714 15,862 72,619 314,387

0.17 0.13 0.39 0.03 0.05 0.23 1.00

N P K S

93,808 132,649 15,862 72,068

0.30 0.42 0.05 0.23

Total

314,387

1.00

A K1 K2 K3 L Z Total

103,160 99,782 235,750 27,632 11,391 45,680 523,395

0.20 0.19 0.45 0.05 0.02 0.09 1.00

N P K S

202,915 263,409 11,391 45,680

0.39 0.50 0.02 0.09

Total

523,395

1.00

A K1 K2 K3 L Z Total

54,496 87,728 71,234 1,887 18,232 29,371 262,948

0.21 0.33 0.27 0.01 0.07 0.11 1.00

N P K S

143,147 72,198 18,232 29,371

0.54 0.27 0.07 0.11

Total

262,948

1.00

Source: IRMS and author’s calculations. Data come from kilometer-post observations. Standard deviations in parentheses.

70

Table A.3: Number of Network Arcs, Distances, and Merge Statistics (by road function) Road Function A

K1

K2

K3

L

Z

Miss

1168 220 2944.91

889 129 1970.65

2618 354 5832.59

309 43 750.39

315 72 663.44

37 6 92.16

. . .

Link-Years Merged Link-Years Unmerged % Merged

16538 1838 0.90

13685 735 0.95

38719 1842 0.95

3876 45 0.99

4689 971 0.83

14572 21772 0.40

3015 157 0.95

Arc-Years Merged Arc-Years Unmerged % Merged

20,844 180 0.99

16002 0 1.00

46350 774 0.98

5562 0 1.00

5670 0 1.00

666 0 1.00

. . .

# of Arcs # of Road IDs Total Distance

1485 207 4964.69

1205 165 4469.43

2975 412 11551.28

453 87 1492.97

277 66 571.67

22 6 56.44

41 13 147.56

Link-Years Merged Link-Years Unmerged % Merged

24755 718 0.97

20035 373 0.98

49171 537 0.99

6808 52 0.99

2603 394 0.87

8730 9722 0.47

1406 12 0.99

Arc-Years Merged Arc-Years Unmerged % Merged

26730 0 1.00

21690 0 1.00

51876 1674 0.97

7830 324 0.96

4986 0 1.00

396 0 1.00

0 738 0.00

1624 113 2836.96

2319 116 3805.92

2051 150 4369.33

15 4 28.35

391 44 732.96

. . .

45 1 70.34

Link-Years Merged Link-Years Unmerged % Merged

24006 25 1.00

24006 356 0.99

34711 410 0.99

30911 339 0.99

551 9 0.98

5670 118 0.98

5674 4755 0.54

Arc-Years Merged Arc-Years Unmerged % Merged

25794 3438 0.88

35694 6048 0.86

33660 3258 0.91

270 0 1.00

7038 0 1.00

. .

0 810 0.00

# of Arcs # of Road IDs Total Distance Java

Sumatra

# of Arcs # of Road IDs Total Distance Sulawesi

Source: IRMS and author’s calculations. Missing function information is attributable to poorly coded shapefiles. Arc-Years could be unmerged potentially because there were no surveys done on that particular link; statistics are computed assuming a balanced panel. Road IDs are defined in the shapefile, while Link IDs are defined from the IRMS data.

71

Table A.4: Roughness and Ride-Quality Speed Limits Max Speed

Good

Fair

Mediocre

Poor

120 km/h 100 km/h 80 km/h 70 km/h 60 km/h 50 km/h 40 km/h 30 km/h 20 km/h 10 km/h

IRI ∈ [0.00, 1.49] IRI ∈ [1.49, 1.79] IRI ∈ [1.79, 2.24] IRI ∈ [2.24, 2.57] IRI ∈ [2.57, 2.99] IRI ∈ [2.99, 3.59] IRI ∈ [3.59, 4.49] IRI ∈ [4.49, 5.99] IRI ∈ [5.99, 8.99] IRI ∈ [8.99, ∞)

IRI ∈ [0.00, 1.89] IRI ∈ [1.89, 2.27] IRI ∈ [2.27, 2.84] IRI ∈ [2.84, 3.25] IRI ∈ [3.25, 3.79] IRI ∈ [3.79, 4.54] IRI ∈ [4.54, 5.69] IRI ∈ [5.69, 7.59] IRI ∈ [7.59, 11.39] IRI ∈ [11.39, ∞)

IRI ∈ [0.00, 2.70] IRI ∈ [2.70, 3.24] IRI ∈ [3.24, 4.05] IRI ∈ [4.05, 4.63] IRI ∈ [4.63, 5.40] IRI ∈ [5.40, 6.25] IRI ∈ [6.25, 8.08] IRI ∈ [8.08, 10.80] IRI ∈ [10.80, 16.16] IRI ∈ [16.16, ∞)

IRI ∈ [0.00, 3.24] IRI ∈ [3.24, 4.05] IRI ∈ [4.05, 4.63] IRI ∈ [4.63, 5.40] IRI ∈ [5.40, 6.25] IRI ∈ [6.25, 8.08] IRI ∈ [8.08, 10.80] IRI ∈ [10.80, 16.16] IRI ∈ [16.16, 32.32] IRI ∈ [32.32, ∞)

Source: Author’s calculations and Yu et al. (2006), Table 2. IRI denotes the international roughness index, measured in m/km. Ride quality levels are subjective and measured on a 5-point scale (“Very Good”, “Good”, “Fair”, “Mediocre”, and “Poor”).

72

Figure A.1: Roughness and Surface Type

Source: Sayers et al. (1986).

73

B B.1

Derivations for the Structural Model Consumer Demands

To derive the consumer demands for individual varieties, (2), first let Ek represent consumer expenditures on industry k. To choose optimal bundles of varieties from industry k, we setup the following Lagrangian: � � � 1 k k Lk = Mk + λk Ek − p (j)q (j)dj =

��

0

1

q (j) k

σk −1 σk

0

k � σ σ−1 k



+ λ k Ek −



1

p (j)q (j)dj k

k

0



Taking the derivative of this with respect to q k (j), we have: ∂Lk = ∂q k (j)



σk σk − 1

� ��

1

q (j) k

σk −1 σk

0

� σ 1−1 �

� σk − 1 k −1 set q (j) σk − λk pk (j) = 0 σk �� 1 � σ 1−1 σk −1 −1 k k q (j) σk q k (j) σk = λk pk (j)

k

=⇒

0

Rearranging terms, we have: ��

1

q k (j)

σk −1 σk

q (j)

σk −1 σk

0

��

1

0

=⇒

k

� σ 1−1 k

k � σ σ−1 k

1

= λk pk (j)q k (j) σk = λσk k pk (j)σk q k (j)

k k Mk λ−σ p (j)−σk = q k (j) k

(12)

Now, multiplying both sides by pk (j) and integrating over the set of varieties, we have: k k Mk λ−σ p (j)1−σk = pk (j)q k (j) k �� 1 � � 1 −σk k 1−σk Mk λk p (j) dj = pk (j)q k (j)dj ≡ Ek

0

0

So, rearranging, we have: k Mk λ−σ = �1 k

0

Ek pk (j)1−σk dj

(13)

Plugging (13) into (12), we arrive at the following expression: pk (j)−σk Ek pk (j)−σk Ek q k (j) = � 1 = k 1−σk dj (P k )1−σk 0 p (j)

where P k is the price index defined in (3). All that remains is to determine Ek , the share of the budget spent on manufacturing varieties from industry k. But, note that (1) is just a Cobb-Douglass utility function over the CES manufacturing indices. Hence, the budget shares are determined by the λk ’s, and Ek = λk Y.

74

B.2

Firm Pricing

To derive the profit-maximizing prices that firms charge for varieties, note that a firm’s profits from operating in region o and shipping goods to region d are given by: � � k k πod (i) = pkod (i) − mko (i)wτod q k (i)

Note that expression takes into account the iceberg transport costs assumption, that in order to k units must be produced. deliver one unit of the variety to region d, τod Taking the derivative of this profit function with respect to pkod (i), we have: � � ∂q k (i) k (i) ∂πod k k k k = q (i) + p (i) − m (i)wτ o od od ∂pkod (i) ∂pkod (i) Setting this expression equal to zero and rearranging, we have: � k � � � ∂q k (i) ∂q (i) k k q k (i) + pkod (i) = m (i)wτ o od ∂pkod (i) ∂pkod (i) � k � � � ∂q k (i) q (i) ∂q k (i) k k pkod (i) + = m (i)wτ o od pkod (i) ∂pkod (i) ∂pkod (i) � � � � � ∂q k (i) pk (i) � ∂q k (i) pkod (i) k k k od pod (i) 1 + k = mo (i)wτod (14) ∂pod (i) q k (i) ∂pkod (i) q k (i) � �� � We compute ∂q k (i)/∂pkod (i) pkod (i)/q k (i) using the consumer’s demand function, (2), and noting that because of the Dixit-Stiglitz structure of competition, firms ignore the effect that their prices have on the price index for their industry in region d, Pdk . This gives us: � � k � � −σk pkod (i)−σk −1 µk Yd pkod (i) ∂q (i) pkod (i) = � k �1−σk q k (i) ∂pkod (i) q k (i) Pd � � � �1−σ k Pdk pkod (i)−σk µk Yd = −σk = −σk � k �1−σk pkod (i)−σk µk Yd P d

Plugging this result into (14), we have:

� � k pkod (i) (1 − σk ) = mko (i)wτod − σk

from which (5) follows immediately.

75

C

Logit Model Estimation

In the paper, I setup a choice model that allows for endogenous choice characteristics, as in the usual BLP random coefficients logit framework. For each firm i, I take NS = 50 scrambled Halton draws from a N (0, 1) distribution to compute the random coefficients component of the choice probabilities. The probability that firm i chooses location j at time t is then approximated by: � � � NS k (σ v s + π D + ... + π � exp{x�jt β + ξj + υjt + K x D )} 1 i1 iD k k1 kD i k=1 jt P�ijt = �J �K � β+ξ +υ k (σ v s + π D + ... + π NS 1 + exp{x + x m mt k i k1 i1 kD DiD )} mt m=1 k=1 mt s=1

where we normalize the utility of choosing the outside option to zero. I have access to a census of manufacturing firms, and so assuming there is no sampling error, I have both the macro data (the total probability that firms choose a particular location at time t) as well as the micro data. Noting that the terms x�jt β + ξj + υjt are common to all individuals, we can write: δjt ≡ x�jt β + ξj + υjt Crucially, the unobserved component of mean utility, υjt , which creates all of the estimation problems in usual random coefficient discrete choice models, is entirely subsumed within the δjt ’s. Let θ1 = (β � , ξ � )� collect the “linear” parameters of the model, which are subsumed within the δjt ’s, and let θ2 = (π � , σ � )� denote the non-linear parameters of the model, including the coefficients on the demographic interactions as well as the standard deviation terms. To estimate θ = (θ1� , θ2� )� , I make use of the following 2-step estimation routine: 1. Step 1: Estimate the δjt ’s and θ2 using maximum simulated likelihood. • Although a fully implemented maximum simulated likelihood is theoretically possible, in practice it is computationally infeasible. My dataset has nearly 200 locations observed for 10 years, so the δjt parameter space is way too large to search over. Consequently, I will maximize the simulated likelihood only over θ2 . For each value of θ2 , I choose δjt = δjt (θ2 ) to ensure that the mean utility components satisfy a market share constraint. 2. Step 2: To recover the linear parameters, θ1 , we estimate the following regression using 2SLS/GMM: � δ� jt = xjt β + ξj + υjt

where we use instruments for the endogenous xjt ’s. The method of moments estimator of θ1 solves the sample analogues of the following moments: E[Z� (δ − X� β)] = 0 where Z is a matrix of M instrument. Additional moment restrictions are appended at this second stage.

C.1

Interactions

Note that in order for the procedure to work, we need consistent estimation of the δjt ’s, which are the mean utility parameters. When constructing the interaction terms, care must be taken to ensure that the δjt ’s accurately reflect mean utility and not utility of the omitted demographic group.

76

In the estimation, I created indicators for each group as follows. Index groups by d = 1, ..., D, � and let the last group, D, denote the omitted demographic group. Define D id as an indicator for whether firm i belongs to industrial sector d. Then, define: � 1 �N 1 − i=1 Did i is in group d N � id − D � id = Did = D 1 �N − N i=1 Did else This just amounts to demeaning the interaction terms, once they are created. To see why this works, let PdG denote the probability that firm i belongs to group d. Note that for plants in the included group d, the expected value of yijt (the choice indicator) is given by: E[ yijt | i ∈ d ] = Pijt � � =

� � k exp{δjt + K k=1 xjt (σk vi + πkd − πk1 µ1 − ... − πk,D−1 µD−1 )} dF (vi ) � � k 1 + Jm=1 exp{δmt + K k=1 xmt (σk vi + πkd − πk1 µ1 − ... − πk,D−1 µD−1 )} � � � exp{x�jt (β + πkd − µ� πk + σvi ) + ξj + υjt } = dF (vi ) � 1 + Jm=1 exp{x�mt (β + πkd − µ� πk + σvi ) + ξm + υmt }

where πk = (πk,1 , πk,2 , ..., πk,D−1 ) is a ((D − 1) × 1) vector of coefficients on the included demographic interaction terms for variable k, and µ = (µ1 , µ2 , ..., µD−1 ) is a ((D − 1) × 1) vector of probabilities that firms are members of each group. Hence, to uncover mean parameters for included demographic group d, we simply add the mean parameters β to the interaction terms πkd , then subtract µ� πk . For the omitted group D, note that in expectation, Did = −µd for all included groups, d �= D. Hence, the expected value of yijt is given by: � � � � k � exp{δjt + K k=1 xjt (σk vi − µ πk )} E[ yijt | i is in group D ] = Pijt = dF (vi ) � � k � 1 + Jm=1 exp{δmt + K k=1 xmt (σk vi − µ πk )} � � � exp{x�jt (β − µ� πk + σvi ) + ξj } = dF (vi ) � 1 + Jm=1 exp{x�mt (β − µ� πk + σvi ) + ξm } So, to uncover the mean parameters for the omitted demographic group D, we subtract µ� πk from the mean parameters, β. Of course, the goal of the exercise is not to estimate parameters on the demeaned interaction terms, πk , but to instead estimate parameters for each industry: � β + πkd − µ� πk d ∈ {1, ..., D − 1} γkd ≡ β − µ� πk d=D To construct these parameters and perform inference on the γ’s, we make use of the so-called Delta method. Specifically, we will show later that the estimated parameters, β�k = (βk , πk,1 , ..., πk,D−1 )� , are asymptotically normal: √

d N (θ�k − θk ) −→ N (0, V)

77

Define:



   R =  (D×D) 

1 (1 − µ1 ) −µ2 1 −µ1 (1 − µ2 ) .. .. .. . . . 1 −µ1 −µ2 1 −µ1 −µ2

... ... .. .

−µD−1 −µD−1 .. .



     . . . (1 − µD−1 )  ... −µD−1

It is easy to see that Rβ�k = γ �k , the vector of mean utilities plus demographic parameters, which are the estimates we care about.

C.2

Details: Step 1

The probability that firm i chooses location j at time t is given by the following: � � � � k s exp{x�jt β + ξj + υjt + K k=1 xjt (σk vi + πk1 Di1 + ... + πkD DiD )} Pijt = dF (vi ) � � k s 1 + Jm=1 exp{x�mt β + ξm + υmt + K k=1 xmt (σk vi + πk1 Di1 + ... + πkD DiD )} This integral is not computed analytically, but is instead approximated by simulation. For each plant, we draw NS = 50 values of vis from a standard normal distribution. We then approximate each individual choice probability by computing the following: � � � NS � exp{x�jt β + ξj + υjt + K xkjt (σk vis + πk1 Di1 + ... + πkD DiD )} 1 k=1 P�ijt = � � k s NS 1 + Jm=1 exp{x�mt β + ξm + υmt + K k=1 xmt (σk vi + πk1 Di1 + ... + πkD DiD )} s=1 NS 1 � ≡ Psijt NS s=1

where Psijt is just the usual logit probability for simulation s. The simulated log-likelihood function is formed in the usual way: Nt � Jt T � � � � � � LL θ2 , δ(θ2 ) = yijt ln P�ijt θ2 , δ(θ2 ) t=1 i=1 j=1

Note that the number of individuals choosing at time period t, Nt , and the number of locations chosen over at time period t, Jt , vary over time. I’m estimating the choice model on the sample of new firms each year, and not all locations are chosen each period, which is why the size of choice sets and the number of firms change each year. I maximize the simulated likelihood function over θ2 = (π � , σ � )� , but at each iteration, I solve for the δjt ’s that equate actual market shares with predicted shares, using the standard BLP contraction mapping: H+1 H δjt = δjt + ln Sjt − ln S�jt (θ, δ)

The contraction mapping reduces the dimensionality of the parameter space considerably, but this creates some additional complications when computing the gradient. Since we estimate θ2 and δjt conditional on θ2 , we have to be careful when computing the score of the likelihood function with respect to θ2 . We need to account for the fact that changing

78

� s: θd also changes the δjt

� � dLL θ2 , δ(θ2 ) ∂LL ∂LL ∂δ = + · dθ2 ∂θ ∂δ ∂θ2 � ��2� � �� � ���� (1)

(2)

(3)

To simplify exposition in the discussion that follows, I’m going to subsume all of the interaction terms in one vector. Let Xijt denote the (1 × (K × D)) vector of choice characteristics interacted with demographic characteristics that each individual i faces when choosing location j at time t: �� � � � � �� K Xijt ≡ x1jt Di1 , ..., x1jt DiD , x2jt Di1 , ..., x2jt DiD , ..., xK jt Di1 , ..., xjt DiD This notation lets us write the following: ΠXijt =

K �

xkjt (πk1 Di1 + ... + πkD DiD )

k=1

We can also do the same thing for the choice characteristics interacted with the simulation draws: � � s s Vijt ≡ x1jt vis , x2jt vis , ..., xK jt vi This is a (1 × K) vector, unique for each simulation s.

C.2.1

Gradient, First Term

The first term of the gradient is straightforward conceptually, but a bit challenging to compute. The fact that the choice probabilities involve averages of logit probabilities instead of logit probabilities themselves makes them more computationally demanding to compute: � � Jt T Nt � ∂ ln P�ijt ∂LL � � = yijt ∂θ2 ∂θ2 t=1 i=1 j=1 � � Nt � Jt T � � yijt ∂ P�ijt = � ∂θ2 t=1 i=1 j=1 Pijt

For the ((K × D) × 1) vector of demographic parameters, Π, the derivative of the simulated choice probabilities is given by: � � NS s } exp{δjt + ΠXijt + ΣVijt ∂ P�ijt 1 � ∂ = � t s } ∂Π NS ∂Π 1 + Jk=1 exp{δmt + ΠXimt + ΣVimt s=1 � � � � NS (1 + k exp{·k }) exp{·j }Xijt − exp{·j } ( k exp{·k }Xikt ) 1 � = � NS (1 + k=1 exp{·k })2 s=1 � � � NS exp{·j } ( k exp{·k }Xikt ) exp{·j }Xijt 1 � � = − � NS (1 + k=1 exp{·k }) (1 + k=1 exp{·k })2 s=1 � � NS Jt � exp{· } 1 � exp{· }X j � � k ikt = Psijt Xijt − NS (1 + k=1 exp{·k }) (1 + k=1 exp{·k }) s=1

k=1

79

� � NS Jt � 1 � s s s = Pijt Xijt − Pijt Pikt Xikt NS s=1

k=1

Similarly, for the choice characteristics interacted with the simulation draws, we have: � � NS Jt � ∂ P�ijt 1 � s s s s s = Pijt Vijt − Pijt Pikt Vikt ∂Σ NS s=1

k=1

To compute ∂LL/∂θ2 , we first compute the components of ∂ P�ijt /∂θ2 that correspond to each simulation (i.e. the terms inside the large brackets in the equations above). We then average those components over all simulations. Finally, we interact the ∂ P�ijt /∂θ2 matrix with the ratio of the dependent variable, yijt , with the simulated choice probabilities, P�ijt . This creates a � (Nobs × (KD + K)) matrix, where Nobs = Tt=1 Nt Jt is the total number of observations and we sum the columns of this matrix over all observations to form a (1 × (KD + K)) vector of derivatives.

C.2.2

Gradient, Second Term

� The second term in the gradient is similar to the first. Let Nalt = Tt=1 Jt denote the total number of alternatives (location-years) in the data, and let Di be a (1 × Nalt ) vector of indicators determining whether observation i corresponds to a given location year. More precisely, Di is defined as follows: Di = [ {location 1, time 1}, ..., {location JT , time t}] This notation simplifies our expression for the (1 × Nalt ) partial derivative of the simulated likelihood with respect to δ. Note that: � � Jt T Nt � yijt ∂ P�ijt ∂LL � � = ∂δ ∂δ P�ijt t=1 i=1 j=1

Using Di and a similar technique for the first term of the gradient, we can write: � � NS Jt � ∂ P�ijt 1 � s s s = Pijt Di − Pijt Pikt Di ∂δ NS s=1

k=1

Note that ∂ P�ijt /∂δ is a (Nobs × Nalt matrix. It is block diagonal, because ∂ P�ijt /∂δks = 0 if s �= t. The computation of this portion of the gradient is done year-by-year because of the block-diagonal structure.

C.2.3

Gradient, Third Term

To obtain an expression for ∂δ/∂θ2 , we proceed by remembering that δ is implicitly defined by θ2 as the solution to: Sjt − S�jt (θ2 , δ) = 0

80

Taking derivatives with respect to θ2 , using the chain-rule, and rearranging, we have: dS�jt ∂ S�jt ∂ S�jt ∂δ = + · dθ2 ∂θ2 ∂δ ∂θ2 � � �−1 � ∂ Sjt ∂ Sjt ∂δ =⇒ =− ∂θ2 ∂δ ∂θ2 � �� � � �� � 0=

(B)

(A)

Note that ∂ S�jt /∂δ is a (Na × Na ) matrix, and ∂ S�jt /∂θ is a (Na × K) matrix.

C.2.4

Gradient, Third Term, Part (A)

Although ∂ S�jt /∂δ is a large (Na × Na ) matrix, fortunately many of its elements are zeros, because: ∂ S�jt = 0 if t �= s ∂δks

Hence, the matrix is block diagonal and given by: Nt ∂ S�jt ∂ P�ijt 1 � = ∂δ Nt ∂δjt i=1

� � NS Nt � Jt � 1 � = Psijt Di − Psijt Psikt Di NS Nt i=1 s=1

k=1

This is an (Nalt × Nalt ) matrix of derivatives. The computation of this portion of the gradient is also done year-by-year because of the block-diagonal structure.

C.2.5

Gradient, Third Term, Part (B)

To form ∂ S�jt /∂θ2 , the (Nalt × K) matrix of partial derivatives, note that we have: Nt ∂ S�jt ∂ P�ijt 1 � = ∂θ2 Nt ∂θ2 i=1

We already computed ∂ P�ijt /∂θ2 , an (Nobs × (KD + K)) matrix, when we constructed the first term of the gradient. So, to build this term, we just average the columns of this matrix over all individuals with a choice situation for location j.

C.3

Standard Errors

To get appropriate standard errors, we will characterize the estimation procedure with GMM, stacking the moments from each step. For the non-linear parameters, θd� , the method of moments estimator sets the sum of the scores of the log-likelihood equal to zero. Let Wijt collect all variables used in the first step (i.e. choice and time indicators, interactions of choice characteristics with firm characteristics). The method of moments estimator solves the following sample moment condition: Ψ1 (θ2 , δ(θ2 )) = LLθ2 (θ2 , δ(θ2 )) = 0

81

The second sample moment is just the usual 2SLS moment condition: Ψ2 (δ(θ2 ), β) = Z� (δ(θ2 ) − Xβ) = 0 Define θ ≡ (θ2� , β � )� to be a vector collecting all of the parameters estimated directly in the model. Estimating θ with GMM, we have the usual asymptotic results: � √ � d N θ�GM M − θ0 −→ N (0, V0 ) where

and we have:

� �−1 � � � �−1 V0 = G�0 C0 G0 G0 C0 Λ0 C0 G0 G�0 C0 G0 Λ0 = EN

and G0 = E





Ψ1 Ψ�1 Ψ1 Ψ�2 Ψ2 Ψ�1 Ψ2 Ψ�2



∂Ψ1 /∂θ2 ∂Ψ1 /∂β ∂Ψ2 /∂θ2 ∂Ψ2 /∂β



and C0 is a weighting matrix, set to I because of the 2-step nature of the computation.49 Note that Λ0 is easily computable. As for G0 , note that the upper right term in the matrix, ∂Ψ1 /∂β, is zero. Morever, the upper left term in the matrix, ∂Ψ1 /∂θ2 , is just the Hessian of the log likelihood function, H(θ2 ), which is returned in the estimation procedure. The bottom right term, ∂Ψ2 /∂β, is just −Z� X. The only term that is challenging is the bottom left term: � � ∂Ψ2 � ∂δ(θ2 ) =Z ∂θd ∂θ2 However, we solved for ∂δ(θ2 )/∂θ2 above in computing the third term of the gradient (parts A and B). So, we have: � � H(θ2 ) 0 � G= Z� (∂δ(θ2 )/∂θ2 ) −Z� X

Clearly this is inefficient, because I’m not using the optimal GMM weight matrix. However, if I do the appropriate weighting, I think I’d be getting back to the computational difficulties encountered earlier this summer. 49

82

83

klein job 93 17 Search Dir 3 Nov 16:50 4 Nov 10:55 18:05 40 Yes -18434.96 8.62 1.61 16.51 -2.01 Yes

klein job 94 11 Search Dir 4 Nov 11:36 5 Nov 04:36 17:00 40 Yes -18435.63 8.39 1.71 10.99 -8.11 Yes

10e−6 10e−3 ≈0

10e−6 10e−3 Original klein job 99 33 Search Dir, Tol 9 Nov 12:09 10 Nov 14:57 26:48 80 Yes -18382.78 9.19 2.28 0.11 -0.14 Yes

10e−9 10e−6 Original

(C)

klein job 98 27 Search Dir 8 Nov 07:07 9 Nov 04:52 21:45 80 Yes -18382.78 9.18 2.27 0.47 -0.22 Yes

10e−9 10e−6 ≈0

(D)

klein job 100 42 Search Dir 13 Nov 07:39 15 Nov 23:58 64:19 136 Yes -18376.20 9.26 2.30 0.01 -0.01 Yes

10e−12 10e−9 Original

(E)

klein job 101 35 Search Dir 16 Nov 04:42 18 Nov 08:02 51:20 136 Yes -18376.20 9.26 2.30 0.03 -0.02 Yes

10e−12 10e−9 ≈0

(F)

Author’s calculations. All estimation attempts use the 25% sample, which includes data on 3, 981 new firms. The testing dataset has 660, 040 observations, with approximately 160 locations for each firm’s choice set (of course the choice sets vary by year).

Job Number Number of Iterations Reason for Stopping Estimation Start Estimation End Estimation Time # of Iters for δjt Conv. ITSTART � LL(θ) max δ� jt min δ� jt max Lθ (θ� d) min Lθ (θ� d) Reasonable θ�d ’s?

High Tolerance Low Tolerance Starting Values

(B)

(A)

Table C.1: Results for Different Estimation Attempts