A Centered Measure of Spatial Concentration

9 downloads 0 Views 498KB Size Report
Keywords: Spatial Concentration, Population Concentration, Capital Cities, Gravity, Har- ... ‡John F. Kennedy School of Government, Harvard University. .... comparing it to alternative measures of concentration and establishing the link with ...
A Centered Measure of Spatial Concentration: A Gravity-Based Approach with an Application to Population and Capital Cities∗ Quoc-Anh Do†and Filipe R. Campante‡ This version: October 2007

Abstract We construct an axiomatic index of spatial concentration around a center or capital point of interest, a concept with wide applicability from urban economics, economic geography and trade, to political economy and industrial organization. We spell out two basic properties (decomposability and monotonicity), and provide a simple model of migration that motivates a third property, the “axiom of gravity”, which disentangles the attraction exerted by the capital point of interest. We show that there is a unique class of functions that has all these properties, defined over any Euclidian space. We also establish a formal connection between our index and the gravity equation used in the trade literature. Finally, we illustrate the application of our index by computing the concentration of population around capital cities across countries and US states. The cross-country data show interesting correlation patterns with population size, which is not captured by alternative measures, and with measures of quality of governance, which vary according to the level of democracy. It can also help test conjectures about the determinants of the choice of where to locate the capital city.

Keywords: Spatial Concentration, Population Concentration, Capital Cities, Gravity, Harmonic Functions, Axiomatics. JEL Classification: C43, F10, R23.



We are grateful to Philippe Aghion, Alberto Alesina, Davin Chor, Ed Glaeser, Jerry Green, Michael Kremer, David Laibson, Erzo Luttmer, Rohini Pande, Karine Serfaty de Medeiros, and participants at Harvard’s theory and development lunches for helpful comments, and to Ngan Dinh, Janina Matuszeski, and C. Scott Walker for help with the data. The usual disclaimer applies. † Department of Economics, Harvard University. Littauer Center, Room 200, Cambridge, MA 02138, USA. Email: [email protected]. ‡ John F. Kennedy School of Government, Harvard University. 79 JFK St., Cambridge, MA 02138. Email: filipe [email protected].

1

Introduction

Spatial concentration is a very important concept in the social sciences, and in economics in particular – both in the sense of geographical space, as studied by urban economics, economic geography or international trade, and in more abstract settings (e.g. product or policy spaces) that are studied in a number of different fields, from industrial organization to political economy. As a result, a number of methods have been developed to measure this concept, from relatively ad hoc measures such as the Herfindahl index to theoretically grounded approaches such as the “dartboard” method of Ellison and Glaeser (1997), and also including the adaptation of indices used to capture related concepts such as inequality (Gini coefficient, entropy measures). These measures are well-suited to analyzing the concentration of a given variable over a “uniform” space, in which no point is considered to be of particular importance in an ex ante sense. In practice, however, it is often the case that some points are indeed more important than others. In other words, we might be interested in measuring the concentration of a given variable around a point (e.g. a city or a specific site), rather than its concentration over some area (e.g. a region or country). Examples of circumstances in which there is specific ex ante knowledge of the importance of a given point are not hard to come by. For instance, the study of urban sprawl puts a lot of emphasis on the concentration of population and economic activity around a geographical center (Glaeser and Kahn 2004). By the same token, it is often the case that capital cities can be naturally thought of as being particularly important points: Ades and Glaeser (1995, p. 198-199) observe that, for a number of reasons, “spatial proximity to power increases political influence”, and hence proximity to the capital city is related to political power.1 Moving to a non-geographical context, within a product space in which distances measure the likelihood that a country might move from one type of product to another (Hidalgo et al. 2007), one might be interested in how concentrated a country’s economy is around a specific industry, say, oil production. Finally, in yet another non-geographical context, within a policy space it can be the case that the status quo policy has special clout, and the concentration of preferences around that status quo point may be of particular interest. The standard indices of concentration are not suited to capture this type of situation, and more generally they leave aside plenty of information on actual spatial distributions. For 1 In fact, the political importance of the capital city is vividly illustrated by the rich history of relocation decisions, often with an important political component (Campante and Do 2007).

1

instance, if we are measuring the concentration of the auto industry around Detroit, it matters whether car plants are in nearby Ohio or in distant Georgia; however, a standard index of concentration computed at the state level would stay unchanged if all the plants in Ohio were moved to Georgia and vice-versa. This paper presents an axiomatic method for generating a measure that is suited for such situations – what we call a centered index of spatial concentration. We choose an axiomatic approach because we want to build a common language that can codify the concept of spatial concentration around a point across a broad range of applications – ultimately, the concentration of any variable in any space of economic interest. This search for generality leads us to look for basic properties that are robust across different contexts, as opposed to model-specific. Following this principle, we start by designating two basic properties that such an index ought to display: (i) Decomposability, which facilitates computation and interpretation, and ensures that the index is founded at the individual level (in the sense that the index is the sum of the impact of every individual observation in the distribution it describes);2 and (ii) Monotonicity, whereby the index should increase when individual observations are moved closer to the point of interest to which the index refers (the “capital point”, or simply the “capital”). We show that these two properties already define a class of measures consisting of the sum of any decreasing, integrable, real-valued function whose only argument is the distance between individual observations and the capital. In order to pin down the index, we build a stylized model of migration within a country, in which the equilibrium distribution of population depends on the “gravitational pull” exerted by all cities in that country. We posit that our index should be able to isolate the gravitational pull exerted by the capital point of interest. Our main theoretical result is that adding an axiom that encapsulates this idea pins down a unique and very convenient class of measures (indexed by a simple parameter), defined over any Euclidian space and indexed by the dimensionality of that space under analysis. In two-dimensional space, for instance, it turns out to be the class of decreasing log-linear functions of the distance between individual observations and the center. We also show that our index, which we call G-CISC (Gravity-based Centered Index of Spatial Concentration), ties neatly into the literature in international trade and economic geography. 2

Here we follow Echenique and Fryer’s (2007) approach to their axiomatically-based segregation index. As we will discuss later, this does not preclude interaction between individual observations in determining the index.

2

On the former, it captures the concept of “multilateral resistance”, or “remoteness”, that shows up in the gravity equation (e.g. Anderson and van Wincoop 2003); on the latter, it has a convenient interpretation in terms of the concept of “market potential” (see Fujita et al. 1999), and it actually highlights a link between this concept and that of potential (e.g. gravitational potential) in physics. The literature has grappled with the question of devising a centered measure of spatial concentration. Most notably, the literature on urban sprawl has tried to measure the “centrality” of “mononuclear” urban areas – namely, the degree to which development in urban areas is concentrated around a central business district. Galster et al. (2001), for instance, measure this centrality by the inverse of the sum of the distances of each observation to the center. Another example is Busch and Reinhardt (1999), who measure the concentration of population around a geographical center by adding up a negative exponential function of distance. While certainly useful, the approaches in the literature have been ad hoc, and hence limited by what an intuitive grasp of the properties of a given space will provide. This intuitive grasp quickly reaches its limits when we move away from specific, concrete applications. Our axiomatic approach, in contrast, guarantees basic desirable properties for our index in any n-dimensional Euclidian space, which ensures wide applicability, to any situation that can be mapped onto such a space. In addition, the ad hoc approaches have left us with a myriad of different measures, as reviewed by Glaeser and Kahn (2004), without a unified framework within which to compare and assess their properties. Our approach gives us such a framework. With it we can analyze the properties of any centered measure of spatial concentration within this system, and compare them to our index. In particular, we can guarantee that any other measure will violate one or more of the desirable properties that we have spelled out. We have emphasized that non-centered measures of concentration are not suited for capturing the concept our G-CISC does. We can also show that our index has the additional advantage that it naturally leads into a non-centered measure, by simple aggregation, in a way that guarantees that the basic properties from the centered context also carry over to the noncentered context. This underscores the versatility and generality of our approach. Finally, we also show how our index relates to the literature measuring other features of distributions; in particular, it provides a way to generalize Esteban and Ray’s (1994) and Duclos et al.’s (2004) approach to measuring polarization. 3

The second part of the paper provides an example of empirical implementation of our measure, by computing an index of population concentration around capital cities across countries. We show that our index provides a more sensible ranking of countries than currently used ad hoc alternatives, and that it uncovers a negative correlation between the size of population and its concentration around the capital city that is not detected by those alternatives. Throughout our empirical implementation, we also show that the picture that would emerge from using a non-centered measure of concentration such as the Gini coefficient as a proxy for the centered notion would in fact be very distorted. In addition, motivated by the aforementioned idea that political influence diminishes with distance to the capital, we consider the correlation between population concentration and a number of measures of quality of governance. We show that there is a positive correlation between concentration and the checks that are faced by governments, and that this correlation is present only in non-democratic countries. (This is consistent with Campante and Do (2007), who present a theory of revolutions and redistribution where concentration is key in increasing the redistributive pressures faced by non-democratic governments.) The statistical significance of this correlation is substantially improved by using our index instead of the ad hoc alternatives. We also illustrate how our index can shed light on the issue of the choice of where to locate the capital city, which goes back at least to James Madison during the debates at the US Constitutional Convention of 1787. We show that there is a pattern in which both very autocratic and very democratic countries tend to have their capital cities in places with relatively low concentration of population. The remainder of the paper is organized as follows. Section 2 presents the basic properties that we impose on the index, motivates the gravity property, and characterizes the unique class of indices satisfying those properties. Section 3 discusses other properties of the index, comparing it to alternative measures of concentration and establishing the link with the trade and economic geography literatures. Section 4 contains the empirical implementation, and Section 5 concludes.

4

2

The Gravity-Based Centered Index of Spatial Concentration

2.1

Main definitions

We start by spelling out the definitions of the main mathematical objects and transformations that are required in constructing our index. Since our main concern is the spatial concentration of a variable – which can be thought of as population, economic activity, etc. – around a center, we start with the definition of centered distributions and subdistributions in Euclidian spaces. For mathematical convenience, we use smooth, positive, compact-support distributions in the space Rn as an approximation of real world distributions (of population, economic activities, preferences over policy, etc.). Definition 1 (Centered distribution and subdistribution) A centered distribution is a couple of (i) a positive Lebesgue-dominated distribution µ of compact support on Rn , and (ii) one point C in the interior of the support of µ. C is called the capital point of the distribution.3 A distribution ν is a subdistribution of the distribution µ if ν is dominated by µ, the support of ν is a subset of the support of µ, and ν(A) ≤ µ(A) for every ν-measurable A. We denote ν ≺ µ. In words, a subdistribution of the distribution µ is a distribution dominated and majored by µ, or equivalently, ν ≺ µ if and only if ν is dominated by µ and both ν and µ − ν are positive distributions. This definition designates a special point of interest, the capital point (e.g. the capital city). It is now straightforward to define the centered index of spatial concentration (henceforth referred to as CISC), which is the object we are ultimately interested in: Definition 2 (Centered Index of Spatial Concentration) A centered index of spatial concentration (CISC) I is a real function defined on the set of centered distributions, denoted as I(µ, C). Related to the concept of spatial concentration, we define the required transformations on distributions, namely squeeze (homothety) and rotation. A squeeze brings observations from 3

We use the term “capital point”, and not “center”, to emphasize that said point need not be located at any spatial concept of a center, such as a baricenter or the center of a circle.

5

parts of the distribution closer (when the scaling ratio is positive and less than one) to a center point by the same proportion. A rotation turns parts of the distribution around the center point by the same angle. These (along with reflection, which is not important for our purposes) are fundamental similarity transformations, which preserve the “shape” of objects. Definition 3 (Squeezes, or homothetic transformations) We denote all relevant squeezes of origin O ∈ Rn and ratio ρ ∈ R as S(O,ρ) . A set-squeeze S(O,ρ) is a mapping within Rn that maps each set A to A0 = S(O,ρ) (A) such that: X0 ∈ A0 ⇔ ∃X ∈ A, X0 − O = ρ(X − O). A distribution-squeeze S(O,ρ) is a mapping within SD that maps each spatial distribution µ to µ0 such that for all µ-measurable set A in Rn , µ(A) = µ0 (S(O,ρ) (A)). A point-squeeze S(O,ρ) is a set-squeeze on singletons in Rn . A circle-squeeze S(O,ρ) is a distribution-squeeze on uniform distributions whose support is a circle of center O (which we will also call an evenly-distributed circle in short). Definition 4 (Rotations, or orthogonal transformations)

4

We denote all relevant rota-

tions of origin O ∈ Rn and rotation matrix M ∈ Rn,n as R(O,M ) . M must be an orthogonal matrix (M M t = Idn ). A set-rotation R(O,M ) is a mapping within Rn that maps each set A to A0 = R(O,M ) (A) such that: X0 ∈ A0 ⇔ ∃X ∈ A, X0 − O = M (X − O), where M satisfies M M t = Idn (M is orthogonal). A distribution-rotation R(O,M ) is a mapping within SD that maps each geographic distribution µ to µ0 such that for all µ-measurable set A in Rn , µ(A) = µ0 (R(O,M ) (A)). Importantly for our purposes, the rotation is an isometry, that is, the distance between two points does not change after they are rotated. As a special case, the rotation does not affect the distance from a point to the center. 4

More precisely, we define orthogonal transformations, of which rotations are a subset in two- and threedimensional spaces. Our use of the term “rotations” is meant to convey the intuition of what the transformation achieves, but it does not apply, rigorously speaking, when the dimension is higher than three.

6

2.2

The Basic Properties

At this point, we are sufficiently equipped to state the axioms that will spell out the basic desired properties of a CISC I(µ, C). The first basic property is one of decomposability, which will facilitate the computation and the interpretation of the index. The second property, monotonicity, is the most basic substantive property that a CISC ought to display: the index increases if observations are moved closer to the capital point of interest. 2.2.1

Decomposability

The decomposability property can be understood as follows: suppose our distribution of interest can be partitioned into a certain number of subdistributions. For instance, we may be interested in the spatial distribution of population in a given city, and this population can be split into groups A, B, and C – say, along ethnic lines, or census tracts, or any other arbitrary criterion. It will be very convenient if we can compute the index of concentration separately for each group, and from those three indices be able to obtain the overall index for the entire population. This is what our first axiom ensures: Axiom 1 (Additivity) For any distribution, and for any partition of this distribution into subdistributions, the CISC is equal to the sum of the indices associated with the subdistributions in the partition. Formally: For all ν and µ in SD, I(ν, C) + I(µ, C) = I(µ + ν, C). Or equivalently: I(µ, C) is such that:5 Z I(µ, C) =

h(x, C)dµ,

where h(x, C) is an integrable function w.r.t. µ called the impact function. This axiom states that, in the context of our simple example, the concentration index for the entire city is the sum of the three indices computed for groups A, B, and C. In other words, the index can be (additively) decomposed into the indices attached to any relevant subdistributions into which the main distribution may be partitioned. The additive component means that each group’s – and by extension, each individual’s – impact on the index is independent of any other group or individual. In that sense, it implies 5

This equivalence can be established formally, yet since it relates mostly to measure theory and has only tangential interest to the theory we are presenting, we choose to omit the proof.

7

that our index is founded at the individual level: the total measure of concentration can be understood as the sum of the impact of every individual observation in the distribution.6 It is important to note, however, that our results do not rule out interactions between individuals. In fact, in the Appendix we show that it is possible to introduce a generalized version of Axiom 1 that allows for local interactions between individual observations – in combination with the other axioms spelled out below, this version would still imply the same results and the same CISC that we derive here. The current version is nevertheless more convenient in terms of interpretation. Note also that the second formal expression of Axiom 1 introduces the concept of what we call the impact function. This is another convenient implication of the axiom, which means that the properties of any index can be analyzed in reference to this object. 2.2.2

Monotonicity

We have noted that our definition of a centered distribution automatically designates the capital point as a special point of interest. If we think of concentration around this center, it is quite natural that our index should increase when observations are moved closer to that point. Our second axiom is a first step towards establishing this very basic monotonicity property that any CISC should satisfy: Axiom 2 (Squeeze Monotonicity) The centered index of spatial concentration should increase when the distribution is squeezed around the center. Formally: For any squeeze of µ around C, S(C,ρ) (with ρ < 1), we have:  I(µ, C) < I S(C,ρ) (µ), C ∀µ. To illustrate what the axiom means, suppose we are interested in the concentration of population of a given country around the capital city. If all the individuals who live on a 200mile-radius circumference around the capital move 100 miles closer along the corresponding radius, then any reasonable index should increase as a result of that move. (This is schematically illustrated in Figure 1a.) FIGURE 1 HERE 6

The approach we follow is thus analogous to the individually disaggregated approach used by Echenique and Fryer (2007) in constructing their segregation index.

8

Note that Axiom 2 does not imply that the index will increase whenever individual observations are moved closer to the capital point. This is because a squeeze refers to a move along a radius, and as such the axiom does not rule out the possibility that bringing individuals closer to the capital point along some other direction would have a different effect. The following axiom rules out such situations: Axiom 3 (Isotropy) The centered index of spatial concentration should stay unchanged when the distribution is rotated around the capital point. Formally: For any rotation of µ around C, R(O,θ) , we have:  I(µ, C) = I R(O,θ) (µ), C ∀µ. This axiom means that moving observations in any direction, while keeping the distance to the capital point constant, should not affect the index. In that sense, no direction is “special”, once the distance to the capital point is taken into account.7 Within our previous example, this axiom means that if the individuals move along the 200-mile-radius circumference, then the index remains unchanged. (This is illustrated by Figure 1b.) It is straightforward to see that Axioms 2 and 3, put together, imply that moving individuals closer to the capital point, along any direction, will increase concentration as measured by our index. This guarantees the basic monotonicity property we were after. 2.2.3

First Result: Only Distance Matters

In fact, these two basic properties spelled out in the three axioms above already have a powerful implication: all that matters is the distance between each observation and the capital point of interest. This is the essential content of the following: Proposition 1 Axioms 1, 2 and 3 define the following class of population concentration indices: Z I(µ, C) =

h(|x − C|)dµ,

7

(1)

On a technical note, any isometry in the plane can be decomposed as a product of a translation, a rotation and an axe-reflection. While translations are not relevant to the purpose of a centered measure (since the center is not the fixed point of the transformation), we have defined our measure’s invariance property with respect to rotations, and it could be easily shown that it is also invariant to axe-reflections. When we add squeezes, we cover all congruent transformations on the plane R2 . In sum, our three simple axioms deliver universal invariance properties – in the sense that we cover the space of all centered congruent transformations, which is the first order approximation of all centered regular transformation on the plane.

9

with h(d) being a decreasing function on R+ so that the right hand side’s integrand is integrable on Rn . Proof of Proposition 1.

It is easy to see that an index defined in (1) is additive with

respect to the population distribution (Axiom of Additivity). It is rotation-invariant (Axiom of Isotropy) because a rotation around the center does not affect the distance to the center. A change of variable shows that: Z Z 1 I(µ, C) = h(|x − C|)dµ = h( |x − C|)dS(C,ρ) (µ) ≤ I(S(C,ρ) (µ), C), ρ thus the Axiom of Squeeze Monotonicity is satisfied. In reverse, suppose I(µ, C) satisfies the three Axioms. As in Definition 2, additivity implies R I(µ, C) = h(x, C)dµ. Now consider two points E and E0 within the same distance from C. There exists a rotation R(O,M ) that maps E to E0 . Take a distribution µE in a neighborhood of E: its image through R(O,θ) is located in a neighborhood of E0 , and is denoted µE0 . The Axiom of Isotropy implies that I(µE , C) = I(µE0 , C). When we let the distribution µE tend towards a mass point distribution at E, I(µE , C) → I(E, C) and I(µE0 , C) → I(E0 , C). The impact function h must therefore be rotation-invariant as well: it could be then renamed as a function of the distance from each point x to C, i.e. h(|x − C|). A similar consideration of a neighborhood of E shows that in order to satisfy the Axiom of Squeeze Monotonicity, h(|E − C|) ≤ h(ρ|E − C|)∀0 ≤ ρ ≤ 1. It follows that h(d) needs to be a decreasing function.

2.3

Pinning down the Index: A Gravity Model

Proposition 1 shows that the first three axioms already impose considerable restrictions on the class of admissible indices. However, it still leaves us with a fairly large class. After all, any decreasing function of the distance between individual points and the center C will be an admissible impact function h(d): a few obvious suggestions of standard functional forms include a step function, and logarithmic, exponential or polynomial (including linear) functions. In order to pin down a more specific class of measures, we will need a fourth axiom, which we now motivate in the context of a “gravity-based” model of migration.

10

At this point, two notes are in order. First, this very simple model is not meant to represent a full-fledged attempt at understanding migration; it is meant to illustrate more precisely another property that is reasonable to require from a CISC, across a broad range of possible applications. Second, the “gravity” approach is very useful in that regard, as it points to the generality of the framework. In fact, it yields a direct connection between our index and the literature on international trade, which we will explore further in the paper. 2.3.1

The Model

Consider a country whose population is distributed over a few “cities” and the “countryside” – that is to say, given a gridded map of the country, a few cells in the grid are occupied by cities, and all other cells are considered to be part of the countryside. One of these cities is designated as the capital. Each individual faces a choice of where to live, and from her standpoint, each city has its own peculiar characteristics, which can be advantageous (higher wages and occupation possibilities, economies of agglomeration, political power and rents) or not (communication and transportation costs, congestion), while the countryside is assumed to have uniform properties. More formally, let us denote the capital city as C0 . At any given location T, each individual k has the following baseline utility:

uk,T = u¯T + k .

(2)

The first term, u¯T , designates the inherent characteristics of that location, and is thus common to all individuals living at T. The second term, k , is an idiosyncratic variable indicating the “attachment” of individual k to her hometown, and is uniformly distributed on a segment [d, D], where d > 0. In turn, if the individual moves to city i, she gets the following utility:

uk,i = u¯i − f (|T − Ci |).

(3)

The term u¯i refers to the fixed attraction exerted by the city Ci , linked to its peculiar characteristics. The function f (·) captures the costs of distance to the hometown, in terms of transportation and communication (individuals from farther places bear higher costs of migration and are less likely to migrate), and we assume it is positive and monotonically increasing. Taken together, the two terms can be thought of as representing the “gravitational” pull ex-

11

erted by the city Ci over individual k, which depends on the characteristics of Ci and on the distance between it and that individual. Each individual faces a choice over whether to migrate or not, and let us focus for simplicity on the migration choice of individuals living in the countryside.8 Given the uniformity of the countryside, there is no incentive for any individual to move to another countryside location. The same uniformity allows us to normalize u¯T = 0 for all T other than the points denoted by Ci . To further simplify the analysis, let us break down the migration decision into two stages: in the first stage, individuals decide whether to migrate or not, and in the second stage they figure out in which city they will end up, conditional on having decided to migrate. We proceed by backward induction. In the second stage, we assume that the decision of the destination city depends on fixed factors regarding these cities, so that there is a fixed P probability pi ( i pi = 1) that the individual would migrate to city i, conditional on her migrating. This probability is a function of city characteristics (including u¯i , and could also depend on individual characteristics, say according to a multinomial logit model, but not on the individual’s distance to the city. This is a restrictive assumption, but it simplifies the analysis and allows us to clearly identify the forces we want to highlight in the model, as will become clear.9 Under these assumptions, the expected utility of migration is then: X

pi (¯ ui − f (|T − Ci |)).

(4)

i

Back to the first stage, the probability of not migrating is then modeled as:10 P PkT = 1 −

i

pi (¯ ui − f (|T − Ci |)) − d . D−d

(5)

Now consider a situation where the countryside population is initially distributed uniformly, or that the original density µT is equal to µ¯T for every location T. In equilibrium, the expected number of individuals who stay back at their hometown in the countryside is then: 8

This is the same as assuming that the two cities are far enough apart that no individual would want to move from one to the other, and that u ¯0 and u ¯1 are sufficiently greater than u ¯T that no one would want to move from a city to the countryside. 9 Here is where it is important to keep in mind the caveat that it is not our goal to provide a full-fledged model of migration. We should also stress that the influence of changes in distance on the probabilities would in any case be second-order, so our restrictive assumption could also be thought of as representing a first-order approximation to a more complex model. 10 This is where the assumption that the probabilities are not a function of distance simplifies the analysis.

12

P   ui − f (T − Ci |)) − d i pi (¯ PT µ ¯=µ ¯ 1− . D−d

(6)

By the same token, the expected number of individuals moving to city Cj is: P µ ¯ pj

i

pi (¯ ui − f (|T − Ci |)) − d D−d

 ,

(7)

also a linear function of each distance. We can thus see that the problem of predicting ex ante who will be migrating to the capital city depends on a complex interaction of characteristics of all cities, which quickly becomes unmanageable if the number of cities is large. However, the model differentiates between the effect of the “gravitational pull” exerted by the capital city (captured by u¯0 and f (|T−C0 |)), the attraction of the other cities (¯ ui and f (|T−Ci |) for i 6= 0), and the distribution of “attachment” preferences (captured by D and d), all of which will in turn help determine the equilibrium distribution of the population. We propose that the index of concentration around the capital city be a measure of this city’s gravitational pull, which depends on characteristics of the body that exerts it and on its distance to the point under its influence. 2.3.2

The Axiom of Gravity

As discussed above, the measure of the gravitational pull exerted by the capital point, which we propose as the foundation of our index, requires that we disentangle from the data the impact of the presence of other points. Pushing the gravity analogy further, measuring the gravitational pull exerted by the Sun over Earth requires that we leave aside the influence of other planets, focusing only on the distance between the two bodies and on their characteristics (mass). In other words, our measure of the degree of attraction of C0 should be invariant to changes in the degree of attraction of another city, say C1 . Our model provides a solution to this problem. Note that an increase in the attraction exerted by C1 , say due to an increase in u¯1 results in the same change in PT µ ¯ for all locations T of the same distance r to C1 . It is thus equivalent to moving a fixed number of individuals −∆f (r) from each location on a circle (C1 , r) towards C1 , in a “circle squeeze” (provided µ ¯p1 ∆¯u1D−d

that the whole circle still lies within the support of the distribution).11 It follows that our 11

We also need a technical assumption that each city Ci , i > 0 only has local influence within a range Ri that is smaller than its distance to the capital city.

13

“gravity-based” measure is to be invariant with respect to such a change. This is captured by the following axiom: Axiom 4 (Gravity) For any point T other than the capital point C, the centered index of spatial concentration should stay unchanged when a uniform subdistribution over a circumference centered on T is squeezed around that point (provided that point C is unaffected). Formally: Any circle-squeeze S(T,ρ) , with ρ < 1 and arbitrary T in the support of µ, of a sufficiently small circle of center T will not change concentration. That is, I(ν + η, C) = I(ν + S(T,ρ) (η), C)∀η ≺ µ, µ = η + ν, such that η is an evenly-distributed circle of center T that does not contain C.12 In words, suppose we have a uniform subdistribution on a circumference around any point T other than the capital point, and we squeeze the distribution around T in a way that leaves the capital C unaffected. (This is depicted in Figure 2.) The axiom implies that our measure will not be affected, just as a measure of the gravitational pull exerted by the capital point should not be in the context of our model. FIGURE 2 HERE

2.4

The Gravity-based CISC

Adding the Axiom of Gravity to our previous set of axioms has quite powerful implications in terms of pinning down a specific CISC. These are immediately apparent in the simple case of the real line R. Axiom 4 would then amount to saying that when two points on the same side of the center came closer to each other at the same “speed”, our index of concentration should remain unchanged. Within the class of indices determined by Proposition 1, it is straightforward to see that this further limits the impact function to be linear: h(d) = αd + β, α < 0. Quite remarkably, we can show (with the help of a well-developed body of harmonic function theory) that this idea extends to any n-dimensional Euclidian space, as our Axiom 4 defines a specific subclass of impact functions within the class defined by Proposition 1: 12

Note that we limit ourselves to local circle-squeezes. It is easy to see that a more ambitious version of Axiom 4, unlimited with regard to the “local circle”, is in direct contradiction with Axiom 2. Indeed, when a circle of center T that contains C in its interior is squeezed around T, all of its points get closer to C, thus population concentration increases according to Axiom 2.

14

Proposition 2 (Gravity-Based Centered Index of Spatial Concentration (G-CISC)) Within the class of centered spatial concentration indices defined in Proposition 1, Axiom 4 determines the following subclass of indices: Z I(µ, C) = 

αd + β ∀d > 0, α < 0 with h(d) =  α log d + β ∀d > 0, α < 0 αd2−n + β ∀d > 0, α > 0

h(|x − C|)dµ,

(8)

if n = 1 if n = 2 . if n ≥ 3

Before tackling the proof of this proposition, let us consider the simplest case of concentration on the line R. In this case, the operation described in Axiom 4 refers to a mean-preserving shrink (or spread) around any point T. It is straightforward to see that the sum of any linear function of distance to C is invariant to such a mean-preserving shrink. On the other hand, for a function of distance to C that is locally strictly convex (resp. concave) at T), the meanpreserving shrink would decrease (resp. increase) the index. We easily reaches the conclusion of Proposition 2 in the unidimensional case. However, in higher dimension spaces, the concept of distance cannot be directly interpreted back in the one dimension case, thus we would need a much more powerful proof. Proof of Proposition 2.

While the mathematically rigorous proof is relegated to the

appendix, we sketch the intuition for the proof, taking R2 as an example. The proof consists of two steps. First, we will show that Axiom 4 is equivalent to a special condition: the impact function needs to satisfy the Laplace equation, namely that the sum of all second-order partial derivatives must be zero: def

4f =

∂ 2f ∂ 2f ∂ 2f ≡ 0. + + . . . + ∂x21 ∂x22 ∂x2n

(9)

Second, we show that the functional forms laid out in Proposition 2 are the unique solution to that equation, taking into account the result of Proposition 1. In the first step, let us consider a squeeze around a point T of coordinates (x1 , x2 ). First we pick a quadruple of points evenly spread around the circle of center T and radius dr: (x1 + dr, x2 ), (x1 − dr, x2 ), (x1 , x2 + dr), (x1 , x2 − dr), and squeeze them towards T. The total measure would decrease by: [h(x1 + dr, x2 ) + h(x1 − dr, x2 ) − 2h(x1 , x2 )] + [h(x1 , x2 + dr) + h(x1 , x2 − dr) − 2h(x1 , x2 )]. 15

As dr → 0, the first order approximation of this expression is 0, while the second order approx2

imation is ( ∂∂xh2 + 1

∂2h )dr2 . ∂x22

It is straightforward to show that when we pick another quadruple

of points evenly spread around T, namely (x1 , x2 ) ± (dn1 , dn2 ) and (x1 , x2 ) ± (dn2 , −dn1 ) (here dn21 + dn22 = dr2 , since these points lie on the same circle), the change of the total measure is 2

( ∂∂xh2 + 1

∂2h )(dn21 ∂x22

2

+ dn22 ) = ( ∂∂xh2 + 1

∂2h )dr2 ∂x22

(approximated at second order). When we integrate

over all quadruples around the circle, Axiom 4 implies that the total change must equal zero, thus its second-order approximation must cancel out everywhere except the center: ∂ 2h ∂ 2h + = 0 ∀(x1 , x2 ) 6= C. ∂x21 ∂x22 The reverse of this claim (any impact function h satisfying the Laplace equation (9) must satisfy Axiom 4) is proven in the appendix using the formal theory of harmonic functions (functions that satisfy the Laplace equation). In the second step, we plug the format h(|x − C|) derived from Proposition 1 into (9) and deduce that: 1 h00 (|x − C|) =− ⇒ log h0 (d) = − log d + const ⇒ h(d) = α log d + β. 0 h (|x − C|) |x − C| In case n = 1, it is easy to see the solution of (9) is the linear function of distance. The same deduction would show that: h00 (|x − C|) n−1 = − ⇒ log h0 (d) = −(n − 1) log d + const ⇒ h(d) = αd2−n + β, n > 2. h0 (|x − C|) |x − C|

In short, our impact function has to be harmonic, meaning that the sum of the second derivatives with respect to all variables has to be zero. The intuition is that the type of squeeze that is considered by Axiom 4 is one where the “average” (in the sense of harmonic mean) of the distances to the center is left unaffected, and a function that displays this property is a harmonic function.13 Note in particular that in R2 , which is the case on which we will focus for our empirical implementation of the index, the limit case yields a log-linear function, h(d) = α log d + β. 13

The use of “harmonic mean” is precise for the three-dimensional case; in the two-dimensional case it is the geometric mean.

16

2.4.1

Normalization

A crucial feature of the gravity-based CISC that is defined in Proposition 2 is its flexibility: as long as it is applied to a distribution µ that is defined over a Euclidian space of any dimension, its desirable properties are ensured. This flexibility means that in any application it will be possible to shape the index in order to make it most suitable to the specific goals of the analysis. This can be done both by conveniently redefining the distribution under analysis and by making use of the degrees of freedom afforded by Proposition 2 with regard to the choice of parameters α and β. In order to see this more clearly, and motivated by our empirical implementation, let us fix ideas by focusing on a situation where our index is applied to the concentration of the population of a given geographic unit of analysis (e.g. a country) around a point of interest (e.g. the capital city), in two dimensions – which means that the impact function is given by h(d) = α log d + β. (It is straightforward to extend the following analysis to other contexts.) In this context, we can think of any given country as a centered distribution (µ, C). If we compute the G-CISC for this distribution, it will in fact incorporate a wealth of information from the distribution µ – for instance, on population size and geographical size: the index measured for Country A may be higher than that of Country B simply because the former has a larger population. However, in many cases it would be convenient to be able to disentangle the effects of population size, geographical size, and concentration per se. In addition, we would like to have an easily interpretable scale. A convenient way would be to restrict the index to the [0, 1] interval, with 0 and 1 representing situations of minimum and maximum concentration, respectively. The latter can be defined simply as a situation in which the entire population is located in the center of interest; the former is a case in which it is located as far from the center as possible, where “as far as possible” is suitably defined. For these purposes, we want a normalized G-CISC. Indeed, we can proceed with this normalization in two steps: (1) normalize the distribution µ, transforming it into a distribution µ

0

that contains only the information we are interested in; and (2) set the parameters α and β. The specifics of each of these steps will depend on how the scale is defined, and in what follows we discuss a few benchmark examples:

17

Maximum distance across units of analysis A first approach is to set the minimum concentration based on the maximum possible distance between a point and the capital city in any of the countries for which the index is to be computed. In this case, the index is evaluated at zero if the entire population lives as far away from the center as it is possible to live in any country.14 (As a result, only one country, the one where this maximum distance is registered, could conceivably display an index equal to zero.) This will be appropriate if we want to compare each country’s concentration against a single benchmark. In order to achieve this, the two steps are: 1. Normalize the distribution by dividing µ by population size,

R d

µ. This implies that we

will be taking each country to have a population of size one, thus separating population size from concentration per se. 2. Set:  (α, β) = −

 1 ,1 log(d)

where d ≡ maxxi ,i |xi − Ci | is the maximum distance between a point and the center in any country. This means that we take the (logarithm of the) largest distance between a point and the capital in any country to equal to one. Maximum distance within unit of analysis Another possibility is to evaluate the index at zero for a given country if its entire population lives as far away from the capital as it is possible to be in that particular country. This is appropriate if we want to compare each country’s actual concentration to what its own conceivably lowest level would be. With that in mind, the two steps are now: 1. Normalize the distribution by dividing µ by population size,

R d

µ, and by log(di ), where

di ≡ maxx |x − Ci | is the maximum distance between a point in country i and that country’s capital. This means that we not only take each country’s population to be of size one, but also that we take the (logarithm of the) largest distance between a point and the capital of that country to be one as well. 14

For instance, in the cross-country implementation later in the paper, the largest recorded distance from the capital city within any country is between the westernmost tip of Hawaii and Washington, DC, in the United States.

18

2. Set: (α, β) = (−1, 1) . 2.4.2

Classification of Indices

In addition to defining the class of measures described in Proposition 2, the theory of harmonic functions also enables us to classify all the indices that satisfy Proposition 1. This in turn lets us evaluate the basic properties of any such index. To do so, let us first define the following: Definition 5 Super-, just- and sub-concentrated indices: A centered index of spatial concentration satisfying Axioms 1-4 is called a just-concentrated index. A centered index of spatial concentration is called a super-concentrated index if it satisfies Axioms 1-3, and the following inequality: I(ν + η, C) ≤ I(ν + S(T,ρ) (η), C)∀η ≺ µ, µ = η + ν, for any circle-squeeze S(T,ρ) , with ρ < 1 and η an evenly-distributed circle of center T with the interior of its circle included in the domain R2 \B(C, 1). A centered index of spatial concentration is called a sub-concentrated index if it satisfies Axioms 1-3, and the following inequality: I(ν + η, C) ≥ I(ν + S(T,ρ) (η), C)∀η ≺ µ, µ = η + ν, for any circle-squeeze S(T,ρ) , with ρ < 1 and η an evenly-distributed circle of center T with the interior of its circle included in the domain R2 \B(C, 1). Within the terms of Axiom 4 (as depicted in Figure 2), an index is super-concentrated (resp. sub-concentrated) if a squeeze around a point T weakly increases (resp. decreases) concentration. Clearly, a just-concentrated index, which is characterized in Proposition 2, is both super- and sub-concentrated. While it can be the case that an index satisfying Axioms 1-3 does not fall in either of the three categories, it can be shown that every such index can be locally classified into one of the categories, and that these correspond to whether the function is sub- or super-harmonic.15 This is what we do in Proposition 3, which is relegated to the Appendix. 15

More precisely, the index must belong to one category within an annular {x|r < |x − C| < r¯}.

19

We thus have a general classification of the class of indices described by Proposition 1. To illustrate this, we can consider the particularly relevant case of R2 . While Proposition 2 has established that only indices with a log-linear impact function are just-concentrated, we can now consider a few other possible impact functions: Corollary 2.1 If the impact function h(θ) ≡ h(|x−C|) is linear (−θ), negative-power (−θλ , λ > 0), negative-exponential (−eλθ , λ > 0), then the index I is super-concentrated. If h(θ) is positive-power (θ−λ ), then the index I is sub-concentrated. If h(θ) is positive-exponential (e−λθ , λ > 0), then the index I is super-concentrated when θ ≥ λ1 , and sub-concentrated otherwise. 2.4.3

Additional Properties

In addition to the desirable properties guaranteed by the axioms, let us briefly discuss a couple of additional features displayed by our index. Gravity Revisited The G-CISC displays an additional property that helps illustrate its nature. To understand this property, consider the following situation, depicted in Figure 3. To fix ideas, let us imagine that it represents a country with two cities, C and T, with the former being the capital. These two cities, which represent mass points of population, stand r kilometers apart, and the rest of the population of the country is uniformly distributed over the circumference of radius r. Now suppose the circumference is shifted (keeping C fixed), so that the two cities, C and T, move closer to each other; or that it is shifted in the other direction, so that the two cities are now further apart. It is natural to expect that the former move increases the measure of concentration around the capital point, whereas the latter decreases it. FIGURE 3 HERE It can be shown that our measure satisfies this property: Property 1 Given a distribution µ as the sum of a uniform circle distribution ν on a circumference (C, r), and a mass point at T on (C, r). A translation of C towards T must (weakly) increase concentration, and a translation of C away from T must (weakly) decrease concentration. 20

In fact, we can go further and show that this property is an immediate consequence of – and is indeed equivalent to – the Axiom of Gravity. (The proof is shown in the Appendix.) Robustness to Measurement Errors In practice, the information used to compute the index typically comes in a grid format, where we only know the aggregate information of each cell. This introduces a source of measurement error. A just-concentrated concentration index, such as the G-CISC, is orthogonal to such measurement error, thanks to the Axiom of Gravity. Indeed, if the population is symmetrically distributed within each cell, the Axiom of Gravity implies that we could replace that population with one mass point at the center.16 Meanwhile, a super-concentrated index would be biased upwards, and a sub-concentrated index would be biased downwards.

3

Discussion

Having characterized our gravity-based CISC, we can now elaborate on its interpretation in a number of directions.

3.1

Comparison with Other Indices

3.1.1

Comparison with other CISCs

The first obvious comparison is to other centered indices of spatial concentration used in the literature – such as the one in Galster et al. (2001), for instance, who use the inverse of the sum of the distances of each observation to the center as a measure of “centrality” in the context of studying urban sprawl. While certainly useful, such ad hoc approaches are limited by what an intuitive grasp of the properties of a given space will provide. This intuitive grasp quickly reaches its limits when we move away from specific, concrete applications. Our axiomatic approach, in contrast, guarantees basic desirable properties for our index in any n-dimensional Euclidian space, which ensures wide applicability, to any situation that can be mapped onto such a space. In addition, our approach provides a unified framework within which to compare and assess their properties, as shown by the classification system presented in the previous section. 16

This is due to the Mean Value Property of harmonic functions, as developed in the proof to Proposition 2 in the Appendix.

21

Our uniqueness result guarantees that any other measure will violate one or more of the desirable properties that we have spelled out. While it might be the case that specific applications could justify such violations, our approach ensures properties that are robust across different applications. This implies that our index can be applied to many different contexts and provide a single “language” to codify the concept of spatial concentration around a point. In that sense, our G-CISC is analogous to the Gini coefficient as a measure of inequality: it might be less suited than some other measure within a given specific application, but it has robust properties that make it a good measure across a wide variety of applications, and as such it provides a universal language to talk about inequality. Besides these theoretical points, the ultimate proof has to be in the empirical pudding. The next section, in which we exemplify the implementation of the G-CISC, will compare our index with alternative measures in terms of how they capture the concept of spatial concentration around a point in practice. 3.1.2

Comparison with Non-Centered Indices of Concentration

Besides the obvious distinction that our index is built on the concept of a particular “center”, it also contains considerably more spatial information than non-centered indices. For instance, let us consider the many indices of concentration that are based on measures first designed to deal with inequality – such as the Gini coefficient and information-based indices (e.g. entropy).17 Such measures do not take into account the actual spatial distribution. Indeed, consider a thought experiment where half of the cells contain exactly one individual observation, and the other half contain zero observations. Such indices do not make any distinction between a situation in which the former cells are all in the East and all the latter cells are in the West and another situation in which both types are completely mingled together in a chess-board pattern. Generally speaking, this type of measure fails to take into account the relative positions between the cells, which can be highly problematic in many circumstances.18 In addition, these measures are also highly non-linear with respect to individuals, because they contain a function of the cell distribution. In that sense, they are not grounded at the individual level in the way 17

For instance, the Gini coefficient is used, inter alia, in the context of economic geography (Krugman 1991, Jensen and Kletzer 2005), studies of migration (Rogers and Sweeney 1998), political economy (Collier and Hoeffler 2004). 18 The same can be said of cruder measures of concentration, such as population density. A related index that does use spatial information is the measure of compactness developed by Fryer and Holden (2007).

22

our G-CISC is. These differences are highlighted by the fact that we can actually derive a non-centered index from our centered measure. We can do so by averaging the centric index over all possible centers (i.e. all points) within the support of the distribution. This could be computationally intensive, but feasible in most applications. Such aggregate non-centric index still satisfies generalized versions of the Axioms of Additivity, Concentration and Isotropy. In fact, the noncentered index could be derived from a set of four axioms, the first three of which coincide with the axioms for the centered index, and the last one is as follows: The change due to a squeeze around any point T does not depend on the distribution of individual observations that are sufficiently far away (i.e. outside the support of the squeeze). This axiom provides a form of spatial decomposability for the index: when a space is partitioned into two regions, the changes in the index with respect to squeezes are the sum of changes coming from the two regions. These four axioms in fact pin down two different functional forms: the log-distance and the squared-distance, each having different additional properties. For now, we leave this subject for future work, but in any case it is not possible to follow the reverse path and obtain a centered index from a given non-centered measure. This underscores the versatility of our approach. 3.1.3

Comparison with Related Indices: Inequality, Segregation, Polarization

Finally, it is worth noting the links between our index and other indices designed to measure other aspects of distributions, be they spatial or not. The connection with inequality measures has already emerged from the very fact that such measures are used to capture spatial concentration. It can be pushed further when we realize that a Lorenz-curve-type concentration curve could be formed from spatial distributions: taking the example of population, we can sort individuals according to the distance from the center within which they are located. These curves could be ranked when one dominates another, which can be seen along the lines of Proposition 1: if one distribution “Lorenz”-dominates another, the former’s corresponding index would be higher than the latter’s, for any functional form of the impact function satisfying the specification of Proposition 1. It is well-known, however, that this order is not complete. In that regard, our index implies a population-weighted measure for a complete ranking of concentration curves, which in two-dimensional space is based on the log-scale of distance. This connection is exactly akin to the one that is emphasized by Echenique and Fryer (2007) with

23

respect to their segregation index: any index that intends to rank distributions that are not in a Lorenz-dominance relationship implies choosing a weighting system, and our axioms give us a well-founded reason to prefer one system to other alternatives. Our index also has an illuminating link to the axiomatic measure of polarization devised by Esteban and Ray (1994) and Duclos, Esteban and Ray (2004)(henceforth DER). Just as they do, we set out an additive functional form (with Axiom 1), and then provide a set of axioms that determine the class of indices of interest. More substantially, if we think of the opposite of our concept of concentration around a center, say, ”dispersion around a center”, it would be quite similar to their concept of polarization. Indeed, in the one-dimensional space R1 to which they limit themselves, to the extent that there is a natural point of interest that could be considered the capital point, our concept is really close to the idea of polarization in that a move away from the center increases polarization as well as dispersion from the center.

19

In that sense, our index generalizes their approach to n-dimensional spaces, in case there is a capital point of interest to be considered: our index satisfies all of DER’s axioms.20

3.2

Trade, Gravity and Potential

While the “gravity” moniker is likely to elicit images of stars and planets for some, the term may bring a different association when it comes to trade economists. This is not by accident, and we can make quite explicit the connection between our theory and the gravity equation, used to estimate trade flows as a function of GDP, distance and other barriers to trade.21 More precisely, we will show that our G-CISC can be linked to the concept of “remoteness”, or “multilateral resistance” in trade theory, and in the process, we show that our index illuminates the possible theoretical foundations of the gravity equation. Let us start with a standard microfoundation of the gravity equation, as presented in Anderson and van Wincoop (2003). Suppose each representative consumer in country j consumes σ  σ−1  P  cij  σ−1 σ cij amount of the good produced in country i, which generates utility uj = i βi P subject to the constraint i pij cij = yj , where σ is the elasticity of substitution between the 19

The only difference is in DER’s axiom 2, where a consolidation of polarized ”peaks” actually increases polarization. 20 More precisely, with respect to their axiom 2, our index does not change when non-local squeezes are performed, according to our Axiom 4. 21 See Anderson and van Wincoop (2003) for a comprehensive exposition of and a rationale for the gravity equation, which they describe as “one of the most empirically successful in economics” (p. 170).

24

consumed goods from different sources. In the limit case of σ = 1, we have a Cobb-Douglas utility function. We further assume an iceberg transport cost model with pij = pi tij where pi is the supply price for the good produced in country i, and tij varies with distance between the two countries: tij is usually modeled as a polynomial function of distance, times other barriers. Denote the nominal value of export xij = pij cij , then from the optimizing behavior of firms and consumers we obtain the following formula for the flow of trade from country i to country j (in logs): log xij = log yi + log yj − log y W + (1 − σ) log tij − (1 − σ) log Pi − (1 − σ) log Pj , with Pj =

P

i

(βi pi tij )1−σ

1  1−σ

(10)

being the price index, and representing the parameter of multi-

lateral resistance. The index is determined in equilibrium by the following system:

Pj1−σ

=

X

 θi

i

with θi =

Pyi k yk

=

yi yW

tij Pi

1−σ ,

(11)

being the share of income of country i.

In order to show the link with our index, consider the limit case of a Cobb-Douglas utility function, i.e. σ = 1.22 The system that determines the price indices in equilibrium could be rewritten in a log-linear form, from (11):

log Pj =

X

θi (log tij − log Pi )∀j.

(12)

i

The solution for log Pj is then: log Pj =

X

θi log tij −

i

1X θi θj log tij . 2 i,j

(13)

This formula shows how our index is related to the concept of multilateral resistance in a P micro-founded gravity equation. The first term, i θi log tij ≡ −CISCj represents (minus) the centered index of income concentration calculated with each country being the capital point P of interest, when tij is modeled as a power of distance. The second term, i,j θi θj log tij = P P P j θj ( i log tij ) = − j θj CISCj , is the weighted average of each country’s concentration. This happens to be equal to −2CISC, where CISC is our proposed non-centered index of concentration, as described in the previous section.23 In sum, we have: 22

A treatment of the general CES case is available upon request. The factor of 2 comes about because our index counts each pair of countries only once, without double counting. 23

25

log Pj = CISC − CISCj

(14)

In words, the multilateral resistance exerted by country j can be understood as the difference between the world concentration of income and the concentration of world income around that country. If world income is not very concentrated around country j, it follows that this country exerts a high level of multilateral resistance: the world income is much more spatially concentrated around Mexico than around South Africa, for instance, hence the multilateral resistance exerted by the latter is greater. Our index thus makes clear the interpretation of remoteness as a low spatial concentration of world income around the country. The trade connection is not the end of our gravity analogy. In fact, an interesting aside is that our index is closely related to the concept of potential in physics, of which gravitational potential is an example. It refers to the potential energy stored within a physical system – the gravitational potential is the stored energy that results in forces that could move objects in space. In fact, in our “real” three-dimensional space, the potential at a point with respect to a mass is the sum of inversed distance from that point to each location in the mass.24 Now remember that in R3 our index is also the weighted sum of the inverse of distance (d2−n = d−1 ): it coincides exactly with the physical concept of potential. Our gravity-based index is therefore, roughly speaking a measure of the potential associated with the capital point of interest. The idea of potential is also present in the literature on economic geography (Fujita et al. 1999), which uses the concept of “market potential”. While the formal link between this idea and the concept of potential is physics is, to the best of our knowledge, not explicit, market potential is usually conceptualized as a weighted measure of income (or purchasing power), with the weights being inversely related to distance. Our concept of spatial concentration, when applied to income, can be thought of as a measure of market potential, but with the weights having been axiomatically obtained. In sum, our index unearths a relationship between the economic concepts of the gravity equation and of market potential and the physical concepts of gravitation and potential. 24

Since the gravity force is the derivative of the potential, it is proportional to the weight, as well as the inverse of squared distance, as one may recall from the classical Newtonian representation.

26

4

Application: Population Concentration around Capital Cities

Having established our G-CISC and discussed its properties, we now move on to illustrate its applicability in practice. We focus on the distribution of population around capital points of interest – capital cities across countries and across US states, and the political center (e.g. the location of city halls) in US metropolitan areas. We will discuss descriptive statistics and basic correlations with variables of interest, and also how the index can be used to shed light on competing theories, using as an example the determinants of the location of capital cities.

4.1

Cross-country implementation

In our first application, we calculate population concentration around capital cities across countries in the world. We use the database Gridded Population of the World (GPW), Version 3 from the Socio-Economic Data Center (SEDC) at Columbia University. This dataset, published in 2005, contains the information for the years 1990, 1995 and 2000, and is arguably the most detailed world population map available. Over the course of more than 10 years, these data are gathered from national censuses and transformed into a global grid of 2.5 arc-minute side cells (approximately 5km), with data on population for each of the cells in this grid.

25

We compute two different versions of our population G-CISC, which as a shorthand we call “Population Concentration Index” (PCI). The first version (P CI1 is normalized by the maximum distance across countries, and the second version (P CI2 ) is normalized by the maximum distance within the country (P CI2 ), both as described in section 2.4.1: the former captures concentration relative to what it could possibly be in any country, while the latter captures concentration relative to what it could possibly be in that specific country.26 4.1.1

Descriptive Statistics

Table 1 shows the basic descriptive statistics for the two versions of the index, for the three years in the sample, and Table 2 presents their correlation. The first remarkable fact is that 25

We limit our analysis to countries with more than one million inhabitants, since most of the examples with extremely high levels of concentration come from small countries and islands. The results with the full sample are very much similar, however, and are available upon request. 26 While the non-normalized measure may present some interest in itself, we do not report it because of its extremely high correlation with population size, which prevents us from disentangling any independent effect.

27

there is very little variation over that span of time: the autocorrelation is extremely high, and almost all variation comes from the cross-country dimension. This suggests that the pattern of population distribution is fairly constant within each country, and that a period of 10 years may be too short to see important changes in that pattern. For this reason, we choose to focus on one of the years; we choose 1990 because it is the one that has the highest quality of data, as judged by the SEDC. [TABLES 1 AND 2 HERE] Tables 1 and 2 also present the descriptive statistics and correlation with two alternative measures of concentration. The first alternative is the Gini coefficient, a non-centered measure that is often used in the literature, and the second one is the inverse of the average distance (“Inv Avg Dist”), which provides a benchmark for comparison with another centered index. The first thing to note with regard to this comparison is that the appropriate benchmark G-CISC is P CI1 , and not P CI2 , since both Gini and Inv Avg Dist, as usually used in the literature, do not normalize by the geographical size of each country. The striking fact that immediately jumps from Table 2 is that our index captures a very different concept from what the Gini coefficient is capturing: they are negatively correlated. This underscores the point that typical measures of concentration are ill-suited for getting at the idea of concentration around a given point. This point becomes even more striking when we compare the list of countries with very high and very low levels of concentration, which are displayed in Table 3. We can see that the list of the countries whose population is least concentrated around their capital cities accords very well with what was to be expected: these are by-and-large countries where the capital city is not the largest city. (The exceptions are Russia, on which we will elaborate later, and the Democratic Republic of the Congo, formerly Zaire, whose capital is located on the far west corner of the country.) By the same token, the list of highly concentrated countries is quite intuitive as well, with Singapore leading the way. The same list for the Gini coefficient, in contrast, surely helps us understand why the correlation between the two is negative. It ranks very highly countries that have big territories and unevenly distributed populations. While this concept of concentration may of course be useful for many applications, it is quite apparent that using non-centered measures of concentration can be very

28

misleading if the application calls for a centered notion of concentration.27 [TABLE 3 HERE] On the other hand, Table 2 shows the correlation between P CI1 and the alternative centered index Inv Avg Dist to be positive and relatively high, as was to be expected, though not overwhelming. Nevertheless, there are very important empirical differences between the two – in addition to the conceptual properties that our axiomatic approach guarantees. The first such difference can be seen from Figure 4, which plots histograms of both indices. We can see from the figure that the distribution of Inv Avg Dist is very skewed, whereas our measure has a more compelling bell-shaped distribution. This implies that our measure is generally less sensitive to extreme observations. A second important difference can be illustrated by considering a specific comparison, between Brazil and Russia. Russia’s capital, Moscow, is the country’s largest city, and is located at about 600km (slightly less than 400 miles) from the country’s second largest city, St Petersburg. In contrast, Brazil’s capital, Bras´ılia, is now the country’s sixth largest city, and is around 900km (more than 550 miles) away from the country’s largest cities, S˜ao Paulo and Rio de Janeiro, whose combined metropolitan area population is about ten times as large as Bras´ılia’s.28 One would thus be led to expect that a measure of population concentration around the capital city would rank Russia ahead of Brazil. Table 3 shows that this is the case with our P CI1 , but not with Inv Avg Dist. The reason why Inv Avg Dist paints this relatively distorted picture is that it gives a larger weight to people who are very far from the capital point of interest; roughly speaking, it gives a relatively large weight to people who are in Vladivostok. As a result, the measure of concentration tends to be pushed down for countries with big territories – in fact, this tendency to give extra weight to outliers is also behind the skewed distribution that we pointed out in the preceding paragraph. Our measure corrects for this tendency, and that is why it produces a more intuitive ranking. Finally, we also note an interesting pattern emerging from Table 3, regarding the “sizenormalized” version of our G-CISC, P CI2 : the countries with the most concentrated popula27

It is also worth noting that a measure such as Gini is quite sensitive to how “coarse” the grid that is being used to compute the index is: the fewer cells there are, the lower the Gini will tend to be. Our index, on the other hand, has the “unbiasedness” feature that we have already discussed. 28 According to official data, the metro area population of S˜ao Paulo, Rio de Janeiro, and Bras´ılia is around 19 million, 12 million, and 3 million, respectively.

29

tions seem to be fairly small ones (in terms of territory). This does not arise from “mechanical” reasons, first of all because the measure is normalized for size – the pattern suggests that the population of relatively small countries is more concentrated than that of large ones, relative to what it could be. In addition, while the measure for these countries may be less precise because of the small size, and consequent smaller number of grids, we know that our index is unbiased to classical measurement error. We will explore this pattern more systematically in our regression results. 4.1.2

Regression analysis

We can also investigate the correlation patterns of our index with several variables of interest. We will stop short of providing a discussion of causal inference, as it falls out of the scope of this paper, but we can nevertheless provide some interesting results that can be built upon by future research. Economic variables We start by regressing P CI on a number of economic variables of interest.29 The results are described in Table 4. The first thing to note is that there is a negative correlation between land area and concentration around the capital city: countries with larger territories have populations that are less concentrated around the capital. This correlation is robust to the inclusion of a number of controls. It is also worth noting that the correlation between land area and concentration is positive when the latter is measured by the Gini coefficient, which is not surprising in light of Table 3, but nevertheless underscores the point that using Gini as a proxy for concentration of population around the capital city is deeply misleading. [TABLE 4 HERE] It is not that surprising that the measures that are not normalized for size will indicate a negative correlation with territorial size. However, our P CI2 index, which is normalized, also displays a very significant and robust negative correlation, as anticipated from Table 3, which suggests that such correlation is more than a mechanical artifact of the construction of the indices. 29

All of the variables that are time-variant are measured with a 5-year lag in our main specifications. Experimenting with other lags did not affect the results.

30

The correlation with land area is also present when the concentration of population around the capital city is measured using the Inv Avg Dist index. The same cannot be said of the second robust correlation pattern displayed by our P CI indices: there is a negative correlation between the size of population, and how concentrated it is around the capital. In other words, the smaller the country’s population is, the more concentrated it is around the capital. In this case, measuring concentration using Inv Avg Dist fails to capture this pattern, as the coefficient is insignificant and has the opposite (positive) sign. One can speculate over the reasons behind this negative correlation; perhaps countries with larger populations are more likely to have other centers of attraction that lead to the equilibrium distribution of population being more dispersed around the capital city. (We should note, however, that the Axiom of Gravity, which isolates the attraction exerted by the capital point of interest, ensures that the existence of other centers of attraction will not be mechanically built into the index.) This can be the subject of future research.30 Governance variables We have argued elsewhere (Campante and Do 2007) that population concentration is an important determinant of redistributive pressures, particularly so in nondemocratic countries. The basic idea, as expressed in Ades and Glaeser (1995), is that proximity to the capital city increases an individual’s political influence. This is particularly the case with regard to “non-institutional” channels like demonstrations, insurgencies and revolutions, as opposed to democratic elections. As such, a more concentrated population is more capable of keeping a non-democratic government in check. With that idea in the background, we study the correlation between our measure of concentration and a number of measures of the quality of governance, compiled by Kaufman, Kraay and Mastruzzi (2006). A striking pattern emerges, where we can distinguish quite clearly the correlations in democratic countries from those in non-democratic ones. This is the main message from Table 5. [TABLE 5 HERE] For five of the six variables – control of corruption, voice and accountability, government effectiveness, rule of law, and quality of regulation – a higher degree of concentration around 30

One tentative way of probing deeper into this link with population size is to consider the effects of openness. Introducing openness into the regression reduces the coefficient and significance of population size, which may indicate that part of the negative relationship is indeed linked to the relative attraction of the capital city, which may be more pronounced in a more open, outward-oriented economy. The high correlation between openness and population makes it hard to disentangle their effects, however.

31

the capital city strongly predicts higher governance quality only in less democratic countries, with an increase of around 30% of standard deviation for an increase of one standard deviation in PCI. No significant effect is verified for more democratic countries, as an increase in concentration is associated with lower governance quality, though in some cases the relationship is not statistically significant. This is precisely in line with the idea that the concentration of population represents a check on non-democratic governments. We should also note that the significance of the coefficients is generally improved with P CI, compared to the alternative ad hoc Inv Avg Dist – once again, our index provides a considerably clearer picture of the effects of population concentration around the capital city. Moreover, the regression coefficients with Inv Avg Dist become insignificant when we exclude its extreme values, illustrating the observation already made about its skewed distribution. Once again, the picture that would be obtained by treating the Gini index as a proxy for concentration around the capital would be entirely distorted.31 4.1.3

Where to Locate the Capital?

The idea that the capital city is a particularly important point from a political standpoint, and the correlation between the concentration of population around the capital and the extent of the checks on the government suggest that governments – and non-democratic ones in particular – would have an incentive to pick suitable locations for their capital. This draws attention to the endogeneity of the location of the capital city: not only is the concentration of population a variable that is determined in equilibrium, but the concentration patterns can also influence the choice of where to locate the capital. While a full treatment of the different avenues of causality is beyond the scope of this paper, we can nevertheless illustrate how our index can shed light on this topic. More generally, we can illustrate how our index helps approach the issue of the choice of the capital point of interest. Consider a country with a given spatial distribution of its population, and let us think of the problem faced by a ruler with respect to where to locate the country’s capital.32 There are centripetal forces that would lead the ruler to consider spots where the concentration would be very high – economies of agglomeration, broadly speaking. But there are other centrifugal 31

The results for Inv Avg Dist and the Gini index are presented in the Appendix. The history of changes in the location of capital cities, considered at some length in Campante and Do (2007), is proof that this problem is very often explicitly considered. 32

32

forces, such as the aforementioned checks on his power, that would lead him to place the capital in a low-concentration spot. The question is, which of these forces will prevail under which circumstances? Our index can provide an avenue for answering this question. For every country, we compute the concentration of population around every single point in that country.33 We then specify the point where this concentration reaches its maximum value. Interestingly, for three fourths of the countries (in the year 1990) this maximum-concentration location lies right within the capital city. This high rate is explained in part by the choice to put the capital in a central location, and in part by the fact that being the capital increases the location’s attractiveness to migrants and to economic activity in general. More broadly, the maximum-concentration location is often at the largest city.34 We can then measure the gap between this site and the actual capital, as an indicator of how far a country’s actual choice of capital is from the point that would maximize the “agglomeration economies”. We regress this distance, normalized by the greatest distance to any point in the country, on a set of political variables using OLS and Tobit regressions. The results are presented in Table 6. When we limit ourselves to non-democratic countries, we see that a higher level of autocracy predicts a greater distance between the capital city and the concentration-maximizing location. Then when we limit ourselves to non-autocratic countries, then a higher level of democracy also predicts a greater distance. When combined together, both variables of autocracy and democracy predict a greater distance: this shows a type of U-shaped relationship, in which the centrifugal forces are strongest in both extremes of autocracy and democracy. This pattern is very robust to the inclusion of many dummy variables, including regional dummies and legal origin dummies. We can speculate that, on the autocratic side, more autocratic governments have greater incentive and/or ability to insulate themselves from popular pressure by locating their capital cities in low-concentration spots. On the democratic side, it is perhaps the case that additional democratic openness will lead to greater decentralization, and a lower level of attraction exerted by the capital. We are far from having a theory to fully account for that at this point, but the stylized fact is quite interesting 33

More precisely, every single cell in the grid that covers the country. The exceptions are often illustrative. In China, it is close to Zhengzhou, the largest city in the province of Henan, which is the country’s most populous; in India, similarly, it is in the state of Uttar Pradesh, which is also the most populous. In the US, it is Columbus, OH, right in the middle of the large population concentrations of the East Coast and the Midwest. 34

33

nonetheless, and we also leave it to future research. [TABLE 6 HERE]

4.2

US State-level Implementation

Building on the previous section’s discussion on the location of the capital city, there is no better country in which to take our empirical implementation to the regional level than the United States, with its long tradition of dealing with the issue. Most famously, James Madison elaborated at length on the choice of the site of the capital city, during the 1789 Constitutional convention, arguing that one should “place the government in that spot which will be least removed from every part of the empire,” and that “regard was also to be paid to the centre [sic] of population.” He also pointed out that state capitals had sometimes been placed in “eccentric places,” and that in those cases “we have seen the people struggling to place it where it ought to be.”35 The force identified by Madison have been very much at play in the case of US states, and our index enables us to get a snapshot of what the outcome has been. Regarding concentration normalized by country size shown in Table 7, Illinois is less concentrated around its capital Springfield than any country in the world. Even with country size, its level of concentration is still comparable with Canada, ranking 10 in the world. Not only for Illinois, but many US States where the capital is not the largest city have the level of concentration comparable to the least concentrated countries in the world. [TABLE 7 HERE]

5

Concluding Remarks

We have presented a general, axiomatic approach to building a centered measure of spatial concentration. We show that requiring a few basic properties, meant to be robust across a variety of applications, enables us to pin down a specific class of measures, defined over any Euclidian space. We then go on to illustrate the empirical implementation of the measure, 35

These quotations were obtained from the website The Founders’ Constitution (on Article 1, Section 8, Clause 17), available at http://press-pubs.uchicago.edu/founders/.

34

and how this implementation highlights some of the advantages of our index over alternative approaches. We emphasize that our approach is a very general one, and unapologetically so. Our idea was to build an index that is not model-specific, so that it can provide a common language to operationalize the concept of centered spatial concentration over a broad scope of applications, in geographical and also in more abstract spaces. We certainly hope that it can be widely applied. In addition, our illustrative implementation also opens up avenues for future research, as the correlations that we are able to point out between our index and a number of variables of interest can be exploited further, with particular attention to issues of causality that are left outside the scope of this paper.

35

References [1] Ades, Alberto F. and Edward L. Glaeser (1995), “Trade and Circuses: Explaining Urban Giants,” Quarterly Journal of Economics 110: 195-227. [2] Anderson, James E. and Eric van Wincoop (2003), “Gravity with Gravitas: A Solution to the Border Puzzle,” American Economic Review 93: 170-192. [3] Armitage, David H. and Stephen J. Gardiner (2001), Classical Potential Theory. London: Springer-Verlag. [4] Axler, Sheldon, Paul Bourdon and Wade Ramey (2001), Harmonic Function Theory. New York: Springer-Verlag. [5] Busch, Marc L. and Eric Reinhardt (1999), “Industrial Location and Protection: The Political and Economic Geography of U.S. Nontariff Barriers,” American Journal of Political Science, 43: 1028-1050. [6] Campante, Filipe R. and Quoc-Anh Do (2007), “Inequality, Redistribution, and Population,” Harvard University (mimeo). [7] Center for International Earth Science Information Network (CIESIN), Columbia University; and Centro Internacional de Agricultura Tropical (CIAT), (2004), “Gridded Population of the World (GPW), Version 3”, Columbia University. Available at http://beta.sedac.ciesin.columbia.edu/gpw. [8] Collier, Paul and Anke Hoeffler (2004), “Greed and Grievance in Civil War,” Oxford Economic Papers, 56: 563-595. [9] Duclos, Jean-Yves, Joan Esteban, and Debraj Ray (2004), “Polarization: Concepts, Measurement, Estimation,” Econometrica 72: 17371772. [10] Echenique, Federico and Roland G. Fryer, Jr. (2007), “A Measure of Segregation Based on Social Interactions,” Quarterly Journal of Economics 122: 441-485. [11] Ellison, Glenn and Edward L. Glaeser (1997), “Geographic Concentration in U.S. Manufacturing Industries: A Dartboard Approach,” Journal of Political Economy 105: 889-927. [12] Esteban, Joan and Debraj Ray (1994), “On the Measurement of Polarization,” Econometrica 62: 819-851. [13] Fujita, Masahisa, Paul Krugman, and Anthony J. Venables (1999), The Spatial Economy: Cities, Regions, and International Trade. Cambridge, MA: MIT Press. [14] Galster, George, Roy Hanston, Michael R. Ratcliffe, Harold Wolman, Stephen Coleman, and Jason Freihage (2001), “Wrestling Sprawl to the Ground: Defining and Measuring an Elusive Concept,” Housing Policy Debate 12: 681-717. [15] Glaeser, Edward L. and Matthew Kahn (2004), “Sprawl and Urban Growth”, in V. Henderson and J. Thisse (eds.), The Handbook of Regional and Urban Economics, Amsterdam: North Holland. [16] Hidalgo, C. A., B. Klinger, A.-L. Barab´asi, and R. Hausmann (2007), “The Product Space Conditions the Development of Nations,” Science 317: 482-487.

36

[17] Jensen, J. Bradford and Lori Kletzer (2005), “Tradable Services: Understanding the Scope and Impact of Services Offshoring,” Brookings Trade Forum, 6: 75-134. [18] Krugman, Paul R. (1991), Geography and Trade, Cambridge, MA: MIT Press. [19] Ransford, Thomas (1995), Potential Theory in the Complex Plane. Cambridge, UK and New York: Press Syndicate of the University of Cambridge. [20] Rogers, Andrei and Stuart Sweeney (1998), “Measuring the Spatial Focus of Migration Patterns,” The Professional Geographer, 50: 232-242. [21] Saff, Edward and Vilmos Totik (1997), Logarithmic Potentials with External Fields. Berlin: Springer-Verlag.

37

A

Appendix: Propositions

Proposition 3 (Classification) A concentration index satisfying Axioms 1-3 is super-concentrated if its impact function is super-harmonic. It is sub-concentrated if its impact function is subharmonic. For any index satisfying Axioms 1-3, the domain Rn \B(C, 1) could be partitioned into a countable set of open annulars {x|ri < |x − C| < ri+1 }, r1 = 1 < r2 < ..., such that within each annular the index is either super- or sub-concentrated. Proposition 4 (Alternative characterization) Within the class of centered spatial concentration indices defined in Proposition 1, Axiom 4 is equivalent to Property 1. Property 1 thus also characterizes the same class of indices determined by Proposition 2.

B

Appendix: Proofs

The proofs of Propositions 2 and 3 require a digression on harmonic function theory. Definition 6 (Harmonic Function) A real function f (x1 , x2 , . . . , xn ) is said to be harmonic on an open domain D of Rn if it satisfies the Laplace equation over that domain (provided the partial derivatives are well defined): ∂ 2f ∂ 2f ∂ 2f + + . . . + 2 ≡ 0. 4f = ∂x21 ∂x22 ∂xn def

Axler et al. (2001) covers the basics of harmonic function theory. In particular, a harmonic function transformed through a summation, scalar multiplication, translation, squeeze, rotation or partial/directional differentiation is still a harmonic function. The essential property of harmonic functions is its Mean Value Property, stated as follows: Mean Value Property Given a ball B(T, r) within the open domain D ⊂ Rn , its sphere S(T, r) = ∂B(T, r), and a harmonic function f on D. The mean of f on the sphere is equal to the value of f at the center of the sphere: Z def 1 f (x)dσS = f (T), (15) MS(T,r) (f ) ≡ σS (S) S where σS is the uniform surface measure on the sphere S. Inversely, a function f satisfying this property over D must be harmonic on D. In order to prove this property we will use Green’s identity, stated in the following lemma: Lemma 1 (Green’s identity) Given a regular open domain Ω ⊂ Rn with regular boundary ∂Ω, and regular functions u and v, as well as volume measure dν and surface measure dσ: Z Z (u4v − v4u)dν = (uDn v − vDn u)dσ, Ω

∂Ω

where Dn is the Gateaux-derivative with respect to the normal vector that points outward of the domain Ω.

38

Proof of Lemma 1. The divergence theorem intuitively states that the change in volume of a vector field within a domain is equal to the net flow in into that domain: Z Z ∇ · wdν = w · ndσ, Ω

∂Ω

∂w where ∇ · w = i ∂x . Replace the field w by u∇v − v∇u, noting that u∇v · n = uDn v and i ∇ · (u∇v) = u4v and we obtain Green’s identity. Proof of the Mean Value Property. First consider a C 2 function f on the domain D containing B(T, r), and apply Green’s identity with u = f and v = 1 on any ball B(T, t) with 0 < t ≤ r to obtain: Z Z 4f dν = Dn f dσ

P

B(T,t)

S(T,t)

The right hand side integral is converted to an integral on the unit sphere S(T, 1): Z Z Z ∂ n−1 n−1 d Dn f dσ = t f (T + tE)dσ(E) = t f (T + tE)dσ(E) dt S(T,1) S(T,t) S(T,1) ∂t (the differentiation under the integral sign is possible in this case), while the left hand side is: Z Z n 4f dν = t 4f (T + tE)dν(E). B(T,t)

B(T,1)

Put together these equalities, noting that σ(S(T, 1)) = nν(B(T, 1)) we obtain: tMB(T,t) (4f ) = n

d MS(T,t) (f ), dt

(16)

where M denotes the mean of a function on a domain. From equation (16), when the function f is harmonic (4f ≡ 0) MB(T,t) (f ) is constant on t ∈ (0, r], thus by continuity is equal to its value at t = 0: MS(T,r) (f ) = f (T). Proof of the Reverse of the Mean Value Property. Notice that the mean value property implies that the function f must be of class C ∞ (D) (a formal proof could be found in Armitage and Gardiner (2001)). Therefore we obtain equation (16), which when integrated over t implies: Z r n(MS(T,r) (f ) − f (T)) = tMB(T,t) (4f )dt. 0

As MB(T,t) (4f ) → 4f (T) when t → 0+, the right hand side is approximated by 21 r2 4f (T) when r → 0+. We deduce: 4f (T) = 2 lim r−2 (MS(T,r) (f ) − f (T)). r→0+

(17)

When f satisfies the Mean Value Property, the expression under the limit sign is 0, implying that 4f = 0 at all T, i.e. f is harmonic. We can also define super- and sub-harmonic functions: the Laplace equation, when considered as an inequality, further refines the space C 2 into the subspaces of super- and sub-harmonic functions. More precisely: 39

Definition 7 (Super-/Sub-harmonic functions) Given f (x1 , x2 , . . . , xn ) a regular real-valued function on an open domain D of Rn . It is said to be super-harmonic on D if 4f ≤ 0. It is said to be sub-harmonic if −f is super-harmonic. The Mean Value Property and its reverse are adapted to the case of super- and sub-harmonic functions as inequalities of the Mean Value, stated as follow: Mean Value Inequality Given a sphere S(T, r) whose ball B(T, r) lies completely within the open domain D ⊂ Rn , and a super-harmonic (sub-harmonic) function f on D. The mean of f on the sphere is less (greater) than or equal to the value of f at the center of the sphere, namely: MS(T,r) (f ) ≤ (≥)f (T), (18) In reverse, if f satisfies the mean value inequality in (18) with a ≤ (≥) sign, then f is superharmonic (sub-harmonic) on D. Proof of the Mean Value Inequality and Its Reverse. We present the proof for the super-harmonic case. For sub-harmonic functions the proof is almost identical. First, given a smooth super-harmonic function f on D. Equation (16) and the condition 4f ≤ 0 implies that the MS(T,t) (f ) is decreasing in t for t ≥ 0. Since its limit when t → 0+ is f (T), it follows that MS(T,r) (f ) ≤ f (T). Conversely, when the Mean Value Inequality is satisfied, equation (17) implies that 4f ≤ 0 at any point T, i.e. f is super-harmonic. The proof presented here applies to sufficiently smooth functions. A more general proof for upper/lower-semicontinuous functions could be found in Ransford (1995). Intuitively, real-valued super- and sub-harmonic functions on the plane R2 could be viewed as more “demanding” versions of concave and convex functions. As we will show, under the assumption of isotropy these concepts provide a strong refinement of the class of real functions on the plane, compared to the concepts of concavity and convexity.36 Armed with these concepts, we can now proceed with the proof of Proposition 2. Proof of Proposition 2. First, we see that conditional on the functional form of the index I given by Proposition 1, Axiom 4 is equivalent to the Mean Value Property of the impact function. Indeed, Axiom 4 could be rewritten as: Z Z h(|ρ(x − T) + T − C|)dη, h(|(x − T) + T − C|)dη = S(T,r)

S(T,r)

from which when letting the squeeze parameter ρ tend to zero, we obtain the Mean Value Property for h at point T by continuity. In reverse, the Mean Value Property implies straightforwardly that the impact function satisfies Axiom 4. Taken together, this means that conditional on the first three axioms, Axiom 4 is equivalent to the impact function h being harmonic on the domain Rn \{C}. In the next step, we will determine the class of harmonic functions in Rn \{C} satisfying the condition laid out by Proposition 1. We first establish the following Lemma, which is derived straightforward from the second derivative of any function depending solely on the distance to a point C: Lemma 2 For a smooth function h|x − C|) in Rn \{C}, we have: 4h(|x − C|) = h00 (|x − C|) + 36

(n − 1) 0 h (|x − C|). |x − C|

For isotropic functions on Rn concavity and convexity is of little interest, since these concepts are respectively equivalent to the function being increasing (decreasing) in the distance to center. Besides, an isotropic function that is both concave and convex on Rn must be constant.

40

Proof of Lemma 2. Straightforward algebra shows Lemma 2:   n n ∂ h0 · xi 2 X X |x−C| ∂ h(|x − C|) 4h(|x − C|) = = 2 ∂xi ∂xi 1 1   n  X x2i x2i 1 00 0 = h · − +h · |x − C|2 |x − C| |x − C|3 1 = h00 (|x − C|) +

(n − 1) 0 h (|x − C|). |x − C|

Lemma 2 together with the previous argument implies that an index satisfies all four Axioms iff its impact function satisfies the condition in Proposition 1, and solves the following equation: h00 (|x − C|) h0 (|x − C|) d ⇔ (log h0 (r)) dr ⇔ log h0 (r) = h(d) ⇔ h0 (r)

n−1 |x − C| 1 −(n − 1) r −(n − 1) log r + const const · r1−n  αr + β, n = 1,  α log r + β, n = 2, αr2−n + β, n > 2.

= − = = =

⇔ h(r) =

Coupled with Proposition 1’s condition of monotonicity, we deduce that α ≥ 0 for n > 2, and α ≤ 0 for n = 1, 2. Proof of Proposition 3. In a manner similar to Proposition 2’s proof, if a concentration index satisfying Axioms 1-3 is super-concentrated, its impact function must satisfy the following inequality for all T. Z Z h(|x − C|)dS(T,ρ) (η). (19) h(|x − C|)dη ≤ S(T,r)

S(T,ρr)

Letting ρ → 0, this inequality implies the Mean Value Inequality in (18) for all points T within the domain D. Thus by the reverse of the Mean Value Inequality, the impact function h must be super-harmonic on Rn \{C}. The sub-harmonic case is analogously proven. In reverse, given a super-harmonic impact function h, equation (16) shows that the mean value of h over the sphere S(T, ρr) is decreasing in ρ. The corresponding index therefore is super-concentrated. The proof for a sub-harmonic impact function is similar. Regarding the annular partition, it suffices to notice that solutions in x of 4h(|x − C|) = 0 are rotation-invariant around C, and determine circles around C. Since 4h(|x − C|) is continuous on Rn \C, the set of |x − C| for all solutions in x is discrete and could be numbered by ri , i ∈ Z. Within each annular between ri and ri+1 , as described in Proposition 3, 4h(|x−C|) has a determined sign, thus the index is either super- or sub-concentrated. d Proof of Corollary 2.1. First, we show that the sign of 4h(|x−C|) is that of − dθ (log(−h0 (θ))+ (n − 1) log θ), as h0 (θ) < 0. This comes from Lemma 2: 4h(|x−C|) = h00 (|x−C|)+

(n − 1) 0 d h (|x−C|) = h0 (|x−C|) (log(−h0 (|x − C|)) + (n − 1) log θ) , |x − C| dθ 41

with the observation that h0 (|x − C|) < 0. On R2 , the check of the super-/sub-harmonic property of a radial function h(|x − C|) is then equivalent to the calculation of the sign of d (log(−h0 (θ)) + log θ). Applying this result to each of the functions mentioned in the Corol− dθ lary, we immediately get the results. Property 2 Given a uniform circle distribution ν on the circumference of (O, r) such that the center C lies inside its interior. Then the concentration of (ν, C) does not depend on the location of O with respect to C. More precisely: Z Z log |x − C|dν = log r dν, ∀|O − C| < r. (20) (O,r)

Proof of property 2. The proof relies essentially on the inversion transformation of the point C inside the circle (O, r) to the point C−1 outside the circle, on the same line that links O and C, such that |O − C| · |O − C−1 | = r2 . The following lemma shows the use of such a transformation: Lemma 3 (Inversion) For all point x on the circle (O, r), |x − C| = |x − C−1 | |O−C| . r Proof of lemma 3. We start with the observation that for any vectors u and v we have the |v| following identity: |u − v| = |u| u − |u| v . Apply it to x − O and C − O we have: |v| |x − C| = |(x − O) − (C − O)| = = = =

|C − O| r = (x − O) − (C − O) r |C − O| |C − O| r2 · (x − O) − (C − O) = 2 r |C − O| |C − O| C − O · (x − O) − |C−1 − O| · = r |C − O| |C − O| · |x − C−1 |. r

Using this result, the integrand in the left hand side of equation (20) is log |C − O| − log r + log |x − C−1 |. Notice that since C−1 is outside of the circle (O, r), we could apply the Mean Value Property to show that the mean of log |x − C−1 | over (O, r) is exactly log |O − C−1 |. The left hand side of equation (20) then equals: Z Z (log |O − C−1 | + log |C − O| − log r) dν = log r dν. As this quantity is independent of the location of O, so is the concentration index. Proof of Property 1. Property 1 is the direct consequence of Property 2 shown above. With the existence of a mass at point T on the circumference of (O, r), as we move C closer to T we do not change the concentration of µ around C, while the concentration of the mass at T around C increases. The total concentration is the sum of the two, thus must increase according to this move. Analogously, when we move C farther away from T, but still within the circle (O, r), total concentration around C must decrease.

42

Property 3 (Strong Reverse) Conversely to Property 2, given a uniform circle distribution ν of mass 1 on a circumference (O, r) such that the center C lies inside its interior. If the following equation is satisfied for O in a neighborhood of C: Z Z h(|x − C|)dν = h(r) dν, ∀|O − C| < , (21) (O,r)

then the function h(|x − C|) must be harmonic, and consequently h(d) must be a linear transformation of the function log d in R2 , and d2−n in Rn . Proof. Take a point O in the neighborhood of C and a radius r > |O − C|. The inversion transforms C to C−1 as discussed in the proof of Property 2, so that according to lemma 3 we have h(|x − C|) = h(|x − C−1 | |O−C| ) for all point x on the circumference (O, r). By a change r in variable, the integral of h(|x − C|) over (O, r) is equal to that of h(|x0 − C−1 |) where x0 runs over a circle of center O0 at distance |O0 − C−1 | = r, and of radius r · |O−C| = |O − C| (this r new circle is a homothetic image of the circle (O, r) through SC−1 , |O−C| ). Equation (21) then r means that the mean of h(|x0 − C−1 |) equals h(|O0 − C−1 |). By the reverse of the Mean Value Property, we know that h(|x0 − C−1 |) must be harmonic, thus h(d) must take the form of a linear transformation of the function log d in R2 , or the power of degree 2 − n in Rn . Proof of Proposition 4. We have already established that within the class of indices defined in Proposition 1, Axiom 4 leads to Property 1. The proof of the reverse consists of two steps. First, we prove that the condition laid out in Property 3 must be satisfied. Indeed, suppose that when we move C away from the center O of the circle, equation (21) becomes an inequality. Suppose first that the concentration of the uniform circle distribution ν around C decreases (what happens if h(d) = −d, for example), say, by an amount δ. Because of the symmetry around O, such movement of C in any direction would incur the difference of δ. Consider the move of C to C0 towards the direction of the mass point T, where the mass is  (excluding the population of ν at that point). The increase in concentration of this mass point with respect to C is (h(|T − C0 |) − h(r)). Thus there exists  small enough (0 <  < h(|T−Cδ0 |)−h(r) ) so that this increase in concentration is smaller than the decrease δ in concentration of ν with respect to C. As a result, when C moves to C0 closer to T, total concentration still decreases, contrary to the condition of Property 1. Analogously, the case when the concentration of ν around C increases after a move of C also leads to contradiction with the condition of Property 1. Consequently, equation (21) must be satisfied. In the second step, starting from this equation, we only need to apply Property 3 to conclude that the function h(|x − C|) must be harmonic on Rn \{C}. This is indeed equivalent to the result of Proposition 2.

C

Appendix: Generalization of Axiom 1

There is some concern about our Axiom of Decomposability, in that by directly proposing the formula with respect to sub-distributions, we implicitly eliminate by assumption all form of inter-individual influences. For example, in a context of migration, individuals may be more incited to migrate if more individuals from their location have also emigrated. In what follows, we consider a generalized version of Axiom 1 that allows for inter-individual influences within a range r from each individual, by including in the impact function h an arbitrary function of the number of individuals within this range, i.e. a factor of the form f (µ({|x − T| ≤ r})). The formula postulated by the old Axiom 1 is a special case when the function f is constant. Generically, the presence of a non-constant f violates the old, more constraining version of Axiom 1, as we only allow for decomposability with respect to locations, as stated formally in 43

what follows. Somewhat surprisingly, we will prove that when combined with axioms 2-3, this generalized version of Axiom 1 will lead to the same conclusion in Proposition 1. Axiom 1 (Generalized) The CISC of a distribution µ with respect to the center C is the sum over each individual, allowing for inter-individual interaction within a small radius r. Formally, it is of the following form: Z I(µ, C) = f (µ({|x − T| ≤ r}), T, C)dµ(T), where the impact function f is continuous in each of its arguments. We will show that when combining with Axiom 2, we still get the same functional form as before. Proof of Proposition 1 under the Generalized Axiom 1. Let us place ourselves on a simple line R: the proof is completely identical for higher dimensions. In what follows, for convenience we rename the function f as a function of µ({|x − T| ≤ r}) and R where R denotes the distance between T and C (there is no confusion as C is fixed here). It suffices to prove that f depends only on R to arrive at the same conclusions as Proposition 1. Imagine a distribution of two mass points, one of size αM at distance R, and another of size (1 − α)M at distance R − r. Now when we move the second mass a little bit outward to R − r− and a little bit inward to R − r+ , Axiom 2 implies the following inequality: αf (M, R) + (1 − α)f (M, R − r− ) ≤ αf (αM, R) + (1 − α)f ((1 − α)M, R − r+ ). By continuity with respect to the second argument of f , we deduce that: αf (M, R) + (1 − α)f (M, R − r) ≤ αf (αM, R) + (1 − α)f ((1 − α)M, R − r). Now we could perform the same operation with the mass point of at distance R to deduce the reverse inequality: αf (M, R) + (1 − α)f (M, R − r) ≥ αf (αM, R) + (1 − α)f ((1 − α)M, R − r). Combine the two inequalities, and we get the following equality: αf (M, R) + (1 − α)f (M, R − r) = αf (αM, R) + (1 − α)f ((1 − α)M, R − r).

(22)

Take α = 21 and we deduce that f (M, R) + f (M, R − r) = f ( 21 M, R) + f ( 12 M, R − r). The application of equality (22) with M , 21 M ,. . ., 2−n M implies that f (M, R) + f (M, R − r) = f (0, R) + f (0, R − r). Plugging back this identity into equality (22), we could further induce: αf (M, R) − (1 − α)f (M, R) + (1 − α)[f (M, R) + f (M, R − r)] = = αf (αM, R) − (1 − α)f ((1 − α)M, R) + (1 − α)[f ((1 − α)M, R) + f ((1 − α)M, R − r)] ⇒ α[f (M, R) − f (αM, R)] = (1 − α)[f (M, R) − f ((1 − α)M, R)]. (23) From equality (23) by letting α approach 1 we infer that f must be differentiable with respect to ∂ M . Furthermore, we also infer that M ∂M f (M, R) = f (M, R)−f (0, R) which for a differentiable function f (., R) means that f (M, R) is linear in M : f (M, R) = f (0, R) + ζ(R)M. This function must also be decreasing in R, so ζ(R) must be a decreasing function of R. Now plug this linear form into the identity of f (M, R) + f (M, R − r), we infer that ζ(R) + ζ(R − r) = 0 for all R. For a decreasing function ζ(.), this is only possible when ζ(.) ≡ 0, or that the impact function f (M, R) does not depend on M . Repeat the rest of the proof in case of the original Axiom 1, and we deduce the same result. 44

D

Appendix: Data Description

Population Concentration Index: The measures P CI1 , P CI2 , P CI3 are calculated as explained in the text, using original gridded population maps from the database Gridded Population of the World (GPW), Version 3 from the Socio-Economic Data Center, Columbia University (2005), containing maps in 1990, 1995 and 2000 of a global grid of 2.5 arc-minute side cells (approximately 5km). Gap to concentration maximizing location: This variable is calculated for each country by measuring the distance between the actual site of the capital city, and the site of the capital that would maximize the PCI. The maximization is done with Matlab’s large scale search method (with analytical gradient matrix), from a grid of 50 initial guesses evenly distributed on the country’s map for large countries. Kaufmann, Kraay and Mastruzzi (KKM): From KKM’s (2006) indices, including Voice and Accountability, Control of Corruption, Rule of Law, Government Effectiveness, Political Stability, and Regulation Quality, themselves a composite of different agency ratings aggregated by an unobserved components methodology. On a scale of −2.5 to 2.5. Data are available for 1996-2002 at two-year intervals, and thereafter for 2002-2005 on an annual basis. We use the data in 1996 for our measure of population concentration in 1990. KKM data available at: http://info.worldbank.org/governance/kkz2005/pdf/2005kkdata.xls Real GDP per capita: From the World Bank World Development Indicators (WDI). Real PPP-adjusted GDP per capita (in constant 2000 international dollars). Population by year: From the World Bank World Development Indicators (WDI). Democracy: Polity IV democracy score, on a scale of 0 to 10. Autocracy: Polity IV autocracy score, on a scale of 0 to 10. Polity: Polity IV composite score as Democracy minus Autocracy, on a scale of -10 to 10. The reference date for the annual observations in the Polity IV dataset is 31 December of each year. We match these to the data corresponding to 1 January of the following year for consistency with the DPI. Data available at: http://www.cidcm.umd.edu/inscr/polity/ Imports, Openness: From the WDI. Imports of goods and services as a percentage of GDP. Openness measure equals to the sum of imports and exports as a share of GDP, also from the WDI. Government Expenditure: From the WDI. Total government consumption expenditure as a share of GDP. Legal Origin: From La Porta et al. (1999). Dummy variables for British, French, Scandinavian, German, and socialist legal origin. Region dummies: Following the World Bank’s classifications, dummy variables for: East Asia and the Pacific; East Europe and Central Asia; Middle East and North America; South Asia; West Europe; North America; Sub-Saharan Africa; Latin America and the Caribbean.

45

Figure 1

C

a)

C

b)

Figure 2

T

C

Figure 3

C’ T C

FIGURE 4

TABLE 1 Cross Country Summary Statistics

Observations

Mean

Standard Deviation

PCI 1 90

156

0.4639292

0.0970644

0.2455372

0.7640706

PCI 1 95

156

0.4643864

0.0970974

0.243909

0.7641067

PCI 1 00

156

0.4647998

0.0971469

0.2417703

0.7641418

PCI 2 90

156

0.2527099

0.0736542

0.1047438

0.5819907

PCI 2 95

156

0.2533685

0.073643

0.1003575

0.5819913

PCI 2 00 Inv Avg Dist 90 Inv Avg Dist 95 Inv Avg Dist 00 Gini Pop 90 Gini Pop 95 Gini Pop 00 PCI 1 Growth 90-95

156 156 156 156 156 156 156 156

0.253967 0.0073017 0.0073248 0.0073482 0.6496209 0.6514677 0.653804 0.0004572

0.0736807 0.0094316 0.0094405 0.0094538 0.1587509 0.1579596 0.1569453 0.0025761

0.0972518 0.0006739 0.0006652 0.0006556 0.138778 0.124376 0.109702 -0.0128641

0.581991 0.0979877 0.09802 0.0980514 0.986923 0.987184 0.987656 0.0109385

PCI 1 Growth 95-00

156

0.0004134

0.0022544

-0.0068741

0.0115931

Variable

Min

Max

TABLE 2 Cross Country Correlation

Inv Avg Inv Avg Inv Avg Gini PCI 1 90 PCI 1 95 PCI 1 00 PCI 2 90 PCI 2 95 PCI 2 00 Dist 90 Dist 95 Dist 00 Pop 90 PCI 1 90 PCI 1 95 PCI 1 00 PCI 2 90 PCI 2 95 PCI 2 00 Inv Avg Dist 90 Inv Avg Dist 95 Inv Avg Dist 00 Gini Pop 90 Gini Pop 95 Gini Pop 00 PCI 1 Growth 90-95 PCI 1 Growth 95-00

1 0.9996 1 0.9988 0.9997 1 0.6715 0.6706 0.669 1 0.6725 0.6729 0.6723 0.9988 1 0.6726 0.674 0.6744 0.9958 0.999 0.7251 0.7242 0.7233 0.4231 0.4225 0.7262 0.7256 0.7249 0.4227 0.4226 0.7272 0.7268 0.7263 0.4219 0.4222 -0.2882 -0.2905 -0.2927 0.2809 0.2756 -0.2881 -0.2895 -0.291 0.2827 0.2792 -0.2869 -0.2874 -0.2882 0.2837 0.2819 -0.0004 0.0261 0.047 -0.0257 0.0234 -0.0135 0.0103 0.0335 -0.0542 -0.0096

1 0.4218 0.4223 0.4224 0.2698 0.2749 0.2791 0.0623 0.0338

1 0.9999 0.9996 -0.2291 -0.2309 -0.2322 -0.0229 -0.0227

1 0.9999 -0.2312 -0.2324 -0.2333 -0.0143 -0.0141

Gini Pop 95

Gini Pop 00

1 -0.2337 1 -0.2345 0.9986 1 -0.2348 0.9947 0.9986 1 -0.0064 -0.089 -0.0545 -0.0248 -0.0051 -0.1019 -0.0715 -0.04

PCI 1 Growth 90-95

1 0.8993

Table 3: Ranking by PCI1 90

Code USA BRA CHN ZAF(b) RUS IND MOZ KAZ ZAR CAN

Country United States Brazil China South Africa (Cape Town) Russian India Mozambique Kazakhstan Congo Kinshasa (DR) Canada

PRI SLV CRI ARM TTO LBN JOR KWT MUS SGP

Puerto Rico El Salvador Costa Rica Armenia Trinidad and Tobago Lebanon Jordan Kuwait Mauritius Singapore

Code EGY ISR(b) KWT PRT URY CRI CHL ARM JOR MUS

Country Egypt Israel (Tel Aviv) Kuwait Portugal Uruguay Costa Rica Chile Armenia Jordan Mauritius

PCI 1 90 0.246 0.247 0.251 0.263 0.269 0.270 0.290 0.298 0.298 0.301 0.622 0.628 0.631 0.645 0.648 0.648 0.652 0.665 0.704 0.764

PCI 1 90 0.513 0.612 0.665 0.492 0.579 0.631 0.478 0.645 0.652 0.704

Rank PCI 1 90 1 2 3 4 5 6 7 8 9 10 147 148 149 150 151 152 153 154 155 156

Rank PCI 1 90 109 146 154 98 139 149 89 150 153 155

PCI 2 90 0.246 0.147 0.169 0.105 0.250 0.171 0.145 0.149 0.156 0.244 0.354 0.345 0.392 0.404 0.346 0.328 0.450 0.384 0.582 0.353

PCI 2 90 0.367 0.377 0.384 0.389 0.391 0.392 0.392 0.404 0.450 0.582

Rank Inverse Rank Inverse Average Avg Distance 90 Distance 90 PCI 2 90 74 0.00067 1 12 0.00093 6 21 0.00090 4 1 0.00091 5 77 0.00069 2 22 0.00101 7 11 0.00103 8 14 0.00133 11 15 0.00104 9 72 0.00087 3 146 142 152 154 144 137 155 149 156 145

0.02124 0.02047 0.01834 0.02152 0.02940 0.02443 0.02116 0.03021 0.02841 0.09799

150 148 145 151 154 152 149 155 153 156

Rank Inverse Avg Distance Inverse Average Rank 90 Distance 90 PCI 2 90 146 0.00666 102 147 0.01783 143 148 0.03021 155 149 0.00454 75 150 0.00827 117 151 0.01834 145 152 0.00311 52 153 0.02152 151 154 0.02116 149 155 0.02841 153

Gini Pop 90

Rank Gini Pop 90

0.914 0.852 0.751 0.923 0.930 0.540 0.661 0.750 0.606 0.987

149 140 113 150 153 39 88 112 58 156

0.493 0.531 0.654 0.564 0.614 0.596 0.884 0.732 0.627 0.516

21 37 86 50 62 56 147 104 70 28

Gini Pop 90

Rank Gini Pop 90

0.967 0.743 0.732 0.743 0.645 0.654 0.884 0.564 0.884 0.627

154 107 103 106 77 85 145 49 146 69

Table 4: Predictors of Population Concentration (1)

(2)

PCI2

(5) (6) Inv_Avg_Dist

-0.00759 -0.00989* [0.0061] [0.0058] -0.0145*** -0.0147** [0.0053] [0.0057] 0.0121** 0.0103 [0.0059] [0.013] 0.00132 [0.0014] -0.0376 [0.032] YES YES

0.000484 0.0012 [0.00063] [0.00091] -0.00449***-0.00548*** [0.0014] [0.0017] 0.000446 0.00275 [0.00042] [0.0017] -0.000214 [0.00021] 0.00509 [0.0036] YES YES

PCI1

Dependent Variable Log Population

-0.00831** -0.00951** [0.0039] [0.0039] Log Land Area -0.0476*** -0.0465*** [0.0026] [0.0026] Log GDP Per Cap 0.00326 0.00457 [0.0035] [0.0087] Polity Score -0.000848 [0.00073] Ethno-Linguistic Frac. -0.024 [0.020] Regional Fixed Effects YES Legal Origin Fixed Effects YES Observations R-squared

(3)

113 0.82

108 0.87

113 0.22

(4)

108 0.46

113 0.52

108 0.62

(7)

(8) Gini Pop

-0.0186* -0.00242 [0.010] [0.013] 0.0587*** 0.0446*** [0.0084] [0.0094] 0.0566*** 0.0750*** [0.010] [0.024] 0.00365 [0.0023] 0.129* [0.065] YES YES 113 0.42

108 0.58

Intercept omitted. Robust standard errors in brackets. All independent variables are taken with 5-year lag. PCI, Inv_Avg_Dist, GiniPop are calculated as described in the paper. See Appendix for data sources of other variables. *** p