Representation of Developments in Labour Market ... - CiteSeerX

0 downloads 0 Views 89KB Size Report
Abstract. A knowledge graph is a kind of semantic network representing some scientific the- ory. The paper describes the state of art in this field and addresses a ...
Quality & Quantity (2005) 39: 241–251 DOI 10.1007/s11135-004-5008-8

© Springer 2005

Representation of Developments in Labour Market Research ROEL POPPING Department of Sociology, Groningen University, NL 9712 TG 31, Groningen, The Netherlands. E-mail: [email protected]

Abstract. A knowledge graph is a kind of semantic network representing some scientific theory. The paper describes the state of art in this field and addresses a number of problems that has not been solved yet. These are: implicit relations, strength of (causal) relations, and conditions. Concepts might be too broad or complex to be used properly, directions for solving this problem are explored. The solutions are applied to a knowledge graph in the field of labour markets. Key words: knowledge graphs, knowledge representation

1. Introduction For some years labour markets have been a relevant research issue in the program of the research school Inter University Center for Social Science (ICS). Based on results empirical found in the first six Ph.D.-theses of students of this school and on interviews with these students Popping and Strijker (1997) wanted to present the state of art in the field of labour markets. The theses however were on different aspects of labour markets. Therefore the network contained several groups of points connected by few relations. This was an unsatisfying. Therefore the (not empirically tested) theory on labour markets was presented. This theory was presented in a network structure. The concepts were represented as labelled points, and relations between concepts as typed links between the points. Such networks are known as knowledge graphs. Still it remains relevant to show the network based on empirical findings. Only this makes it possible to show growth in later stages when the network is based on more findings. In this text the empirical network is presented and related to the theoretical one. 2. Labour Markets The choice for sociological theories about labour markets as knowledge domain is a pragmatic one. The Interuniversity Center for Social Sciences

242

ROEL POPPING

Theory and Methodology (ICS) is the first official social science research school in the Netherlands. The Ph.D. projects are placed in clusters. One cluster is concerned with organisations and labour markets. The approach to organisations and labour markets is characterised by two specific features (ICS, 1995). Theory formation on both processes is explicitly oriented toward the integration of economic and sociological insights. Organisations and labour markets are seen as phenomena that must be studied as interrelated processes. Five interrelated aspects are being studied 1. The governance structures (including internal labour markets) of organisations; 2. Basic processes of contracting; 3. The inequalities resulting from selection and allocation in organisations; 4. The demand for particular organisations; 5. Search processes in the labour market. The study of labour market research is becoming more and more important because on this market people find the position that will be their major source of income. Here is a coherent research program. Therefore, one can expect that publications by members of this cluster are in some way related. This suggests that the knowledge graphs based on these publications will have several points in common. This is obligatory for the integration of the knowledge graphs. Knowledge graphs on labour markets can serve different goals. First they allow looking for inconsistencies in the different graphs (and so in the theory). One of the greatest benefits of knowledge graphs is that knowledge increases. This is due to integration. Results found by different investigators are integrated in one graph. This allows a greater and more detailed area of research to be represented. Until now such an integration of results of research did not occur. Based on this integration, it is possible that specific relations are found without being actually investigated. Say in one investigation one finds that A causes B, in another it is found that B causes C. Now there are reasons to conclude that A causes C for as far the conditions in both investigations are comparable. Due to an integrated knowledge graph, it is feasible to reduce extra research. An investigator can decide whether specific knowledge in a field is already available and does not have to be investigated again. The decision can also be that the knowledge is not available, or only under specific conditions. The graphs also allow the finding of gaps in the theory. A gap is a missing relation between two points. It might be that the relation has never been investigated. It is also possible that the relation is found not to exist. So far there are no technical tools to distinguish this situation. During link integration a researcher can investigate whether there is really no relation

REPRESENTATION OF DEVELOPMENTS

243

between two points. The investigator can also notify the gaps when a graph is inspected. Now the investigator is wondering why a relation is missing. 3. Knowledge Graphs The knowledge about the labour markets is represented by using knowledge graphs. These graphs can be viewed as a particular kind of semantic network. One essential difference between knowledge graphs and semantic networks is the explicit choice of only a few types of relations (James, 1992: 98). The construction of knowledge graphs starts with the extraction of information from texts. This is called text analysis. The result is a list of concepts, represented as labelled points, and a list of typed links between the points. These form the so-called author graph. A concept is a unit of meaning. It is used as the basic unit for the meaning content of what it refers to (Popping, 2000: 17). The most important type of link between points is the causal relation. The goal of the group of scientists working on knowledge graphs was to construct graphs that represent the theory on a specific subject. The next step is called concept identification. Here the various author graphs are combined into one graph by identifying points with each other. When the texts that were at the basis of the graphs are about the same subject, points with the same label are identified. An author may use synonyms for a concept; therefore points with different labels should be identified. This is done by comparing the neighbourhoods of points to identify the potentially identical pairs. An index has been developed for measuring the similarity between two sets of points. The value this index takes, in combination with a threshold value, can be used to decide upon identification of two concepts. In the same way it is possible to detect points with the same label, but referring to a different content, the so-called homonyms. For example, a chair is something one sits on, but it can also refer to one’s position (e.g., a committee leader or a professor). One of these points should get another label. Now results a compiled graph, which is free of ambiguity of language. This graph is further investigated in procedures called concept integration and link integration. The first procedure tries to find interesting substructures; the second procedure infers new links from the given ones. The result is called the integrated graph. In order to represent the structure of knowledge often a complex relation is necessary: the frame relation. This relation combines a number of concepts and relations that are inseparably connected into a single concept. These concepts and relations together make that the frame functions as it is supposed to. An example of a frame is the measurableness of quality of work. Also a bicycle might be considered as a frame; it consists of the frame, the wheels, the handlebar, and so on. These parts all together make that the bike can

244

ROEL POPPING

work. Concept integration aims at determining those subgraphs that are candidates for contraction into a frame. (Note, the term is used in another meaning than generally in artificial intelligence.) In link integration relations are combined to deduce new relations. If there exist relations between the points A and B as well as between B and C, there may be reasons to infer a relation between A and C. To find these new relations path-algebra (Carr´e, 1979: 84–85) is used. Relations can be based on multiplication for the serial combination, and on addition for the parallel combination. With respect to relations four characteristics are distinguished: directionality, meaning, sign, strength (Carley, 1993; Popping, 2000: 99). In the theory about knowledge graphs the first two have been used so far. All relations are unidirectional ones, and the meaning is denoted by using types, see hereafter. A meaning like ‘is friends with’ is not used, but ‘is a kind of’ is used. The characteristics sign (positive or negative) and strength (usually a value on a 0–1 scale) are up to now not used. Originally the idea was to represent knowledge by using as few semantic relations as possible. First only the types cau, par, and ako were used. The cau relation denotes a cause effect relation (Unstable market positions cause polarisation.). The relation is asymmetric and transitive. In all methods using networks based on text the causal relation is read as might cause. par stands for the is part of relation (Having relations with high status is a part of social capital.), it characterises a thing. ako refers to is a kind of (A married man is a kind of man; a low educated woman is a kind of woman.), here something is exemplified. The latter two relations are transitive and asymmetric. Inverse relations are also distinguished: cby (is caused by), hak (has as kind), and hap (has as part) (Stokman and De Vries, 1988). Also the ass (association) relation seems relevant. This symmetric relation specifies a connection between two concepts, without specifying the type of this relation. During the text analysis process, the ‘translation’ from text to network, solutions have been suggested (Popping, 2003). One such problem is the following. Some relations are so obvious that one forgets to mention them. For example, one of the theses was on women in general, another one was on girls just leaving school and ready to enter the labour market. The concepts ‘woman’ and ‘girl’ were used. For this reason at first no relation between the theses was found. This shows that one needs a network in which concepts like these are related. 4. Findings 4.1. the theses The theses that were at the basis of the knowledge graph are shortly introduced.

REPRESENTATION OF DEVELOPMENTS

245

Thesis 1: A study of the impact of automation on the job structure in firms. The goal is to arrive on conclusions about (a) the impact and the content of changes of job structure in firms after automation, and (b) to find the most prominent determinants of these changes. Thesis 2: Selection and allocation on the labour market. The goal is to explain why different people get different jobs and different incomes. Thesis 3: Labour market positions of women and men having a high professional training for an occupation commonly performed by men. The central question is why the starting position of women having a training generally followed by men is lower than that of men having the same qualifications. Thesis 4: Contacts and career. The central issue of the study is to explain the effects of networks of personal relations on labour market outcomes. Thesis 5: The way in which school leavers from vocational schools perceive their opportunities on the labour market. First it is explained how differences could rise in the way students leaving school, and having the same opportunities, perceive their possibilities on the labour market. Next perception of labour market opportunities is introduced. This perception is related to disposition, i.e., the wish to get a job, the benefits of a job, the judgement of the probability to find a job, and the wish to have a job because of the social environment; Thesis 6: Repression on the labour market, differences in sector and sex. The investigator tried to answer the question of whether and how it is possible to explain differences in effects of ‘crowding out’ between segments of the labour market and between men and women within these segments. The author graphs are based on text. The main problem was to get a text on which the text analysis can be based. The complete thesis is too much. In the summary too often the why of the project and the interpretation of the findings are emphasised. The actual findings get little attention. The authors were asked to write one page in which they present the theoretical starting position, the topic, and the main findings of their research. This did not result in a text that can be converted into a graph. The authors gave most attention to the legitimisation of the research project. Therefore the authors were interviewed, and together a text was composed that could be used. The final author graphs are verified by the authors. The graphs are a good representation of the main findings of the author.

246

ROEL POPPING

In some of the studies part of the results only hold for a specific part from the sample, some results were found for men only, some for women only, or some for employers only and some for employees only. In the presentation hereafter only findings for the whole population are presented, this is the complete working population in the Netherlands. 4.2. the graphs In the text analysis part subject–verb–object (SVO) syntactic relations are encoded as concept1 -link-concept2 relations. The result is a list of concepts, represented as labelled points, and a list of typed links between the points. These form the so-called author graph. The 6 author graphs are combined in the compiled graph. The compiled graphs are integrated in two steps. In case it is found that A is a cause of B, and that B is a cause of C, one will find after the first step in linking that A will be a cause of C. In case it is also known that C is a cause of D, one finds in the second step that A is a cause of D. The compiled graph contains 54 concepts, 57 relations are found between these concepts in the compiled graph, in the integrated graph these are 61 relations. The concepts from the study on the impact of automation are completely isolated from the concepts in the other studies. Two of the other studies contain a lot of conditional relations, i.e., the results only hold for a specific group in the population, for example only for men. In the treatment frame relations are not discussed, concepts are treated as simple indications for what they represent. Also the strength of a relation is not considered. This because it is not clear which quantity should be used for this. 4.3. in- and outdegrees In- and outdegree of a concept show the number of concepts that have a relation into the direction of that concept or just away from that concept. Assume a causal relation. The fact that concept A causes a change in concept B contributes to the indegree of concept B, but also to the outdegree of concept A. Now the fact, that a concept causes a change in the concept under investigation, contributes to the indegree of this concept under investigation. If the concept would cause something in another concept, then there is a contribution to the outdegree. In case the indegree equals zero and one or more outdegrees are originate from a causal relation, the concept might be considered as an independent variable, the concept has effect on other variables. In the reverse situation, where the outdegree is zero and the relations taking care of the indegree are of type cau, the concept might be considered as a dependent variable.

247

REPRESENTATION OF DEVELOPMENTS

Looking at the compiled graph we find the concept wish for a job after training with outdegree 5 and no indegree. Therefore this seems to be a relevant independent variable. The highest indegree and no outdegree is found for function level high (6). This might be a relevant dependent variable. The highest indegree is found for the concepts improvement educ./funct. (a stronger relation between educational level and function level) and underestimation of the labour market (4), but these concepts both have outdegree 1. Based on the integrated graph after two steps of link integration the relevant independent variables seem to be wish job after training (5), bureaucratic industry (3), professional service (3), and sector high price control (3). As candidates for dependent variables we find function level high (6), pushing away (lower educated employees are replaced by higher educated ones) (5), graduate finds job later (5), and demand empl. high educated (4). The pushing away and the graduate finds a job later as dependent variables did gain more connections (both indegree from 2 to 5). The concept empl high educated has a very high indegree (7), but the outdegree is 1. As mentioned before, the compiled graph contains 53 different concepts, 16 of these concepts are on a path. These 16 concepts have both an in- and outdegree > 0. This is before link integration. These concepts are shown in Table 1. The degrees show not that many relations. In the compiled graph the average in- and outdegree is 0.38, in the integrated graph it is 0.43. Table 1. Concepts with in- and outdegree > 0

Ample financ. resources Better reputation org. Better reward Demand at service org.+ Empl. high educated Employee + high product. Financ necessity Improved quality service Improvement educ./funct. Labour market underest. Preference higher job Sector high price contr. Work high price control Works for income Works for status Works if status low

Before linkintegr.

After linkintegr.

In 1 1 1 1 3 1 1 1 4 4 1 3 1 2 1 1

In 2 6 1 4 7 0 0 1 1 0 0 0 0 0 0 0

Out 1 1 1 1 1 2 1 1 1 1 1 3 2 1 1 1

out 1 1 1 1 1 2 1 1 1 1 1 3 3 1 1 1

248

ROEL POPPING

Some concepts that are mentioned before are not in the table. The wish for job after training is linked to concepts indicating whether a fitting job is found or whether this finding went fast. These are all concepts that are not related to other concepts (at least, not in the studies under investigation). Whether the function level is high or should be high is caused by the fact that the employee is a man or a women, eventually married or not, having a high educational training or not. In the present studies no outdegree was found for the function level high. The most relevant information is in the concepts that are on a path. 4.4. representation of the knowledge graph The part of the integrated knowledge graph consisting of the concepts mentioned in Table 1, and extended with the concepts that have indegree ≥3, is presented in Figure 1. The figure contains one association relation (indicated by ==). This is the relation between employee with high productivity and ample financial resources. All other relations are causal relations (indicated by