Proximity loggers - IEEE Xplore

8 downloads 342 Views 792KB Size Report
millions of contact records produced by proximity loggers and programming functions, for use in the statistical programming language R, for performing raw data ...
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Sensors-5755-2011.R1

1

Proximity loggers: data handling and classification for quality Control Nathan S. Watson-Haigh, Christopher J. O‟Neill and Haja N. Kadarmideen 

Abstract—Proximity loggers are a novel biotelemetry device used for quantifying animal-animal interactions in a non-invasive way. Such data has been used for studying a range of interactions from disease spread among badgers and cattle to quantifying cow-calf interactions. Such quantitative behavioural traits could be used for the purpose of selective breeding in domesticated animals. With the use of real data, from the study of oestrus behaviour in cattle populations raised in an extensive grazing system, we have identified poor reciprocal agreement (RA) as a source of variation. To date, RA has not been adequately considered or addressed and can have serious implications for further analyses aimed at correctly quantifying and interpreting social behaviour. We provide a database schema for storing the millions of contact records produced by proximity loggers and programming functions, for use in the statistical programming language R, for performing raw data quality control, data import, database queries and classification of RA between proximity logger pairs. Poor RA leads to a lack of confidence in the data recorded by a pair of loggers. At best, substantial noise is added to the data set and at worst can lead to over or under estimation of contacts between pairs of animals. Over successive deployments, the identification and removal of loggers consistently involved in poor RA pairs can improve the level of agreement and confidence in the recorded data. This is necessity for the accurate estimation of genetic parameters based on proximity logger data. Index Terms—Biotelemetry, R scripts, Reproduction, Social Behaviour, Cattle, Concordance

I. INTRODUCTION

O

bservations and electronic devices have long been used to monitor both wildlife [1], [2] and livestock behaviour [3], [4]. Biotelemetry is animal-attached technology that allows remote sensing and reporting of physiology and behaviour (for a review of biotelemetry see Cooke et al. [5]). Biotelemetry Manuscript received August 30, 2011. This work was supported in part by CSIRO‟s Office of the Chief Executive (OCE) Postdoctoral Fellowship and Sustainable Agriculture Flagship (SAF). N. S. Watson-Haigh, C. J. O‟Neill and H. N. Kadarmideen were with CSIRO Livestock Industries, Townsville, QLD 4810, Australia. N. S. Watson-Haigh is currently with The Australian Wine Research Institute, Waite Precinct, Adelaide, SA 5064, Australia (e-mail: [email protected]). H. N. Kadarmideen is currently with the University of Copenhagen, Section of Genetics and Bioinformatics, Department of Basic Animal and Veterinary Sciences, Faculty of Life Sciences, 1870 Frederiksberg C, Denmark.

Fig. 1. A timeline showing fictional contacts recorded between two loggers (A and B). There is perfect reciprocal agreement between the two loggers as the contacts recorded by logger A perfectly match the contacts recorded by logger B.

data can be used to generate behavioural trait information for use in the genetic analysis of a physiological or behavioural state. For instance, Prayaga et al. [6] showed the genetic basis of bonding behaviour between a cow and her calf which was measured by biotelemetry. The use of biometry devices is timely for the global cattle industries. These industries are facing significant issues of sustainability [7], [8] because of low and/or declining cattle fertility and disease resistance among others [9]. Animal-animal interactions, measured remotely using biotelemetry devices, are becoming increasingly popular as part of biological and genetic investigations [1], [6]. Biological and genetic studies require accurately separating biological or genetic effect of the individual from non-biological/non-genetic effects as well as from systematic measurement errors [10],[11]. Thus it is important that telemetry devices themselves do not contribute to systematic measurement error or noise. Remote-sensing proximity logging devices, utilizing UHF radio waves, have recently been developed by Sirtrack Ltd (Havelock, New Zealand) [1] for the purpose of quantifying animal-animal interactions. They provide insights into different aspects of social behaviour and are particularly useful for terrestrial species which are difficult to observe. A proximity (contact) event occurring between a pair of animals is recorded by both of the loggers worn by the two animals. Under perfect conditions, two proximity loggers that come into range of each other should record precisely the same contact information (Fig. 1), thus exhibiting perfect reciprocal agreement (RA). However, there are many factors inherent with UHF radio wave technology, which cannot be easily isolated, that can affect the variability in RA, including but not limited to: 1) the radiation/receptor pattern of the UHF antenna; 2) the relative orientation and height of the animals wearing the devices; 3) differences in body mass/condition resulting in differences in received signal strength; 4) building, trees, rocky outcrops, fences and troughs reflect and block signals resulting in multipath interference. To date, proximity loggers have been used for the study of contact rates between

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Sensors-5755-2011.R1

2

animals such as raccoons[1], possums[12] and Tasmanian devils[13]; as well as to identify cow-calf pairs[3] and to study the transmission of Bovine tuberculosis among badgers and cattle [14]. The use of proximity loggers presents several problems: 1) handling and accessing the millions of contact records produced by the loggers; 2) how to assess the level of RA in the data; 3) how to improve the RA in the data; 4) what to do with contact data recorded by two loggers when there is poor RA; and 5) how best to use the reciprocal nature of the data to quantify the animal-to-animal interactions. To date there has been insufficient work in all these areas and is hampering attempts to realise the full potential of this technology in agricultural and biological sciences. In this study, we address the first three of these problems by providing the tools to handle and store the millions of contact records produced by proximity loggers, methodologies to assess RA which is then used to identify loggers consistently involved in poor RA (i.e. those which are systematically incorrect). We use real data collected from 85 heifers (young female cattle which have not yet had a calf) on their oestrus behaviour in a rangeland environment, to develop and illustrate proximity logger data handling methods. A suite of R functions is provided with this manuscript that handles proximity logger data. II. METHODS A. Source of Data An experiment was conducted in 2009 to study the social/sexual behaviour of a group of young female cattle in a rangeland production system. As cattle reproduction is associated with changes in the social behaviours of females [15] proximity loggers (Sirtrack Ltd, Havelock, New Zealand) were used to measure those social interactions. The aim was to identify changes in social behaviour linked to the reproductive cycle. In particular, identifying the onset and duration of oestrus. Such traits are difficult to measure [16], especially in extensive environments. Briefly, the experiment was conducted on CSIRO Belmont Research Station, a 3260ha property located on the Fitzroy River north of Rockhampton in Northeastern Australia (S23.224, E150.383). A group of 85 2-year-old maiden heifers, consisting of 2 genotypes, were pastured on two paddock types for 2 seasons. Additional details of the experiment can be found in O‟Neill et al. [17]. B. Proximity Loggers A mix of proximity loggers with memory capacities of 16384 or 32768 records were used. Logger detection distances were set to 5m and randomly attached, by means of a collar, to the neck of an animal at the start of each deployment: 1) 200906-19 to 2009-07-03; 2) 2009-07-03 to 2009-07-20; 3) 200907-20 to 2009-07-30; 4) 2009-10-28 to 2009-11-10; 5) 200911-10 to 2009-11-23 and 6) 2009-11-23 to 2009-12-01. A dedicated laptop was used to program and download data from the loggers to ensure date/time synchronisation with the

Fig. 2. A flowchart showing the steps involved in obtaining data from the proximity loggers, through the identification and removal of erroneous records prior to storing data in the database. Proximity loggers with a high proportion of erroneous records were flagged for maintenance.

loggers was consistent both before and after a deployment. C. Logger Data Files Logger data files are comma-separated text file with 5 fields: Record ID, Encounter ID, Date day/month/(year), Encounter Start Time and Encounter Length. The Record ID is a sequential integer indicating the row (record) number, Encounter ID is the ID of the other logger involved in the contact. Date is the date (possibly including a year) of when the contact started in dd/mm or dd/mm/yy format, Encounter Start Time is the time the encounter was initiated in hh:mm:ss format and Encounter Length is the duration (integer seconds) over which the encounter took place. Records are ordered by the date/time the contact was terminated. Following each deployment, data was downloaded from each logger using the Sirtrack Proximity Logger Admin Tool (v1.1.0.6). D. Identifying Erroneous Records Proximity loggers invariably contained a number of corrupt or invalid records, or the download process failed completely. This led to complete or partial loss of reciprocal data for one or more contact events between logger pairs. Records in logger data files with the following types of errors were ignored and excluded from the database: 1) internal diagnostic records; 2) incorrectly formatted data, including corrupt data shown as „FF‟ for part or all of a data field; 3) dates with a month less than that of the previous record; 4) invalid dates e.g. 00/00 and 34/05; 5) invalid Encounter IDs i.e. IDs of non-existent loggers or loggers not deployed; and 6) dates outside that of the current deployment. Loggers with ≥10% of records containing errors 1-5 or with ≥5% of record containing error 6 were flagged for maintenance (Fig. 2).

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Sensors-5755-2011.R1

3

TABLE I PROXIMITY LOGGERS WITH MANY ERRONEOUS RECORDS Logger ID

1 3 4 7 14 15 18 20 24 25 27 29 31 38 105 111 113 123 131 141 155 157 186 187 188 190 193 194 198

Deployment 1

2

3

4

5

6

. + . . . + + . + . + + .

. .

. . + . . . . . +

. . + + .

. + + . .

+ . . + +

.

+

+

. +

+ +

. . . . . . . + + . . . . + . . +

+ + . . . . + . . . . . + + . . .

. . . . . . + . . + . . + + . . + + . +

+ . + + + . .

. . . + . . . . . . . . .

. + +

. . . . .

. + .

. . .

. . . . . . . . . . .

. . . .

Loggers deployed and found to contain many erroneous records (+); loggers deployed but did not contain many erroneous records (.).

E. Assessment of Reciprocal Agreement We calculated hourly contact duration data for all pairs of loggers using contact data from the database. One second data is ignored as they are predominantly phantom contacts [1]. For each logger pair, a 5 day window was used to calculate the level of RA. This window started at 00:00 on the 2nd day of the deployment and finished at 23:59 on the 6th day of the deployment. The first and last day of a deployment was not used as animals were artificially within close proximity to each other in the cattle yards. A 5 day window was chosen to provide adequate data for most logger pairs while avoiding situations where loggers may have failed or filled their memories mid-deployment. Using data from the 5 day window, we calculated the concordance correlation coefficient (CCC) [18] using all data points between pairs of loggers (CCCA). Each logger pair with sufficient data (≥4 data points) to calculate CCCA was classified into one of three classes: top 25th percentile (H), bottom 25th percentile (L) or in between (M). In order to

Fig. 3. An entity relationship diagram (ERD) of the logger database. Animal, logger, paddock and deployment data are stored in the animal, logger, paddock and deployment tables respectively. Individual contacts recorded by proximity loggers are stored in the contact table. The manyto-many relationships that exist between animals, loggers, paddocks and deployments are resolved using the intermediate association table animal_deployment. The reciprocal contact data between pairs of animals are implemented as two 1:m relationships between the animal_deployment and contact tables.

ensure a small number of data points were not overly influencing the CCC, the CCC was recalculated with influential data points excluded (CCCI). Influential data points were identified using the influence.measures() function of the stats package in R [19] which uses: 1) DFBETAS; 2) DFFITS; 3) covariance ratios; 4) Cook‟s distance and 5) diagonal elements of the hat matrix to identify influential data points in a linear regression setting. The logger pair was also classified into one of the three classes (H, L and M) using the CCCI. Loggers with ≥20% of their pairs classified as L for both CCCA and CCCI (LL) were flagged for maintenance. III. RESULTS A. Identifying Erroneous Records We expected to have a total of 440 logger data files from across all 6 deployments. We retrieved only 421 logger data files, which could be correctly linked back to loggers actually deployed. Some data files were absent due to failure of the download process or did not contain any records. In the majority of these cases, it was found to be a broken battery wire that was the cause of absent data files. Some data files were excluded as the logger IDs (stored electronically on the logger) did not match those that were deployed. A total of 4.3 million contact records were processes from across the 6 deployments. The vast majority of logger data files (316, 75%) contained only a small number (≤5%) of erroneous records. A small number of loggers (29, 7%) were responsible for 44 data files with a large number (≥10%) of erroneous records (Table

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Sensors-5755-2011.R1

Fig. 4. A scatter plot of the concordance correlation coefficients, calculated with (CCCA) and without (CCCI) influential data points for the 1467 logger pairs in deployment 1. Histograms above and to the right show the distributions of CCCA and CCCI respectively. Logger pairs were classified into 1 of 3 classes (H, L and M) for CCCA and CCCI, using percentiles. The upper (green sold lines) and lower (red dotted lines) 25 th percentiles are shown with their corresponding values. Loggers that were consistently involved in pairs classified as HH or LL were identified as being good and bad loggers respectively.

I). In total 0.4 million erroneous records were identified leaving the remaining 3.9 million records to be inserted into a database. B. Database and R Functions A MySQL relational database (DB) was created to store the 3.9 million contact records and associated data about loggers, deployments, animals and paddocks (Fig. 3). An SQL script for generating this DB is provided in supplemental A. R functions used for importing, retrieving and processing contact data from the DB using the Open Database Connectivity (ODBC) interface, as well as for assessing RA are provided in supplemental B and C. C. Assessment of Reciprocal Agreement There was at least one contact recorded for 1728, 1917, 2092, 942, 811 and 1168 logger pairs for deployments 1-6 respectively. However, for some pairs there were insufficient data points within the 5 day window over which RA was measured. As a result, RA was measured for 1467 (85%), 1297 (68%), 1344 (64%), 664 (70%), 557 (71%) and 566 (48%) logger pairs for deployments 1-6 respectively. Fig. 4 shows a plot of CCCI against CCCA for the 1467 logger pairs from deployment 1 together with the upper and lower 25th percentiles for CCCA and CCCI used to classify loggers into one of the three classes. Together the classification of CCCA and CCCI results in nine possible classes as indicated by the

4 discrete rectangular regions in Fig. 4. Similar plots were obtained for the other deployments but with slightly different upper and lower 25th percentiles. The lower 25th percentile for CCCA was 0.275, 0.188, 0.217, 0.158, 0.191 and 0.145 for deployments 1-6 respectively. The lower 25th percentile for CCCI was 0.155, 0.085, 0.118, 0.082, 0.087 and 0.071 for deployments 1-6 respectively. The upper 25th percentile for CCCA was 0.862, 0.797, 0.814, 0.77, 0.804 and 0.771 for deployments 1-6 respectively. The upper 25th percentile for CCCI was 0.741, 0.667, 0.708, 0.601, 0.637 and 0.646 for deployments 1-6 respectively. We identified a number of loggers consistently involved in poor RA (LL classified) pairs: 23, 16, 15, 10, 12 and 5 from deployments 1-6 respectively. This represents between 1029% of classified loggers from a deployment being flagged for maintenance. The following were identified as poor RA loggers in ≥2 deployments and ≥50% of the deployments in which they were used: 149 (3 of 3 deployments), 26 (5 of 6 deployments), 27 (5 of 6 deployments), 39 (5 of 6 deployments), 33 (4 of 5 deployments), 9 (2 of 3 deployments), 23 (2 of 3 deployments), 24 (2 of 3 deployments), 101 (4 of 6 deployments), 153 (2 of 3 deployments) and 189 (3 of 6 deployments). IV. DISCUSSION A. Identifying Erroneous Records Prange et al. [1] evaluated the performance of proximity loggers using both laboratory tests and field tests on 42 freeranging raccoons. During their field trials, approximately 40% of loggers experienced problems which would have resulted in complete or partial loss of reciprocal data. They found that some logger function was lost in the first few months of their deployment, but stabilised thereafter. They comment that these failure rates were similar to other studies using proximity loggers [12] and were comparable to studies using GPS collars (although the GPS units used are now 20 years old) [20]. We identified loggers which consistently had a high proportion of erroneous records. This is a first step to improving the number of correctly functioning loggers over multiple deployments and thereby improving RA. This is achieved by providing the practitioner with the opportunity to exclude such loggers from the study until they have been repaired or shown to be one-off errors e.g. due to electrical interference from lightning storms, electric fences etc. At the end of the 6 deployments, it can be seen that loggers 4, 24 and 190 were flagged as containing a significant proportion of erroneous records in three consecutive deployments (Table I). In addition, loggers 27 and 131 were flagged in ≥50% of the deployments in which they were used. Together, these five loggers should be treated with suspicion and undergo maintenance as they are likely contributing to significant loss of data. However, in practice, the researcher would make such assessments of the loggers following each deployment with the aim of identifying and fixing problematic loggers before they are redeployed.

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Sensors-5755-2011.R1

5 A B

Completely Overlapping Partially Overlapping

Fig. 5. A timeline showing fictional contacts recorded between two loggers (A and B). Completely and partially overlapping pseudo-data sets are shown below. Logger B is detecting the UHF radio signal sent from logger A at a greater distance than the UHF radio signal sent from logger B and received by logger A. As such, logger B is consistently recording fewer contacts of longer duration than logger A.

Fig. 5. Time series plots (a-c) and hourly contact data plots (d-f) for three pairs of loggers from deployment 1. Shaded areas in the time series plots indicate the 5 day window over which reciprocal agreement was measured. Hourly contact data plots show the level of concordance in data recorded between two loggers and shows the line of equality (dashed green line) on which we expect our data to fall, and the regression lines (solid and dashed red lines) and concordance correlation coefficients calculated using all data points (CCCA) and following the exclusion of influential data points ( ; CCCI). Logger pairs 117/100 (a and d), 17/198 (b and e) and 145/198 (c and f) were subsequently classified as HH, LL and HH respectively.

B. Assessment of Reciprocal Agreement Prange et al. [1] used mean daily differences and standardised daily differences to conclude that RA was good between pairs of loggers. However this observation was based on only six pairs of loggers from the 42 loggers used in the experiment. This represents only 0.7% of all possible pair-wise combinations (42C2 = 861) and thus should not be used to draw conclusions about the level of RA for other logger pairs. A more recent study [3] avoided the reciprocal nature of logger data by using data from only one collar in every pair. This was done based on the assumption that a strong Pearson correlation of the number or duration of contacts recorded between each pair of loggers meant good RA. However, the use of the Pearson correlation as an indicator for the level of

agreement has a long history of misuse and can be misleading if used as such [21], [22]. There is a difference between “agreement” and “correlation” and the two are not necessarily the same [18], [23]. Some studies did not attempt to reconcile the reciprocal data, instead opted to calculate gross animal contact metrics, such as the number and duration of contacts per 24-h for each logger, rather than quantifying individual animal-to-animal contacts [12]. Hamede et al. [13] noted that there was disagreement between the number and duration of contacts recorded between logger pairs. They attempted to address the situation by calculating a pseudo-contact data set for each logger pair by calculating partially overlapping contacts. This is where a contact is said to be occurring when one or both of the loggers involved are recording a contact at a given time

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Sensors-5755-2011.R1 (Fig. 6). However, they did not consider the source of the disagreement and how it may affect the calculation of partially overlapping contacts. Although we are unaware of its use, a similar approach to calculating partially overlapping contacts would be the calculation of completely overlapping contacts. This is where a contact is said to be occurring if, and only if, both loggers are recording a contact at a given time (Fig. 6). Both approaches, completely and partially overlapping contacts, will suffer from similar issues if the source of poor RA is not known or understood for a given logger pair. A lack of date/time synchronisation, different detection distances or a lack of free memory in a logger would systematically bias the completely or partially overlapping contacts. Fig. 6 provides a simple fictional example of two loggers (A and B) where logger B is consistently recording longer duration contacts against logger A compared to the reverse. If, for example, the contacts recorded by A are known to be correct, then the partially overlapping contacts will overestimate the true duration of contacts between the pair, while the completely overlapping contacts would be correct. However, in an extensive rangeland where contacts are not observed it is not possible to know which logger‟s data is correct. Such approaches will clearly affect the conclusions drawn from the data unless the source of poor RA is known. We used the concordance correlation coefficient (CCC) to measure the level of agreement in hourly contact duration between pairs of loggers. Like the Pearson correlation coefficient, CCC takes values in the interval [-1,1] with 1 being complete agreement, 0 being no agreement and -1 being strong disagreement. Under ideal conditions, we expect the CCC calculated between all pairs to tend towards 1 (i.e. perfect agreement). We found the distribution of CCC‟s to be bimodal in all 6 deployments (e.g. Fig. 4), with peaks around 1 and 0. Most CCC‟s fell in the interval [0,1] with the lower 25th percentiles being slightly larger than 0, meaning there are logger pairs showing no agreement. The origin of this disagreement could have many sources including systematic errors in the logger (e.g. a fault or incorrectly setup), differences in the radiation/receptor pattern of the UHF antenna, differences in the relative orientation and height of the animals wearing the devices, differences in body mass/condition resulting in differences in received signal strength and objects in the environment such as building, trees, rocky outcrops, fences and troughs which may reflect or block signals resulting in multipath interference. However, the systematic errors due to faulty or poorly configured loggers will result in systematic biases when evaluating contacts between animals in downstream analyses. These types of loggers are easily identified as those which have consistently poor RA with most other loggers. Knowing which loggers have consistently poor (or good) RA could be used to provide a level of confidence in the data they have recorded and may also be used to obtain a better estimate of the true contacts occurring between pairs of animals. There was a general tendency for CCCI to be less than

6

Fig. 6. A timeline showing fictional contacts recorded between two loggers (A and B) and the calculated completely and partially overlapping pseudodata sets. Logger B is detecting the UHF radio signal sent from logger A at a greater distance than the UHF radio signal sent from logger B and received by logger A. As such, logger B is consistently recording fewer contacts of longer duration than logger A.

CCCA meaning that outlying data points were artificially driving linear regressions closer to perfect concordance, the y  x line. Using both CCCI and CCCA to classify the RA of logger pairs helped to reduce the chance of misclassifying poor RA loggers as good RA loggers by identifying loggers consistently involved in LL classified pairs. Likewise, loggers consistently involved in HH classified pairs can be deemed good loggers. Fig. 5a and 5b shows an HH and LL classified pair of loggers respectively; a clear difference can be seen between the data recorded by the logger pairs. Although out of the scope of the current work, the creation and assessment of neural network approaches [24], [25] to logger classification would be an important next step. Exactly what variables could be used as input to these approaches would have to be determined. Most loggers had not reached their memory capacity by the end of the deployment, however for the longer deployments (≥12 days in length) there were an increased number of loggers which had reached their memory capacity. Fig. 5c shows a HH classified pair of loggers in which logger 198 did not record any contacts with logger 145 after 22:00 on 2009-06-23, but logger 145 continued to record contacts against logger 198 until 07:00 on 2009-07-01. On close inspection of the data it was clear that logger 145 had filled all its available memory by 2009-07-01 and could not record further contacts. Logger 198 stopped recording contacts, for an unknown reason, at around 8200 records. For contacts that do not have reciprocated data over a period of time, either due to logger failure or lack of memory, the result is the same: we are unsure if the contacts recorded over that time are accurate. This situation is exacerbated in studies involving wild animals where loggers may become damaged (e.g. due to road kill) [1], may be retrieved less frequently and thus may have filled their memory or have used loggers with different memory capacities. The easiest solution would be to simply exclude contacts taking place over a time period in which the data is not reciprocated. However, in the case of logger pair 145/198 (Fig. 5c) we have strong RA over the 5 day window taken at the start of the deployment. So this may be used to give confidence in the non-reciprocated data for this pair. V. IMPLICATIONS The data quality control and handling tools developed here would be particularly useful in animal social behavioural and

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Sensors-5755-2011.R1

7

related sciences. For instance, the accurate detection of female oestrus in cattle remains a major concern for the industries [15], [26], particularly for those dealing with cattle in extensive rangelands of Australia and the Americas. This is because it is simply impossible to monitor animals in the vast open areas of rangelands (often 5000-25000 sq km). Hence, there has been little work on the genetic parameters for sexual/reproductive behaviour in cattle. The estrus behaviour can now be captured by proximity loggers fitted to the necks of the free ranging animals for the entire duration they are out grazing. Such large scale remote phenotyping can then open up the possibility of estimating heritabilities and breeding values for social and estrus behaviour in the same way as for fertility, health and production traits in animals (e.g. [27], [28]) and eventually incorporating social behavioural breeding values in genetic selection programs [15], [26]. VI. CONCLUSION Proximity loggers are becoming an increasingly popular technology for quantifying animal-to-animal interactions and provide an overwhelming amount of data for current practitioners. As far as we are aware, this is a first attempt to quantify the level of reciprocal agreement in data generated from such technology. As such, we provide a database model and application functions written in the R statistical programming language to stimulate further activity and research into the quality control of proximity logger data. In their current form, these tools allow the proximity logger practitioners (such as ethologists, biologists and geneticists) to handle these huge data sets and to assess logger performance in order to identify those in need of maintenance. This approach will result in the continual improvement of RA and thus confidence in the data being recorded between logger pairs. How best to use RA, other than identifying faulty loggers, still remains an active area of research and depends on the analytical approaches used downstream. REFERENCES [1] S. Prange, T. Jordan, C. Hunter, and S. D. Gehrt, “New Radiocollars for the Detection of Proximity among Individuals,” Wildlife Society Bulletin, vol. 34, no. 5, pp. 1333-1344, Dec. 2006. [2] S. A. Altmann and J. Altmann, “The transformation of behaviour field studies,” Animal Behaviour, vol. 65, no. 3, pp. 413-423, Mar. 2003. [3] D. L. Swain and G. J. Bishop-Hurley, “Using contact logging devices to explore animal affiliations: Quantifying cow-calf interactions,” Applied Animal Behaviour Science, vol. 102, no. 1-2, pp. 1-11, Jan. 2007. [4] D. O. Rae, P. J. Chenoweth, M. A. Giangreco, P. W. Dixon, and F. L. Bennett, “Assessment of estrus detection by visual observation and electronic detection methods and characterization of factors associated with estrus and pregnancy in beef heifers,” Theriogenology, vol. 51, no. 6, pp. 1121-1132, Apr. 1999. [5] S. J. Cooke et al., “Biotelemetry: a mechanistic approach to ecology,” Trends in Ecology & Evolution, vol. 19, no. 6, pp. 334-343, Jun. 2004. [6] K. C. Prayaga, J. M. Henshall, D. L. Swain, and A. R. Gilmour, “Estimation of maternal variance components considering cow-calf contacts under extensive pastoral systems,” J. Anim Sci., vol. 86, no. 5, pp. 1081-1088, May. 2008. [7] W. M. Rauw, E. Kanis, E. N. Noordhuizen-Stassen, and F. J. Grommers, “Undesirable side effects of selection for high production efficiency in farm

animals: a review,” Livestock Production Science, vol. 56, no. 1, pp. 15-33, Oct. 1998. [8] J. A. Maas, P. C. Garnsworthy, and A. P. F. Flint, “Modelling responses to nutritional, endocrine and genetic strategies to increase fertility in the UK dairy herd,” The Veterinary Journal, vol. 180, no. 3, pp. 356-362, Jun. 2009. [9] C. J. O‟Neill, D. L. Swain, and H. N. Kadarmideen, “Evolutionary process of Bos taurus cattle in favourable versus unfavourable environments and its implications for genetic selection,” Evolutionary Applications, vol. 3, no. 5-6, pp. 422-433, 2010. [10] M. McGue, “The End of Behavioral Genetics?,” Behavior Genetics, vol. 40, no. 3, pp. 284-296, May. 2010. [11] D. S. Falconer and T. F. C. Mackay, Introduction to quantitative genetics. Longman, 1996. [12] W. Ji, P. C. L. White, and M. N. Clout, “Contact rates between possums revealed by proximity data loggers,” Journal of Applied Ecology, vol. 42, no. 3, pp. 595-604, 2005. [13] R. K. Hamede, J. Bashford, H. McCallum, and M. Jones, “Contact networks in a wild Tasmanian devil (Sarcophilus harrisii) population: using social network analysis to reveal seasonal variability in social behaviour and its implications for transmission of devil facial tumour disease,” Ecology Letters, vol. 12, no. 11, pp. 1147-1157, 2009. [14] M. Böhm, M. R. Hutchings, and P. C. L. White, “Contact Networks in a Wildlife-Livestock Host Community: Identifying High-Risk Individuals in the Transmission of Bovine TB among Badgers and Cattle,” PLoS ONE, vol. 4, no. 4, 2009. [15] J. Roelofs, F. López-Gatius, R. H. F. Hunter, F. J. C. M. van Eerdenburg, and C. Hanzen, “When is a cow in estrus? Clinical and practical aspects,” Theriogenology, vol. 74, no. 3, pp. 327-344, 2010. [16] A. Orihuela, “Some factors affecting the behavioural manifestation of oestrus in cattle: a review,” Applied Animal Behaviour Science, vol. 70, no. 1, pp. 1-16, Nov. 2000. [17] C. J. O‟Neill, S. Goodswen, N. S. Watson-Haigh, P. Williams, and H. N. Kadarmideen, “The Use of Contact Loggers to Study Social and Oestrus Activity in Brahman and Composite Beef Heifers under Field Conditions,” in Proceedings of the Australian Society of Animal Production, 2010, vol. 28. [18] L. I.-K. Lin, “A Concordance Correlation Coefficient to Evaluate Reproducibility,” Biometrics, vol. 45, no. 1, pp. 255-268, Mar. 1989. [19] R Development Core Team, R: A Language and Environment for Statistical Computing. Vienna, Austria: , 2009. [20] R. G. D‟Eon and R. Serrouya, “Mule deer seasonal movements and multiscale resource selection using global positioning system radiotelemetry,” Journal of Mammalogy, vol. 86, no. 4, pp. 736-744, Aug. 2005. [21] V. Bewick, L. Cheek, and J. Ball, “Statistics review 7: Correlation and regression,” Critical Care (London, England), vol. 7, no. 6, pp. 451-459, Dec. 2003. [22] A. M. Porter, “Misuse of correlation and regression in three medical journals.,” Journal of the Royal Society of Medicine, vol. 92, no. 3, pp. 123128, Mar. 1999. [23] M. J. Bland and D. Altman, “Statistical methods for assessing agreement between two methods of clinical measurement,” The Lancet, vol. 327, no. 8476, pp. 307-310, Feb. 1986. [24] G. P. Zhang, “Neural networks for classification: a survey,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 30, no. 4, pp. 451-462, 2000. [25] S. B. Kotsiantis, “Supervised Machine Learning: A Review of Classification Techniques,” Informatica, vol. 31, no. 3, pp. 249-268, 2007. [26] K. A. Weigel, “Prospects for improving reproductive performance through genetic selection,” Animal Reproduction Science, vol. 96, no. 3-4, pp. 323-330, Dec. 2006. [27] H. N. Kadarmideen, R. Thompson, M. P. Coffey, and M. A. Kossaibati, “Genetic parameters and evaluations from single- and multipletrait analysis of dairy cow fertility and milk production,” Livestock Production Science, vol. 81, no. 2-3, pp. 183-195, Jun. 2003. [28] H. N. Kadarmideen, D. Schworer, H. Ilahi, M. Malek, and A. Hofer, “Genetics of osteochondral disease and its relationship with meat quality and quantity, growth, and feed conversion traits in pigs,” J. Anim Sci., vol. 82, no. 11, pp. 3118-3127, Nov. 2004.

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].