Comparison of linkage disequilibrium levels in Iranian indigenous cattle using whole genome SNPs data

Background Knowledge of linkage disequilibrium (LD) levels among different populations can be used to detect genetic diversity and to investigate the historical changes in population sizes. Availability of large numbers of SNP through new sequencing technologies has provided opportunities for extensive researches in quantifying LD patterns in cattle breeds. The aim of this study was to compare the extent of linkage disequilibrium among Iranian cattle breeds using high density SNP genotyping data. Results A total of 70 samples, representing seven Iranian indigenous cattle breeds, were genotyped for 777962 SNPs. The average values of LD based on the r2 criterion were computed by grouping all syntenic SNP pairwises for inter-marker distances from 0 Kb up to 1 Mb using three distance sets. Average r2 above 0.3 was observed at distances less than 30 Kb for Sistani and Kermani, 20 Kb for Najdi, Taleshi, Kurdi and Sarabi, and 10 Kb for Mazandarani. The LD levels were considerably different among the Iranian cattle breeds and the difference in LD extent was more detectable between the studied breeds at longer distances. Lower level of LD was observed for Mazandarani breed as compared to other breeds indicating larger ancestral population size in this breed. Kermani breed continued to have more slowly LD decay than all of the other breeds after 3 Kb distances. More slowly LD decay was observed in Kurdi and Sarabi breeds at larger distances (>100 Kb) showing that population decline has been more intense in more recent generations for these populations. Conclusions A wide genetic diversity and different historical background were well reflected in the LD levels among Iranian cattle breeds. More LD fluctuation was observed in the shorter distances (less than 10 Kb) in different cattle populations. Despite of the sample size effects, High LD levels found in this study were in accordance with the presence of inbreeding and population decline in Iranian cattle breeds.


Background
Linkage disequilibrium (LD) is defined as the nonrandom association of alleles at different loci within a population. Pattern of LD within a population can be affected by several factors including selection, mutation rate, migration, genetic drift, population structure and recombination rates [1]. Detection of genomic regions under selection pressures [2], exploring the genetic basis of economically important traits [3] and diversity between cattle breeds [4] can be investigated using comparison of LD maps. In recent years, genomic selection was successfully implemented in dairy cattle and is being developed to other livestock species. The basis of the genomic selection is the existence of LD between causative variants and genetic markers [5,6]. Hence, efficient implementation of this method and delivery of accurate genomic predictions depend on the extent of linkage disequilibrium within a population [7]. Moreover, efficiency of some other routine studies applied to animal breeding such as genome-wide association studies (GWAS), genomic marker imputation, marker assisted selection (MAS), quantitative trait loci (QTL) mapping and parentage testing are impressed by levels of LD in studied populations [4,8].
Depending on the population and the threshold used to measure LD, the average extent of LD is highly variable in different studies. Compared to human studies, likely due to smaller effective population size and intensive selection pressure, higher levels of LD have been found in livestock species [9][10][11]. Although, several studies have been conducted regarding linkage disequilibrium in cattle populations based on microsatellite markers [12,13], availability of large numbers of SNP through new sequencing technologies has provided opportunities for extensive researches in quantifying LD patterns in cattle breeds [14][15][16]. Extensive LD along with different patterns on each chromosome observed in different cattle breeds, have confirmed that the LD maps can be used to characterize the cattle populations. Furthermore, since LD decays as a function of the number of generations, LD data have been frequently applied to estimate Ne at any particular time in the past in the cattle populations [17,18]. Despite of particular LD characteristics of each population, results of these studies have been to some extent affected by some factors such as sample size, minor allele frequency (MAF) thresholds, density of SNP panels and distance between markers [19].
Cattle domestication and raising have an historical origin in Iran. Iranian indigenous cattle have been keeping in different geographical regions of the country and have been adapted with various environment conditions. Some important traits including resistance to local diseases and parasites (such as Theileriosis, Babesiosis and intestinal Nematodes), adaptation to low quality feed resources and heat tolerance were attributed to these breeds. In recent years, high dense SNP data has been widely applying as the standard tools in LD analysis of livestock populations. However, the level of LD has not been previously investigated in the Iranian indigenous cattle. Therefore, the objective of this study was to compare the LD levels among different Iranian cattle populations based on a high density SNP data set. Knowledge of difference in LD levels of these populations can help to detect diversity between cattle breeds and to investigate the historical changes in population sizes. Moreover, the applicability of the modern genomic technologies such as genomic selection and genome-wide association studies can be compared between different populations using the LD data.

Samples collection and genotyping
For this study, ten samples per each breed were collected from 70 individuals representing seven Iranian indigenous cattle breeds. Unrelated individuals were selected where possible based either on pedigree or farmers information.
Animals used in this study were included the three main cattle types: Bos taurus breeds (Sarabi and Kurdi), Bos indicus (Sistani) and composite cattle (Taleshi, Mazandarani, Kermani and Najdi). Samples were genotyped using the BovineHD SNP chip (Illumina, Inc, San Diego, CA, USA) designed to genotype 777,962 SNPs.

Quality control and minor allele frequency distribution
Quality control (QC) were performed using PLINK 1.07 software [20]. The SNPs located on X, Y and mitochondrial chromosomes (39367, 1224 and 343 SNPs, respectively) were excluded from the data set. The whole autosomal genome included 2612.82 Mb and the lengths of autosomal chromosomes ranged from 160.88 Mb (BTA1) to 43.08 Mb (BTA25). SNPs with MAF higher than 0.05 and with call rates of 90% or greater both by locus and by animal were selected. Also, SNPs deviating from Hardy-Weinberg equilibrium (HWE) at a p-value < 10 −7 were removed from data set. MAF was calculated using PLINK for all autosomal SNPs and the distribution of the allelic frequencies was graphed as the proportion of the SNPs represented in 6 different categories of MAF: <0.05, ≥0.05 to <0.1, ≥0.1 to <0.2, ≥0.2 to <0.3, ≥0.3 to <0.4 and ≥0.4 to ≤0.5.

Estimation of linkage disequilibrium
Among several proposed measures to estimate LD in a population, D′ [21] and r 2 [22] are two statistic parameters widely used to measure the extent of LD. The r 2 has been known as a more robust statistic due to less sensitivity to sample size and allele frequency differences [23]. The LD between two SNPs was evaluated using r 2 defined as the correlation coefficient between SNP pairs, based on the following equation [24]: In the above equation, freq A, freq a, freq B and freq b are the frequencies of alleles A, a, B and b, respectively, and freq AB,freq ab, freq aB and freq Ab are the frequencies of the haplotypes AB, ab, aB and Ab in the population, respectively. The measures of LD (r 2 ) were calculated for all marker pairs of each chromosome (syntenic SNPs) using the SnppldHD software (Sargolzaei, M., University of Guelph, Canada). The r 2 calculation was limited to the SNPs within the maximum distances of 15 Mb from each other. A sample size correction was performed on all of the computed r 2 values using the below equation [25]: where, n is the number of haplotypes in the sample.
Average r 2 between all adjacent SNPs was calculated for each breed. Maximum distances between syntenic SNP pairs was categorized as ≤10Kb, ≤100Kb, and ≤1Mb distances and for each distance category, SNP comparisons were binned using bin sizes of 1 Kb, 10 Kb, and 100 Kb, respectively. The mean r 2 was computed for each bin in whole autosomal chromosomes. The mean r 2 in each distance bin was plotted against the median of the distance bin range (Kb).

Results
A total of 70 Iranian native cattle were genotyped for 777962 SNPs. On average, 166742 SNPs remained after quality control, and SNPs had an overall MAF mean of 0.221. Table 1 represented the number of SNPs remained after quality control, overall mean of MAF, average SNP interval (Kb) and total SNP pairwise comparisons in each breed. Figure 1a is given the distribution of the allelic frequencies in different breeds. The highest proportion of SNPs having a MAF less than 0.2 was observed in Sistani breed (58.1 %) while the lowest level of this proportion (39.2 %) was in Kurdi breed. Average proportions of SNPs with MAF < 0.2 in taurine, indicine and composite breeds were equal to 40, 58.1 and 43.8 %, respectively (Fig. 1b).
Average r 2 between adjacent markers was estimated for each chromosome within different breeds ( Table 2). The pattern of LD was significantly different among various chromosomes in each breed. The highest average r 2 between adjacent SNP was in the Sistani breed (r 2 = 0.393) while the lowest average r 2 was observed in Kurdi aniamls (r 2 = 0.321). LD means at various intervals were computed by grouping all syntenic SNP combinations using bins of 1 Kb (intervals spanning up to 10 Kb), 10 Kb (intervals spanning 10 Kb to 100 Kb) and 100 Kb (intervals spanning 100 Kb to 1Mb) in whole autosomal chromosomes. Table 3 was represented the statistical information for average r 2 as distance between SNP pairs up to 500 Kb in each breed. Decay of LD for SNP pairs, categorized in the three distance sets, was represented by the average r 2 of consecutive bins in Fig. 2 The average r 2 declines with increasing physical distance between markers in all breeds. However, the degree of fluctuations was different between various populations. Mazandarani cattle had the lowest average LD among other breeds for all the studied distances between markers. Over very short distances between markers (<3 Kb), Kurdi, Taleshi, Sarabi and Sistani breeds had the higher average LD (r 2 > 0.5) and the Kurdi breed had the highest average LD (r 2 > 0.53) among others. However, LD decayed faster in Kurdi, Sarabi and Taleshi breeds than Sistani breed in the larger distances between markers (see Fig. 2). While Kermani breed had a lower average r 2 than most other breeds in the distances less than 3 Kb, LD decay was very slow in this breed such that the highest LD levels were observed in Kermani breed for higher than 100 Kb distances (see Table 3 and Fig. 2). The average r 2 were different among various autosomal chromosomes in each breed (Fig. 3). Higher LD values were found for BTA5 (r 2 = 0.21), BTA19 (r 2 = 0.24), BTA28 (r 2 = 0.21), BTA11 (r 2 = 0.14), BTA14 (r 2 = 0.12), BTA26 (r 2 = 0.14) and BTA25 (r 2 = 0.2) in Sistani, Kermani, Najdi, Taleshi, Mazandarani, Kurdi and Sarabi breeds, respectively. This may be attributed to different selection criteria in each breed that have influenced the particular QTLs on different chromosomes.

Discussion
The LD maps can be used to explore the diversity between cattle breeds with different evolutionary history. In order to compare the extent of LD in the Iranian cattle genome, we analyzed SNP genotyping data belonged to seven cattle breeds. The observed mean r 2 values were significantly different between Iranian cattle populations. This can be reflected different population history, selection pressures and inbreeding levels in each breed. The average r 2 for SNPs less than 1 Kb apart was found to be equal to 0.579 (Kurdi), 0.546 (Taleshi), 0.537 (Sarabi), 0.527 (Sistani), 0.52 (Najdi), 0.501 (Mazandarani) and 0.499 (Kermani) in this study. Furthermore, the mean r 2 ranged from 0.387 (Mazandarani) to 0.435 (Kurdi) for the SNPs less than 10 kb apart. The LD  [19]. These results confirmed that the logical ranges of average r 2 were obtained for the two first studied distance sets. The mean r 2 declined more slowly with increasing physical distances between markers for distances larger than 100 Kb and was almost constant after 500 Kb of distance. After 100 Kb up to 1 Mb, average r 2 ranged from 0.11 to 0.216 among various defined bins in Iranian cattle LD data. Comparing to the other studies conducted on indigenous Swiss cattle (0.06-0.14) [18], Australian Holstein-Friesian (0.057-0.108) [27], Canchim (0.07-0.1) [29] and Chinese Simmental (0.05-0.08) [19], it appears that r 2 values were overestimated in this study for larger distances (>100 Kb). Khatkar et al. [2008] [27] pointed out that the studies involved relatively small sample sizes are subject to bias and loss of accuracy and this bias may vary with inter-marker distance. Certainly, small sample sizes have influenced the r 2 values obtained for Iranian cattle breeds in this study. However, r 2 values estimated at the shorter distances, have more reliability and can be used to compare LD levels. The main idea of this study was based on representation a general picture to compare the LD levels between Iranian cattle breeds. However, any comparison with other studies should be conservative. LD levels have been generally influenced by factors such as sample size, MAF thresholds, density of SNP panels and distance markers among different studies [19,30]. Despite the sample size bias, it would be reasonable to expect extensive LD in Iranian cattle populations. Iranian indigenous cattle breeds have been included small populations that were exposed to serious extinction risk in recent years. Population decline, increasing inbreeding and uncontrolled crossbreeding are of concerns, and it should be given more attention to conserve these genetic resources.
Much more LD fluctuations were observed in the distances less than 10 Kb among different breeds. After 10 Kb apart between markers, LD decay had the similar trend in most of the breeds. Kermani breed continued to have more slowly LD decay than all of the other breeds after 3 Kb distances. It appears that population decline has been initiated earlier in Kermnai than other breeds. More rapid LD decay was observed in Kurdi breed for distances less than 100 Kb than other breeds. However, LD decay has been slower in Kurdi breed for the larger distances indicating the rising trend of population decline in more recent generations for this population. It appears that Sarabi breed has also had more intense population decline in recent times. Among the studied breeds, Mazandarani had the lowest LD level in different intervals of genome which could be an indicator of larger ancestral population in this breed. Based on both phenotype characteristics and genetic structure analysis, Sistani breed has more indicine genetic background [31].
Results of the previous studies [15,16] indicated that LD levels were less in indicine breeds. However, in this study, Sistani breed (as a indicine breed) had higher LD level among other studied breeds. This can be attributed to historical smaller effective population sizes [25] or a higher ancestral relatedness [32] in Sistani breed. Extensive LD variability was observed among different chromosomes that can probably be evidence on varying recombination rates, selection effects and genetic drift between chromosomes [14]. Meuwissen et al. [5] suggested that the LD levels should be above 0.2 to achieve an accuracy of 0.85 for genomic breeding values. Useful LD to give sufficient power for genome-wide-association studies (GWAS) have been suggested to be above 0.3 [1,33].
Average distance between markers ranged from 11.1 Kb (Mazandarani) to 18.2 Kb (Kurdi) among Iranian cattle populations in this study. Average r2 above 0.2 was given at distances less than 200 Kb (Sistani and Kermani), 100 Kb (Najdi), 70 Kb (Taleshi and Sarabi), 60 Kb (Kurdi) and 50 Kb (Mazandarani) in our study. In other hand, the average r2 above 0.3 was observed at distances less than 30 Kb for Sistani and Kermani, 20 Kb for Najdi, Taleshi, Kurdi and Sarabi, and 10 Kb for Mazandarani. Different LD patterns on individual chromosomes among various breeds could be created through uneven selection pressures on QTLs distributed throughout the genome. So, higher LD can be expected for chromosomes harboring quantitative trait loci (QTL) undergoing selection [34,35]. Number of significant detected QTLs located on some chromosomes was explored for several important traits of interest in Iranian cattle breeds using two QTL databases available online (http://www.animalgenome.org/cgi-bin/ QTLdb/BT/index. and http://bovinegenome.org/bovi-neqtl_v2/login.jsp) ( Table 4). This investigation confirmed that certain chromosomes presented higher LD in each breed have included more numbers of QTLs pertaining to important traits attributed to that breed (see Fig. 3 and Table 4). In accordance with presence of QTLs identified for some traits such as growth, carcass weight, meat percentage, body height, resistance to clinical mastitis and calf size on the chromosome 5, higher average r 2 was also seen for this chromosome in Sisatni breed (that is popular for mentioned traits). Also, it appears that several traits such as milk fat, somatic cell score and udder attachment have been affected by stronger selection in Kermani breed (chromosome 19). Mazandarani breed has some prominent traits such as growth efficiency, carcass weight, tick resistance and high milk fat and protein that can be largely explained by QTLs located on chromosome 14. In agreement with higher average r 2 found for chromosome 26 in Kurdi breed, more QTLs associated with prominent traits of Kurdi breed (such as small body size, Calving ease and milk related traits) were also observed for this chromosome. Similar trend can be reported for QTLs on chromosome 11 that were related to clinical mastitis, SCC and milk fat in Taleshi breed. More selections on calving ease, calf size, dairy capacity and milk protein in Sarabi breed may have led to higher average r 2 found for genomic regions located on chromosome 25. Furthermore, chromosome 28 and 8 have included QTLs affecting traits highly attributed to Najdi cattle such as easy calving, Immunoglobuin G level, heat tolerance and high milk protein. In harmony with these findings, lower LD levels were also observed for chromosomes containing lower numbers of QTLs identified for outstanding traits of Iranian cattle (Table 4). However, LD levels can be depended on some other factors such as recombination rates, mutation rates, genetic drift and population size. Hence, a more detailed study on selected regions of the genome are required and assessing signatures of positive selection can be suggested for future investigations.
In this study, large number of SNPs were excluded from the data set due to lack of enough quality. This may be attributed to DNA quality, however the GCscore, which provides information on the genotyping quality, did not show any abnormal deviations (16 % of markers had a GC score <0.5), and therefore we don't believe that there have been issues with DNA quality. SNP ascertainment bias effect must be acknowledged in this study. Iranian indigenous cattle have not been included in the bovine genome sequencing projects and the SNPs on the chip have been mainly selected based on information from European taurine cattle (also refrence genome is from a taurine breed), this can be somewhat affected genotyping quality. However, the number of remained SNPs was enough to perform analysis and could provide more information, for instance, compared to 50 K chips. Despite of removing a large part of SNP data due to low quality, these results confirmed that the SNP densities used in this work can provide enough accuracy for the future genomic selection programs and GWAS in Iranian cattle.

Conclusions
A wide genetic diversity and different historical background were well reflected in LD levels among Iranian cattle breeds. LD fluctuations were more detectable in the shorter distances (less than 10 Kb) among different breeds. Despite of the sample size effects, High LD levels obtained in this study confirmed the small size of Iranian cattle populations that were exposed to serious extinction risk in recent years.