test
Search publications, data, projects and authors

Dataset

Undefined

ID: <

50|dedup_wf_001::2db02173f3720ba233381621f4532e82

>

·

DOI: <

10.5061/dryad.41dq8

>

Where these data come from
Data from: The standing pool of genomic structural variation in a natural population of Mimulus guttatus

Abstract

Major unresolved questions in evolutionary genetics include determining the contributions of different mutational sources to the total pool of genetic variation in a species, and understanding how these different forms of genetic variation interact with natural selection. Recent work has shown that structural variants (insertions, deletions, inversions and transpositions) are a major source of genetic variation, often out-numbering single nucleotide variants in terms of total bases affected. Despite the near ubiquity of structural variants, major questions about their interaction with natural selection remain. For example, how does the allele frequency spectrum of structural variants differ when compared to single nucleotide variants? How often do structural variants affect genes, and what are the consequences? To begin to address these questions, we have systematically identified and characterized a large set submicroscopic insertion and deletion (indel) variants (between 1 kb to 200 kb in length) among ten individuals from a single natural population of the plant species Mimulus guttatus. After extensive computational filtering, we focused on a set of 4,142 high-confidence indels that showed an experimental validation rate of 73%. All but one of these indels were < 200 kb. While the largest were generally at lower frequencies in the population, a surprising number of large indels are at intermediate frequencies. While indels overlapping with genes were much rarer than expected by chance, nearly 600 genes were affected by an indel. NBS-LRR defense response genes were the most enriched among the gene families affected. Most indels associated with genes were rare and appeared to be under purifying selection, though we do find four high-frequency derived insertion alleles that show signatures of recent positive selection. SV_code_dryadPython code library for identifying structural variation from paired-end illumina reads.Deletion_CallsSpreadsheet containing genomic deletion calls for all deletion alleles. Columns include: include accession and chromosomal location information, estimated deletion size (size), count of supporting illumina reads (num_reads), deletion allele includes ref. genome (contains_IM62), sequence read coverage in deleted interval (cov), "N" bases in deleted interval in ref. genome (prop_Ns). Annotation of gene and Transposable element hits (gene_hits and TE_hits; includes annoation and proportion deleted in parenthesis).SNP.data.Supp.Table.txtAll SNPs collected from 12 Mimulus accessions. Columns listed include SNP position (scaffold and base position (pos)), SNP type (if coding, synonymous or nonsynonymous), and allele state for all 12 accessions. Missing data represented by a hyphen.SV.read.data.Supp.Table.txtRead meta-data for all illumina reads implicated in a structural variant. Including read name, accession (line), structural variant category (kind), re-alignment status with Novoalign (reject_by_Novoalign), cluster formation status (failed_to_form_cluster), cluster ID (cluster), status of ref. genome line in cluster (cluster_includes_ref_line), status of cluster cov or alignment quality QC (failed_cov_or_abnormally_mapped_QC).

Your Feedback

Please give us your feedback and help us make GoTriple better.
Fill in our satisfaction questionnaire and tell us what you like about GoTriple!