Data construction
Illumina short reads of 50 rice re-sequencing data (1) were obtained from SRA. Low quality bases were removed using fqcut and adapter sequences were trimmed using cutadapt (v1.0, 2). Processed reads were mapped to Kasalath pseudomolecule by BWA (v0.6.2, 3) and Local Realignment was conducted using GATK (v1.6.5, 4). After removing PCR duplicates with Picard (v1.92, http://picard.sourceforge.net), variants were called by SAMtools (v0.1.19, 5). Gene structures were predicted by Cufflinks using RNA-Seq data derived from young leaf and panicle samples. The effect of each variant site was annotated by using snpEff (6).
Citation
- Xu, X et al. (2011) Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nature Biotechnology 30:1-10.
- Martin, M. (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10-12.
- Li, H. and Durbin, R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754-1760.
- McKenna, A. et al. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303.
- Li, H. et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079.
- Cingolani, P. et al. (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 6(2):80-92.
|