[Standard] Analyzing pooled pileup data (EA)

Description

Resequencing data from two different sources of dog DNA have been summarized in the following pileup files:

  1. pool3_chr38.bam.raw.pileup.txt
  2. pool6_chr38.bam.raw.pileup.txt

One of the files represents data originating from a single breed, while the second contains data from a mix of four breeds. Each line in the files harbour 6 columns: 1) chromosome, 2) position, 3) reference base, 4) number of reads mapped to this position, 5) sequenced nucleotide and 6) quality of sequenced nucleotide. The resequencing data has only been summarized at positions that are known to vary from a comparison of a larger set of dog and wolf data (i.e. not all positions are shown). In column 5, “.“ and “,” indicates that the sequenced nucleotide matched the reference. “[agctAGCT]” represent alternative sequenced nucleotides. All other signs can be disregarded.

Objective

  1. Count the numbers of variable sites in both files and try to figure out which file is most likely to contain data from four and one breed, respectively.