Release Mar-4-2018 Fix to hg19 decoy mapping inconsistency.
Release Jan-4-2018 Updated strand information.Release Nov-23-2017 Updates:
See here for previous releases.
Note: some old gz files appear double compressed when downloaded with filefox. Please apply decompression twice.
require the GenomicRanges package from Bioconductor
seqnames, start and end - the location of the target (1-based coordinates, 2 nucleotides for CpG probes, or 1 nucleotide for CpH and SNP probes). strand is left as "*" always. Some erroneous CpH probe coordinates mapping information in the manufacturer's manifest have been corrected. SNP probe coordinates are provided.
strand - This is consistent with "orientation" column for strand information of the actual probe. '+' is for all the up-probes positioned in smaller coordinates and '-' for all the down-probes positioned in greater coordinates with respect to the target CpGs.
address_A and address_B - addresses of probe A and B on the chip designated by the original manifest.
channel - "Both" for type II probes and "Grn"/"Red" for type I probes.
designType - either "I" or "II".
nextBase - the actual extension base (on the probe strand) after bisulfite conversion ("A" or "C" or "T"). Unmapped probe has extension base labeled in the original manifest.
nextBaseRef - the extension base (on the hybridized DNA) before bisulfite conversion ("A", "C", "G" or "T"). Unmapped probe has "NA".
probeType - either "cg", "ch" or "rs".
orientation - either "up" or "down" specifying whether the probe is positioned upstream (in smaller coordinates) or downstream (in greater coordinates) the target.
Note that by design, probes positioned upstream (in smaller coordinates) are always on the Watson strand and probes positioned downstream (in greater coordinates) are always on the Crick strand.
probeCpGcnt - the number of additional CpGs in the probe (not counting the interrogated CpG).
context35 - the number of CpG in the [-35bp, +35bp] window.
probeStart and probeEnd - the mapped start and end position of the probe, it is always 50bp long.
ProbeSeq_A and ProbeSeq_B - the probe sequence for allele A and B.
gene - ";"-separated list of gene annotations (unique and alphabetically sorted). Gene models follows GENCODE version 22 (hg38).
gene_HGNC - ";"-separated list of gene annotations (unique and alphabetically sorted). Genes are checked using HGNChelper for compatibility with HGNC. Gene models follows GENCODE version 22 (hg38).
chrm_A, beg_A, flag_A, mapQ_A, cigar_A, NM_A - the mapping info for probe A excluding decoy chromsomes. mapQ=mapping quality score, 0-60, with 60 being the best.
chrm_B, beg_B, flag_B, mapQ_B, cigar_B, NM_B - the mapping info for probe B excluding decoy chromosomes, like above.
wDecoy_chrm_A, wDecoy_beg_A, wDecoy_mapQ_A, wDecoy_cigar_A, wDecoy_NM_A - the mapping info for probe A including decoy chromosomes.
wDecoy_chrm_B, wDecoy_beg_B, wDecoy_mapQ_B, wDecoy_cigar_B, wDecoy_NM_B - the mapping info for probe B including decoy chromosomes.
posMatch - whether the mapping matches the original manifest, it only applies to hg19 and will be NA under hg38.
MASK.mapping - whether the probe is masked for mapping reason. Probes retained should have high quality (>=40 on 0-60 scale) consistent (with designed MAPINFO) mapping (for both in the case of type I) without INDELs.
MASK.typeINextBaseSwitch - whether the probe has a SNP in the extension base that causes a color channel switch from the official annotation (described as color-channel-switching, or CCS SNP in the reference). These probes should be processed differently than designed (by summing up both color channels instead of just the annotated color channel).
MASK.rmsk15 - whether the 15bp 3'-subsequence of the probe overlap with repeat masker, this MASK is NOT recommended.
MASK.sub25.copy, MASK.sub30.copy, MASK.sub35.copy and MASK.sub40.copy - whether the 25bp, 30bp, 35bp and 40bp 3'-subsequence of the probe is non-unique.
MASK.snp5.common - whether 5bp 3'-subsequence (including extension for typeII) overlap with any of the common SNPs from dbSNP (global MAF can be under 1%).
MASK.snp5.GMAF1p - whether 5bp 3'-subsequence (including extension for typeII) overlap with any of the SNPs with global MAF >1%.
MASK.extBase - probes masked for extension base inconsistent with specified color channel (type-I) or CpG (type-II) based on mapping.
MASK.general - recommended general purpose masking merged from "MASK.sub30.copy", "MASK.mapping", "MASK.extBase", "MASK.typeINextBaseSwitch" and "MASK.snp5.GMAF1p".
Zhou W, Laird PW and Shen H, Comprehensive characterization, annotation and innovative use of Infinium DNA Methylation BeadChip probes, Nucleic Acids Research 2017
Questions regarding this annotation can be addressed to Wanding.Zhou@vai.org.