CpG_chrm |
CpG_beg |
CpG_end |
probe_strand |
Probe_ID |
address_A |
address_B |
channel |
designType |
nextBase |
nextBaseRef |
probeType |
orientation |
probeCpGcnt |
context35 |
probeBeg |
probeEnd |
ProbeSeq_A |
ProbeSeq_B |
gene |
gene_HGNC |
chrm_A |
beg_A |
flag_A |
mapQ_A |
cigar_A |
NM_A |
chrm_B |
beg_B |
flag_B |
mapQ_B |
cigar_B |
NM_B |
wDecoy_chrm_A |
wDecoy_beg_A |
wDecoy_mapQ_A |
wDecoy_cigar_A |
wDecoy_NM_A |
wDecoy_chrm_B |
wDecoy_beg_B |
wDecoy_mapQ_B |
wDecoy_cigar_B |
wDecoy_NM_B |
posMatch |
MASK.mapping |
MASK.typeINextBaseSwitch |
MASK.rmsk15 |
MASK.sub25.copy |
MASK.sub30.copy |
MASK.sub35.copy |
MASK.sub40.copy |
MASK.snp5.common |
MASK.snp5.GMAF1p |
MASK.extBase |
MASK.general |
(1-3) CpG_chrm, CpG_beg, CpG_end: the location of the target. CpG_beg is 0-based coordinate and CpG_end is 1-based. The coordinates should have a span of 2 nucleotides for CpG probes, or 1 nucleotide for CpH and SNP probes. Some erroneous CpH probe coordinates mapping information in the manufacturer's manifest have been corrected.
(4) probe_strand: This is consistent with "orientation" column for strand information of the actual probe. '+' is for all the up-probes positioned in smaller coordinates and '-' for all the down-probes positioned in greater coordinates with respect to the target CpGs. "*" is used for unmapped probes.
(5) Probe_ID: Probe ID
(6-7) address_A, address_B: addresses of probe A and B on the chip designated by the original manifest.
(8) channel: "Both" for type II probes and "Grn"/"Red" for type I probes.
(9) designType: either "I" or "II".
(10) nextBase: the actual extension base (on the probe strand) after bisulfite conversion ("A" or "C" or "T"). Unmapped probe has extension base labeled in the original manifest.
(11) nextBaseRef: the extension base (on the hybridized/template DNA) before bisulfite conversion ("A", "C", "G" or "T"). Unmapped probe has "NA".
(12) probeType: either "cg", "ch" or "rs".
(13) orientation: either "up" or "down" specifying whether the probe is positioned upstream (in smaller coordinates) or downstream (in greater coordinates) the target.
Note that by design, probes positioned upstream (in smaller coordinates) are always on the Watson strand and probes positioned downstream (in greater coordinates) are always on the Crick strand.
(14) probeCpGcnt: the number of additional CpGs in the probe (not counting the interrogated CpG).
(15) context35: the number of CpG in the [-35bp, +35bp] window.
(16-17) probeBeg, probeEnd: the mapped start and end position of the probe, it is always 50bp long.
(18-19) ProbeSeq_A, ProbeSeq_B: the probe sequence for allele A and B.
(20) gene: ";"-separated list of gene annotations (unique and alphabetically sorted). Gene models follows GENCODE version 22 (hg38).
(21) gene_HGNC: ";"-separated list of gene annotations (unique and alphabetically sorted). Genes are checked using HGNChelper for compatibility with HGNC. Gene models follows GENCODE version 22 (hg38).
(22-27) chrm_A, beg_A, flag_A, mapQ_A, cigar_A, NM_A: the mapping info for probe A excluding decoy chromsomes. mapQ=mapping quality score, 0-60, with 60 being the best.
(28-33) chrm_B, beg_B, flag_B, mapQ_B, cigar_B, NM_B: the mapping info for probe B excluding decoy chromosomes, like above.
(34-39) wDecoy_chrm_A, wDecoy_beg_A, wDecoy_flag_A, wDecoy_mapQ_A, wDecoy_cigar_A, wDecoy_NM_A: the mapping info for probe A including decoy chromosomes.
(40-45) wDecoy_chrm_B, wDecoy_beg_B, wDecoy_flag_B, wDecoy_mapQ_B, wDecoy_cigar_B, wDecoy_NM_B: the mapping info for probe B including decoy chromosomes.
(46) posMatch: whether the mapping matches the original manifest, it only applies to hg19 and will be NA under hg38.
(47) MASK.mapping: whether the probe is masked for mapping reason. Probes retained should have high quality (>=40 on 0-60 scale) consistent (with designed MAPINFO) mapping (for both in the case of type I) without INDELs.
(48) MASK.typeINextBaseSwitch: whether the probe has a SNP in the extension base that causes a color channel switch from the official annotation (described as color-channel-switching, or CCS SNP in the reference). These probes should be processed differently than designed (by summing up both color channels instead of just the annotated color channel).
(49) MASK.rmsk15: whether the 15bp 3'-subsequence of the probe overlap with repeat masker, this MASK is NOT recommended.
(50-53) MASK.sub25.copy, MASK.sub30.copy, MASK.sub35.copy, MASK.sub40.copy: whether the 25bp, 30bp, 35bp and 40bp 3'-subsequence of the probe is non-unique.
(54) MASK.snp5.common: whether 5bp 3'-subsequence (including extension for typeII) overlap with any of the common SNPs from dbSNP (global MAF can be under 1%).
(55) MASK.snp5.GMAF1p: whether 5bp 3'-subsequence (including extension for typeII) overlap with any of the SNPs with global MAF >1%.
(56) MASK.extBase: probes masked for extension base inconsistent with specified color channel (type-I) or CpG (type-II) based on mapping.
(57) MASK.general: the recommended general purpose masking merged from "MASK.sub30.copy", "MASK.mapping", "MASK.extBase", "MASK.typeINextBaseSwitch" and "MASK.snp5.GMAF1p".