Human Array | NEW  Mouse Array | Release Notes | Reference

All files are gzipped plain text files (tab-delimited). some old gz files appear double compressed when downloaded with filefox. Please apply decompression twice.

Human Array

GRCh38 / hg38

GRCh37 / hg19

Column Legends

CpG_chrm, CpG_beg and CpG_end - the location of the target. CpG_beg is 0-based coordinate and CpG_end is 1-based. The coordinates should have a span of 2 nucleotides for CpG probes, or 1 nucleotide for CpH and SNP probes. Some erroneous CpH probe coordinates mapping information in the manufacturer's manifest have been corrected.
probe_strand - This is consistent with "orientation" column for strand information of the actual probe. '+' is for all the up-probes positioned in smaller coordinates and '-' for all the down-probes positioned in greater coordinates with respect to the target CpGs. "*" is used for unmapped probes.
Probe_ID - Probe ID address_A and address_B - addresses of probe A and B on the chip designated by the original manifest.
channel - "Both" for type II probes and "Grn"/"Red" for type I probes.
designType - either "I" or "II".
nextBase - the actual extension base (on the probe strand) after bisulfite conversion ("A" or "C" or "T"). Unmapped probe has extension base labeled in the original manifest.
nextBaseRef - the extension base (on the hybridized/template DNA) before bisulfite conversion ("A", "C", "G" or "T"). Unmapped probe has "NA".
probeType - either "cg", "ch" or "rs".
orientation - either "up" or "down" specifying whether the probe is positioned upstream (in smaller coordinates) or downstream (in greater coordinates) the target.
Note that by design, probes positioned upstream (in smaller coordinates) are always on the Watson strand and probes positioned downstream (in greater coordinates) are always on the Crick strand.
probeCpGcnt - the number of additional CpGs in the probe (not counting the interrogated CpG).
context35 - the number of CpG in the [-35bp, +35bp] window.
probeStart and probeEnd - the mapped start and end position of the probe, it is always 50bp long.
ProbeSeq_A and ProbeSeq_B - the probe sequence for allele A and B.
gene - ";"-separated list of gene annotations (unique and alphabetically sorted). Gene models follows GENCODE version 22 (hg38).
gene_HGNC - ";"-separated list of gene annotations (unique and alphabetically sorted). Genes are checked using HGNChelper for compatibility with HGNC. Gene models follows GENCODE version 22 (hg38).
chrm_A, beg_A, flag_A, mapQ_A, cigar_A, NM_A - the mapping info for probe A excluding decoy chromsomes. mapQ=mapping quality score, 0-60, with 60 being the best.
chrm_B, beg_B, flag_B, mapQ_B, cigar_B, NM_B - the mapping info for probe B excluding decoy chromosomes, like above.
wDecoy_chrm_A, wDecoy_beg_A, wDecoy_mapQ_A, wDecoy_cigar_A, wDecoy_NM_A - the mapping info for probe A including decoy chromosomes.
wDecoy_chrm_B, wDecoy_beg_B, wDecoy_mapQ_B, wDecoy_cigar_B, wDecoy_NM_B - the mapping info for probe B including decoy chromosomes.
posMatch - whether the mapping matches the original manifest, it only applies to hg19 and will be NA under hg38.

Masking

MASK.mapping - whether the probe is masked for mapping reason. Probes retained should have high quality (>=40 on 0-60 scale) consistent (with designed MAPINFO) mapping (for both in the case of type I) without INDELs.
MASK.typeINextBaseSwitch - whether the probe has a SNP in the extension base that causes a color channel switch from the official annotation (described as color-channel-switching, or CCS SNP in the reference). These probes should be processed differently than designed (by summing up both color channels instead of just the annotated color channel).
MASK.rmsk15 - whether the 15bp 3'-subsequence of the probe overlap with repeat masker, this MASK is NOT recommended.
MASK.sub25.copy, MASK.sub30.copy, MASK.sub35.copy and MASK.sub40.copy - whether the 25bp, 30bp, 35bp and 40bp 3'-subsequence of the probe is non-unique.
MASK.snp5.common - whether 5bp 3'-subsequence (including extension for typeII) overlap with any of the common SNPs from dbSNP (global MAF can be under 1%).
MASK.snp5.GMAF1p - whether 5bp 3'-subsequence (including extension for typeII) overlap with any of the SNPs with global MAF >1%.
MASK.extBase - probes masked for extension base inconsistent with specified color channel (type-I) or CpG (type-II) based on mapping.
MASK.general - recommended general purpose masking merged from "MASK.sub30.copy", "MASK.mapping", "MASK.extBase", "MASK.typeINextBaseSwitch" and "MASK.snp5.GMAF1p".

Cross-species Mapping


Mouse Array (see here for working with the mouse array)

GRCm38 / mm10

Column Legends

CpG_chrm, CpG_beg, CpG_end (1-3) - genomic coordinate for the target, length 2 for CpG, length 1 for SNP and CpH. beg is 0-based and end is 1-based like in bed files.
address_A, address_B (4-5) - Chip/tango address for A-allele and B-allele.
target (6) - CG if the probe measures CpG methylation, reference allele if otherwise
nextBase (7) - the actual extension base (on the probe strand) before bisulfite conversion. "R" (stands for G/A) for Infinium-II probes
channel (8) - color channel, green or red
Probe_ID (9) - the probe ID
lastBase_A (10) - last base on 3'-end of the probe sequence, for Infinium-II CpG probe it will always be a C
mapFlag_A, mapChrm_A, mapPos_A, mapQ_A, mapCigar_A, mapSeq_A, mapNM_A, mapAS_A, mapYD_A (11-19) - Mapping information for allele A. NM is the number of mutations. AS represents alignment score. YD (f/r/n) represents the bisulfite strand. mapFlag can be used to determine direction of the probe sequence. 0 means upstream and 16 means downstream.
lastBase_B (20) - lastB, last for B-allele for Infinium-I probes
mapFlag_B, mapChrm_B, mapPos_B, mapQ_B, mapCigar_B, mapSeq_B, mapNM_B, mapAS_B, mapYD_B (21-29) - Mapping information for allele B. Same as allele A.
design (30) - design category string
mask (31) - whether the probe should be masked. This include control probes, multi-mapping probes (mapQ<30 for either allele A or B) and non-informative probes (uk probes).

GRCm39 / mm39

Cross-species Mapping


Release Notes

Update Jun-15-2021:
Update Apr-14-2021:
Update Jul-4-2020 Updates:
Release Sep-9-2018 (latest) Updates:
Release Aug-8-2018 Updates:

Release Mar-4-2018 Fix to hg19 decoy mapping inconsistency.

Release Jan-4-2018 Updated strand information.

Release Nov-23-2017 Updates:
Release Mar-13-2017 Updates: fix to relative position in SNP masking of type-I probes.

Reference

Human Array - Zhou W, Laird PW and Shen H, Comprehensive characterization, annotation and innovative use of Infinium DNA Methylation BeadChip probes, Nucleic Acids Research 2017

Mouse Array - In preparation


Contact

Questions regarding this annotation can be addressed to wanding.zhou@pennmedicine.upenn.edu.