Affymetrix Gene Filtering Procedures

Affy Control Gene Filter

Use the Affy Control Gene Filter to remove internal control probes from the analysis. Affymetrix technology uses house-keeping genes' intensity values as a means for quality control. A pre-determined concentration of these control genes was spiked into the cRNA target mixture prior to application onto the microarrays. The measured intensity values are used for internal quality control. Since these control genes are typically from different species, they have little importance to the analysis.

MAS5 Absolute Call Filter

Use the MAS5 Absolute Call Filter to either keep or remove probes based on their present or absent call. The MAS5 algorithm uses probe-pair intensities to generate a detection p-value and assign a Present, Marginal, or Absent call. Each probe pair in a probe set has a potential vote in determining whether the measured transcript is detected (Present) or not detected (Absent). The vote is described by a value called the discrimination score [R]. The score is calculated for each probe pair and is compared to a predefined threshold Tau. Probe pairs with scores higher than Tau vote for the presence of the transcript. Probe pairs with scores lower than Tau vote for the absence of the transcript. The voting result is summarized as a p-value. The greater the number of discrimination scores calculated for a given probe set that are above Tau, the smaller the p-value and the more likely the given transcript is truly present in the sample. The p-value associated with this test reflects the confidence of the detection call.

The detection p-value cut-offs, alpha 1 (α1) and alpha 2 (α2), provide boundaries for defining Present, Marginal, or Absent calls. At the default settings determined for probe sets with 16 - 20 probe pairs (defaults α1 = 0.04 and α2 = 0.06), any p-value that falls below α1 is assigned a Present call, and above α2 is assigned an Absent call. Marginal calls are given to probe sets which have p-values between α1 and α2.

DABG Absolute Call Filter

Analogous to the MAS5.0 present/absent calls from the Affymetrix 3’ Array, each probe set on the Affymetrix Exon array is given a p-value associated with the hypothesis that the intensity values can be distinguished from background noise. This p-value is referred to as the detection above background (DABG). It is generated by comparing each probe in the probe set to a set of background probes with similar GC content. The probe-level p-values are combined into a DABG p-value on the probe set level. For this filter, probe set with a p-value less than 0.0001 is considered “present” and a probe set with a p-value greater than or equal to 0.0001 is considered “absent”.

Heritability Filter

Use the Heritability Filter to limit the probe sets analyzed to those with a high genetic heritability. This filter eliminates probe sets for transcripts whose environmental influence on expression is high compared to the strict genetic influence. For analyses on the Affymetrix Mouse 430 version 2 array, the broad sense heritability has been calculated on the public inbred mouse panel (20 strains) normalized using RMA with poor quality probes eliminated prior to normalization and the public BXD recombinant inbred mouse panel (32 strains) normalized using RMA with poor quality probe eliminated prior to normalization.

For analyses on the Affymetrix Mouse Exon array, the broad sense heritability has been calculated on the core transcript clusters from the public LXS brain recombinant inbred mouse panel normalized using RMA with poor quality probes eliminated prior to normalization. For analyses on the Affymetrix Rat Exon array, the user must choose the tissue of interest (brain, heart, liver, or brown adipose tissue). The broad sense heritability has been calculated separately for each tissue on the core transcript clusters from the public HXB/BXH recombinant inbred rat panel normalized using RMA with poor quality probes eliminated prior to normalization.

The broad sense heritability is calculated for each probe set/transcript clusters separately using an ANOVA model. Because the public data sets are based on the Affymetrix Mouse 430 version 2 array, the Affymetrix Mouse Exon arrary, and the Affymetrix Rat Exon array, the Heritability filter is only available for data sets on these chips. Use either panel's heritability values for this filter and specify a minimum heritability threshold for inclusion. All filtering is done on the probe set or transcript cluster level.

eQTL/bQTL Filter

Another way to prioritize genes for analysis is to limit those considered to be genes whose transcription levels are controlled from the same genetic region that controls the phenotype/behavior of interest (e.g., Tabakoff et al. 2009). We have identified expression quantitative trait loci (eQTL) for probe sets from the BXD recombinant inbred panel on the Affymetrix Mouse 430 version 2 array, for core transcript clusters from brain tissue of the LXS recombinant inbred mouse panel on the Affymetrix Mouse Exon Array, and for core transcript cluster from brain, heart, liver, or brown adipose tissue of the HXB/BXH recombinant inbred rat panel on the Affymetrix Rat Exon Array.

When data sets are created based on any of these three array technologies, the respective eQTL data sets are used. When using the HXB/BXH recombinant inbred rat panel, the user must choose the correct tissue. The user also chooses a significance threshold for eQTL and the appropriate bQTL to compare to. Probe sets/transcript clusters are retained if their eQTL is significant and the location of the marker (SNP) with the maximum association with transcript expression is within the chosen bQTL limits. All filtering is done on the probe set or transcript cluster level.

Gene List Filter

Filtering by Gene List allows you to select a gene list that has been previously created and either keep or remove all the genes within that gene list.