<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>bcr2138</ui>
   <ji>BCJ</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>A robust classifier of high predictive value to identify good prognosis patients in ER-negative breast cancer</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Teschendorff</snm>
               <mi>E</mi>
               <fnm>Andrew</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>aet21@cam.ac.uk</email>
            </au>
            <au id="A2">
               <snm>Caldas</snm>
               <fnm>Carlos</fnm>
               <insr iid="I1"/>
               <insr iid="I3"/>
               <email>cc234@cam.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Breast Cancer Functional Genomics Laboratory, Cancer Research UK Cambridge Research Institute, Cambridge, CB2 0RE, UK.</p>
            </ins>
            <ins id="I2">
               <p>Department of Oncology University of Cambridge, Li Ka-Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.</p>
            </ins>
            <ins id="I3">
               <p>Cambridge Breast Unit, Addenbrookes Hospital, Cambridge University Hospitals NHS Foundation Trust, Hills Road, Cambridge, UK.</p>
            </ins>
         </insg>
         <source>Breast Cancer Research</source>
         <issn>1465-5411</issn>
         <pubdate>2008</pubdate>
         <volume>10</volume>
         <issue>4</issue>
         <fpage>R73</fpage>
         <url>http://breast-cancer-research.com/content/10/4/R73</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18755024</pubid>
               <pubid idtype="doi">10.1186/bcr2138</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>25</day>
               <month>4</month>
               <year>2008</year>
            </date>
         </rec>
         <revreq>
            <date>
               <day>7</day>
               <month>7</month>
               <year>2008</year>
            </date>
         </revreq>
         <revrec>
            <date>
               <day>15</day>
               <month>7</month>
               <year>2008</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>28</day>
               <month>8</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>28</day>
               <month>8</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Teschendorff and Caldas; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Introduction</p>
               </st>
               <p>Patients with primary operable oestrogen receptor (ER) negative (-) breast cancer account for about 30% of all cases and generally have a worse prognosis than ER-positive (+) patients. Nevertheless, a significant proportion of ER- cases have favourable outcomes and could potentially benefit from a less aggressive course of therapy. However, identification of such patients with a good prognosis remains difficult and at present is only possible through examining histopathological factors.</p>
            </sec>
            <sec>
               <st>
                  <p>Methods</p>
               </st>
               <p>Building on a previously identified seven-gene prognostic immune response module for ER- breast cancer, we developed a novel statistical tool based on Mixture Discriminant Analysis in order to build a classifier that could accurately identify ER- patients with a good prognosis.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We report the construction of a seven-gene expression classifier that accurately predicts, across a training cohort of 183 ER- tumours and six independent test cohorts (a total of 469 ER- tumours), ER- patients of good prognosis (in test sets, average predictive value = 94% [range 85 to 100%], average hazard ratio = 0.15 [range 0.07 to 0.36] p &lt; 0.000001) independently of lymph node status and treatment.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>This seven-gene classifier could be used in a polymerase chain reaction-based clinical assay to identify ER- patients with a good prognosis, who may therefore benefit from less aggressive treatment regimens.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Introduction</p>
         </st>
         <p>Oestrogen receptor (ER) negative (-) breast cancer accounts for about 30% of all breast cancer cases and generally has a worse prognosis compared with ER positive (+)disease <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. Nevertheless, a significant proportion of ER- cases have shown a favourable outcome and could potentially benefit from a less aggressive course of therapy <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Reliable identification of such ER- patients with a good prognosis is, however, difficult and at present only possible through examining histopathological factors.</p>
         <p>Recently, attempts have been made to explain the observed clinical heterogeneity of ER- disease in terms of gene expression signatures <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. However, most of these studies clearly indicated the difficulty of identifying a prognostic gene expression signature for ER- disease <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>, unlike ER+ breast cancer where a multitude of alternative prognostic signatures have been identified <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. Nevertheless, using an integrative analysis of gene expression microarray data from three untreated (no chemotherapy) ER- breast cancer cohorts (a total of 186 patients) <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B8">8</abbr><abbr bid="B10">10</abbr></abbrgrp> and a novel feature selection method <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, it was possible to identify a seven-gene immune response expression module associated with good prognosis,. This suggests that at least part of the observed clinical heterogeneity in ER- disease can be explained on the basis of mRNA expression levels <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Specifically, overexpression of this immune response gene module identified a subclass of basal ER- breast cancer, about 25% of all ER- cases, with a reduced risk of distant metastasis (Hazard ratio [HR] = 0.49; range 0.29 to 0.83; p = 0.009) compared with ER- cases without overexpression of this module <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, a result that was validated in two independent untreated test cohorts (58 ER- samples) <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B12">12</abbr></abbrgrp>.</p>
         <p>The important role that immune system-related gene expression signatures play in breast cancer prognosis has been further supported by four recent reports <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. Specifically, one study reported that high expression of lymphocyte-associated genes identifies a good prognosis subgroup within lymph node negative (LN-) human epidermal growth factor receptor 2 positive (HER2+) breast cancer <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. A further study focused on LN- breast cancer and identified a prognostic B-cell metagene signature, confirming that overexpression of this signature correlated with good prognosis in ER- breast cancer, while underexpression correlated with good prognosis in ER+ breast cancer <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. A similar contrasting result between ER- and ER+ breast cancer was also found by deriving a gene expression signature for lymphocytic infiltration (LI) and demonstrating its positive and negative association with good prognosis in ER- and ER+ disease, respectively <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. All these results are consistent with our findings and highlight the importance of stratifying breast cancer patients into ER+ and ER- subtypes before associations with clinical outcome can be derived <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B16">16</abbr></abbrgrp>.</p>
         <p>The discovery and construction of a molecular classifier that can robustly identify ER- patients with a good prognosis is important for two main reasons. First, identification of ER- patients with a good prognosis based on histopathological predictors like LN status or Adjuvant! is far from optimal <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Second, reliable identification of ER- patients of good prognosis could help guide the management of ER- patients further, by providing less aggressive treatment regimens for such patients. Building on our previous results <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> here we report on the construction of a seven-gene prognostic classifier and further validate this single-sample predictor across six (four untreated and two partially treated) independent ER- breast cancer cohorts: 'UPP' <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, 'JRH-2' <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, 'UNC248' <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, 'CAL' <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, 'Loi' <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> and 'Kreike' <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. This therefore confirms the validity of this classifier in more than 469 ER- patients.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Linear and quadratic discriminant analysis</p>
            </st>
            <p>Before discussing Mixture Discriminant Analysis (MDA), it is convenient to briefly review Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. We assume that we have a training data set <it>X </it>of dimension <it>p </it>&#215; <it>N</it>, where <it>p </it>is the number of dimensions (ie, genes) and <it>N </it>is the number of training samples (ie, tumour samples). We also assume that we have a test set <it>Y </it>of dimension <it>p </it>&#215; <it>n </it>and that we have <it>C </it>phenotype classes among the training set samples.</p>
            <p>In the training process of discriminant analysis one attempts to learn parameters that specify the clusters associated with each of the phenotype classes. In the maximum likelihood framework, one learns parameters (<it>&#960;</it>, <it>&#952;</it>) = (<it>&#960;</it><sub><it>k</it></sub>, <it>&#952;</it><sub><it>k </it></sub>= 1,..., <it>C</it>) such that the likelihood function</p>
            <p>
               <display-formula id="M1">
                  <m:math name="bcr2138-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>L</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>&#960;</m:mi>
                           <m:mn>,</m:mn>
                           <m:mi>&#952;</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mi>p</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>X</m:mi>
                           <m:mo>|</m:mo>
                           <m:mi>&#960;</m:mi>
                           <m:mn>,</m:mn>
                           <m:mi>&#952;</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8719;</m:mo>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mi>N</m:mi>
                              </m:munderover>
                           </m:mstyle>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mi>C</m:mi>
                              </m:munderover>
                           </m:mstyle>
                           <m:msub>
                              <m:mi>&#960;</m:mi>
                              <m:mi>k</m:mi>
                           </m:msub>
                           <m:msub>
                              <m:mi>f</m:mi>
                              <m:mi>k</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>x</m:mi>
                              <m:mi>i</m:mi>
                           </m:msub>
                           <m:mo>|</m:mo>
                           <m:msub>
                              <m:mi>&#952;</m:mi>
                              <m:mi>k</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">)</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xI8qiVKIOFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemitaWKaeGikaGIaeqiWdaNaeGilaWIaeqiUdeNaeGykaKIaeGypa0JaemiCaaNaeGikaGIaemiwaGLaeGiFaWNaeqiWdaNaeGilaWIaeqiUdeNaeGykaKIaeGypa0ZaaebCaeqaleaacqWGPbqAcqaI9aqpcqaIXaqmaeaacqWGobGta0Gaey4dIunakmaaqahabeWcbaGaem4AaSMaeGypa0JaeGymaedabaGaem4qameaniabggHiLdGccqaHapaCdaWgaaWcbaGaem4AaSgabeaakiabdAgaMnaaBaaaleaacqWGRbWAaeqaaOGaeGikaGIaemiEaG3aaSbaaSqaaiabdMgaPbqabaGccqaI8baFcqaH4oqCdaWgaaWcbaGaem4AaSgabeaakiabiMcaPaaa@5EDC@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>is maximised. In the above, <it>f</it><sub><it>k </it></sub>denotes the probability function that specifies the probability that the observation <it>x</it><sub><it>i </it></sub>is generated from cluster <it>k</it>, <it>&#960;</it><sub><it>k </it></sub>denotes the weight of this cluster and <it>&#952;</it><sub><it>k </it></sub>parameterises the cluster. The optimisation of the likelihood is performed using the EM-algorithm, subject to the constraint that <inline-formula><m:math name="bcr2138-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8721;</m:mo><m:mrow><m:mi>k</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>C</m:mi></m:msubsup></m:mstyle><m:msub><m:mi>&#960;</m:mi><m:mi>k</m:mi></m:msub><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWaaabmaeqaleaacqWGRbWAcqaI9aqpcqaIXaqmaeaacqWGdbWqa0GaeyyeIuoakiabec8aWnaaBaaaleaacqWGRbWAaeqaaOGaeGypa0JaeGymaedaaa@398D@</m:annotation></m:semantics></m:math></inline-formula>, yielding estimates <inline-formula><m:math name="bcr2138-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mo stretchy="false">(</m:mo><m:mn/><m:mover accent="true"><m:mi>&#960;</m:mi><m:mo>^</m:mo></m:mover><m:mn>,</m:mn><m:mover accent="true"><m:mi>&#952;</m:mi><m:mo>^</m:mo></m:mover><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeGikaGIafqiWdaNbaKaacqaISaalcuaH4oqCgaqcaiabiMcaPaaa@3407@</m:annotation></m:semantics></m:math></inline-formula>.</p>
            <p>Having estimated the parameters, we can now classify a test sample <it>y </it>using Bayes' Theorem as follows. The probability that <it>y </it>belongs to class <it>k </it>is just the posterior probability <it>p</it>(<it>k</it>|<it>y</it>), which by Bayes' Theorem can be written as</p>
            <p>
               <display-formula id="M2">
                  <m:math name="bcr2138-i4" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>p</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>k</m:mi>
                           <m:mo stretchy="false">|</m:mo>
                           <m:mi>y</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:msub>
                                    <m:mover accent="true">
                                       <m:mi>&#960;</m:mi>
                                       <m:mo>^</m:mo>
                                    </m:mover>
                                    <m:mi>k</m:mi>
                                 </m:msub>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mi>k</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>y</m:mi>
                                 <m:mo stretchy="false">|</m:mo>
                                 <m:msub>
                                    <m:mover accent="true">
                                       <m:mi>&#952;</m:mi>
                                       <m:mo>^</m:mo>
                                    </m:mover>
                                    <m:mi>k</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:munderover>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mi>c</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                       <m:mi>C</m:mi>
                                    </m:munderover>
                                 </m:mstyle>
                                 <m:msub>
                                    <m:mover accent="true">
                                       <m:mi>&#960;</m:mi>
                                       <m:mo>^</m:mo>
                                    </m:mover>
                                    <m:mi>c</m:mi>
                                 </m:msub>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mi>c</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>y</m:mi>
                                 <m:mo stretchy="false">|</m:mo>
                                 <m:msub>
                                    <m:mover accent="true">
                                       <m:mi>&#952;</m:mi>
                                       <m:mo>^</m:mo>
                                    </m:mover>
                                    <m:mi>c</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xI8qiVKIOFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiCaaNaeGikaGIaem4AaSMaeGiFaWNaemyEaKNaeGykaKIaeGypa0tcfa4aaSaaaeaacuaHapaCgaqcamaaBaaabaGaem4AaSgabeaacqWGMbGzdaWgaaqaaiabdUgaRbqabaGaeGikaGIaemyEaKNaeGiFaWNafqiUdeNbaKaadaWgaaqaaiabdUgaRbqabaGaeGykaKcabaWaaabCaeqabaGaem4yamMaeGypa0JaeGymaedabaGaem4qameacqGHris5aiqbec8aWzaajaWaaSbaaeaacqWGJbWyaeqaaiabdAgaMnaaBaaabaGaem4yamgabeaacqaIOaakcqWG5bqEcqaI8baFcuaH4oqCgaqcamaaBaaabaGaem4yamgabeaacqaIPaqkaaaaaa@59E5@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Assigning <it>y </it>to the class which maximises this posterior probability (the maximum probability criterion) minimises the expected misclassification error. Thus,</p>
            <p>
               <display-formula id="M3"><it>k </it>= <it>class</it>(<it>y</it>) = max{<it>p</it>(<it>c</it>|<it>y</it>)|<it>c </it>= 1,..., <it>C</it>}</display-formula>
            </p>
            <p>To compute the posterior probabilities one needs to estimate the functions <it>f</it><sub><it>k </it></sub>or, if the functional form is prespecified, the parameters <it>&#952;</it><sub><it>k</it></sub>. The simplest functional approximation one can make is to assume that the clusters are multivariate Gaussians, so that</p>
            <p>
               <display-formula>
                  <m:math name="bcr2138-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable columnalign="left">
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>f</m:mi>
                                          <m:mi>k</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>y</m:mi>
                                       <m:mo stretchy="false">|</m:mo>
                                       <m:msub>
                                          <m:mi>&#952;</m:mi>
                                          <m:mi>k</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>=</m:mo>
                                       <m:mi>G</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>y</m:mi>
                                       <m:mo stretchy="false">|</m:mo>
                                       <m:msub>
                                          <m:mi>&#956;</m:mi>
                                          <m:mi>k</m:mi>
                                       </m:msub>
                                       <m:mn>,</m:mn>
                                       <m:msub>
                                          <m:mi>&#931;</m:mi>
                                          <m:mi>k</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mtext/>
                                       <m:mo>=</m:mo>
                                       <m:mfrac>
                                          <m:mn>1</m:mn>
                                          <m:mrow>
                                             <m:msqrt>
                                                <m:mrow>
                                                   <m:mn>2</m:mn>
                                                   <m:mi>&#960;</m:mi>
                                                   <m:mi>d</m:mi>
                                                   <m:mi>e</m:mi>
                                                   <m:mi>t</m:mi>
                                                   <m:msub>
                                                      <m:mi>&#931;</m:mi>
                                                      <m:mi>k</m:mi>
                                                   </m:msub>
                                                </m:mrow>
                                             </m:msqrt>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:msup>
                                          <m:mi>e</m:mi>
                                          <m:mrow>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mfrac>
                                                <m:mn>1</m:mn>
                                                <m:mn>2</m:mn>
                                             </m:mfrac>
                                             <m:msup>
                                                <m:mrow>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:mi>y</m:mi>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:msub>
                                                      <m:mi>&#956;</m:mi>
                                                      <m:mi>k</m:mi>
                                                   </m:msub>
                                                   <m:mo stretchy="false">)</m:mo>
                                                </m:mrow>
                                                <m:mi>T</m:mi>
                                             </m:msup>
                                             <m:msubsup>
                                                <m:mi>&#931;</m:mi>
                                                <m:mi>k</m:mi>
                                                <m:mrow>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:mn>1</m:mn>
                                                </m:mrow>
                                             </m:msubsup>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>y</m:mi>
                                             <m:mo>&#8722;</m:mo>
                                             <m:msub>
                                                <m:mi>&#956;</m:mi>
                                                <m:mi>k</m:mi>
                                             </m:msub>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:msup>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xI8qiVKIOFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqbaeaabiqaaaqaaiabdAgaMnaaBaaaleaacqWGRbWAaeqaaOGaeGikaGIaemyEaKNaeGiFaWNaeqiUde3aaSbaaSqaaiabdUgaRbqabaGccqaIPaqkcqaI9aqpcqWGhbWrcqaIOaakcqWG5bqEcqaI8baFcqaH8oqBdaWgaaWcbaGaem4AaSgabeaakiabiYcaSiabfo6atnaaBaaaleaacqWGRbWAaeqaaOGaeGykaKcabiqaaaubcaWLjaGaeGypa0tcfa4aaSaaaeaacqaIXaqmaeaadaGcaaqaaiabikdaYiabec8aWjabcsgaKjabcwgaLjabcsha0jabfo6atnaaBaaabaGaem4AaSgabeaaaeqaaaaakiabdwgaLnaaCaaaleqabaGaeyOeI0scfa4aaSaaaeaacqaIXaqmaeaacqaIYaGmaaWccqaIOaakcqWG5bqEcqGHsislcqaH8oqBdaWgaaqaaiabdUgaRbqabaGaeGykaKYaaWbaaeqabaGaemivaqfaaiabfo6atnaaDaaabaGaem4AaSgabaGaeyOeI0IaeGymaedaaiabiIcaOiabdMha5jabgkHiTiabeY7aTnaaBaaabaGaem4AaSgabeaacqaIPaqkaaaaaaaa@6E85@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>&#956;</it><sub><it>k </it></sub>is the mean and &#931;<sub><it>k </it></sub>the covariance matrix of the Gaussian. If, furthermore, we assume that the covariance matrices are identical for each cluster (ie, &#931;<sub><it>k </it></sub>= &#931; &#8704; <it>k</it>), then the classification function becomes a linear function of <it>y</it>, known as LDA. In the more general case where the covariance matrices of each class are allowed to differ, the classification function is a quadratic form of the <it>y </it>and the analysis is known as QDA.</p>
         </sec>
         <sec>
            <st>
               <p>Mixture Discriminant Analysis</p>
            </st>
            <p>The assumption that a phenotype class is best modelled by a multivariate Gaussian is often violated. In the context of gene-expression analysis, gene expression profiles often exhibit bi-or multimodality, even when restricted to one phenotype class <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Similarly, gene expression profiles typically also have longer tails than Gaussians. In such circumstances, it seems more appropriate to model each <it>f</it><sub><it>k </it></sub>as a mixture of multivariate Gaussians, since any general density can be approximated by such a mixture. Therefore, one assumes that</p>
            <p>
               <display-formula id="M4">
                  <m:math name="bcr2138-i6" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>f</m:mi>
                              <m:mi>k</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>y</m:mi>
                           <m:mo stretchy="false">|</m:mo>
                           <m:msub>
                              <m:mi>&#952;</m:mi>
                              <m:mi>k</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>j</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>G</m:mi>
                                       <m:mi>k</m:mi>
                                    </m:msub>
                                 </m:mrow>
                              </m:munderover>
                           </m:mstyle>
                           <m:msub>
                              <m:mi>&#964;</m:mi>
                              <m:mrow>
                                 <m:mi>k</m:mi>
                                 <m:mi>j</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mi>G</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>y</m:mi>
                           <m:mo stretchy="false">|</m:mo>
                           <m:msub>
                              <m:mi>&#956;</m:mi>
                              <m:mrow>
                                 <m:mi>k</m:mi>
                                 <m:mi>j</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mn>,</m:mn>
                           <m:msub>
                              <m:mi>&#931;</m:mi>
                              <m:mrow>
                                 <m:mi>k</m:mi>
                                 <m:mi>j</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo stretchy="false">)</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xI8qiVKIOFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOzay2aaSbaaSqaaiabdUgaRbqabaGccqaIOaakcqWG5bqEcqaI8baFcqaH4oqCdaWgaaWcbaGaem4AaSgabeaakiabiMcaPiabi2da9maaqahabeWcbaGaemOAaOMaeGypa0JaeGymaedabaGaem4raC0aaSbaaeaacqWGRbWAaeqaaaqdcqGHris5aOGaeqiXdq3aaSbaaSqaaiabdUgaRjabdQgaQbqabaGccqWGhbWrcqaIOaakcqWG5bqEcqaI8baFcqaH8oqBdaWgaaWcbaGaem4AaSMaemOAaOgabeaakiabiYcaSiabfo6atnaaBaaaleaacqWGRbWAcqWGQbGAaeqaaOGaeGykaKcaaa@56DF@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where the number of Gaussians to use for phenotype label <it>k </it>is given by <it>G</it><sub><it>k</it></sub>. This number may or may not be specified in advance resulting in a variety of different implementations. In ordinary MDA <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, one assumes that <it>G</it><sub><it>k </it></sub>is known in advance for each class <it>k </it>and that the covariance matrices are all identical (ie, &#931;<sub><it>kj </it></sub>= &#931;). However, these assumptions are not necessary and instead one can use the training data to learn the best mixture model fit for each phenotype class using for example the Bayesian Information Criterion (BIC) <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> or a variational Bayesian framework for model selection <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. This model selection step is a cluster-inference procedure that yields estimates for(<it>&#964;</it><sub><it>kj</it></sub>,<it>&#956;</it><sub><it>kj</it></sub>,&#931;<sub><it>kj</it></sub>, <it>G</it><sub><it>k</it></sub>), from which classification of test samples proceeds as before using the maximum probability criterion. Therefore, MDA is a direct generalisation of LDA and QDA and may reduce to these if the data does not support multiple components per phenotype class <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Classification in heterogeneous cancers: the MDAhet classifier</p>
            </st>
            <p>Using mixtures of Gaussians, the densities of each phenotype class can be estimated more accurately. Thus, provided that the inferred Gaussian components are biologically meaningful, this approach should in general lead to an improved classification performance. However, the implicit assumption in MDA is that we are still interested in classifying samples into the <it>C </it>phenotype classes, whereas in certain circumstances we may be only interested in classifying into certain subtypes within the phenotype classes. Therefore, while in MDA one allows for heterogeneity of each phenotype label by estimating the density of each class as a mixture of Gaussians, classification is subsequently performed into each phenotype class. On the other hand, it is possible to classify samples into the Gaussian subcomponents inferred for each phenotype class, a variation of MDA called Heterogeneous Mixture Discriminant Analysis (MDAhet), because this explicitly takes the heterogeneity of each phenotype class into account by attempting to classify the samples into these subcomponents.</p>
            <p>As an example, consider the case of two phenotype classes with MDA predicting two Gaussian components for each class. Thus, training data is used to learn the parameters and weights for four Gaussian clusters and classification of test samples is subsequently performed via the Bayes' classifier (equation 3) on these four subclasses. Note therefore that in MDAhet, the cluster-inference step of MDA is used to define the classes for which classification is then performed. Since these inferred classes make up subtypes of the original phenotype labels, this classification framework explicitly takes the heterogeneity of the phenotypes into account.</p>
            <p>In the context of cancer gene-expression studies it has been a problem in certain cancers to derive reliable prognostic classifiers as is the case for ER- breast cancer. Typically, in the context of prognosis one would expect discriminative gene-expression profiles to exhibit bimodal distributions with the two modes mapping roughly to the two prognostic groups (good and poor) <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. However, as previously shown <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, the best candidate gene-expression prognostic markers can also exhibit bimodal (or multimodal) profiles (ie, mixtures of Gaussians) within a given prognostic class, indicating that these phenotypes are themselves heterogeneous and that classification analysis should attempt to take this heterogeneity explicitly into account. Thus, in such circumstances the proposed classifier MDAhet seems the more appropriate classification scheme to use.</p>
         </sec>
         <sec>
            <st>
               <p>Time-dependent negative predictive value analysis</p>
            </st>
            <p>Following the work by Heagerty and colleagues <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, we estimate time-dependent sensitivity <it>SE</it>(<it>t</it>) and specificity <it>SP</it>(<it>t</it>) values using Kaplan-Meier estimators for the predicted subclasses. In our context, we assume that samples have been classified into two groups, so that the predictor <it>X </it>= 1 predicts poor prognosis, while <it>X </it>= 0 predicts good prognosis (ie, the 'good-up' group) Thus,</p>
            <p>
               <display-formula>
                  <m:math name="bcr2138-i7" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable columnalign="left">
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>S</m:mi>
                                       <m:mi>E</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>t</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>=</m:mo>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mn>1</m:mn>
                                             <m:mo>&#8722;</m:mo>
                                             <m:msub>
                                                <m:mover accent="true">
                                                   <m:mi>S</m:mi>
                                                   <m:mo>^</m:mo>
                                                </m:mover>
                                                <m:mrow>
                                                   <m:mi>K</m:mi>
                                                   <m:mi>M</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>t</m:mi>
                                             <m:mo stretchy="false">|</m:mo>
                                             <m:mi>X</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mn>1</m:mn>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mi>p</m:mi>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>X</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mn>1</m:mn>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:mn>1</m:mn>
                                             <m:mo>&#8722;</m:mo>
                                             <m:msub>
                                                <m:mover accent="true">
                                                   <m:mi>S</m:mi>
                                                   <m:mo>^</m:mo>
                                                </m:mover>
                                                <m:mrow>
                                                   <m:mi>K</m:mi>
                                                   <m:mi>M</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>t</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:mfrac>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>S</m:mi>
                                       <m:mi>P</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>t</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>=</m:mo>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mover accent="true">
                                                   <m:mi>S</m:mi>
                                                   <m:mo>^</m:mo>
                                                </m:mover>
                                                <m:mrow>
                                                   <m:mi>K</m:mi>
                                                   <m:mi>M</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>t</m:mi>
                                             <m:mo stretchy="false">|</m:mo>
                                             <m:mi>X</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mn>0</m:mn>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mi>p</m:mi>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>X</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mn>0</m:mn>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mover accent="true">
                                                   <m:mi>S</m:mi>
                                                   <m:mo>^</m:mo>
                                                </m:mover>
                                                <m:mrow>
                                                   <m:mi>K</m:mi>
                                                   <m:mi>M</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>t</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:mfrac>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xI8qiVKIOFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqbaeaabiqaaaqaaiabdofatjabdweafjabiIcaOiabdsha0jabiMcaPiabi2da9KqbaoaalaaabaGaeGikaGIaeGymaeJaeyOeI0Iafm4uamLbaKaadaWgaaqaaiabdUealjabd2eanbqabaGaeGikaGIaemiDaqNaeGiFaWNaemiwaGLaeGypa0JaeGymaeJaeGykaKIaeGykaKIaemiCaaNaeGikaGIaemiwaGLaeGypa0JaeGymaeJaeGykaKcabaGaeGymaeJaeyOeI0Iafm4uamLbaKaadaWgaaqaaiabdUealjabd2eanbqabaGaeGikaGIaemiDaqNaeGykaKcaaaGcbaGaem4uamLaemiuaaLaeGikaGIaemiDaqNaeGykaKIaeGypa0tcfa4aaSaaaeaacuWGtbWugaqcamaaBaaabaGaem4saSKaemyta0eabeaacqaIOaakcqWG0baDcqaI8baFcqWGybawcqaI9aqpcqaIWaamcqaIPaqkcqaIPaqkcqWGWbaCcqaIOaakcqWGybawcqaI9aqpcqaIWaamcqaIPaqkaeaacuWGtbWugaqcamaaBaaabaGaem4saSKaemyta0eabeaacqaIOaakcqWG0baDcqaIPaqkaaaaaaaa@7429@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <inline-formula><m:math name="bcr2138-i8" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>S</m:mi><m:mo>^</m:mo></m:mover><m:mrow><m:mi>K</m:mi><m:mi>M</m:mi></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafm4uamLbaKaadaWgaaWcbaGaem4saSKaemyta0eabeaaaaa@317D@</m:annotation></m:semantics></m:math></inline-formula>(<it>t</it>) denotes the Kaplan-Meier estimator for the overall survival function, while <inline-formula><m:math name="bcr2138-i8" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>S</m:mi><m:mo>^</m:mo></m:mover><m:mrow><m:mi>K</m:mi><m:mi>M</m:mi></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafm4uamLbaKaadaWgaaWcbaGaem4saSKaemyta0eabeaaaaa@317D@</m:annotation></m:semantics></m:math></inline-formula>(<it>t</it>|<it>X </it>= <it>c</it>) denotes the Kaplan-Meier survival estimate for the particular subgroup <it>X </it>= <it>c </it>(<it>c </it>= 1, 2) <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. In our context, however, the most important performance measure is the negative prective value (NPV), since this is the probability of correctly identifying a patient with a good prognosis. Adapting the same methods as used by Heagerty and colleagues <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> we can obtain time-dependent estimates for the NPV and positive predictive value (PPV) simply as:</p>
            <p>
               <display-formula>
                  <m:math name="bcr2138-i9" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable columnalign="left">
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>N</m:mi>
                                       <m:mi>P</m:mi>
                                       <m:mi>V</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>t</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>=</m:mo>
                                       <m:msub>
                                          <m:mover accent="true">
                                             <m:mi>S</m:mi>
                                             <m:mo>^</m:mo>
                                          </m:mover>
                                          <m:mrow>
                                             <m:mi>K</m:mi>
                                             <m:mi>M</m:mi>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>t</m:mi>
                                       <m:mo stretchy="false">|</m:mo>
                                       <m:mi>X</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>0</m:mn>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>P</m:mi>
                                       <m:mi>P</m:mi>
                                       <m:mi>V</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>t</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:msub>
                                          <m:mover accent="true">
                                             <m:mi>S</m:mi>
                                             <m:mo>^</m:mo>
                                          </m:mover>
                                          <m:mrow>
                                             <m:mi>K</m:mi>
                                             <m:mi>M</m:mi>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>t</m:mi>
                                       <m:mo stretchy="false">|</m:mo>
                                       <m:mi>X</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xI8qiVKIOFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqbaeaabiqaaaqaaiabd6eaojabdcfaqjabdAfawjabiIcaOiabdsha0jabiMcaPiabi2da9iqbdofatzaajaWaaSbaaSqaaiabdUealjabd2eanbqabaGccqaIOaakcqWG0baDcqaI8baFcqWGybawcqaI9aqpcqaIWaamcqaIPaqkaeaacqWGqbaucqWGqbaucqWGwbGvcqaIOaakcqWG0baDcqaIPaqkcqaI9aqpcqaIXaqmcqGHsislcuWGtbWugaqcamaaBaaaleaacqWGlbWscqWGnbqtaeqaaOGaeGikaGIaemiDaqNaeGiFaWNaemiwaGLaeGypa0JaeGymaeJaeGykaKcaaaaa@56B2@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>The seven-gene immune response module validates in six external cohorts</p>
            </st>
            <p>Applying a feature selection method designed to remove false positives <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> to an integrated expression data set of 186 untreated ER- samples across 5007 genes <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B8">8</abbr><abbr bid="B10">10</abbr></abbrgrp>, we previously identified a total of 22 prognostic genes, seven of which were associated with immune response functions (<it>XCL2, HLA-F, C1QA, TNFRSF17, SPP1, IGLC2, LY9</it>) <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Furthermore, mapping the seven-genes into those available on two external platforms we were able to separate two independent untreated populations of ER- breast cancer patients <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B12">12</abbr></abbrgrp> into two subgroups with statistically significant differences in survival outcome <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Specifically, samples overexpressing this module had significantly better clinical outcomes, as measured by absence of a poor outcome event (disease-specific death or the surrogate distant metastasis if the former was unavailable) (Figures <figr fid="F1">1a, b</figr>).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Heatmaps of seven-gene immune response-modules</p>
               </caption>
               <text>
                  <p><b>Heatmaps of seven-gene immune response-modules</b>. Heatmaps of gene expression of the seven-gene immune response-module for the training and six test cohorts (red = high relative expression, green = low). Samples are clustered into two groups according to the partitioning around medoids algorithm <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> (purple = group overexpressing the immune response-module, yellow = group underexpressing the immune response-module). Clinical outcome as defined by a disease-specific death event (or distant metastasis if the former is not available) is also shown (black = poor, grey = good, white = missing data). Note that in some cases not all seven genes could be mapped to the external platform. C1QA = complement component 1, q subcomponent, A chain; HLA-F = major histocompatibility complex, class I, F; IGLC2 = immunoglobulin lambda constant 2; LY9 = lymphocyte antigen 9; TNFRSF17 = tumour necrosis factor receptor superfamily member 17; SPP1 = secreted phosphoprotein 1 (osteopontin); XCL2 = chemokine (C motif) ligand 2.</p>
               </text>
               <graphic file="bcr2138-1"/>
            </fig>
            <p>These results motivated us to investigate the prognostic role of the immune response-module further in four additional ER- data sets for which gene expression and clinical data were available <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>. Using the same partitioning around medoids algorithm to separate each of these additional independent cohorts into two subgroups we were able to confirm the prognostic role of the immune response-module across a total of 469 ER- tumours (Figures <figr fid="F1">1c</figr> to <figr fid="F1">1f</figr>). Given that overexpression of the immune response-module consistently identified a good prognosis subgroup of ER- breast cancer, we asked if we could derive a robust single-sample prognostic predictor.</p>
         </sec>
         <sec>
            <st>
               <p>Deriving the prognostic classifier</p>
            </st>
            <p>To derive a single-sample prognostic classifier we first applied a mixture discriminant classifier to the same training set of 186 ER- patients and across the seven identified genes. The heterogeneity of the good-prognosis phenotype, as shown by the gene expression patterns of the immune response-module (Figure <figr fid="F1">1</figr>), suggested to us that MDA <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> would be an appropriate classification method to use, since it is designed to work for such heterogeneous phenotypes. Specifically, the MDA classifier estimates, from the training data, densities for each of the good and poor prognosis phenotypes as mixtures of two Gaussians (Figure <figr fid="F2">2</figr>). The choice of two Gaussians to model each phenotype was not arbitrary but followed from the application of a variational Bayesian algorithm that infers the optimal number of Gaussians to use <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> (data not shown). Thus, using the training data, patients with a good prognosis were divided up into two groups, one with high relative expression of the immune response-genes (the 'good-up' group) and another with relative low expression (the 'good-down' group). A similar subdivision was performed for the poor prognosis patients to yield 'poor-up' and 'poor-down' subgroups. The training process involves learning the mean expression vectors, covariance matrices and weights for each of the four subgroups (Table <tblr tid="T1">1</tblr>).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>The MDA and MDAhet classifier</p>
               </caption>
               <text>
                  <p><b>The MDA and MDAhet classifier</b>. Four two-dimensional projections of the seven-dimensional Mixture Discriminant Analysis (MDA) and Heterogeneous Mixture Discriminant Analysis (MDAhet) classifiers. Scatterplots show projections of the training expression data (183 oestrogen receptor negative samples) onto arbitrarily chosen two-dimensional subspaces spanned by the genes <it>HLA-F </it>and <it>IGLC2</it>, <it>LY9 </it>and <it>TNFRSF17</it>, <it>SPP1 </it>and <it>XCL2</it>, and <it>IGLC2 </it>and <it>C1QA</it>. Codings: black = poor outcome, grey = good outcome, triangle = training samples classified into the good prognosis subgroup defined by overexpression of seven-gene module 'good-up', circle = training samples not classified into 'good-up' group. In addition, the means and covariance-curves of the two Gaussians that approximate each of the poor (black ellipses) and good outcome (grey ellipses) classes are shown. C1QA = complement component 1, q subcomponent, A chain; HLA-F = major histocompatibility complex, class I, F; IGLC2 = immunoglobulin lambda constant 2; LY9 = lymphocyte antigen 9; TNFRSF17 = tumour necrosis factor receptor superfamily member 17; SPP1 = secreted phosphoprotein 1 (osteopontin); XCL2 = chemokine (C motif) ligand 2.</p>
               </text>
               <graphic file="bcr2138-2"/>
            </fig>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>The Heterogeneous Mixture Discriminant Analysis (MDAhet) classifier.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>
                           <inline-formula>
                              <m:math name="bcr2138-i10" xmlns:m="http://www.w3.org/1998/Math/MathML">
                                 <m:semantics>
                                    <m:mover accent="true">
                                       <m:mi>&#956;</m:mi>
                                       <m:mo>^</m:mo>
                                    </m:mover>
                                    <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafqiVd0MbaKaaaaa@2F96@</m:annotation>
                                 </m:semantics>
                              </m:math>
                           </inline-formula>
                        </p>
                     </c>
                     <c ca="left">
                        <p>good-down</p>
                     </c>
                     <c ca="left">
                        <p>good-up</p>
                     </c>
                     <c ca="left">
                        <p>poor-down</p>
                     </c>
                     <c ca="left">
                        <p>poor-up</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>HLA-F</p>
                     </c>
                     <c ca="left">
                        <p>-0.31</p>
                     </c>
                     <c ca="left">
                        <p>0.65</p>
                     </c>
                     <c ca="left">
                        <p>-0.29</p>
                     </c>
                     <c ca="left">
                        <p>0.40</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IGLC2</p>
                     </c>
                     <c ca="left">
                        <p>-0.56</p>
                     </c>
                     <c ca="left">
                        <p>0.98</p>
                     </c>
                     <c ca="left">
                        <p>-0.46</p>
                     </c>
                     <c ca="left">
                        <p>0.68</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LY9</p>
                     </c>
                     <c ca="left">
                        <p>-0.29</p>
                     </c>
                     <c ca="left">
                        <p>0.58</p>
                     </c>
                     <c ca="left">
                        <p>-0.52</p>
                     </c>
                     <c ca="left">
                        <p>1.12</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TNFRSF17</p>
                     </c>
                     <c ca="left">
                        <p>-0.41</p>
                     </c>
                     <c ca="left">
                        <p>0.97</p>
                     </c>
                     <c ca="left">
                        <p>-0.58</p>
                     </c>
                     <c ca="left">
                        <p>0.59</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SPP1</p>
                     </c>
                     <c ca="left">
                        <p>0.01</p>
                     </c>
                     <c ca="left">
                        <p>-0.38</p>
                     </c>
                     <c ca="left">
                        <p>0.47</p>
                     </c>
                     <c ca="left">
                        <p>-0.57</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>XCL2</p>
                     </c>
                     <c ca="left">
                        <p>-0.36</p>
                     </c>
                     <c ca="left">
                        <p>0.67</p>
                     </c>
                     <c ca="left">
                        <p>-0.41</p>
                     </c>
                     <c ca="left">
                        <p>0.58</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C1QA</p>
                     </c>
                     <c ca="left">
                        <p>-0.39</p>
                     </c>
                     <c ca="left">
                        <p>0.79</p>
                     </c>
                     <c ca="left">
                        <p>-0.40</p>
                     </c>
                     <c ca="left">
                        <p>0.57</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <inline-formula>
                              <m:math name="bcr2138-i11" xmlns:m="http://www.w3.org/1998/Math/MathML">
                                 <m:semantics>
                                    <m:mover accent="true">
                                       <m:mi>&#931;</m:mi>
                                       <m:mo>^</m:mo>
                                    </m:mover>
                                    <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafu4OdmLbaKaaaaa@2F64@</m:annotation>
                                 </m:semantics>
                              </m:math>
                           </inline-formula>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.74</p>
                     </c>
                     <c ca="left">
                        <p>0.74</p>
                     </c>
                     <c ca="left">
                        <p>0.58</p>
                     </c>
                     <c ca="left">
                        <p>0.58</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><inline-formula><m:math name="bcr2138-i12" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mover accent="true"><m:mi>&#960;</m:mi><m:mo>^</m:mo></m:mover><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafqiWdaNbaKaaaaa@2F9D@</m:annotation></m:semantics></m:math></inline-formula> &#8733; <it>I</it></p>
                     </c>
                     <c ca="left">
                        <p>0.31</p>
                     </c>
                     <c ca="left">
                        <p>0.28</p>
                     </c>
                     <c ca="left">
                        <p>0.32</p>
                     </c>
                     <c ca="left">
                        <p>0.09</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Estimated mean expression profiles <inline-formula><m:math name="bcr2138-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mover accent="true"><m:mi>&#956;</m:mi><m:mo>^</m:mo></m:mover><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafqiVd0MbaKaaaaa@2F96@</m:annotation></m:semantics></m:math></inline-formula>, covariance matrices <inline-formula><m:math name="bcr2138-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mover accent="true"><m:mi>&#931;</m:mi><m:mo>^</m:mo></m:mover><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafu4OdmLbaKaaaaa@2F64@</m:annotation></m:semantics></m:math></inline-formula> and weights <inline-formula><m:math name="bcr2138-i12" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mover accent="true"><m:mi>&#960;</m:mi><m:mo>^</m:mo></m:mover><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafqiWdaNbaKaaaaa@2F9D@</m:annotation></m:semantics></m:math></inline-formula> for the four subgroups, as estimated from the training set. Note that the optimal covariance matrices were all proportional to the identity matrix <inline-formula><m:math name="bcr2138-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mover accent="true"><m:mi>&#931;</m:mi><m:mo>^</m:mo></m:mover><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafu4OdmLbaKaaaaa@2F64@</m:annotation></m:semantics></m:math></inline-formula> &#8733; <it>I </it>and are thus summarised by a single value, the variance of expression of the corresponding cluster. C1QA, complement component 1, q subcomponent, A chain; HLA-F, major histocompatibility complex, class I, F; IGLC2, immunoglobulin lambda constant 2; LY9, lymphocyte antigen 9; TNFRSF17, tumour necrosis factor receptor superfamily member 17; SPP1, secreted phosphoprotein 1 (osteopontin); XCL2, chemokine (C motif) ligand 2.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Evaluation of the prognostic classifier: MDAhet versus MDA</p>
            </st>
            <p>Having estimated the parameters for each of the phenotypes, external samples can then be classified by applying the MDA to the test sample's gene expression profile, yielding probabilities of the sample belonging to each phenotype class, and subsequently using the maximum probability criterion for class assignment. Since each phenotype class is modelled as a mixture of two Gaussians (Figure <figr fid="F2">2</figr>), class assignment can also be made on the four subclasses, a novel variation of MDA called MDAhet because this explicitly takes the heterogeneity of each phenotype in the classification process into account. This novel variation of MDA is crucial as it allows for a more reliable identification of good prognosis samples (ie, the NPV).</p>
            <p>In detail, MDAhet assigns a test sample with a seven-gene expression profile <it>y </it>to one of the four subclasses <it>c </it>(<it>c </it>= 1, 2, 3, 4) using the maximum probability criterion</p>
            <p>
               <display-formula id="M5">
                  <m:math name="bcr2138-i13" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>c</m:mi>
                           <m:mo>=</m:mo>
                           <m:mo stretchy="false">{</m:mo>
                           <m:mi>j</m:mi>
                           <m:mo>:</m:mo>
                           <m:munder>
                              <m:mstyle displaystyle="true">
                                 <m:mrow>
                                    <m:mi>m</m:mi>
                                    <m:mi>a</m:mi>
                                    <m:mi>x</m:mi>
                                 </m:mrow>
                              </m:mstyle>
                              <m:mrow>
                                 <m:mi>j</m:mi>
                                 <m:mo>=</m:mo>
                                 <m:mn>1,2,3,4</m:mn>
                              </m:mrow>
                           </m:munder>
                           <m:mfrac>
                              <m:mrow>
                                 <m:msub>
                                    <m:mover accent="true">
                                       <m:mi>&#960;</m:mi>
                                       <m:mo>^</m:mo>
                                    </m:mover>
                                    <m:mi>j</m:mi>
                                 </m:msub>
                                 <m:mi>G</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>y</m:mi>
                                 <m:mo stretchy="false">|</m:mo>
                                 <m:msub>
                                    <m:mover accent="true">
                                       <m:mi>&#956;</m:mi>
                                       <m:mo>^</m:mo>
                                    </m:mover>
                                    <m:mi>j</m:mi>
                                 </m:msub>
                                 <m:mn>,</m:mn>
                                 <m:msub>
                                    <m:mover accent="true">
                                       <m:mi>&#931;</m:mi>
                                       <m:mo>^</m:mo>
                                    </m:mover>
                                    <m:mi>j</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:munderover>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mi>k</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                       <m:mn>4</m:mn>
                                    </m:munderover>
                                 </m:mstyle>
                                 <m:msub>
                                    <m:mover accent="true">
                                       <m:mi>&#960;</m:mi>
                                       <m:mo>^</m:mo>
                                    </m:mover>
                                    <m:mi>k</m:mi>
                                 </m:msub>
                                 <m:mi>G</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>y</m:mi>
                                 <m:mo stretchy="false">|</m:mo>
                                 <m:msub>
                                    <m:mover accent="true">
                                       <m:mi>&#956;</m:mi>
                                       <m:mo>^</m:mo>
                                    </m:mover>
                                    <m:mi>k</m:mi>
                                 </m:msub>
                                 <m:mn>,</m:mn>
                                 <m:msub>
                                    <m:mover accent="true">
                                       <m:mi>&#931;</m:mi>
                                       <m:mo>^</m:mo>
                                    </m:mover>
                                    <m:mi>k</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mfrac>
                           <m:mo stretchy="false">}</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xI8qiVKIOFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4yamMaeGypa0JaeG4EaSNaemOAaOMaeiOoaOZaaybuaeqaleaacqWGQbGAcqaI9aqpcqaIXaqmcqaISaalcqaIYaGmcqaISaalcqaIZaWmcqaISaalcqaI0aanaeqakeaacqGGTbqBcqGGHbqycqGG4baEaaqcfa4aaSaaaeaacuaHapaCgaqcamaaBaaabaGaemOAaOgabeaacqWGhbWrcqaIOaakcqWG5bqEcqaI8baFcuaH8oqBgaqcamaaBaaabaGaemOAaOgabeaacqaISaalcuqHJoWugaqcamaaBaaabaGaemOAaOgabeaacqaIPaqkaeaadaaeWbqabeaacqWGRbWAcqaI9aqpcqaIXaqmaeaacqaI0aanaiabggHiLdGafqiWdaNbaKaadaWgaaqaaiabdUgaRbqabaGaem4raCKaeGikaGIaemyEaKNaeGiFaWNafqiVd0MbaKaadaWgaaqaaiabdUgaRbqabaGaeGilaWIafu4OdmLbaKaadaWgaaqaaiabdUgaRbqabaGaeGykaKcaaOGaeGyFa0haaa@6B40@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>G </it>denotes the seven-dimensional multivariate Gaussian and the parameters <inline-formula><m:math name="bcr2138-i14" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mo stretchy="false">(</m:mo><m:msub><m:mover accent="true"><m:mi>&#956;</m:mi><m:mo>^</m:mo></m:mover><m:mi>j</m:mi></m:msub><m:mn>,</m:mn><m:msub><m:mover accent="true"><m:mi>&#931;</m:mi><m:mo>^</m:mo></m:mover><m:mi>j</m:mi></m:msub><m:mn>,</m:mn><m:msub><m:mover accent="true"><m:mi>&#960;</m:mi><m:mo>^</m:mo></m:mover><m:mi>j</m:mi></m:msub><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuqqRPxAKvMB6bYrY9gDLn3AGiuraeXatLxBI9gBaebbnrfifHhDYfgasaacPi6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeGikaGIafqiVd0MbaKaadaWgaaWcbaGaemOAaOgabeaakiabiYcaSiqbfo6atzaajaWaaSbaaSqaaiabdQgaQbqabaGccqaISaalcuaHapaCgaqcamaaBaaaleaacqWGQbGAaeqaaOGaeGykaKcaaa@3B3A@</m:annotation></m:semantics></m:math></inline-formula> are estimated from the training set (Table <tblr tid="T1">1</tblr>).</p>
            <p>The classification distribution of samples from the six external cohorts into the four subclasses as determined by MDAhet showed that test samples classified most often into the 'poor-down' and 'good-up' classes (Table <tblr tid="T2">2</tblr>). Since samples falling into the 'good-down' and 'poor-down' categories could not be discriminated in terms of prognosis (a sign that these subclasses are not distinguishable on the basis of the expression of these seven genes) we can pool these together in order to compare more objectively the predicted proportions with those estimated from the training set. This revealed that for four cohorts, JRH-2 (8 vs. 16), CAL (13 vs. 33), UNC248 (28 vs. 56) and Loi (13 vs. 27), the 'good-up' group is about half the size of the pooled 'down' group (Table <tblr tid="T2">2</tblr>), which is consistent with the relative proportions estimated from the training set (0.28 vs. 0.63). For the other two cohorts, relative proportions still did not deviate markedly from the training set proportions, although some deviations might be expected due to inherent cohort differences.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Classification of test samples.</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>Test cohort</p>
                     </c>
                     <c ca="left">
                        <p>Size</p>
                     </c>
                     <c ca="left">
                        <p>good-down</p>
                     </c>
                     <c ca="left">
                        <p>good-up</p>
                     </c>
                     <c ca="left">
                        <p>poor-down</p>
                     </c>
                     <c ca="left">
                        <p>poor-up</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>UPP</p>
                     </c>
                     <c ca="left">
                        <p>34</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>14</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>JRH-2</p>
                     </c>
                     <c ca="left">
                        <p>24</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>8</p>
                     </c>
                     <c ca="left">
                        <p>11</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CAL</p>
                     </c>
                     <c ca="left">
                        <p>46</p>
                     </c>
                     <c ca="left">
                        <p>13</p>
                     </c>
                     <c ca="left">
                        <p>13</p>
                     </c>
                     <c ca="left">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Kreike</p>
                     </c>
                     <c ca="left">
                        <p>97</p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>35</p>
                     </c>
                     <c ca="left">
                        <p>41</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>UNC248</p>
                     </c>
                     <c ca="left">
                        <p>85</p>
                     </c>
                     <c ca="left">
                        <p>28</p>
                     </c>
                     <c ca="left">
                        <p>28</p>
                     </c>
                     <c ca="left">
                        <p>28</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Loi</p>
                     </c>
                     <c ca="left">
                        <p>40</p>
                     </c>
                     <c ca="left">
                        <p>8</p>
                     </c>
                     <c ca="left">
                        <p>13</p>
                     </c>
                     <c ca="left">
                        <p>19</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Distribution of test samples into the four subclasses by the Heterogeneous Mixture Discriminant Analysis (MDAhet) classifier.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Validation of MDAhet in external cohorts</p>
            </st>
            <p>To evaluate the performance of the MDAhet classifier in the training and test cohorts we used several different measures and models of prognostic separation, depending on the variable of clinical outcome used. As binary outcome we used absence or presence of a disease-specific death event, or the surrogate-distant metastasis if the former was not available. Since this does not take time dependence of events into account, binary outcome was also used at four years after surgery adapting methods for time-dependent receiver operator curve (ROC) analysis <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. In addition, we considered continuous outcome in full stratified Cox-proportional hazard regression models, where stratification was performed on a per cohort basis to take inter-cohort differences in the types of survival data (ie, whether disease-specific survival or distant metastasis) into account.</p>
            <p>Performance indicators based on the binary outcome measures are shown in Table <tblr tid="T3">3</tblr>. The most important performance indicator in our context is the NPV, since this represents the probability of correctly identifying a good prognosis patient. As shown, the NPV was very high with average values of 0.8 in the training sets and 0.96 in the test sets (range 0.85 to 1). Indeed, a significant improvement over simple predictions based on <it>a priori </it>known proportions was observed in all test sets (Table <tblr tid="T3">3</tblr>). In line with these results, sensitivity values were also very high with average values of 0.84 in training sets and 0.94 in test sets (range 0.76 to 1). Results evaluated at four years after surgery were, as expected, not markedly different, indicating that the prognostic classifier performs equally well in terms of short-term survival outcomes (Table <tblr tid="T3">3</tblr>).</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Performance measures of seven-gene Heterogeneous Mixture Discriminant Analysis (MDAhet) classifier</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Training set</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Test</p>
                     </c>
                     <c ca="left">
                        <p>Sets</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cohort</p>
                     </c>
                     <c ca="left">
                        <p>NKI2+EMC+NCH</p>
                     </c>
                     <c ca="left">
                        <p>UPP</p>
                     </c>
                     <c ca="left">
                        <p>JRH-2</p>
                     </c>
                     <c ca="left">
                        <p>CAL</p>
                     </c>
                     <c ca="left">
                        <p>Kreike</p>
                     </c>
                     <c ca="left">
                        <p>UNC248</p>
                     </c>
                     <c ca="left">
                        <p>Loi</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cohort size</p>
                     </c>
                     <c ca="left">
                        <p>186</p>
                     </c>
                     <c ca="left">
                        <p>34</p>
                     </c>
                     <c ca="left">
                        <p>24</p>
                     </c>
                     <c ca="left">
                        <p>46</p>
                     </c>
                     <c ca="left">
                        <p>97</p>
                     </c>
                     <c ca="left">
                        <p>85</p>
                     </c>
                     <c ca="left">
                        <p>40</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Annotated</p>
                     </c>
                     <c ca="left">
                        <p>183</p>
                     </c>
                     <c ca="left">
                        <p>31</p>
                     </c>
                     <c ca="left">
                        <p>24</p>
                     </c>
                     <c ca="left">
                        <p>46</p>
                     </c>
                     <c ca="left">
                        <p>71</p>
                     </c>
                     <c ca="left">
                        <p>80</p>
                     </c>
                     <c ca="left">
                        <p>34</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Good prognosis (%)</p>
                     </c>
                     <c ca="left">
                        <p>59</p>
                     </c>
                     <c ca="left">
                        <p>81</p>
                     </c>
                     <c ca="left">
                        <p>75</p>
                     </c>
                     <c ca="left">
                        <p>67</p>
                     </c>
                     <c ca="left">
                        <p>76</p>
                     </c>
                     <c ca="left">
                        <p>74</p>
                     </c>
                     <c ca="left">
                        <p>76</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Poor prognosis (%)</p>
                     </c>
                     <c ca="left">
                        <p>41</p>
                     </c>
                     <c ca="left">
                        <p>19</p>
                     </c>
                     <c ca="left">
                        <p>25</p>
                     </c>
                     <c ca="left">
                        <p>33</p>
                     </c>
                     <c ca="left">
                        <p>24</p>
                     </c>
                     <c ca="left">
                        <p>26</p>
                     </c>
                     <c ca="left">
                        <p>24</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Chemotherapy (%)</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>67</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>66</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>MDA</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>NPV (%)</p>
                     </c>
                     <c ca="left">
                        <p>74</p>
                     </c>
                     <c ca="left">
                        <p>92</p>
                     </c>
                     <c ca="left">
                        <p>93</p>
                     </c>
                     <c ca="left">
                        <p>69</p>
                     </c>
                     <c ca="left">
                        <p>83</p>
                     </c>
                     <c ca="left">
                        <p>74</p>
                     </c>
                     <c ca="left">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PPV (%)</p>
                     </c>
                     <c ca="left">
                        <p>55</p>
                     </c>
                     <c ca="left">
                        <p>28</p>
                     </c>
                     <c ca="left">
                        <p>56</p>
                     </c>
                     <c ca="left">
                        <p>35</p>
                     </c>
                     <c ca="left">
                        <p>29</p>
                     </c>
                     <c ca="left">
                        <p>27</p>
                     </c>
                     <c ca="left">
                        <p>40</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SE (%)</p>
                     </c>
                     <c ca="left">
                        <p>69</p>
                     </c>
                     <c ca="left">
                        <p>83</p>
                     </c>
                     <c ca="left">
                        <p>83</p>
                     </c>
                     <c ca="left">
                        <p>53</p>
                     </c>
                     <c ca="left">
                        <p>71</p>
                     </c>
                     <c ca="left">
                        <p>38</p>
                     </c>
                     <c ca="left">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SP (%)</p>
                     </c>
                     <c ca="left">
                        <p>61</p>
                     </c>
                     <c ca="left">
                        <p>48</p>
                     </c>
                     <c ca="left">
                        <p>78</p>
                     </c>
                     <c ca="left">
                        <p>52</p>
                     </c>
                     <c ca="left">
                        <p>44</p>
                     </c>
                     <c ca="left">
                        <p>63</p>
                     </c>
                     <c ca="left">
                        <p>54</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>MDAhet</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><b>NPV </b>(%)</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>80</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>100</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>100</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>100</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>85</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>92</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>100</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PPV (%)</p>
                     </c>
                     <c ca="left">
                        <p>51</p>
                     </c>
                     <c ca="left">
                        <p>30</p>
                     </c>
                     <c ca="left">
                        <p>37</p>
                     </c>
                     <c ca="left">
                        <p>45</p>
                     </c>
                     <c ca="left">
                        <p>29</p>
                     </c>
                     <c ca="left">
                        <p>36</p>
                     </c>
                     <c ca="left">
                        <p>35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><b>SE </b>(%)</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>84</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>100</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>100</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>100</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>76</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>90</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>100</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SP (%)</p>
                     </c>
                     <c ca="left">
                        <p>44</p>
                     </c>
                     <c ca="left">
                        <p>44</p>
                     </c>
                     <c ca="left">
                        <p>44</p>
                     </c>
                     <c ca="left">
                        <p>42</p>
                     </c>
                     <c ca="left">
                        <p>41</p>
                     </c>
                     <c ca="left">
                        <p>42</p>
                     </c>
                     <c ca="left">
                        <p>42</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><b>NPV at 4 years </b>(%)</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>83</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>100</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>100</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>100</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>88</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>93</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>100</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PPV at 4 years (%)</p>
                     </c>
                     <c ca="left">
                        <p>42</p>
                     </c>
                     <c ca="left">
                        <p>24</p>
                     </c>
                     <c ca="left">
                        <p>33</p>
                     </c>
                     <c ca="left">
                        <p>35</p>
                     </c>
                     <c ca="left">
                        <p>25</p>
                     </c>
                     <c ca="left">
                        <p>45</p>
                     </c>
                     <c ca="left">
                        <p>35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><b>SE at 4 years </b>(%)</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>83</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>100</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>100</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>100</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>79</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>88</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>100</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SP at 4 years (%)</p>
                     </c>
                     <c ca="left">
                        <p>44</p>
                     </c>
                     <c ca="left">
                        <p>42</p>
                     </c>
                     <c ca="left">
                        <p>43</p>
                     </c>
                     <c ca="left">
                        <p>37</p>
                     </c>
                     <c ca="left">
                        <p>40</p>
                     </c>
                     <c ca="left">
                        <p>45</p>
                     </c>
                     <c ca="left">
                        <p>43</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LN</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>NPV (%)</p>
                     </c>
                     <c ca="left">
                        <p>61</p>
                     </c>
                     <c ca="left">
                        <p>84</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>85</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>85</p>
                     </c>
                     <c ca="left">
                        <p>76</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PPV (%)</p>
                     </c>
                     <c ca="left">
                        <p>50</p>
                     </c>
                     <c ca="left">
                        <p>30</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>46</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>37</p>
                     </c>
                     <c ca="left">
                        <p>0<sup>a</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SE (%)</p>
                     </c>
                     <c ca="left">
                        <p>27</p>
                     </c>
                     <c ca="left">
                        <p>50</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>80</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>71</p>
                     </c>
                     <c ca="left">
                        <p>0<sup>a</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SP (%)</p>
                     </c>
                     <c ca="left">
                        <p>81</p>
                     </c>
                     <c ca="left">
                        <p>70</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>55</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>58</p>
                     </c>
                     <c ca="left">
                        <p>100<sup>a</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>NPV at 4 years (%)</p>
                     </c>
                     <c ca="left">
                        <p>67</p>
                     </c>
                     <c ca="left">
                        <p>88</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>90</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>82</p>
                     </c>
                     <c ca="left">
                        <p>77</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PPV at 4 years (%)</p>
                     </c>
                     <c ca="left">
                        <p>39</p>
                     </c>
                     <c ca="left">
                        <p>37</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>38</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>47</p>
                     </c>
                     <c ca="left">
                        <p>0<sup>a</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SE at 4 years (%)</p>
                     </c>
                     <c ca="left">
                        <p>25</p>
                     </c>
                     <c ca="left">
                        <p>84</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>85</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>69</p>
                     </c>
                     <c ca="left">
                        <p>0<sup>a</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SP at 4 years (%)</p>
                     </c>
                     <c ca="left">
                        <p>80</p>
                     </c>
                     <c ca="left">
                        <p>74</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>53</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>60</p>
                     </c>
                     <c ca="left">
                        <p>100<sup>a</sup></p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>a</sup>Loi's cohort consists only of LN- samples. The table summarises performance indicators of the seven-gene MDAhet classifier and lymph node status (LN) across oestrogen receptor negative (ER-) training and test sets. For each cohort, we also give the number of tumours (cohort size), number of clinically annotated tumours (annotated), the percentage of good and poor prognosis patients (as defined by disease-specific death or distant metastasis event) and the percentage of patients treated with chemotherapy. NPV, PPV, SE and SP are evaluated at four years and at end of study. NPV, negative predictive value (precision for good prognosis); PPV, positive predictive value (precision for poor prognosis); SE, sensitivity; SP, specificity.</p>
               </tblfn>
            </tbl>
            <p>Stratified Cox-regression models further confirmed the much better prognosis of the predicted subclass overexpressing the immune response-module relative to samples classified as poor prognosis (Table <tblr tid="T4">4</tblr>). Specifically, samples classified as good prognosis with overexpression of the immune response-module ('good-up' group) have less than half the risk of a poor outcome event (death or distant metastasis) relative to samples classified as poor prognosis, a result that we found to be independent of LN status and chemotherapy (Table <tblr tid="T4">4</tblr>). Note that four of the test cohorts were untreated (no chemotherapy) populations (Table <tblr tid="T3">3</tblr>), such as the training set itself, confirming the prognostic relevance of the classifier, and that chemotherapy itself was not prognostic in the two partially treated populations (Table <tblr tid="T4">4</tblr>).</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Stratified Cox-regression model of seven-gene Heterogeneous Mixture Discriminant Analysis (MDAhet) classifier</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Training set</p>
                     </c>
                     <c ca="left">
                        <p>Combined test set</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Annotated</p>
                     </c>
                     <c ca="left">
                        <p>183</p>
                     </c>
                     <c ca="left">
                        <p>286</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>MDAhet</p>
                     </c>
                     <c ca="left">
                        <p>0.29 (0.16&#8211;0.56) <it>p </it>= 0.0002</p>
                     </c>
                     <c ca="left">
                        <p>0.15 (0.07&#8211;0.36) <it>p </it>&lt; 0.000001</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LN</p>
                     </c>
                     <c ca="left">
                        <p>1.31 (0.73&#8211;2.33) <it>p </it>= 0.36</p>
                     </c>
                     <c ca="left">
                        <p>3.25 (1.61&#8211;6.58) <it>p </it>= 0.001</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CT</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>0.68 (0.34&#8211;1.39) <it>p </it>= 0.29</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LN+MDAhet</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>MDAhet</p>
                     </c>
                     <c ca="left">
                        <p>0.29 (0.15&#8211;0.55) <it>p </it>= 0.0002</p>
                     </c>
                     <c ca="left">
                        <p>0.06 (0.01&#8211;0.27) <it>p </it>= 0.0002</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LN</p>
                     </c>
                     <c ca="left">
                        <p>1.59 (0.81&#8211;3.11) <it>p </it>= 0.18</p>
                     </c>
                     <c ca="left">
                        <p>3.68 (1.32&#8211;10.13) <it>p </it>= 0.012</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CT+MDAhet</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>MDAhet</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>0.27 (0.15&#8211;0.48) <it>p </it>= 0.00001</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CT</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>0.76 (0.27&#8211;2.13) <it>p </it>= 0.6</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Stratified Cox-proportional hazards regression performance of the seven-gene MDAhet classifier, lymph node status (LN) and chemotherapy (CT) across oestrogen receptor-negative training and test sets, with strata defined by cohorts. For the univariate analysis, Hazard ratio (HR), 95% confidence intervals (CI) and LR-test p-value are given. In the multivariate models, p-values quoted are from the corresponding Wald test.</p>
               </tblfn>
            </tbl>
            <p>Kaplan-Meier survival curves stratified according to the type of survival data (disease-specific death or distant metastasis) further confirmed the better prognosis of the predicted 'good-up' group (Figure <figr fid="F3">3</figr>). These survival curves further show that the classifier in the test sets is unable to discriminate the good prognosis samples that do not overexpress the immune response-module ('good-down') from the poor outcome samples. This result is expected since the seven-gene module is hypothesised to only identify a particular subgroup of good prognosis <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Kaplan-Meier curves for MDAhet classifier</p>
               </caption>
               <text>
                  <p><b>Kaplan-Meier curves for MDAhet classifier</b>. Kaplan-Meier survival curves for the three subclasses 'good-down' (light green), 'good-up' (dark green), 'poor-down' (blue), as predicted by the Heterogeneous Mixture Discriminant Analysis (MDAhet) classifier, in the training and combined test cohorts. The class 'poor-up' is not shown due to small sample size (Table 2). Hazard ratios (HR), 95% confience intervals (CI) and log-rank test p-values are given for the predicted 'good-up' class relative to the predicted poor prognostic classes, as given by a stratified Cox-regression model with strata defined by cohorts. The Kaplan-Meier curves for each subclass is shown separately for disease-specific survival (solid lines) and distant metastasis (broken lines).</p>
               </text>
               <graphic file="bcr2138-3"/>
            </fig>
            <p>Since the maximum probability criterion assigns test samples to classes without regard to how large the maximal posterior class probabilites are, we tested the robustness of our results by only classifying samples passing a minimum probability threshold. For a probability threshold of 0.3 (already significant compared with the minimum possible maximal probability of 1/4 = 0.25), 94% of all test samples passed this threshold, indicating that our results are indeed robust. For a threshold of 0.4, we found 68%of samples were classifiable and results were still in line with those reported for the minimum threshold of 0.25 (data not shown).</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Based on the seven genes we had identified previously as defining an immune response-related prognostic module in ER- breast cancer, we have now constructed a single-sample classifier and have validated it in six external, independent ER- cohorts, four of which were untreated populations. Remarkably, we find that overexpression of this immune response-module considerably reduces the risk of disease-specific death or distant metastasis in both untreated and partially untreated ER- populations (HR = 0.15; 95% confidence interval 0.07 to 0.36; <it>p </it>&lt; 10<sup>-6</sup>) (Table <tblr tid="T4">4</tblr>). Importantly, we also found that this association is independent of LN status (Table <tblr tid="T4">4</tblr>). In terms of binary outcome measures, the classifier shows clinical promise with consistently high NPV values across all test cohorts, even when time-dependent outcome measures are taken into account (Table <tblr tid="T3">3</tblr>). For example, the NPV and sensitivity values at four years after surgery were 100% in four of the six cohorts and in all cases larger than 85%. Thus, the classifier could potentially be used for identifying high-grade ER- patients that may benefit from a less agressive or nonexistent course of chemotherapy.</p>
         <p>The remarkably high NPV values in the test cohorts, however, raise some important questions. First, we found that the performance in the test sets was better than in the training set (Tables <tblr tid="T3">3</tblr> and <tblr tid="T4">4</tblr>). While this is true for the NPV analysis, the Cox-regression analysis also shows that the 95% confidence intervals (CI) are overlapping. Therefore, statistically, there is no discrepancy. In any case, a plausible explanation for why the performance is slightly worse in the training set could be related to the merging step involved in building the training set <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. By merging different microarray expression sets together we gain power from the considerable increase in sample size; however, merging may also compromise the accuracy of the expression profiles, because these need to be renormalised before merging is performed <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Therefore, it is entirely plausible that small errors in the merging procedure may have affected the classifier's performance in the training set. In this context it is important to point out that the training set is only used to derive a classifier and that the gold-standard evaluation of any classifier is determined by its performance in the test cohorts <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. As shown here, the MDAhet classifier is strongly prognostic across six totally independent breast cancer cohorts profiled on different array platforms.</p>
         <p>A second important point relates to the nature of the MDAhet classifier. As remarked in a previous study <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, in the context of validating gene expression signatures across different array platforms, some renormalisation is inevitable. Thus, our MDAhet classifier is not strictly speaking a single-sample predictor because the gen