Patients and Samples
US Oncology 02-103 was a single arm neoadjuvant trial involving women with stage II/III breast cancer. Patients with Human Epidermal Growth Factor Receptor 2 (HER2)-negative cancer received FEC 100, which consisted of 5-fluorouracil (Adrucil [5-FU], 500 mg/m2) + epirubicin (Ellence, 100 mg/m2) + cyclophosphamide (Cytoxan, 500 mg/m2) IV on Day 1 every 21 days (x 4 cycles, 12 weeks total) followed by wTX, Taxotere (35 mg/m2) weekly Days 1 and 8 every 21 days (x 4 cycles, 12 weeks total), (FEX/TX group). Patients with HER2-positive tumors received FEC 75, which consisted of 5-fluorouracil (Adrucil [5-FU], 500 mg/m2) + epirubicin (Ellence, 75 mg/m2) + cyclophosphamide (Cytoxan, 500 mg/m2) IV on Day 1 every 21 days with trastuzumab (Herceptin) 4 mg/kg IVx1 as initial loading dose on Day 1 followed by 2 mg/kg IV weekly x12 (x4 cycles, 12 weeks total), followed by wTX, Taxotere (35 mg/m2) weekly Days 1 and 8 every 21 days (x 4 cycles, 12 weeks total) and Herceptin 2 mg/kg weekly x12, (FEC/TX plus H group). HER2 status was assessed by immunohistochemistry (IHC) or fluorescent in situ hybridization (FISH). IHC ≥3+ was considered positive and IHC 1+ or 2+ was confirmed by FISH. The primary study endpoint was pathologic complete response (pCR) rate defined as no viable invasive cancer in the breast and lymph nodes after completion of neoadjuvant chemotherapy. The US Oncology 02-103 clinical trial was approved by the institutional review board of the practice group and all patients provided written informed consent to participate in the therapeutic trial and to provide a specimen for genomic analysis of the cancer. Pre-treatment fine-needle aspiration (FNA) specimens were obtained and immediately placed in RNAlater (Ambion, Austin, TX), and shipped to the University of Texas MD Anderson Cancer Center (UTMDACC) for RNA extraction and gene expression profiling with Affymetrix HU133A gene chips (Affymetrix, Santa Clara, CA) as described previously . Tissue analysis was approved by the Institutional Review Board of UT MDACC. Full gene expression 132 data is available at Gene Expression Omnibus under accession number GSE42822.
In vitro chemosensitivity testing of breast cancer cell lines
Forty-two breast cancer cell lines were obtained from ATCC (Manassas, VA) or DSMZ (Braunschweig, Germany). Cell lines were selected primarily based on the availability of publicly available expression data, availability for commercial purchase, and the compatibility with use in an in vitro chemosensitivity assay. All cell lines were maintained in RPMI 1640 (Mediatech, Herndon, VA) containing 10% FBS (HyClone, Logan, UT) at 37°C in 5% CO2. Upon reaching approximately 80% confluence, each cell line was trypsinized and seeded into 384-well microtiter plates (Corning, Lowell, MA) for in vitro chemotherapy sensitivity testing.
Cell lines were treated with the combination of docetaxel (0.1nM – 25nM), 5-fluorouracil (0.1 μM - 50 μM), epirubicin (0.7nM – 13.5 μM) and preactivated cyclophosphamide (4-hydroperoxycyclophosphamide) (0.2 μM – 13.6 μM) (TFEC) to simulate the treatment protocol of FEC followed by TX in the US Oncology 02-103 clinical trial . Ten serial dilutions prepared in 10% RPMI 1640 media, along with a media control that did not contain drugs, were added in triplicate to each cell line. The combination treatment was composed of equal volumes of each drug at each dose number, i.e. combination treatment dose 1 contained equal volumes of the dose 1 concentration of each component drug, combination treatment dose 2 contained equal volumes of the dose 2 concentrations of each component drug, and so on. The clinical formulation of docetaxel was used, which was supplied as a concentrated solution of docetaxel in polysorbate 80 plus a vial of diluent (13% w/w ethanol in water for injection). Cells were incubated for 72 hours at 37°C in 5% CO2. Non-adherent cells and medium were then removed from each well. The remaining adherent cells were fixed in 95% ethanol and stained with DAPI (Molecular Probes, Eugene, OR). A proprietary automated microscope was used to count the number of stained cells remaining after drug treatment . A survival fraction (SF) at dose i (i = 1, 2, …, 10) was calculated as , where mean
is the average of the number of surviving cells in the drug treated wells at dose i, and mean
is the average number of living cells in the control wells. The area under the dose–response curve, which is the summation of SF values over 10 doses, AUC=ΣSF
, was used to quantify the sensitivity of each cell line to the treatment of TFEC, with lower AUC score indicating greater sensitivity.
Development of the TFEC-MGP
Gene expression profiles for the 42 breast cancer cell lines  generated with Affymetrix HG-U133 Plus 2.0 Array (Affymetrix, Santa Clara, CA), were downloaded from the Gene Expression Omnibus database (Accession number GSE12777). The RMAExpress V1.05 software package (http://rmaexpress.bmbolstad.com)  was used to generate probe level intensities by setting the operating parameters as: Background adjust: Yes; Normalization: Quantile; Summarization method: Probe level model. The probe level intensities were log2-transformed before further analyses. Non-specific filtering was applied to remove probe sets having small variation (interquartile range < 0.5) or low expression values (median < log2(100)) across all cell lines. The expression values were then standardized to mean zero and standard deviation one for each cell line.
The MGP was developed based on supervised principal components regression [17, 18] and implemented by using Superpc V1.05 software package (http://www-stat.stanford.edu/~tibs/superpc) under the programming environment R 2.11.1 (http://www.r-project.org). Code is provided in the Additional file 1. Briefly, univariate linear regression analysis was first conducted to calculate the association between the cell lines’ AUC scores derived from the dose response curves and the expression values for each probe set. Probe sets with a regression coefficient larger than the threshold (1.8) estimated by 10-fold cross-validation were selected and their expression values were used for principle component analysis. The first principal component was then chosen as an independent variable in a linear regression model to predict the patient’s chemotherapy response. A lower prediction score corresponds to a greater chemotherapy sensitivity and therefore higher likelihood of achieving pCR. CEL files from cancer biopsies were provided to PTI by UTMDACC without any accompanying clinical information. These array data were processed by RMA using the same procedure as the one used for cell lines. Prediction scores were calculated by investigators at PTI and returned to collaborators at UTMDACC to calculate AU-ROC curves and compare scores between patients with pCR versus residual disease (RD) response outcome.
To understand the functions of these probe sets, gene set enrichment analysis was performed based on the c2 collection of molecular signatures database v3.0 provided by Broad Institute (http://www.broadinstitute.org/gsea/msigdb/index.jsp). The q-value of each gene set was calculated by the permutation test. Gene sets with q-value less than 0.1 were considered to be enriched.
Clinical validation of MGP
MGP scores were compared between patients with pCR and RD using the non-parametric Wilcoxon test. The scores were used as a continuous variable to perform receiver-operator characteristics curve (ROC) analysis to evaluate the predictive performance of the MGP. Univariate and multivariate logistic regression analyses were also performed including ER status, nodal status and tumor grade as categorical variables and age, tumor size and the MGP score as continuous variables. To control for the confounding effect of trastuzumab, analyses were done separately for patients treated with FEC/TX plus H and patients treated with FEC/TX.