Open Access Highly Accessed Research article

BreastMark: An Integrated Approach to Mining Publicly Available Transcriptomic Datasets Relating to Breast Cancer Outcome

Stephen F Madden1*, Colin Clarke1, Patricia Gaule1, Sinead T Aherne1, Norma O'Donovan1, Martin Clynes1, John Crown1 and William M Gallagher2

Author Affiliations

1 Molecular Therapeutics for Cancer Ireland, National Institute for Cellular Biotechnology, Dublin City University, Glasnevin, Dublin 9, Ireland

2 UCD School of Biomolecular and Biomedical Science, UCD Conway Institute, University College Dublin, Dublin 4, Ireland

For all author emails, please log on.

Breast Cancer Research 2013, 15:R52  doi:10.1186/bcr3444

Published: 2 July 2013

Abstract

Introduction

Breast cancer is a complex heterogeneous disease for which a substantial resource of transcriptomic data is available. Gene expression data have facilitated the division of breast cancer into, at least, five molecular subtypes, namely luminal A, luminal B, HER2, normal-like and basal. Once identified, breast cancer subtypes can inform clinical decisions surrounding patient treatment and prognosis. Indeed, it is important to identify patients at risk of developing aggressive disease so as to tailor the level of clinical intervention.

Methods

We have developed a user-friendly, web-based system to allow the evaluation of genes/microRNAs (miRNAs) that are significantly associated with survival in breast cancer and its molecular subtypes. The algorithm combines gene expression data from multiple microarray experiments which frequently also contain miRNA expression information, and detailed clinical data to correlate outcome with gene/miRNA expression levels. This algorithm integrates gene expression and survival data from 26 datasets on 12 different microarray platforms corresponding to approximately 17,000 genes in up to 4,738 samples. In addition, the prognostic potential of 341 miRNAs can be analysed.

Results

We demonstrated the robustness of our approach in comparison to two commercially available prognostic tests, oncotype DX and MammaPrint. Our algorithm complements these prognostic tests and is consistent with their findings. In addition, BreastMark can act as a powerful reductionist approach to these more complex gene signatures, eliminating superfluous genes, potentially reducing the cost and complexity of these multi-index assays. Known miRNA prognostic markers, mir-205 and mir-93, were used to confirm the prognostic value of this tool in a miRNA setting. We also applied the algorithm to examine expression of 58 receptor tyrosine kinases in the basal-like subtype, identifying six receptor tyrosine kinases associated with poor disease-free survival and/or overall survival (EPHA5, FGFR1, FGFR3, VEGFR1, PDGFRβ, and TIE1). A web application for using this algorithm is currently available.

Conclusions

BreastMark is a powerful tool for examining putative gene/miRNA prognostic markers in breast cancer. The value of this tool will be in the preliminary assessment of putative biomarkers in breast cancer. It will be of particular use to research groups with limited bioinformatics facilities.