Little is known about gene expression patterns in normal breasts. We have identified a cluster of twelve normal breast tissue samples (cluster 1) that cluster tightly together using different clustering algorithms and different gene lists and that share characteristics of stromal cells, stem cells and the claudin-low phenotype.
The cluster 1 samples have a reduced expression of the epithelial defining keratin genes and have an up-regulation of several mesenchymal markers such as TWIST1, SPARC and VIM. This may lead to the hypothesis that the cluster 1 samples represent more immature or dedifferentiated epithelial cells, and/or enrichment for stromal cells. This is supported by our findings that the cluster 1 samples have an expression of genes that resembles published gene lists characterizing stromal tissue and have an overrepresentation of gene ontology terms associated with the extracellular matrix. Reliable and specific stem cell markers are still unavailable , but cells in the cluster 1 samples show similarities with stem-like or progenitor-like cells.
The interindividual differences observed may reflect true differences between women with different risk or exposure histories, or may represent different normal tissue subtypes that are present within a single woman, at different sites in the breast, at different times during the lifespan or in different proportions. For example, stem cell niches may be oversampled in the cluster 1 biopsies.
The stem cell niche refers to a zone of the breast epithelium where stem and progenitor cells reside. The microenvironment constitutes the niche and influences the stem cells (for review, see ). Stem cell niches are thought to be present in the breasts of all women, but some women may have more than others. The immature breasts of nulliparous women may contain larger volumes of stem cell niches than the post-lactationally involuted breasts. This could explain why there are more nulliparous women in cluster 1 than in cluster 2. Understanding the intra- and inter-individual variation in normal breast tissue is important and this investigation raises the question as to whether the clustering patterns observed represent only a fraction of women or if all women have cells/niches with these characteristics, with some women having a higher fraction than others.
The fact that the stem cell niche is constituted by the microenvironment could explain the combined stem-like and stromal-like characteristics identified in cluster 1-samples. In breast cancer, the stem cell niche may contain mesenchymal cells derived from the normal breast stroma or recruited from the bone marrow  and the current results raise the hypothesis that mesenchymal cells may be present in normal breast stem cell-niches. The link between mesenchymal and stem cell traits is also made clear by Mani and colleagues who showed that immortalized breast cells undergoing epithelial-mesenchymal transition acquire stem-cell like characteristics and that normal mouse mammary stem cells express mesenchymal markers .
This study is not designed to predict risk of developing breast cancer. However, we do have four separate sources of information that can be used to infer about the risk of developing breast cancer: Mammographic density, source of referral to the breast diagnostic center, occurrence of breast cancer after inclusion in the study and the previously published malignancy risk predictor developed by Chen and colleagues . The information from these sources does not point in the same direction. There is no difference in mammographic density, one of the strongest risk factors for breast cancer, between the two clusters (Table 1). When we apply the malignancy risk predictor the cluster 1-samples tend to have a slightly decreased risk. The women with samples in cluster 1 with known referral patterns were all referred to the mammographic centers due to palpable breast lumps or positive family history and not from the screening program. All the four breast cancers developed in these women after inclusion in the study occurred in women belonging to cluster 2, and none in cluster 1. This is not statistically significant due to low numbers. All four cancers were estrogen receptor positive.
The malignancy risk predictor is dominated by proliferative genes and may represent proliferation more than risk of developing breast cancer. Low proliferation rate is also seen in stem cells and an increased proportion of stem cells may explain the low proliferation estimated. The increased incidence of family history of breast cancer in the women belonging to cluster 1 could point toward a higher risk of developing breast cancers by genetic as opposed to environmental causes. The four cancers diagnosed in women belonging to cluster 2 were all estrogen receptor positive, supporting a more environmental/hormonal etiology. Cluster 1 is smaller than cluster 2 and the lack of cancers in this cluster is not statistically significant. The two clusters may not be different in risk of breast cancer as much as in which type of breast cancer the women are predisposed to develop. Since the cluster 1-samples have a stem-like gene expression profile, have certain myoepithelial/basal characteristics and a higher frequency of family history of breast cancer, one may speculate that these women, if they develop breast cancer, will have a greater proportion of estrogen receptor negative cancers
All the 12 samples in cluster 1 were classified as claudin-low, compared to only three of the remaining 67 samples. Similarly in the AHUS1-dataset, the claudin-low samples were exclusively in the smaller cluster which is the one resembling the MDG cluster 1. The claudin-low subtype is developed for classification of breast cancers and was not thought to be a group of normal breast samples. The claudin-low nature of the cluster 1 samples is, however, striking. Down-regulation of E-cadherin, occludin, claudin 3, 4, 7 as well as up-regulation of the mesenchymal genes and SNAI2 is in line with the features described in claudin-low tumor samples. The low expression of ESR1 corresponds with the estrogen receptor negative trend of the claudin-low subtype . The claudin-low tumours are thought to arise from mammary stem cells . The hypothesis that the cluster 1-samples are enriched for immature cells is further supported by the down-regulation of GATA3 seen in these samples compared to the cluster 2 samples (p = 3.8E-9), a protein that is also down-regulated in claudin-low samples .
The biopsies used in this study are unique in that they represent a group of women that are examined at breast diagnostic centers. Since the sample size is small, the use of additional datasets is important for validation of the results. The AHUS1 dataset consists of two main types of samples; mammoplasty reductions and cancer normals. Mammoplasty reductions are widely used as representing normal breast tissue, although one can expect the biology to be slightly biased toward fat-related processes. Cancer normals may be influenced by the biology in the cancer  or they may represent normal tissue in high-risk breasts . A dataset consisting of these two tissue-types, therefore represent a variety of normal tissue. The fact that the AHUS1 dataset clusters into two clusters with biology similar to those seen in the MDG dataset is interesting and indicates that our results are reproducible.
The reduced expression of epithelial surface makers may be explained by a large component of adipocytes in the biopsies. This is, however, unlikely, as the biopsies were taken from mammographic dense areas. In addition, when this dataset was clustered with other datasets containing biopsies from normal breast tissue with varying proportions of fatty tissue, the cluster 1-samples did not segregate with the adipocyte-rich biopsies (Additional file 3, Figure S2).
There was a greater proportion of nulliparous women in cluster 1. The association between cluster and parity was, however, not confirmed using a gene list describing post-pregnant epithelial cells (Table 2 and Additional file 3, Table S3 and Figure S1). The breasts of nulliparous women are not fully matured and the fraction of differentiated epithelial cells is lower than in post-pregnant breasts. The genelist published by Asztalos et al is short and may not capture all parity-related gene expression alterations. The cluster 1-samples may represent more immature breasts with less differentiated epithelial cells, but the association between the cluster 1-type gene expression profile and parity needs to be elucidated further.
The difference between cluster 1 samples and the remaining normal samples could be due to differences in fractions of the cell types present in the biopsies. For ethical reasons, the number of biopsies per woman was limited and we did not have enough tissue to do both RNA-extraction and obtain histology. The lack of histology of the biopsies prior to extraction prevents exact knowledge of the cell types contributing to the expression profiles. It has become evident that the development and progression of breast cancers are not limited to epithelial cells and that the total microenvironment is important. Approximately 95% of normal breast tissue may be composed of stroma, and therefore cell type differences in stroma are most likely captured rather than subtle differences in epithelial content. For evaluation of the putative interplay between all the cells at this location of the breast, expression analysis of the entire biopsy provides the most comprehensive picture of the situation. Previous studies have shown that different biopsies from one tumor share gene expression profile . The variability of gene expression from different locations of one breast is not known, but King and colleagues have shown that microdissected and bulk tissue samples from normal breasts have a high similarity in gene expression and that such technical differences are minor compared with biological differences . There is, therefore, reason to believe that the variability seen represents differences that affect the biology of the breast and not only random sampling.
This study is limited by the relatively low number of women included. Larger datasets with several biopsies representing different parts of the breast will be needed to allow further study of the variation in the normal biology of the breast.
The similarity of the cluster 1 gene expression profile with stromal and stem-like gene signatures and less prominent with mesenchymal cells suggest a biology dominated by less developed cells. This is further supported by the trend of more nulliparous women and the striking similarity with the claudin-low breast cancer phenotype. These samples may represent breasts with an increased number of non-proliferating and not differentiated stem cells with the accompanying stromal niche. There seem to be fewer differentiated luminal cells. We hypothesize that the women belonging to cluster 1 have an increased risk of claudin-low and basal like breast cancer. This is supported by the immature and partly myoepithelial features of the breast and the increase of positive family history in this group.