Publications
Here is a select number of my publications. For a complete list, please see my Google Scholar page here.
Journal Articles
2023
- Cancer phylogenetic tree inference at scale from 1000s of single cell genomesSohrab Salehi, Fatemeh Dorri, Kevin Chern, Farhia Kabeer, and 11 more authors2023
A new generation of scalable single cell whole genome sequencing (scWGS) methods allows unprecedented high resolution measurement of the evolutionary dynamics of cancer cell populations. Phylogenetic reconstruction is central to identifying sub-populations and distinguishing the mutational processes that gave rise to them. Existing phylogenetic tree building models do not scale to the tens of thousands of high resolution genomes achievable with current scWGS methods. We constructed a phylogenetic model and associated Bayesian inference procedure, sitka, specifically for scWGS data. The method is based on a novel phylogenetic encoding of copy number (CN) data, the sitka transformation, that simplifies the site dependencies induced by rearrangements while still forming a sound foundation to phylogenetic inference. The sitka transformation allows us to design novel scalable Markov chain Monte Carlo (MCMC) algorithms. Moreover, we introduce a novel point mutation calling method that incorporates the CN data and the underlying phylogenetic tree to overcome the low per-cell coverage of scWGS. We demonstrate our method on three single cell datasets, including a novel PDX series, and analyse the topological properties of the inferred trees. Sitka is freely available at ‘https://github.com/UBC-Stat-ML/sitkatree.git‘.
2021
- Clonal fitness inferred from time-series modelling of single-cell cancer genomesSohrab Salehi, Farhia Kabeer, Nicholas Ceglia, Mirela Andronescu, and 104 more authorsJul 2021
Progress in defining genomic fitness landscapes in cancer, especially those defined by copy number alterations (CNAs), has been impeded by lack of time-series single-cell sampling of polyclonal populations and temporal statistical models1–7. Here we generated 42,000 genomes from multi-year time-series single-cell whole-genome sequencing of breast epithelium and primary triple-negative breast cancer (TNBC) patient-derived xenografts (PDXs), revealing the nature of CNA-defined clonal fitness dynamics induced by TP53 mutation and cisplatin chemotherapy. Using a new Wright–Fisher population genetics model8,9 to infer clonal fitness, we found that TP53 mutation alters the fitness landscape, reproducibly distributing fitness over a larger number of clones associated with distinct CNAs. Furthermore, in TNBC PDX models with mutated TP53, inferred fitness coefficients from CNA-based genotypes accurately forecast experimentally enforced clonal competition dynamics. Drug treatment in three long-term serially passaged TNBC PDXs resulted in cisplatin-resistant clones emerging from low-fitness phylogenetic lineages in the untreated setting. Conversely, high-fitness clones from treatment-naive controls were eradicated, signalling an inversion of the fitness landscape. Finally, upon release of drug, selection pressure dynamics were reversed, indicating a fitness cost of treatment resistance. Together, our findings define clonal fitness linked to both CNA and therapeutic resistance in polyclonal tumours.
2017
- ddClone: joint statistical inference of clonal populations from single cell and bulk tumour sequencing dataSohrab Salehi, Adi Steif, Andrew Roth, Samuel Aparicio, and 2 more authorsMar 2017
Next-generation sequencing (NGS) of bulk tumour tissue can identify constituent cell populations in cancers and measure their abundance. This requires computational deconvolution of allelic counts from somatic mutations, which may be incapable of fully resolving the underlying population structure. Single cell sequencing (SCS) is a more direct method, although its replacement of NGS is impeded by technical noise and sampling limitations. We propose ddClone, which analytically integrates NGS and SCS data, leveraging their complementary attributes through joint statistical inference. We show on real and simulated datasets that ddClone produces more accurate results than can be achieved by either method alone.
Contributed Articles
2024
- Single-cell mtDNA dynamics in tumors is driven by coregulation of nuclear and mitochondrial genomesMinsoo Kim, Alexander N. Gorelick, Ignacio Vàzquez-Garcı́a, Marc J. Williams, and 17 more authorsNature Genetics, 2024
The extent of cell-to-cell variation in tumor mitochondrial DNA (mtDNA) copy number and genotype, and the phenotypic and evolutionary consequences of such variation, are poorly characterized. Here we use amplification-free single-cell whole-genome sequencing (Direct Library Prep (DLP+)) to simultaneously assay mtDNA copy number and nuclear DNA (nuDNA) in 72,275 single cells derived from immortalized cell lines, patient-derived xenografts and primary human tumors. Cells typically contained thousands of mtDNA copies, but variation in mtDNA copy number was extensive and strongly associated with cell size. Pervasive whole-genome doubling events in nuDNA associated with stoichiometrically balanced adaptations in mtDNA copy number, implying that mtDNA-to-nuDNA ratio, rather than mtDNA copy number itself, mediated downstream phenotypes. Finally, multimodal analysis of DLP+ and single-cell RNA sequencing identified both somatic loss-of-function and germline noncoding variants in mtDNA linked to heteroplasmy-dependent changes in mtDNA copy number and mitochondrial transcription, revealing phenotypic adaptations to disrupted nuclear/mitochondrial balance.
- Single-cell decoding of drug induced transcriptomic reprogramming in triple negative breast cancersFarhia Kabeer, Hoa Tran, Mirela Andronescu, Gurdeep Singh, and 22 more authorsGenome Biology, 2024
The encoding of cell intrinsic drug resistance states in breast cancer reflects the contributions of genomic and non-genomic variations and requires accurate estimation of clonal fitness from co-measurement of transcriptomic and genomic data. Somatic copy number (CN) variation is the dominant mutational mechanism leading to transcriptional variation and notably contributes to platinum chemotherapy resistance cell states. Here, we deploy time series measurements of triple negative breast cancer (TNBC) single-cell transcriptomes, along with co-measured single-cell CN fitness, identifying genomic and transcriptomic mechanisms in drug-associated transcriptional cell states.
- CDC7 inhibition impairs neuroendocrine transformation in lung and prostate tumors through MYC degradationAlvaro Quintanal-Villalonga, Kenta Kawasaki, Esther Redin, Fathema Uddin, and 29 more authorsSignal Transduction and Targeted Therapy, 2024
Neuroendocrine (NE) transformation is a mechanism of resistance to targeted therapy in lung and prostate adenocarcinomas leading to poor prognosis. Up to date, even if patients at high risk of transformation can be identified by the occurrence of Tumor Protein P53 (TP53) and Retinoblastoma Transcriptional Corepressor 1 (RB1) mutations in their tumors, no therapeutic strategies are available to prevent or delay histological transformation. Upregulation of the cell cycle kinase Cell Division Cycle 7 (CDC7) occurred in tumors during the initial steps of NE transformation, already after TP53/RB1 co-inactivation, leading to induced sensitivity to the CDC7 inhibitor simurosertib. CDC7 inhibition suppressed NE transdifferentiation and extended response to targeted therapy in in vivo models of NE transformation by inducing the proteasome-mediated degradation of the MYC Proto-Oncogen (MYC), implicated in stemness and histological transformation. Ectopic overexpression of a degradation-resistant MYC isoform reestablished the NE transformation phenotype observed on targeted therapy, even in the presence of simurosertib. CDC7 inhibition also markedly extended response to standard cytotoxics (cisplatin, irinotecan) in lung and prostate small cell carcinoma models. These results nominate CDC7 inhibition as a therapeutic strategy to constrain lineage plasticity, as well as to effectively treat NE tumors de novo or after transformation. As simurosertib clinical efficacy trials are ongoing, this concept could be readily translated for patients at risk of transformation.
2023
- Identification of transcriptional programs using dense vector representations defined by mutual information with GeneVectorNicholas Ceglia, Zachary Sethna, Samuel S. Freeman, Florian Uhlitz, and 10 more authorsNature Communications, 2023
Deciphering individual cell phenotypes from cell-specific transcriptional processes requires high dimensional single cell RNA sequencing. However, current dimensionality reduction methods aggregate sparse gene information across cells, without directly measuring the relationships that exist between genes. By performing dimensionality reduction with respect to gene co-expression, low-dimensional features can model these gene-specific relationships and leverage shared signal to overcome sparsity. We describe GeneVector, a scalable framework for dimensionality reduction implemented as a vector space model using mutual information between gene expression. Unlike other methods, including principal component analysis and variational autoencoders, GeneVector uses latent space arithmetic in a lower dimensional gene embedding to identify transcriptional programs and classify cell types. In this work, we show in four single cell RNA-seq datasets that GeneVector was able to capture phenotype-specific pathways, perform batch effect correction, interactively annotate cell types, and identify pathway variation with treatment over time.
2022
- Single-cell genomic variation induced by mutational processes in cancerTyler Funnell, Ciara H. O’Flanagan, Marc J. Williams, Andrew McPherson, and 116 more authorsNature, 2022
How cell-to-cell copy number alterations that underpin genomic instability1 in human cancers drive genomic and phenotypic variation, and consequently the evolution of cancer2, remains understudied. Here, by applying scaled single-cell whole-genome sequencing3 to wild-type, TP53-deficient and TP53-deficient;BRCA1-deficient or TP53-deficient;BRCA2-deficient mammary epithelial cells (13,818 genomes), and to primary triple-negative breast cancer (TNBC) and high-grade serous ovarian cancer (HGSC) cells (22,057 genomes), we identify three distinct ‘foreground’mutational patterns that are defined by cell-to-cell structural variation. Cell- and clone-specific high-level amplifications, parallel haplotype-specific copy number alterations and copy number segment length variation (serrate structural variations) had measurable phenotypic and evolutionary consequences. In TNBC and HGSC, clone-specific high-level amplifications in known oncogenes were highly prevalent in tumours bearing fold-back inversions, relative to tumours with homologous recombination deficiency, and were associated with increased clone-to-clone phenotypic variation. Parallel haplotype-specific alterations were also commonly observed, leading to phylogenetic evolutionary diversity and clone-specific mono-allelic expression. Serrate variants were increased in tumours with fold-back inversions and were highly correlated with increased genomic diversity of cellular populations. Together, our findings show that cell-to-cell structural variation contributes to the origins of phenotypic and evolutionary diversity in TNBC and HGSC, and provide insight into the genomic and mutational states of individual cancer cells.
- Accurate determination of CRISPR-mediated gene fitness in transplantable tumoursPeter Eirew, Ciara O’Flanagan, Jerome Ting, Sohrab Salehi, and 23 more authorsNature Communications, 2022
Assessing tumour gene fitness in physiologically-relevant model systems is challenging due to biological features of in vivo tumour regeneration, including extreme variations in single cell lineage progeny. Here we develop a reproducible, quantitative approach to pooled genetic perturbation in patient-derived xenografts (PDXs), by encoding single cell output from transplanted CRISPR-transduced cells in combination with a Bayesian hierarchical model. We apply this to 181 PDX transplants from 21 breast cancer patients. We show that uncertainty in fitness estimates depends critically on the number of transplant cell clones and the variability in clone sizes. We use a pathway-directed allelic series to characterize Notch signaling, and quantify TP53 / MDM2 drug-gene conditional fitness in outlier patients. We show that fitness outlier identification can be mirrored by pharmacological perturbation. Overall, we demonstrate that the gene fitness landscape in breast PDXs is dominated by inter-patient differences.
2020
- TMEM30A loss-of-function mutations drive lymphomagenesis and confer therapeutically exploitable vulnerability in B-cell lymphomaDaisuke Ennishi, Shannon Healy, Ali Bashashati, Saeed Saberi, and 53 more authorsNature Medicine, 2020
Transmembrane protein 30A (TMEM30A) maintains the asymmetric distribution of phosphatidylserine, an integral component of the cell membrane and ‘eat-me’signal recognized by macrophages. Integrative genomic and transcriptomic analysis of diffuse large B-cell lymphoma (DLBCL) from the British Columbia population-based registry uncovered recurrent biallelic TMEM30A loss-of-function mutations, which were associated with a favorable outcome and uniquely observed in DLBCL. Using TMEM30A-knockout systems, increased accumulation of chemotherapy drugs was observed in TMEM30A-knockout cell lines and TMEM30A-mutated primary cells, explaining the improved treatment outcome. Furthermore, we found increased tumor-associated macrophages and an enhanced effect of anti-CD47 blockade limiting tumor growth in TMEM30A-knockout models. By contrast, we show that TMEM30A loss-of-function increases B-cell signaling following antigen stimulation—a mechanism conferring selective advantage during B-cell lymphoma development. Our data highlight a multifaceted role for TMEM30A in B-cell lymphomagenesis, and characterize intrinsic and extrinsic vulnerabilities of cancer cells that can be therapeutically exploited.