Sohrab Salehi

Memorial Sloan Kettering Cancer Center

321 East 61st Street

New York, NY 10065

I am a postdoctoral research fellow at Memorial Sloan Kettering Cancer Center in the Department of Epidemiology and Biostatistics under Drs. Sohrab P. Shah, Charles M. Rudin, and a postdoctoral research scientist at the Irving Institute for Cancer Dynamics at Columbia Univeristy under Dr. David M. Blei. My current research focuses on developing causal inference methods to understand mechanisms of drug-resistance and metastasis in human cancers at a single cell level. I did my PhD in Bioinformatics at University of British Columbia (UBC) under Dr. Alexandre Bouchard-Côté. My PhD research focused on developing Bayesian models to quantify the evolutionary dynamics and fitness of human cancers from single cell whole genome sequencing data.

Selected Publications

Disrupted priming within draining lymph nodes drives immune quiescence in gastric cancer

Sohrab Salehi, Emily E Stroobant, Hannah Lees, Ya-Hui Lin, and 7 more authors

2025

Abs Bib HTML

The gastric mucosa is characterized by continuous innate immune surveillance and inflammatory signaling, yet a high proportion of gastric carcinomas (GCs) are recalcitrant to immune-directed therapies. The mechanisms by which GCs evade adaptive immune surveillance within the highly antigenic microenvironment of the gastric mucosa remains unknown. To address this, we collected patient-matched tumor tissue, distant normal tissue, metastasis, and draining lymph nodes to generate a large-scale single-cell immune profiling dataset from 64 patients (n=179 samples, >150,000 cells). From single cell analysis, we identified two distinct sources of impaired tumor surveillance within tumor draining lymph nodes. First, we observed that a significant fraction of tumor draining lymph nodes had undergone cytokine-driven reprogramming, leading to reduced dendritic cell homing and limited T cell priming. Second, T cells undergoing successful activation exhibited limited expansion and constrained differentiation, marked by expression of the quiescence-associated transcription factor Kruppel-like Factor 2 (KLF2). Overexpression of KLF2 in primary T cells limited both their differentiation and cytotoxic capacity. These findings implicate both impaired T cell priming and KLF2-dependent T cell quiescence in limiting T cell immunity in gastric adenocarcinoma. We suggest these findings represent an emerging model for immune silencing in tumors developing from tissues with chronic inflammation.
@article{salehi2025disrupted, title = {Disrupted priming within draining lymph nodes drives immune quiescence in gastric cancer}, author = {Salehi, Sohrab and Stroobant, Emily E and Lees, Hannah and Lin, Ya-Hui and Shimada, Shoji and Abate, Miseker and Zatzman, Matthew Jason and Ceglia, Nicholas and Freeman, Samuel and Laszkowska, Monika and others}, journal = {bioRxiv}, pages = {2025--05}, year = {2025}, publisher = {Cold Spring Harbor Laboratory}, url = {https://www.biorxiv.org/content/10.1101/2025.05.05.651897v1.abstract}, type = {first_author} }
Population Priors for Matrix Factorization

Sohrab Salehi, Achille Nazaret, Sohrab P Shah, and David Blei

2025

Abs Bib HTML

We develop an empirical Bayes prior for probabilistic matrix factorization. Matrix factorization models each cell of a matrix with two latent variables, one associated with the cell’s row and one associated with the cell’s column. How to set the priors of these two latent variables? Drawing from empirical Bayes principles, we consider estimating the priors from data, to find those that best match the populations of row and column latent vectors. Thus we develop the twin population prior. We develop a variational inference algorithm to simultaneously learn the empirical priors and approximate the corresponding posterior. We evaluate this approach with both synthetic and real-world data on diverse applications: movie ratings, book ratings, single-cell gene expression data, and musical preferences. Without needing to tune Bayesian hyperparameters, we find that the twin population prior leads to high-quality predictions, outperforming manually tuned priors.
@article{salehipopulation, title = {Population Priors for Matrix Factorization}, author = {Salehi, Sohrab and Nazaret, Achille and Shah, Sohrab P and Blei, David}, journal = {Transactions on Machine Learning Research}, url = {https://openreview.net/forum?id=AT9G5s1pOj}, year = {2025}, type = {first_author} }
Cancer phylogenetic tree inference at scale from 1000s of single cell genomes

Sohrab Salehi, Fatemeh Dorri, Kevin Chern, Farhia Kabeer, and 11 more authors

2023

Abs Bib HTML

A new generation of scalable single cell whole genome sequencing (scWGS) methods allows unprecedented high resolution measurement of the evolutionary dynamics of cancer cell populations. Phylogenetic reconstruction is central to identifying sub-populations and distinguishing the mutational processes that gave rise to them. Existing phylogenetic tree building models do not scale to the tens of thousands of high resolution genomes achievable with current scWGS methods. We constructed a phylogenetic model and associated Bayesian inference procedure, sitka, specifically for scWGS data. The method is based on a novel phylogenetic encoding of copy number (CN) data, the sitka transformation, that simplifies the site dependencies induced by rearrangements while still forming a sound foundation to phylogenetic inference. The sitka transformation allows us to design novel scalable Markov chain Monte Carlo (MCMC) algorithms. Moreover, we introduce a novel point mutation calling method that incorporates the CN data and the underlying phylogenetic tree to overcome the low per-cell coverage of scWGS. We demonstrate our method on three single cell datasets, including a novel PDX series, and analyse the topological properties of the inferred trees. Sitka is freely available at ‘https://github.com/UBC-Stat-ML/sitkatree.git‘.
@article{10_24072_pcjournal_292, author = {Salehi, Sohrab and Dorri, Fatemeh and Chern, Kevin and Kabeer, Farhia and Rusk, Nicole and Funnell, Tyler and Williams, Marc J. and Lai, Daniel and Andronescu, Mirela and Campbell, Kieran R. and McPherson, Andrew and Aparicio, Samuel and Roth, Andrew and Shah, Sohrab P. and Bouchard-C\ot\'e, Alexandre}, title = {Cancer phylogenetic tree inference at scale from 1000s of single cell genomes}, journal = {Peer Community Journal}, eid = {e63}, publisher = {Peer Community In}, volume = {3}, year = {2023}, doi = {10.24072/pcjournal.292}, language = {en}, url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.292/}, type = {first_author} }

Single-cell genomic variation induced by mutational processes in cancer

Tyler Funnell, Ciara H. O’Flanagan, Marc J. Williams, Andrew McPherson, and 116 more authors

Nature, 2022

Abs Bib HTML

How cell-to-cell copy number alterations that underpin genomic instability1 in human cancers drive genomic and phenotypic variation, and consequently the evolution of cancer2, remains understudied. Here, by applying scaled single-cell whole-genome sequencing3 to wild-type, TP53-deficient and TP53-deficient;BRCA1-deficient or TP53-deficient;BRCA2-deficient mammary epithelial cells (13,818 genomes), and to primary triple-negative breast cancer (TNBC) and high-grade serous ovarian cancer (HGSC) cells (22,057 genomes), we identify three distinct ‘foreground’mutational patterns that are defined by cell-to-cell structural variation. Cell- and clone-specific high-level amplifications, parallel haplotype-specific copy number alterations and copy number segment length variation (serrate structural variations) had measurable phenotypic and evolutionary consequences. In TNBC and HGSC, clone-specific high-level amplifications in known oncogenes were highly prevalent in tumours bearing fold-back inversions, relative to tumours with homologous recombination deficiency, and were associated with increased clone-to-clone phenotypic variation. Parallel haplotype-specific alterations were also commonly observed, leading to phylogenetic evolutionary diversity and clone-specific mono-allelic expression. Serrate variants were increased in tumours with fold-back inversions and were highly correlated with increased genomic diversity of cellular populations. Together, our findings show that cell-to-cell structural variation contributes to the origins of phenotypic and evolutionary diversity in TNBC and HGSC, and provide insight into the genomic and mutational states of individual cancer cells.

@article{genomic_variation_funnell,
  author = {Funnell, Tyler and O'Flanagan, Ciara H. and Williams, Marc J. and McPherson, Andrew and McKinney, Steven and Kabeer, Farhia and Lee, Hakwoo and Salehi, Sohrab and V{\'a}zquez-Garc{\'\i}a, Ignacio and Shi, Hongyu and Leventhal, Emily and Masud, Tehmina and Eirew, Peter and Yap, Damian and Zhang, Allen W. and Lim, Jamie L. P. and Wang, Beixi and Brimhall, Jazmine and Biele, Justina and Ting, Jerome and Au, Vinci and Van Vliet, Michael and Liu, Yi Fei and Beatty, Sean and Lai, Daniel and Pham, Jenifer and Grewal, Diljot and Abrams, Douglas and Havasov, Eliyahu and Leung, Samantha and Bojilova, Viktoria and Moore, Richard A. and Rusk, Nicole and Uhlitz, Florian and Ceglia, Nicholas and Weiner, Adam C. and Zaikova, Elena and Douglas, J. Maxwell and Zamarin, Dmitriy and Weigelt, Britta and Kim, Sarah H. and Da Cruz Paula, Arnaud and Reis-Filho, Jorge S. and Martin, Spencer D. and Li, Yangguang and Xu, Hong and de Algara, Teresa Ruiz and Lee, So Ra and Llanos, Viviana Cerda and Huntsman, David G. and McAlpine, Jessica N. and Hannon, Gregory J. and Battistoni, Georgia and Bressan, Dario and Cannell, Ian G. and Casbolt, Hannah and Jauset, Cristina and Kova{\v c}evi{\'c}, Tatjana and Mulvey, Claire M. and Nugent, Fiona and Ribes, Marta Paez and Pearson, Isabella and Qosaj, Fatime and Sawicka, Kirsty and Wild, Sophia A. and Williams, Elena and Laks, Emma and Smith, Austin and Roth, Andrew and Balasubramanian, Shankar and Lee, Maximilian and Bodenmiller, Bernd and Burger, Marcel and Kuett, Laura and Tietscher, Sandra and Windhager, Jonas and Boyden, Edward S. and Alon, Shahar and Cui, Yi and Emenari, Amauche and Goodwin, Daniel R. and Karagiannis, Emmanouil D. and Sinha, Anubhav and Wassie, Asmamaw T. and Caldas, Carlos and Bruna, Alejandra and Callari, Maurizio and Greenwood, Wendy and Lerda, Giulia and Eyal-Lubling, Yaniv and Rueda, Oscar M. and Shea, Abigail and Harris, Owen and Becker, Robby and Grimaldo, Flaminia and Harris, Suvi and Vogl, Sara Lisa and Joyce, Johanna A. and Watson, Spencer S. and Tavare, Simon and Dinh, Khanh N. and Fisher, Eyal and Kunes, Russell and Walton, Nicholas A. and Al Sa'd, Mohammed and Chornay, Nick and Dariush, Ali and Gonz{\'a}lez-Solares, Eduardo A. and Gonz{\'a}lez-Fern{\'a}ndez, Carlos and Yolda{\c s}, Ayb{\"u}ke K{\"u}pc{\"u} and Miller, Neil and Zhuang, Xiaowei and Fan, Jean and Lee, Hsuan and Sep{\'u}lveda, Leonardo A. and Xia, Chenglong and Zheng, Pu and Shah, Sohrab P. and Aparicio, Samuel and Consortium, IMAXT},
  date = {2022/12/01},
  date-added = {2024-09-02 18:18:42 -0400},
  date-modified = {2024-09-02 18:18:42 -0400},
  doi = {10.1038/s41586-022-05249-0},
  id = {Funnell2022},
  isbn = {1476-4687},
  journal = {Nature},
  number = {7938},
  pages = {106--115},
  title = {Single-cell genomic variation induced by mutational processes in cancer},
  url = {https://doi.org/10.1038/s41586-022-05249-0},
  volume = {612},
  year = {2022},
  bdsk-url-1 = {https://doi.org/10.1038/s41586-022-05249-0}
}

Clonal fitness inferred from time-series modelling of single-cell cancer genomes

Sohrab Salehi, Farhia Kabeer, Nicholas Ceglia, Mirela Andronescu, and 104 more authors

Jul 2021

Abs Bib HTML

Progress in defining genomic fitness landscapes in cancer, especially those defined by copy number alterations (CNAs), has been impeded by lack of time-series single-cell sampling of polyclonal populations and temporal statistical models1–7. Here we generated 42,000 genomes from multi-year time-series single-cell whole-genome sequencing of breast epithelium and primary triple-negative breast cancer (TNBC) patient-derived xenografts (PDXs), revealing the nature of CNA-defined clonal fitness dynamics induced by TP53 mutation and cisplatin chemotherapy. Using a new Wright–Fisher population genetics model8,9 to infer clonal fitness, we found that TP53 mutation alters the fitness landscape, reproducibly distributing fitness over a larger number of clones associated with distinct CNAs. Furthermore, in TNBC PDX models with mutated TP53, inferred fitness coefficients from CNA-based genotypes accurately forecast experimentally enforced clonal competition dynamics. Drug treatment in three long-term serially passaged TNBC PDXs resulted in cisplatin-resistant clones emerging from low-fitness phylogenetic lineages in the untreated setting. Conversely, high-fitness clones from treatment-naive controls were eradicated, signalling an inversion of the fitness landscape. Finally, upon release of drug, selection pressure dynamics were reversed, indicating a fitness cost of treatment resistance. Together, our findings define clonal fitness linked to both CNA and therapeutic resistance in polyclonal tumours.

@article{salehi_clonal_2021,
  title = {Clonal fitness inferred from time-series modelling of single-cell cancer genomes},
  volume = {595},
  issn = {1476-4687},
  url = {https://doi.org/10.1038/s41586-021-03648-3},
  doi = {10.1038/s41586-021-03648-3},
  number = {7868},
  journal = {Nature},
  author = {Salehi, Sohrab and Kabeer, Farhia and Ceglia, Nicholas and Andronescu, Mirela and Williams, Marc J. and Campbell, Kieran R. and Masud, Tehmina and Wang, Beixi and Biele, Justina and Brimhall, Jazmine and Gee, David and Lee, Hakwoo and Ting, Jerome and Zhang, Allen W. and Tran, Hoa and O’Flanagan, Ciara and Dorri, Fatemeh and Rusk, Nicole and de Algara, Teresa Ruiz and Lee, So Ra and Cheng, Brian Yu Chieh and Eirew, Peter and Kono, Takako and Pham, Jenifer and Grewal, Diljot and Lai, Daniel and Moore, Richard and Mungall, Andrew J. and Marra, Marco A. and Hannon, Gregory J. and Battistoni, Giorgia and Bressan, Dario and Cannell, Ian Gordon and Casbolt, Hannah and Fatemi, Atefeh and Jauset, Cristina and Kovačević, Tatjana and Mulvey, Claire M. and Nugent, Fiona and Ribes, Marta Paez and Pearsall, Isabella and Qosaj, Fatime and Sawicka, Kirsty and Wild, Sophia A. and Williams, Elena and Laks, Emma and Li, Yangguang and O’Flanagan, Ciara H. and Smith, Austin and Ruiz, Teresa and Lai, Daniel and Roth, Andrew and Balasubramanian, Shankar and Lee, Maximillian and Bodenmiller, Bernd and Burger, Marcel and Kuett, Laura and Tietscher, Sandra and Windhager, Jonas and Boyden, Edward S. and Alon, Shahar and Cui, Yi and Emenari, Amauche and Goodwin, Dan and Karagiannis, Emmanouil D. and Sinha, Anubhav and Wassie, Asmamaw T. and Caldas, Carlos and Bruna, Alejandra and Callari, Maurizio and Greenwood, Wendy and Lerda, Giulia and Eyal-Lubling, Yaniv and Rueda, Oscar M. and Shea, Abigail and Harris, Owen and Becker, Robby and Grimaldi, Flaminia and Harris, Suvi and Vogl, Sara Lisa and Weselak, Joanna and Joyce, Johanna A. and Watson, Spencer S. and Vázquez-Garćıa, Ignacio and Tavaré, Simon and Dinh, Khanh N. and Fisher, Eyal and Kunes, Russell and Walton, Nicholas A. and Sa’d, Mohammad Al and Chornay, Nick and Dariush, Ali and González-Solares, Eduardo A. and González-Fernández, Carlos and Yoldas, Aybüke Küpcü and Millar, Neil and Whitmarsh, Tristan and Zhuang, Xiaowei and Fan, Jean and Lee, Hsuan and Sepúlveda, Leonardo A. and Xia, Chenglong and Zheng, Pu and McPherson, Andrew and Bouchard-Côté, Alexandre and Aparicio, Samuel and Shah, Sohrab P. and {IMAXT Consortium}},
  month = jul,
  year = {2021},
  pages = {585--590},
  type = {first_author}
}

ddClone: joint statistical inference of clonal populations from single cell and bulk tumour sequencing data

Sohrab Salehi, Adi Steif, Andrew Roth, Samuel Aparicio, and 2 more authors

Mar 2017

Abs Bib HTML

Next-generation sequencing (NGS) of bulk tumour tissue can identify constituent cell populations in cancers and measure their abundance. This requires computational deconvolution of allelic counts from somatic mutations, which may be incapable of fully resolving the underlying population structure. Single cell sequencing (SCS) is a more direct method, although its replacement of NGS is impeded by technical noise and sampling limitations. We propose ddClone, which analytically integrates NGS and SCS data, leveraging their complementary attributes through joint statistical inference. We show on real and simulated datasets that ddClone produces more accurate results than can be achieved by either method alone.
@article{salehi_ddclone_2017, title = {{ddClone}: joint statistical inference of clonal populations from single cell and bulk tumour sequencing data}, volume = {18}, issn = {1474-760X}, url = {https://doi.org/10.1186/s13059-017-1169-3}, doi = {10.1186/s13059-017-1169-3}, number = {1}, journal = {Genome Biology}, author = {Salehi, Sohrab and Steif, Adi and Roth, Andrew and Aparicio, Samuel and Bouchard-Côté, Alexandre and Shah, Sohrab P.}, month = mar, year = {2017}, pages = {44}, type = {first_author} }
Adaptive Nonparametric Perturbations of Parametric Bayesian Models

Bohan Wu, Eli N Weinstein, Sohrab Salehi, Yixin Wang, and 1 more author

arXiv preprint arXiv:2412.10683, Mar 2024

Abs Bib HTML

Parametric Bayesian modeling offers a powerful and flexible toolbox for scientific data analysis. Yet the model, however detailed, may still be wrong, and this can make inferences untrustworthy. In this paper we study nonparametrically perturbed parametric (NPP) Bayesian models, in which a parametric Bayesian model is relaxed via a distortion of its likelihood. We analyze the properties of NPP models when the target of inference is the true data distribution or some functional of it, such as in causal inference. We show that NPP models can offer the robustness of nonparametric models while retaining the data efficiency of parametric models, achieving fast convergence when the parametric model is close to true. To efficiently analyze data with an NPP model, we develop a generalized Bayes procedure to approximate its posterior. We demonstrate our method by estimating causal effects of gene expression from single cell RNA sequencing data. NPP modeling offers an efficient approach to robust Bayesian inference and can be used to robustify any parametric Bayesian model.
@article{wu2024adaptive, title = {Adaptive Nonparametric Perturbations of Parametric Bayesian Models}, author = {Wu, Bohan and Weinstein, Eli N and Salehi, Sohrab and Wang, Yixin and Blei, David M}, journal = {arXiv preprint arXiv:2412.10683}, year = {2024}, url = {https://arxiv.org/abs/2412.10683}, }