Over the past 6 months, we have been hard at work building our meta-analytic software platform for making go/no-go decisions from early clinical trials in solid tumor oncology. Our main goal is to estimate the Probability of Technical Success (PTS) in late-stage trials by using joint event and semi-mechanistic models of tumor growth, hierarchical Bayesian models fit with Stan to a large portfolio of completed trials, and proper statistical adjustments (poststratification) to account for the differences between the sample and the target population. We are excited to share what we have built and if you are coming to ACoP in Orlando next week, please grab a demo time slot or reach out by email at acop2019@generable.com.

### The big picture: disease area view

One of the key questions we should be asking is whether or not treatments are getting better over time. In oncology, tumors are typically classified by histology or by primary site, like Non Small Cell Lung Cancer (NCSLC) or prostate cancer. We refer to these as disease areas and we built a disease area view where you can see the time-trends for completed trials.

Another question is what is the right measure of efficacy or more specifically what should be on the y-axis? Lots of possibilities here, but in oncology, we think the measure that is most meaningful to patients and clinicians is overall survival (OS). OS is not always available, particularly in early trials where a common measure of efficacy is the Objective Response Rate (ORR) based on the RECIST1 criteria. But ORR has lots of problems. By constructing large, meta-analytic, patient-level models, we are working to overcome some of them.

### What is ORR and why is it problematic?

RECIST defines categories that are based on the radiological examination of tumor scans where a distinction is made between on-target and off-target lesions. On-target lesions are identified at baseline, their longest diameters measured, and the sum of those diameters (SLD) monitored and reported over time. The time-series observations of the SLD constitute the tumor trajectories which are discretized into the following response categories:

• Complete Response (CR): on target lesions are no longer visible on the scan
• Partial Response (PR): 30% reduction in size from the baseline measurement of the on-target lesions
• Progressive Disease (PD): >20% increase in the size from the lowest point of the on-target lesions and 5 mm absolute increase or the appearance of new lesions
• Stable Disease (SD): Not CR, not PR, and not PD. The category is supposed to represent no major changes in tumor trajectories, no new lesions, and no changes in non target lesions.

The ORR is taken to be the percentage of patients that are either in CR or PR. In early phase platform trials, it is not uncommon to have less than 30 patients allocated to each treatment arm. To get a sense of how noisy ORR is, the following simulation shows what you might expect to see after running many trials with true ORR rates ranging between 0.35 and 0.50 with Cohort A slightly more responsive than Cohort B. This therapeutic range is shown in yellow.

Consider that using just one realization from this distribution, a clinical team is expected to pick one or two “best performing” arms to advance to late-stage testing or stop the program completely. (Notice that there is no control group here.) The noisiness of the estimator is just one problem with the ORR. A bigger problem is that the association between ORR and OS seems to be quite weak2.

### From early trials to late-stage predictions and decisions

Given the problems cited above, it is difficult to imagine decision criteria that could be constructed with ORR that would lead to either detecting good candidates for late-stage testing or rejecting candidates from further study.

We propose a better approach that consists of:

1. Modeling tumor trajectories directly instead of discretizing continuous measurements.
2. Using multi-trial meta-analysis to estimate drug effects.
3. Assessing the impact of tumor dynamics (SLD and other biomarkers) on OS by utilizing Bayesian Joint Models3 fit with Stan4.
4. Adjusting the estimates to reflect the characteristics of the Phase III population by using multi-level regression and poststratification (MRP)5.
5. Computing the Probability of Technical Success in late-stage trials by mimicking the classical analysis that is typically performed by Biostatisticians.
6. Given the estimated PTS, applying decision theory by considering costs and benefits of the late-stage program under drug success and failure scenarios.

Of course, just applying these steps does not guarantee that the approach will be successful. We need to do a lot of work to build up these models and subject them to rigorous testing; for that, we generally follow the Bayesian workflow6.

I hope to write more about this in the future but in the meantime, if you are curious about how we implemented this approach in our platform, please stop by our booth at ACoP or contact us directly to set up a demo.

We would like to thank AstraZeneca and in particular the AZ project lead Sergey Aksenov for supporting our work.

### References

[1] Eisenhauer, E. A., P. Therasse, J. Bogaerts, L. H. Schwartz, D. Sargent, R. Ford, J. Dancey, et al. “New Response Evaluation Criteria in Solid Tumours: Revised RECIST Guideline (Version 1.1).” European Journal of Cancer (Oxford, England: 1990) 45, no. 2 (January 2009): 228–47. https://doi.org/10.1016/j.ejca.2008.10.026.

[2] Blumenthal, Gideon Michael, Stella Karuri, Sean Khozin, Dickran Kazandjian, Hui Zhang, Lijun Zhang, Shenghui Tang, Rajeshwari Sridhara, Patricia Keegan, and Richard Pazdur. “Overall Response Rate (ORR) as a Potential Surrogate for Progression-Free Survival (PFS): A Meta-Analysis of Metastatic Non-Small Cell Lung Cancer (MNSCLC) Trials Submitted to the U.S. Food and Drug Administration (FDA).” Journal of Clinical Oncology 32, no. 15_suppl (May 20, 2014): 8012–8012. https://doi.org/10.1200/jco.2014.32.15_suppl.8012.

[3] Brilleman, Samuel L., Michael J. Crowther, Margarita Moreno-Betancur, Jacqueline Buros Novik, James Dunyak, Nidal Al-Huniti, Robert Fox, Jeff Hammerbacher, and Rory Wolfe. “Joint Longitudinal and Time-to-Event Models for Multilevel Hierarchical Data.” ArXiv:1805.06099 [Stat], May 15, 2018. http://arxiv.org/abs/1805.06099.

[4] Carpenter, Bob, Andrew Gelman, Matthew D. Hoffman, and Daniel Lee. “Stan: A Probabilistic Programming Language | Carpenter | Journal of Statistical Software.” Accessed November 13, 2018. https://doi.org/10.18637/jss.v076.i01.

[5] Gelman, Andrew, and Thomas C. Little. “Poststratification Into Many Categories Using Hierarchical Logistic Regression”, 1997.

[6] Gabry, Jonah, Daniel Simpson, Aki Vehtari, Michael Betancourt, and Andrew Gelman. “Visualization in Bayesian Workflow,” September 5, 2017. https://arxiv.org/abs/1709.01449.