Cancer Vulnerability Explorer

To use CaVu, the main task for the user is to define a cancer subtype of interest. There are two ways of doing this—histology and mutation—that reflect the most common subtyping strategies. Examples of histology based subtype definitions can be “lung adenocarcinoma” or “squamous cell carcinoma”. Subtyping by mutation enables definitions such as “BRAF^V600E melanoma”, “BRAF mutant melanoma”, or “all BRAF mutant pathologies”.

Subtype Definition

To perform this selection, the user simply enters the desired keyword(s) into the appropriate column header. The desired property automatically gets filtered. By entering different filters into different column headers (ex: “V600E” under mutation and “skin” under tissue), the user can easily construct compound definitions. Once the desired subtype is defined, the user simply presses the “Select All” button. If necessary, subsequent filters can be added and the “Deselect All” button used to filter out any cell lines with undesired characteristics. The user can modulate the selection of individual cell lines with a single-click on the row with the cell line.

When defining the subtype of interest, it is important to make sure that the subtype you define has 6 or more cell lines. This is because our power analysis shows that for subtypes with fewer cell lines, there is insufficient statistical power. One potential method of increasing the cell line count for your subtype of interest, especially when using the mutation based definition of subtypes, might be to include cell lines from other lineages. Example: ARID1A deleted cell lines of all lineages as opposed to those from only one tissue. The user should exercise caution that any definition makes biological sense.

The mutation data are from the Cancer Cell Line Encyclopedia.

Vulnerability Identification

When done defining your subtype, click the “Execute” button to identify genes whose knockdown is selectively lethal to the subtype of interest. In the resulting page, the columns are as follows:

KD Target: The knocked down gene. Each row assesses the hypothesis that the user-specified cancer subtype is vulnerable to the inhibition of the gene in this column.
#Lethal in query: The number of cell lines in the specified subset for which the knockdown of this gene is lethal.
#Lethal total: The number of cell lines in the entire dataset for which the knockdown of this gene is lethal.
Query set size: The number of cell lines in the subtype defined by the user.
p-value: The p-value of the hypothesis that the knockdown of this gene is specifically lethal to the query set. This is measured using the hypergeometric enrichment test. We discuss how to interpret this value below.
q-value: The false discovery rate (FDR) for the hypothesis that the knockdown of this gene is specifically lethal to the user-defined subtype as calculated by the Benjamini-Hochberg multiple hypothesis testing correction method. We recommend using this value to evaluate target candidates.
Lethal CLs: The names of the cell lines for which the knockdown of this gene is lethal.Note that the number of cell lines in this column is equal to the number in the #Lethal in query

The top KD target genes in this page constitute drug target candidates against the subtype of interest because their inhibition is lethal to the user-defined subtype, but not to the other cell lines. For target identification purposes, this is the endpoint of the analysis.

The user can choose to use the knockdown data from the Project DRIVE by Novartis or an integration of DRIVE, Achilles, and Marcotte datasets as provided by Demeter2

Marcotte datasets as provided by Demeter2. DRIVE uses significantly more shRNAs per gene but covers a smaller number of cell lines and genes (around 8k genes and 400 cell lines). Consequently, in our benchmarking results, we found DRIVE to be more effective in identifying Cancer Gene Consensus genes than Demeter2. However, DRIVE is not usable if the disease of interest is not covered with sufficiently many cell lines in DRIVE, or if major genes suspected of being vulnerabilities are absent. In those cases, the larger coverage of Demeter2 (17k genes, 500 cell lines) presents a strong alternative.

Interpreting the Results

Here, the strength of the hypothesis that a specific gene is a candidate target is represented by a p-value. However, because thousands of such hypotheses are tested at once, we correct for multiple hypothesis testing using the Benjamini-Hochberg method. The False Discovery Rate (FDR) is represented in the q-value column. We strongly urge the user to base their decisions on the q-value. The p-value is provided mainly for transparency purposes.

This is an exploratory analysis, not a confirmatory one. For exploratory analyses, an FDR threshold of 0.25 (representing 1 out of 4 chance of a false discovery) has previously been recommended in popular bioinformatics tools. We concur that this is a reasonable cutoff for most purposes. However, we underline that ultimately this should not be viewed as a “set-in-stone” number. The user ultimately has to determine the appropriate threshold fit for their purpose themselves. When making this decision, the cost of testing a false discovery can be considered as a key determinant.

We would like to caution the user that the q-values we report are valid for exactly one run of the method from beginning (subtype definition) to end (vulnerability identification). If the user goes back, changes the subtype definition, then runs again to repeat this process over and over, then the multiple hypothesis correction is no longer valid. In that case, the user should use an appropriate correction method.

Drug Candidate Identification

One of the most important motivations of findings targets is to chemically address them with drugs. To automate this process, we link the targets to the small molecule chemicals that are known to target them. If a gene can be linked to drug candidates in this manner, we color it orange and make it clickable. Clicking any target shows the drugs that target that protein.

To do this, we first use the Ensembl dataset to link the knockdown target genes to their protein products. Then we use the STITCH database to link these proteins to chemicals known to target them. At this point, we apply a set of rough drug-likeness filters by filtering out molecules that are too small or excessively large.

One major consideration when doing this is the similarity to existing approved drugs. This can be useful in two ways, the first being that we can help users who are interested in repurposing existing drugs for new indications. The second is that the users who seek to patent new drugs might be interested in finding drugs that are not similar to existing drugs. To help both types of users, we show the similarity of each chemical to an existing approved drug.

We used the DrugBank dataset to retrieve a set of “approved” drugs and their chemical structures. We then used the RDKitsoftware to compute the chemical similarity (2D similarity using MACCS fingerprints) between every chemical in STITCH and every approved drug. We stored the drug with the highest similarity to each chemical as well as what that highest similarity value is. We show this information as well, which can help a user by either directing them towards approved drugs or away from them if they are interested in novel chemistries.

Overall Workflow

CaVu allows a user to start with the cancer subtype they want to target and end up with a drug. Briefly, the user starts by selecting a small set of cancer cell lines all of which have a particular property that is absent from all other cell lines. Then, we find those genes whose knockdown specifically kills cell lines in this subset. This means that the cancer cell lines with the user-specified characteristic require the function of that gene to survive. We then show drugs that stop that gene’s product from working. Therefore, these drugs are hypothesized to interfere with a process that is essential to the functioning of the cancer subtype the user specified. Using CaVu requires no programming knowledge, and thus it offers bench scientists a chance to test targeted, computer-generated predictions without any special training.

Cancer Vulnerability Explorer

Tutorial

Graphical Abstract

Data

Operation

Cancer Cell Line Annotations

User Defines Subtypes

Gene Knockdown on Cancer Cell Lines

Vulnerability Analysis

STITCH v5

Protein-chemical interaction mapping

DrugBank

Drug Similarity Report

Team

Murat Cobanoglu

Vasanth Siruvallur Murali

Meyer Zinn

Venat Malladi

Jonathan Gesell

Mingzhu Nie

Contact