Cancer Vulnerability Explorer

Identify targets and drug candidates for any cancer subtype.

Choose how to define your subtype of interest

Histology Mutation


To use CaVu, the main task for the user is to define a cancer subtype of interest. There are two ways of doing this—histology and mutation—that reflect the most common subtyping strategies. Examples of histology based subtype definitions can be “lung adenocarcinoma” or “squamous cell carcinoma”. Subtyping by mutation enables definitions such as “BRAFV600E melanoma”, “BRAF mutant melanoma”, or “all BRAF mutant pathologies”.

Subtype Definition

To perform this selection, the user simply enters the desired keyword(s) into the appropriate column header. The desired property automatically gets filtered. By entering different filters into different column headers (ex: “V600E” under mutation and “skin” under tissue), the user can easily construct compound definitions. Once the desired subtype is defined, the user simply presses the “Select All” button. If necessary, subsequent filters can be added and the “Deselect All” button used to filter out any cell lines with undesired characteristics. The user can modulate the selection of individual cell lines with a single-click on the row with the cell line.

When defining the subtype of interest, it is important to make sure that the subtype you define has 6 or more cell lines. This is because our power analysis shows that for subtypes with fewer cell lines, there is insufficient statistical power. One potential method of increasing the cell line count for your subtype of interest, especially when using the mutation based definition of subtypes, might be to include cell lines from other lineages. Example: ARID1A deleted cell lines of all lineages as opposed to those from only one tissue. The user should exercise caution that any definition makes biological sense.

The mutation data are from the Cancer Cell Line Encyclopedia.

Vulnerability Identification

When done defining your subtype, click the “Execute” button to identify genes whose knockdown is selectively lethal to the subtype of interest. In the resulting page, the columns are as follows:

The top KD target genes in this page constitute drug target candidates against the subtype of interest because their inhibition is lethal to the user-defined subtype, but not to the other cell lines. For target identification purposes, this is the endpoint of the analysis.

The user can choose to use the knockdown data from the Project DRIVE by Novartis or an integration of DRIVE, Achilles, and Marcotte datasets as provided by Demeter2

Marcotte datasets as provided by Demeter2. DRIVE uses significantly more shRNAs per gene but covers a smaller number of cell lines and genes (around 8k genes and 400 cell lines). Consequently, in our benchmarking results, we found DRIVE to be more effective in identifying Cancer Gene Consensus genes than Demeter2. However, DRIVE is not usable if the disease of interest is not covered with sufficiently many cell lines in DRIVE, or if major genes suspected of being vulnerabilities are absent. In those cases, the larger coverage of Demeter2 (17k genes, 500 cell lines) presents a strong alternative.

Interpreting the Results

Here, the strength of the hypothesis that a specific gene is a candidate target is represented by a p-value. However, because thousands of such hypotheses are tested at once, we correct for multiple hypothesis testing using the Benjamini-Hochberg method. The False Discovery Rate (FDR) is represented in the q-value column. We strongly urge the user to base their decisions on the q-value. The p-value is provided mainly for transparency purposes.

This is an exploratory analysis, not a confirmatory one. For exploratory analyses, an FDR threshold of 0.25 (representing 1 out of 4 chance of a false discovery) has previously been recommended in popular bioinformatics tools. We concur that this is a reasonable cutoff for most purposes. However, we underline that ultimately this should not be viewed as a “set-in-stone” number. The user ultimately has to determine the appropriate threshold fit for their purpose themselves. When making this decision, the cost of testing a false discovery can be considered as a key determinant.

We would like to caution the user that the q-values we report are valid for exactly one run of the method from beginning (subtype definition) to end (vulnerability identification). If the user goes back, changes the subtype definition, then runs again to repeat this process over and over, then the multiple hypothesis correction is no longer valid. In that case, the user should use an appropriate correction method.

Drug Candidate Identification

One of the most important motivations of findings targets is to chemically address them with drugs. To automate this process, we link the targets to the small molecule chemicals that are known to target them. If a gene can be linked to drug candidates in this manner, we color it orange and make it clickable. Clicking any target shows the drugs that target that protein.

To do this, we first use the Ensembl dataset to link the knockdown target genes to their protein products. Then we use the STITCH database to link these proteins to chemicals known to target them. At this point, we apply a set of rough drug-likeness filters by filtering out molecules that are too small or excessively large.

One major consideration when doing this is the similarity to existing approved drugs. This can be useful in two ways, the first being that we can help users who are interested in repurposing existing drugs for new indications. The second is that the users who seek to patent new drugs might be interested in finding drugs that are not similar to existing drugs. To help both types of users, we show the similarity of each chemical to an existing approved drug.

We used the DrugBank dataset to retrieve a set of “approved” drugs and their chemical structures. We then used the RDKitsoftware to compute the chemical similarity (2D similarity using MACCS fingerprints) between every chemical in STITCH and every approved drug. We stored the drug with the highest similarity to each chemical as well as what that highest similarity value is. We show this information as well, which can help a user by either directing them towards approved drugs or away from them if they are interested in novel chemistries.

Overall Workflow

CaVu allows a user to start with the cancer subtype they want to target and end up with a drug. Briefly, the user starts by selecting a small set of cancer cell lines all of which have a particular property that is absent from all other cell lines. Then, we find those genes whose knockdown specifically kills cell lines in this subset. This means that the cancer cell lines with the user-specified characteristic require the function of that gene to survive. We then show drugs that stop that gene’s product from working. Therefore, these drugs are hypothesized to interfere with a process that is essential to the functioning of the cancer subtype the user specified. Using CaVu requires no programming knowledge, and thus it offers bench scientists a chance to test targeted, computer-generated predictions without any special training.

Graphical Abstract

Cancer Cell Line Annotations

In CaVu, cancer subtypes are defined through collections of cell lines that all share one exclusive property. In other words, the cell lines in this subset should all have one property which is absent from all other cell lines. Typically, cancer subtypes are defined on the basis of one of two major properties: mutation or histology. We use data from the Cancer Cell Line Encyclopedia to offer the user both options.

User Defines Subtypes

The user specifies whether to use a histology or mutation based subtype definition. They then follow through the presented options to define the subtype of interest. This operation is done by selecting the set of cell lines that exclusively hold the defining property of the subtype. By using the provided filters, the users can easily construct compounded subtype definitions. Some examples of allowed mutation-based subtype definitions in increasing specificity are BRAFmut, BRAFV600mut, or BRAFV600E skin cancer; an alternative option is ARID1Adel cancer from all tissues. Histology based subtype definitions are often more traditional with an example being the non-small cell lung cancer subtype of lung cancer.

Example: BRAF mutation selection

Gene Knockdown on Cancer Cell Lines

Recently, there have been multiple large scale studies of gene knockdown induced lethality on hundreds of different cell lines. Two important ones are Project DRIVE (~8k genes knocked down on ~400 cell lines, with 20 shRNAs per gene on average) and Project Achilles (~17k genes knocked down on ~500 cell lines, with 6.3 shRNAs per gene on average). Achilles was analyzed using Demeter, which recently has been improved to produce Demeter2. This new version incorporates data from DRIVE as well as Achilles, and a third smaller dataset (Marcotte). We provide the user the option to use either this latest iteration, Demeter2, or DRIVE solo due to its high average per gene shRNA count.

Read more: DRIVE Read more: Demeter2

Vulnerability Analysis

Using the user-defined subtype and the selected gene knockdown dataset, we automatically perform a statistical test to identify the subtype specific vulnerabilities. Specifically, we use the hypergeometric enrichment test to calculate the genes whose knockdown is specifically lethal to the user-defined subtype compared to the rest of the cell lines. Since we test for many hypotheses, we calculate the FDR (Benjamini-Hochberg method) to correct for multiple hypothesis correction.

Example: BRAFV600E skin cancer vulnerabilities


The STITCH database ‘stitches’ together multiple different protein-chemical interaction data sources. It uses these data sources to calculate a single summary score that estimates the confidence in the validity of the interaction of each chemical-protein pair.

Read more: STITCH

Protein-chemical interaction mapping

We filtered STITCH v5 to identify all the chemicals that interact with a human protein with high confidence (STITCH self identifies high confidence as being 70% or greater) based exclusively on experimental evidence. This means that each protein-chemical interaction we included was supported by high confidence experimental evidence.


DrugBank is a Database that combines detailed drug data with comprehensive drug target information. DrugBank contains approval status information for chemicals. Specifically, 1,739 small molecule chemicals are annotated as approved.

Read more: DrugBank

Drug Similarity Report

Not all chemicals make drugs. For the users who are interested in repurposing existing drugs in order to expedite the path to clinic, we provide how similar each predicted chemical is to an approved drug. To do this, we retrieved the chemical structures of all drugs annotated as approved in DrugBank. Then we compared each chemical that was identified as an interaction partner for a cancer-specific target to all the approved drugs using RDKit. For each chemical, we present the drug that was found to be the most similar, along with the similarity percentage. Users interested in repurposing can use this to prioritize existing drugs, while users interested in novel chemistries can do exactly the opposite.

Example: BRAF targeting small molecules



Murat Cobanoglu

Principal Investigator


Vasanth Siruvallur Murali

Postdoctoral Scholar


Meyer Zinn

Computational Scientist

Venat Malladi

Web Developer Team Lead

Jonathan Gesell

Web Developer

Mingzhu Nie

Web Developer


UT Southwestern Medical Center
Lyda Hill Department of Bioinformatics
Suite E4.350, Mail Code 9365
5323 Harry Hines Blvd.
Dallas, TX 75390-9077