Step 1: Preparation

If you haven't already done so, download this zip file:  tutorial_files.zip Extract this zip file. It should contain the following files

  • Mouse gene database
  • small pathway set with 8 mouse pathways
  • Experimental data in Excel format

Step 2: Launching PathVisio

Start PathVisio by going to the  Download page and clicking the download link. When asked to open or save the file, select open. After this step, when prompted to save a shortcut to your desktop, select cancel.

Step 3: Data import

The data is in the file GCRMA_ES_EB.xls. This is an Excel spreadsheet. PathVisio can't read this data directly, so first you need to open it in Excel and save it as a tab delimited text format. Step by step:

While the file is open in Excel, you can examine the dataset. Can you figure out what the columns mean? How many samples have been measured in this dataset, and how many conditions have been compared?

  1. Open GCRMA_ES_EB.xls in Excel.
  2. Click File > Save as. Change the drop down menu next to Save as type so that it lists Text (Tab delimited).
  3. Click Save. A popup will appear warning about incompatible features. Click Yes.
  4. Close Excel. Select No when it asks to save again, this is not necessary.

Now switch back to PathVisio.

  1. Select Data > Import expression data.
  2. For the input file, select the tab delimited text file you just created.
  3. Leave the output file as it is.
  4. Make sure the gene database ends with Mm_Lite.pgdb (can be found in the tutorial_files folder). Click Next.
  5. Make sure the data delimiter is set to "tab". Click Next.
  6. Make sure the primary identifier column is set to "Probe set".
  7. Make sure the system code column is set to "System Code". Click Next. Data import may take a few minutes.

When the import is finished, you should see a message like this:

1034 genes were added successfully to the expression dataset 273 exceptions occurred

You can safely ignore these 273 exceptions, these are caused by unrecognized genes. These are usually unknown genes, and it is unavoidable that a certain percentage of data is unknown.

When the Finish button becomes clickable, click it. A file named gcrma_ES-EB-expression.pgex has now been created, this stores all gene expression data.

Step 4: Create a color set

  1. Go to File > Open and select the file Mm_ESC_Pluripotency_Pathways.gpml that you downloaded at the beginning of the tutorial.
  2. Go to Data > Select expression dataset and select the gcrma_ES-EB-expression.pgex that you created in the previous step.
  3. Go to Data > Visualization options. A dialog window pops up.
  4. Enable the check box Text Label. A small panel shows up, but you don't need to change anything there.
  5. Enable the check box Expression as color. A panel shows up.
  6. Scroll a bit down in the table and mark the check box for the item log Fold EB vs ES.
  7. Create a new color set by clicking the tool icon in the bottom right and selecting New.
  8. In the new dialog, select the Gradient check box.

Now the gene boxes on the pathway should have changed color based on their expression values. For example down regulated genes are blue, up regulated genes are yellow, and unchanged genes are grey (The coloring depends on the exact color options you selected).

Open other pathways. Which pathway has changed the most?

Step 5: Search for regulated pathways

  1. Go to Data > Statistics. In the Expression field, enter the following:

[Fold EB vs ES] > 1.2 AND [rawp] < 0.05

  1. In the Pathway directory field, select the pathway directory that comes with the tutorial files you downloaded in the beginning.
  2. Click Calculate.

After the calculations have finished, you now get a list of pathways, ordered by how many genes that have a fold change of > 1.2 (i.e have increased more than 20% in expression) with a significant rawp value (confidence level 0.05).

In the result table you see the values for r, the number of genes that meet your criterion, and n, the total number of genes in the Pathway. From these values a percentage and a z-score are calculated. Note how the z-score and percentages are related for different pathways.

You can do the same calculation for down-regulated genes by using the following expression:

[Fold EB vs ES] < -1.2 AND [rawp] < 0.05