************
Output files
************

Overview
========

The default output of the program is a heatmap and the
corresponding data as an XLSX spreadsheet.

Heatmap
=======

In the heatmap, x-axis are the gene clusters, and y-axis show
peak-sets expanded to different genomic distances.

The cell color represents -log10(p-value) of enrichment, and
annotation of cells are the number of common genes for that
particular combination of gene cluster (x-axis) and gene-set
derived from the expanded peak-set (y-axis). For example:

.. image:: images/example_heatmap.png
   :width: 400
   :alt: Example heatmap from PEGS

If TADs are used, then the additional subplot at the bottom shows
similar enrichment as above, but without individual distances
for peak-sets (because each genomic interval is expanded to
distance/bounary defined in the TADs BED file). For example:

.. image:: images/example_with_tads_heatmap.png
   :width: 400
   :alt: Example heatmap with TADs from PEGS

By default the heatmap is output as a PNG image, however this
can be changed (along with other properties of the heatmap such
as colours and axis labels) as described in
:ref:`customising_the_heatmap`.

XLSX file
=========

This spreadsheet includes all the data (including p-values, and
common genes) used to generate the heatmap; for example:

.. image:: images/example_results_xlsx.png
   :width: 400
   :alt: Example XLSX output from PEGS

Note that as all the data used in the heatmap is also output to
the XLSX file, the user can use this to build their own custom
heatmaps.

Optional outputs
================

Intersection files
------------------

By default the program removes all working data on successful
completion, however it is possible to keep the intermediate
intersection files by specifying the ``-k``
(``--keep-intersection-files``) option. The intersection files
will be written to the directory ``intersection_beds``.

These files are generated by intersecting the expanded peak-set
BED file (for a given distance) and the ``GENE_INTERVALS`` input
file (reference gene intervals file for all genes). They can be
used for further analysis, for example finding common gene names
and overlapping peaks, which can be used for motif enrichment etc.

Raw p-value and count data
--------------------------

These can be output using the ``--dump-raw-data`` option.