Running analyses with PEGS

Basic usage

To run an analysis the basic command line is:

pegs [options] GENE_INTERVALS --peaks PEAKSET [PEAKSET ...] --genes CLUSTER [CLUSTER ...]

where:

  • GENE_INTERVALS is a set of reference transcription start sites (TSSs) for all genes; it can either be the name of a built-in reference set (for example “mm10”), or file with BED interval data

  • PEAKSET is a BED file containing input ChIP-seq peaks data (or other genomic intervals)

  • CLUSTER is a file defing a gene cluster

PEGS will then calculate the enrichments (p-values and counts) using a default set of genomic distances around the input intervals in each of the peak-sets.

By default the program outputs a PNG heatmap called pegs_heatmap.png, and an XLSX file with the p-values and count data called pegs_results.xlsx.

The formats and naming conventions for the various files are described in Input files and Output files.

Warning

This is a change in version 0.6.0 to the previous way of specifying peaksets and gene cluster via directories, which is no longer supported but can replicated using a command line of the form:

pegs [options] GENE_INTERVALS --peaks PEAKS_DIR/* --genes CLUSTERS_DIR/*

Specifying genomic distances (-d, --distances)

The default set of genomic distances used in the enrichment calculations can be overriden with a custom set of intervals specified using the -d or --distances option:

pegs mm10 --peaks PEAKSET [PEAKSET ...] --genes CLUSTER [CLUSTER ...] -d [DISTANCE [DISTANCE ...]]

For example:

pegs mm10 --peaks ./InputPeaks/*.bed ./Clusters/*.txt -d 1000 2000

will calculate enrichments for +/-1KB and +/-2KB from the centre of the input peak-set intervals.

Warning

This is a change in version 0.6.0 to the previous way of specifying distances at the end of the command line, which is no longer supported.

Specifying TADs (-t, --tads)

In addition to individual distances, enrichments within TADs (Topologically Associated Domains) can be calculated by providing a BED file with TAD definitions using the -t/--tads option.

In this case the heatmap for the TADs will be appended to the heatmap for peaks and distances, and the raw data will be appended to the XLSX file.

Specifying output file names (--name, -m, -x)

The basename for all the output files can be set using the --name option; by default the basename is pegs and the output files will be called pegs_heatmap.png and pegs_results.xlsx.

The names for these output files can also be set explicitly using the -m and -x options:

  • -m: sets the heatmap file name and image format (based on the file extension, e.g. my_heatmap.png will produce a PNG image, my_heatmap.svg will produce an SVG etc)

  • -x: sets the file name for the XLSX file

Note

Using the -m and -x options to explicitly set the output file names will override the implicit file names generated from the basename.

Specifying where output files are written (-o)

By default the result files are written to current working directory, but they can be redirected to a different directory, by using the -o option to specify the location.

Note

The directory specified by -o will be created if it doesn’t already exist.