Running analyses with PEGS¶
Basic usage¶
To run an analysis the basic command line is:
pegs [options] GENE_INTERVALS --peaks PEAKSET [PEAKSET ...] --genes CLUSTER [CLUSTER ...]
where:
GENE_INTERVALS
is a set of reference transcription start sites (TSSs) for all genes; it can either be the name of a built-in reference set (for example “mm10”), or file with BED interval data
PEAKSET
is a BED file containing input ChIP-seq peaks data (or other genomic intervals)
CLUSTER
is a file defing a gene cluster
PEGS
will then calculate the enrichments (p-values and
counts) using a default set of genomic distances around the
input intervals in each of the peak-sets.
By default the program outputs a PNG heatmap called
pegs_heatmap.png
, and an XLSX file with the p-values and
count data called pegs_results.xlsx
.
The formats and naming conventions for the various files are described in Input files and Output files.
Warning
This is a change in version 0.6.0 to the previous way of specifying peaksets and gene cluster via directories, which is no longer supported but can replicated using a command line of the form:
pegs [options] GENE_INTERVALS --peaks PEAKS_DIR/* --genes CLUSTERS_DIR/*
Specifying genomic distances (-d
, --distances
)¶
The default set of genomic distances used in the enrichment
calculations can be overriden with a custom set of intervals
specified using the -d
or --distances
option:
pegs mm10 --peaks PEAKSET [PEAKSET ...] --genes CLUSTER [CLUSTER ...] -d [DISTANCE [DISTANCE ...]]
For example:
pegs mm10 --peaks ./InputPeaks/*.bed ./Clusters/*.txt -d 1000 2000
will calculate enrichments for +/-1KB and +/-2KB from the centre of the input peak-set intervals.
Warning
This is a change in version 0.6.0 to the previous way of specifying distances at the end of the command line, which is no longer supported.
Specifying TADs (-t
, --tads
)¶
In addition to individual distances, enrichments within TADs
(Topologically Associated Domains) can be calculated by
providing a BED file with TAD definitions using the
-t
/--tads
option.
In this case the heatmap for the TADs will be appended to the heatmap for peaks and distances, and the raw data will be appended to the XLSX file.
Specifying output file names (--name
, -m
, -x
)¶
The basename for all the output files can be set using the
--name
option; by default the basename is pegs
and
the output files will be called pegs_heatmap.png
and
pegs_results.xlsx
.
The names for these output files can also be set explicitly
using the -m
and -x
options:
-m
: sets the heatmap file name and image format (based on the file extension, e.g.my_heatmap.png
will produce a PNG image,my_heatmap.svg
will produce an SVG etc)
-x
: sets the file name for the XLSX file
Note
Using the -m
and -x
options to explicitly set
the output file names will override the implicit file
names generated from the basename.
Specifying where output files are written (-o
)¶
By default the result files are written to current working
directory, but they can be redirected to a different directory,
by using the -o
option to specify the location.
Note
The directory specified by -o
will be created if it
doesn’t already exist.