ATAC-Seq analysis

Author: Ji Huang

This is the my note on the ATAC-Seq analysis.

Background

There are many wonderful tutorials/notes on web and I learnt from them. Below to name a few:

Pipeline overview

My scripts is the basic one, including:

Download fast files from EBI. You can use SRA-Explorer to find the URL for downloading fastq files. No need to start from sra files any more.
Adapter trimming with fastp.
Reads mapping with Bowtie2.
Peak calling with Genrich. The nice part of Genrich is Genrich was designed to be able to run all of the post-alignment steps through peak-calling with one command. No need to run samtools, Picard and all kinds of commands. However, Genrich is still not peer-reviewed/published, although it get used many times in papers.
Calculate Fraction of reads in peaks (FRiP).

According to ENCODE term:

Fraction of reads in peaks (FRiP) – Fraction of all mapped reads that fall into the called peak regions, i.e. usable reads in significantly enriched peaks divided by all usable reads. In general, FRiP scores correlate positively with the number of regions. (Landt et al, Genome Research Sept. 2012, 22(9): 1813–1831)

For ATAC-Seq,

The fraction of reads in called peak regions (FRiP score) should be >0.3, though values greater than 0.2 are acceptable. For EN-TEx tissues, FRiP scores will not be enforced as QC metric. TSS enrichment remains in place as a key signal to noise measure.

I mostly follow the ATAC-seq Guidelines from Harvard FAS Informatics.

Pipeline details

You can run step 1,2,3 sequentially, or run submit_all.sh to submit all three jobs.

01_download_fastq.slurm: download fastq files.
02_submit_snake.slurm: run the adapter cleaning and Bowtie2 alignment step.
03_call_peaks.slurm: call peaks with Genrich.
submit_all.sh: if you set up correctly, you can just run this bash script which including the above three jobs.
calc_FRiP.slurm: calculate FRiP.

Misc

To get the maize contigs name that we want to exclude from Genrich peak calling, you can run grep "^B" zmav4.chr.length.txt |cut -f1|sed -z 's/\n/,/g;s/,$/\n/'.
The example code is to re-analyze the maize protoplast ATAC-Seq data in 3D Chromatin Architecture of Large Plant Genomes Determined by Local A/B Compartments, 2017 Molecular Plant.
To get the sum base pair of the peak regions, awk '{$4=$3-$2; sum+=$4} END {print sum}' SRR5748809_10_ATAC_Zma_leaf_mesophyll.narrowPeak. The result is 22856828 or 22.9Mb.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ATAC-Seq analysis

Background

Pipeline overview

Pipeline details

Misc

About

Releases

Packages

Languages

timedreamer/ATAC-Seq_analysis

Folders and files

Latest commit

History

Repository files navigation

ATAC-Seq analysis

Background

Pipeline overview

Pipeline details

Misc

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages