Pipeline for filtering and cleaning of multiple sequence alignments to identify ideal consensus regions from homologue sequences for primer design.
Publication: https://doi.org/10.3390/biomedinformatics2040041
The pipeline is developed and testet for:
Software | Version |
---|---|
Linux | 20.04 |
Mafft | 7.453 |
Primer3 | 2.5.0 |
On the hardware side, the alignments with mafft are the bottleneck. The length and number of sequences play a decisive role. Alignments with a few hundred sequences of moderate length can be calculated in a few seconds to minutes even on simple laptops. Larger alignments require a system with more RAM.
The duration of the installation is a few minutes and depends among other things on whether various requirements are already installed, such as conda.
The processing time of the sample dataset is about 15 sec on a laptop with Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz and 16 GB RAM.
Install Python3.8 in miniconda. https://docs.conda.io/en/latest/miniconda.html
Download and uncompress all files from the GitHub repository.
wget https://github.com/mcollatz/ConsensusPrime/archive/refs/heads/main.zip
unzip main.zip
Adept the primer3_parameters.txt to your needs. For details see https://primer3.org/manual.html
Install MAFFT, Primer3 and ClustalX (optional for alignment visualzation)
sudo apt-get update -y
sudo apt-get install -y mafft
sudo apt-get install -y primer3
sudo apt-get install -y clustalx
Create and activate new Python environment for ConsensusPrime
conda create -n consensus_prime
conda activate consensus_prime
Install pandas
conda install pandas
Download the example_data.fna and primer3_parameters.txt then run the following command (remember to adjust the file paths accordingly).
python3.8 /path_to/consensus_prime.py --infile /path_to/example_data.fna --primer3 /path_to/primer3_parameters.txt
Options:
command | what it does |
---|---|
-i, --infile | Multi-Fasta file with gene sequences. [required] |
-x, --primer3 | Primer3 parameter file. [required] |
-o, --outdir | Specifies output directory. Default = . |
-t, --threads | Number of threads used by MAFFT. Default = -1 (all) |
-k, --keepduplicates | Keep duplicate sequences. Default = False |
-c, --consensusthreshold | Consensus threshold bitween 0 and 1 with 1 beeing a perfect consensus. Default = 0.95 |
-s, --consensussimilarity | Minimum similarity threshold for sequences in the input alignment when comparing each sequence to the consensus sequence. Default = 0.8 |
--primers | Known primers for visualisation in the final alignment in multifasta format. |
--negativesequences | File with sequences that get their consensus sequence added to the final alignment in multifasta format. |
-h, --help | show this message and exit |
Exit the conda environment when you are done
conda deactivate
Simply pull and run a ready-to-use image from Dockerhub:
Show --help
docker run mcollatz/consensusprime:1.0
Example
Download the example_data.fna and primer3_parameters.txt then run the following command (remember to adjust the file paths accordingly).
docker run -t --rm -v /path/to/dir/with/your/input/files/:/in \
-v /path/to/dir/for/results/:/out \
-u `id -u $USER`:`id -g $USER` \
mcollatz/consensusprime:1.0 \
/consensus_prime.py --infile /in/example_data.fna --primer3 /in/primer3_parameters.txt --outdir /out
The results are stored in the specified /out directory or in the current directory under "/results". The intermediate results of the individual filter steps are also located in this directory. The predicted primers are listed in the "consensus_prime_summary.html" with all related details. The predicted primers can be viewed in "final_alignment.fna" using an alignment visualization program such as ClustalX.
For an example output check the example_results.zip.