Nicotine-Microbiome

IQ Biology Rotation Project with Noah Fierer

Goal

Assess number of nicotine-degrading genes in the oral microbiomes of smokers/vapers vs non-smokers/vapers

Pull shotgun metagenomic data from this paper
- Project IDs: PRJNA548383, PRJNA544061, and PRJNA508385
Create database of sequences for genes involved in nicotine degradation, based on Mu et al., 2020
- MetaCyc degradation pathways tsvs added
- Pull fastas from GenBank
  - conda install -y -c conda-forge -c bioconda -c defaults entrez-direct
  - esearch -db protein -query 'Protein name' | efetch -format fasta
  - But some of the genes only have names, not accession IDs, so they return sequences for unwanted genes...
- Start with a few key genes to get the pipeline running
- Pull from UniProt and/or GenBank
  - UniProt API querying
- Then add others once things are working
- Check for these genes in their hosts in IMG as a sanity check
Use Diamond to find nicotine-degrading genes in the oral microbiolmes
Assess difference in nicotine-degrading genes in the oral microbiomes of smokes/vapers vs controls
Look at MAGs if time?

Get MetaCyc gene details
Search E.C. IDs in UniProt
Download fastas for all matching sequences
- For 3.5.1.3, filtered only bacteria because lots of human and mouse sequences were coming up
- Skipped 1.3.99.- and 1.5.99.- because too many unrelated genes were coming up
- E.C. for nctB was sourced from Uniprot info on (S)-6-hydroxynicotine oxidase
- Skipped 1.5.8.M1 because it wasn't pulling up any results, and searching the name pulled up kdhA/B/C, which I already had

Please cite http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)
tested diamond with diamond blastx -q PRJNA548383/SRR9641933/SRR9641933_1.fastq -d database-building/uniprot-nicotine-db.dmnd -o test.tsv
- Got a couple hundred hits

setup_scripts contains scripts with code to install sra toolkit. I haven't tried these in script format (since I only needed to install it once), but I imagine I will need this code later if using fastq-dump on fiji.

If having issues see the documentation or config guidelines.
Note for future John can try the following instead of vdb-config -i:
- vdb-config --restore-defaults
- vdb-config -s "/repository/user/main/public/root=."
- vdb-config -s "cache-enabled=true"
- vdb-config --report-cloud-identity yes

get_data.sh is a script to loop through the SRA accession values for the projects and download the fastq files in directories named by their run IDs.

while read line; do
wget http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=dload&run_list=${line}&format=fastq;
done<list_of_ids