Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPAdes assembler crashed due to odd read correction #188

Closed
oschwengers opened this issue May 15, 2019 · 9 comments
Closed

SPAdes assembler crashed due to odd read correction #188

oschwengers opened this issue May 15, 2019 · 9 comments

Comments

@oschwengers
Copy link

oschwengers commented May 15, 2019

Hi and thanks a lot for this great tool!
I use Unicycler a lot and so far it almost always did a great job.

I recently QCed and assembled SRR1609861 (Illumina PE only) with fastp and Unicycler and for some reason Unicycler crashes in the first kmer (K27) assembly iteration right after the read error correction step:

Error: SPAdes failed to produce assemblies. See spades_assembly/assembly/spades.log for more info

The SPAdes log says:

The number of right read-pairs is larger than the number of left read-pairs
Unequal number of read-pairs detected in the following files: /var/scratch/2014C-3598-fastp/spades_assembly/corrected_1.fastq.gz  /var/scratch/2014C-3598-fastp/spades_assembly/corrected_2.fastq.gz

The Unicycler cmd:

$ unicycler -1 1.fastq.gz -2 2.fastq.gz -s se.fastq.gz -o . --verbosity 3 --keep 3 -t 32

se.fastq.gz contains unpaired reads surviving the QC but lacking a valid mate.

Indeed, the Unicycler-internal SPAdes-corrected read files are erroneous as the forward file only contains a fraction of the actual reads:

$ ll spades_assembly/
total 57M
drwxr-xr-x 4 oschweng cb 4.0K May 15 14:00 ./
drwxr-xr-x 3 oschweng cb 4.0K May 15 13:58 ../
drwxr-xr-x 6 oschweng cb 4.0K May 15 14:00 assembly/
-rw-r--r-- 1 oschweng cb 5.5M May 15 14:00 corrected_1.fastq.gz
-rw-r--r-- 1 oschweng cb  51M May 15 14:00 corrected_2.fastq.gz
-rw-r--r-- 1 oschweng cb 1.3M May 15 14:00 corrected_u.fastq.gz
-rw-r--r-- 1 oschweng cb   42 May 15 14:00 kmer_range
drwxr-xr-x 4 oschweng cb 4.0K May 15 14:00 read_correction/

The fastp PE output seems OK and has exact the same number of reads:

zcat 1.fastq.gz | grep -c '@SRR'
246147
zcat 2.fastq.gz | grep -c '@SRR'
246147

Interestingly, a normal SPAdes assembly with the same SPAdes version (3.13.0) enabling internal read error correction finished without any problems.

I'm running the latest Unicycler (v0.4.7) on a native Ubuntu with 64 cores (HT), 256 Gb memory and local storage; so no VM issues should be involved here. I could also reproduce this issue on a different machine.

I'm equally puzzled and curious to know what exactly is the cause for this crash. Any help very appreciated! Please, let me know if you need anything else.
Best regards!

@oschwengers
Copy link
Author

oschwengers commented May 15, 2019

Hi,
I just realized that running Unicycler without the unpaired reads finishes successfully. As the unpaired short reads should be unrelated to the aforementioned recognized PE issue, this seems rather odd to me.

Am I missing something or might be there some internal issue with unpaired short data provided via -s when also using PE data?

So, this works:

unicycler -1 1.fastq.gz -2 2.fastq.gz -o . --verbosity 3 --keep 3 -t 32

but this not:

unicycler -1 1.fastq.gz -2 2.fastq.gz -s se.fastq.gz -o . --verbosity 3 --keep 3 -t 32

@oschwengers oschwengers changed the title SPAdes assembler crashed due to spurious read correction SPAdes assembler crashed due to odd read correction May 16, 2019
@dswan
Copy link

dswan commented May 21, 2019

I don't think Unicycler supports multiple short-read libraries: #64

@oschwengers
Copy link
Author

Thanks @dswan for the hint. Nevertheless, I think discussion in #64 are a little bit different as they're talking about different short read libraries.

My question / bug report is dealing with a single short-read lib. I used https://github.com/OpenGene/fastp to QC and stitch the PE reads. Thus, I get PE, merged/stitched and remaining SE reads from a single short-read lib.

As Unicycler offers parameters for both PE and SE short-reads, it might be very beneficial to accept and forward them to the SPAdes, as newer versions exactly accept this kind of short-read file sets.

@dswan
Copy link

dswan commented May 22, 2019

I'm curious as to why you're not just supplying the paired end library? What's the utility in merging a paired-end library and treating the merged pairs as a single end library, but retaining the pairs that don't stitch? I'm more intrigued if you have a specific use-case why this is beneficial than anything else!

@oschwengers
Copy link
Author

oschwengers commented May 22, 2019

It's because you get slightly longer reads and thus larger kmers so SPAdes is potentially able to compute better assemblies. The new SPAdes version (>3.12.0) provides extra parameters in order to provide these files (-1/-2, -s, -merged)

@dswan
Copy link

dswan commented May 22, 2019

It's because you get slightly longer reads and thus larger kmers so SPAdes is potentially able to compute better assemblies. The new SPAdes version (>3.12.0) provides extra parameters in order to provide these files (-1/-2, -s, -merged)

Thanks, I've seen this strategy used in eukaryotic genome assembly but only when supplemented with LMP libraries or other positional information, hadn't seen that this was part of the latest SPAdes release though!

@oschwengers
Copy link
Author

Does anyone else (@rrwick ) have an idea?

@rrwick
Copy link
Owner

rrwick commented Oct 25, 2019

Oliver shared his reads with me (thanks!) so I could try to reproduce this issue, but I failed to do so. I.e. when I assemble the same reads on my computer, I get the proper result:

-rw-r--r--  1 ryan  41M Oct 25 15:50 corrected_1.fastq.gz
-rw-r--r--  1 ryan  51M Oct 25 15:50 corrected_2.fastq.gz
-rw-r--r--  1 ryan 1.3M Oct 25 15:50 corrected_u.fastq.gz

So I think this might be a weird platform-specific SPAdes-related bug, and I don't think I can solve it.

A decent workaround is to just turn off read correction by using Unicycler's --no_correct option. In my limited tests this does not have a negative impact on assembly quality.

Sorry for the lack of a solid resolution, but since I think there's nothing to be done on Unicycler's end, I'm going to close this issue now!

@rrwick rrwick closed this as completed Oct 25, 2019
@oschwengers
Copy link
Author

Thanks @rrwick for the deep dive into this. As the error seems to be unreproducible but remains at our site I'll just use the --no-correct option which mitigates this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants