Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running assembly workflow on Ubuntu: Error in rule assembly_megahit: #113

Open
nalbright opened this issue Jul 18, 2018 · 6 comments

Comments

@nalbright
Copy link

As per assembly workflow in Really Quick Copy-And-Paste Quick Start I copied the json file specified and execute the following which gave me the following error:
$ export SINGULARITY_BINDPATH="data:/data"
$ snakemake -p --use-singularity --configfile=config/custom_assembly_workflow.json assembly_workflow_all

Error:

Building DAG of jobs...
Pulling singularity image docker://quay.io/biocontainers/spades:3.11.1--py27_zlib1.2.8_0.
Pulling singularity image docker://quay.io/biocontainers/megahit:1.1.2--py35_0.
Using shell: /bin/bash
Provided cores: 1

Rules claiming more threads will be scaled down.
Job counts:
count jobs
4 assembly_megahit
4 assembly_metaspades
1 assembly_workflow_all
9

Job 6: --- Assembling quality trimmed reads with Megahit

rm -rf data/SRR606249_subset25.trim2_megahit && megahit -t 1 --memory 0.20 -1 /data/SRR606249_subset25_1.trim2.fq.gz -2 /data/SRR606249_subset25_2.trim2.fq.gz --out-prefix=SRR606249_subset25.trim2_megahit -o /data/SRR606249_subset25.trim2_megahit && mv /data/SRR606249_subset25.trim2_megahit/SRR606249_subset25.trim2_megahit.contigs.fa /data/SRR606249_subset25.trim2_megahit.contigs.fa
Activating singularity image /home/user/dahak_2018/dahak/workflows/.snakemake/singularity/bfd669a63b585d366276296fdcd11501.simg
7.791Gb memory in total.
Using: 1.558Gb.
MEGAHIT v1.1.2
--- [Tue Jul 17 21:49:41 2018] Start assembly. Number of CPU threads 1 ---
--- [Tue Jul 17 21:49:41 2018] Available memory: 8365150208, used: 1673030041
--- [Tue Jul 17 21:49:41 2018] Converting reads to binaries ---
b' [read_lib_functions-inl.h : 209] Lib 0 (/data/SRR606249_subset25_1.trim2.fq.gz,/data/SRR606249_subset25_2.trim2.fq.gz): pe, 26715952 reads, 101 max length'
b' [utils.h : 126] Real: 71.9618\tuser: 23.4359\tsys: 10.3257\tmaxrss: 155080'
--- [Tue Jul 17 21:50:53 2018] k-max reset to: 119 ---
--- [Tue Jul 17 21:50:53 2018] k list: 21,29,39,59,79,99,119 ---
--- [Tue Jul 17 21:50:53 2018] Extracting solid (k+1)-mers for k = 21 ---
--- [Tue Jul 17 22:13:00 2018] Building graph for k = 21 ---
Error occurs when running "builder build" for k = 21; please refer to /data/SRR606249_subset25.trim2_megahit/SRR606249_subset25.trim2_megahit.log for detail
[Exit code 1]
Error in rule assembly_megahit:
jobid: 6
output: data/SRR606249_subset25.trim2_megahit.contigs.fa
log: data/SRR606249_subset25.trim2_megahit.log

RuleException:
CalledProcessError in line 148 of /home/user/dahak_2018/dahak/workflows/assembly/Snakefile:
Command 'singularity exec --home /home/user/dahak_2018/dahak/workflows /home/user/dahak_2018/dahak/workflows/.snakemake/singularity/bfd669a63b585d366276296fdcd11501.simg bash -c ' set -euo pipefail; rm -rf data/SRR606249_subset25.trim2_megahit && megahit -t 1 --memory 0.20 -1 /data/SRR606249_subset25_1.trim2.fq.gz -2 /data/SRR606249_subset25_2.trim2.fq.gz --out-prefix=SRR606249_subset25.trim2_megahit -o /data/SRR606249_subset25.trim2_megahit && mv /data/SRR606249_subset25.trim2_megahit/SRR606249_subset25.trim2_megahit.contigs.fa /data/SRR606249_subset25.trim2_megahit.contigs.fa '' returned non-zero exit status 1.
File "/home/user/dahak_2018/dahak/workflows/assembly/Snakefile", line 148, in __rule_assembly_megahit
File "/home/user/miniconda3/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/user/dahak_2018/dahak/workflows/.snakemake/log/2018-07-17T214845.783217.snakemake.log

@nalbright
Copy link
Author

I have a question about the Assembly workflow in the Really Quick Copy-and-Paste Quick Start:

I noticed that the input for the assembly is the same (raw data) that is called for the read filtering workflow. Shouldn't the input for assembly workflow be the output of the readfiltering workflow (i.e. filtered/trimmed reads rather than the raw reads)? The same raw data are also called for the inputs with the next workflows ( comparison, taxonomic classification)? Is this something that the user should be updating with each progressive workflow? Could this be a possible source of the error for that I am seeing above for the assembly?

I appreciate any clarification you can provide!
Thanks,
Nicolette

@ctb
Copy link
Contributor

ctb commented Jul 20, 2018 via email

@ctb
Copy link
Contributor

ctb commented Jul 20, 2018

In order to diagnose the error, we might need the file data/SRR606249_subset25.trim2_megahit/SRR606249_subset25.trim2_megahit.log - could you paste that here? thx!

@nalbright
Copy link
Author

Thanks for the explanation!
Before seeing your response I went ahead and executed assembly, but changed the input files from the copy and paste to the outputs of read trimming. It has been successfully running over night and is still running!

Here is the .log file that you requested above:

MEGAHIT v1.1.2
--- [Thu Jul 19 10:06:16 2018] Start assembly. Number of CPU threads 1 ---
--- [Thu Jul 19 10:06:16 2018] Available memory: 8365150208, used: 1673030041
--- [Thu Jul 19 10:06:16 2018] Converting reads to binaries ---
/usr/local/bin/megahit_asm_core buildlib /data/SRR606249_subset25.trim2_megahit/tmp/reads.lib /data/SRR606249_subset25.trim2_megahit/tmp/reads.lib
b' [read_lib_functions-inl.h : 209] Lib 0 (/data/SRR606249_subset25_1.trim2.fq.gz,/data/SRR606249_subset25_2.trim2.fq.gz): pe, 26715952 reads, 101 max length'
b' [utils.h : 126] Real: 72.5175\tuser: 25.3008\tsys: 9.7853\tmaxrss: 155044'
--- [Thu Jul 19 10:07:28 2018] k-max reset to: 119 ---
--- [Thu Jul 19 10:07:28 2018] k list: 21,29,39,59,79,99,119 ---
--- [Thu Jul 19 10:07:28 2018] Extracting solid (k+1)-mers for k = 21 ---
cmd: /usr/local/bin/megahit_sdbg_build count -k 21 -m 2 --host_mem 1673030041 --mem_flag 1 --gpu_mem 0 --output_prefix /data/SRR606249_subset25.trim2_megahit/tmp/k21/21 --num_cpu_threads 1 --num_output_threads 1 --read_lib_file /data/SRR606249_subset25.trim2_megahit/tmp/reads.lib
b' [sdbg_builder.cpp : 112] Host memory to be used: 1673030041'
b' [sdbg_builder.cpp : 113] Number CPU threads: 1'
b' [cx1.h : 450] Preparing data...'
b' [read_lib_functions-inl.h : 256] Before reading, sizeof seq_package: 885946936'
b' [read_lib_functions-inl.h : 260] After reading, sizeof seq_package: 885946936'
b' [cx1_kmer_count.cpp : 136] 26715952 reads, 101 max read length'
b' [cx1.h : 457] Preparing data... Done. Time elapsed: 8.0859'
b' [cx1.h : 464] Preparing partitions and initialing global data...'
b' [cx1_kmer_count.cpp : 227] 2 words per substring, 2 words per edge'
b' [cx1_kmer_count.cpp : 322] Set: 1145227092, 708974305'
b' [cx1.h : 171] Adjusting memory layout: max_lv1_items=283796391, num_sorting_items=418397, mem_sorting_items=10041528, mem_avail=708974305'
b' [cx1_kmer_count.cpp : 356] 174733194, 418397 708974304 708974305'
b' [cx1_kmer_count.cpp : 363] Memory for reads: 906953816'
b' [cx1_kmer_count.cpp : 364] max # lv.1 items = 174733194'
b' [cx1.h : 480] Preparing partitions and initialing global data... Done. Time elapsed: 31.2946'
b' [cx1.h : 486] Start main loop...'
b' [cx1.h : 515] Lv1 scanning from bucket 0 to 1180'
b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 41.0832'
b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 58.6954'
b' [cx1.h : 515] Lv1 scanning from bucket 1180 to 2972'
b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 40.1835'
b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 57.0502'
b' [cx1.h : 515] Lv1 scanning from bucket 2972 to 5165'
b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 48.7228'
b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 63.3104'
b' [cx1.h : 515] Lv1 scanning from bucket 5165 to 7735'
b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 40.4177'
b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 66.7352'
b' [cx1.h : 515] Lv1 scanning from bucket 7735 to 10699'
b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 43.5897'
b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 68.2777'
b' [cx1.h : 515] Lv1 scanning from bucket 10699 to 14101'
b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 46.8261'
b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 65.2864'
b' [cx1.h : 515] Lv1 scanning from bucket 14101 to 18006'
b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 41.7479'
b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 66.8297'
b' [cx1.h : 515] Lv1 scanning from bucket 18006 to 22532'
b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 43.2049'
b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 68.1576'
b' [cx1.h : 515] Lv1 scanning from bucket 22532 to 27895'
b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 48.6676'
b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 70.1291'
b' [cx1.h : 515] Lv1 scanning from bucket 27895 to 34492'
b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 44.5378'
b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 69.6958'
b' [cx1.h : 515] Lv1 scanning from bucket 34492 to 43306'
b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 56.5795'
b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 77.2814'
b' [cx1.h : 515] Lv1 scanning from bucket 43306 to 61704'
b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 59.7476'
b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 81.8533'
b' [cx1.h : 515] Lv1 scanning from bucket 61704 to 65536'
b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 32.5052'
b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 4.0216'
b' [cx1.h : 607] Main loop done. Time elapsed: 1405.1385'
b' [cx1.h : 613] Postprocessing...'
b' [cx1_kmer_count.cpp : 860] Total number of candidate reads: 311050(531785)'
b' [cx1_kmer_count.cpp : 871] Total number of solid edges: 183916390'
b' [cx1.h : 621] Postprocess done. Time elapsed: 0.3870'
b' [utils.h : 126] Real: 1444.9483\tuser: 1424.9294\tsys: 8.4105\tmaxrss: 1794920'
--- [Thu Jul 19 10:31:33 2018] Building graph for k = 21 ---
/usr/local/bin/megahit_sdbg_build seq2sdbg --host_mem 1673030041 --mem_flag 1 --gpu_mem 0 --output_prefix /data/SRR606249_subset25.trim2_megahit/tmp/k21/21 --num_cpu_threads 1 -k 21 --kmer_from 0 --num_edge_files 1 --input_prefix /data/SRR606249_subset25.trim2_megahit/tmp/k21/21 --need_mercy
b' [sdbg_builder.cpp : 339] Host memory to be used: 1673030041'
b' [sdbg_builder.cpp : 340] Number CPU threads: 1'
b' [cx1.h : 450] Preparing data...'
b' [cx1_seq2sdbg.cpp : 394] Number edges: 183916390'
b' [cx1_seq2sdbg.cpp : 434] Bases to reserve: 5057700714, number contigs: 0, number multiplicity: 229895487'
b' [cx1_seq2sdbg.cpp : 440] Before reading, sizeof seq_package: 1264425188, multiplicity vector: 229895487'
b' [cx1_seq2sdbg.cpp : 455] Adding mercy edges...'
b' [cx1_seq2sdbg.cpp : 373] Number of reads: 311050, Number of mercy edges: 4387638'
b' [cx1_seq2sdbg.cpp : 462] Done. Time elapsed: 39.0692'
b' [cx1_seq2sdbg.cpp : 529] After reading, sizeof seq_package: 1264425188, multiplicity vector: 229895487'
b' [ERROR] [cx1_seq2sdbg.cpp : 540]: 1673030041 bytes is not enough for CX1 sorting, please set -m parameter to at least 1759269939'
Error occurs when running "builder build" for k = 21; please refer to /data/SRR606249_subset25.trim2_megahit/SRR606249_subset25.trim2_megahit.log for detail
[Exit code 1]

@ctb
Copy link
Contributor

ctb commented Jul 20, 2018

cool, #115 fixes this! (I had the branch ready but had forgotten to make the PR!)

@nalbright
Copy link
Author

Re-kicked this off this morning and seems to be running fine with no errors so far! :)
(already past the execution point where it error-ed out above)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants