Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during run of smoothxg #2

Open
brettChapman opened this issue Aug 31, 2020 · 3 comments
Open

Error during run of smoothxg #2

brettChapman opened this issue Aug 31, 2020 · 3 comments

Comments

@brettChapman
Copy link

brettChapman commented Aug 31, 2020

Hi Erik

I tried running smoothxg on my GFA file (generated from Edyeet and Seqwish) and received the following error.

srun -n 1 singularity exec --bind /data/pangenome_paf2vg/edyeet_results:/data/pangenome_paf2vg/edyeet_results /data/smoothxg_builds/smoothxg.sif smoothxg -t 16 -g /data/pangenome_paf2vg/edyeet_results/Morex_v1_5H_vs_Morex_v2_5H.gfa -V
topological sort 89803306 of 89803306 ~ 100.0000%
[path sgd sort]: 3.33% progress: iteration: 1, eta: 448940415309625600.00, delta_max: 594444905.25, number of updates: 1269049259
[path sgd sort]: 6.67% progress: iteration: 2, eta: 94303329111190512.00, delta_max: 368510246.58, number of updates: 1269049348
[path sgd sort]: 10.00% progress: iteration: 3, eta: 19809127399056060.00, delta_max: 381083138.07, number of updates: 1269049302
[path sgd sort]: 13.33% progress: iteration: 4, eta: 4161057006262880.50, delta_max: 383619328.38, number of updates: 1269049567
[path sgd sort]: 16.67% progress: iteration: 5, eta: 874061489967219.12, delta_max: 367479371.61, number of updates: 1269050798
[path sgd sort]: 20.00% progress: iteration: 6, eta: 183603225597205.25, delta_max: 374963091.71, number of updates: 1269049436
[path sgd sort]: 23.33% progress: iteration: 7, eta: 38567245939370.37, delta_max: 367984332.28, number of updates: 1269049396
[path sgd sort]: 26.67% progress: iteration: 8, eta: 8101341654046.21, delta_max: 388226502.63, number of updates: 1269049309
[path sgd sort]: 30.00% progress: iteration: 9, eta: 1701748076561.14, delta_max: 386113946.32, number of updates: 1269049264
[path sgd sort]: 33.33% progress: iteration: 10, eta: 357465052055.07, delta_max: 434327299.29, number of updates: 1269049329
[path sgd sort]: 36.67% progress: iteration: 11, eta: 75088237325.32, delta_max: 382051011.29, number of updates: 1269049158
[path sgd sort]: 40.00% progress: iteration: 12, eta: 15772852065.42, delta_max: 385033556.35, number of updates: 1269049442
[path sgd sort]: 43.33% progress: iteration: 13, eta: 3313206850.23, delta_max: 388481862.24, number of updates: 1269049230
[path sgd sort]: 46.67% progress: iteration: 14, eta: 695964153.27, delta_max: 378469030.30, number of updates: 1269049235
[path sgd sort]: 50.00% progress: iteration: 15, eta: 146192533.26, delta_max: 271011994.39, number of updates: 1269049341
[path sgd sort]: 53.33% progress: iteration: 16, eta: 30708847.11, delta_max: 224731454.76, number of updates: 1269049320
[path sgd sort]: 56.67% progress: iteration: 17, eta: 6450625.56, delta_max: 159636655.50, number of updates: 1269054719
[path sgd sort]: 60.00% progress: iteration: 18, eta: 1355002.68, delta_max: 122656738.47, number of updates: 1269049239
[path sgd sort]: 63.33% progress: iteration: 19, eta: 284628.56, delta_max: 106244339.66, number of updates: 1269049526
[path sgd sort]: 66.67% progress: iteration: 20, eta: 59788.38, delta_max: 92229319.32, number of updates: 1269049249
[path sgd sort]: 70.00% progress: iteration: 21, eta: 12559.00, delta_max: 78504943.86, number of updates: 1269049288
[path sgd sort]: 73.33% progress: iteration: 22, eta: 2638.11, delta_max: 64098323.67, number of updates: 1269049403
[path sgd sort]: 76.67% progress: iteration: 23, eta: 554.16, delta_max: 44395128.51, number of updates: 1269049426
[path sgd sort]: 80.00% progress: iteration: 24, eta: 116.40, delta_max: 24999736.03, number of updates: 1269049253
[path sgd sort]: 83.33% progress: iteration: 25, eta: 24.45, delta_max: 12598686.25, number of updates: 1269049170
[path sgd sort]: 86.67% progress: iteration: 26, eta: 5.14, delta_max: 11266547.88, number of updates: 1269049562
[path sgd sort]: 90.00% progress: iteration: 27, eta: 1.08, delta_max: 9630000.63, number of updates: 1269049232
[path sgd sort]: 93.33% progress: iteration: 28, eta: 0.23, delta_max: 897421.29, number of updates: 1269049308
[path sgd sort]: 96.67% progress: iteration: 29, eta: 0.05, delta_max: 152573.99, number of updates: 1269049455
[path sgd sort]: 100.00% progress: iteration: 30, eta: 0.01, delta_max: 72168.32, number of updates: 1269049556
topological sort 89803306 of 89803306 ~ 100.0000%
mismatch in handle sequence for 164047
srun: error: node-6: task 0: Exited with exit code 1

I built smoothxg in Docker and then from the local Docker image to Singularity. Installation looked ok. My Docker file is here:

FROM ubuntu:18.04

RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections

RUN apt-get update && apt-get upgrade -y && apt-get install -y \
  apt-utils \
  dialog \
  build-essential \
  git \
  libssl-dev \
  libffi-dev \
  python3-dev \
  cmake \
  && rm -rf /var/lib/apt/lists/*

RUN git clone --recursive https://github.com/ekg/smoothxg.git

RUN cd smoothxg && cmake -H. -Bbuild && cmake --build build -- -j4

ENV PATH="/smoothxg/bin:${PATH}"

I'm trying to align two different versions of Morex 5H chromosome for comparison, to build a genome graph and deconstruct to a VCF file. I've tried other tools for alignment such as GSAlign, and I can see differences, so I know the genomes on 5H are quite different, mostly SNPs though. I thought this would be a good test ground for Edyeet and smoothxg. I did get a file from the run, suffixed with *.prep.gfa. I'm not sure if that's an intermediate file or the output. I tried piping all output to a *.smooth.gfa file, which is still empty. The *.prep.gfa file is several times larger than my original *.gfa file. Does that sound correct to you?

I ran the command like so:

srun -n 1 singularity exec --bind ${PWD}:${PWD} ${SMOOTHXG_IMAGE} smoothxg -t ${SLURM_NTASKS_PER_NODE} -g ${INPUTGFA} -V > ${OUTPUTGFA}

Thanks.

@ekg
Copy link
Collaborator

ekg commented Sep 1, 2020

It looks like you've run out of memory. This is on the most-recent git HEAD?

I just resolved a problem where the memory usage of the spoa algorithm rose over time. I think it had to do with fragmentation of the allocator that it uses.

There is another way to further reduce the memory consumption by writing the POA graphs out to disk, and then reading them back in when the final graph is built. This could help a fair bit, depending on how big the graph is.

@brettChapman
Copy link
Author

Ok thanks. I think it's using the most recent version. I pulled from Git only a few days ago (27th August) to create the Docker image. Any way to reduce memory consumption would be great!

If it's only just been updated I'll create another image and try again. Thanks.

@ekg
Copy link
Collaborator

ekg commented Sep 2, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants