Psiblast contre uniref90 sur le cluster #17

florianecoulmance · 2019-11-04T21:10:44Z

Hello,

I have tried several times to psiblast 1 query against uniref90 in the cluster.

-1st I perform :

makeblastdb -in /shared/bank/uniref90/current/flat/uniref90.fasta -dbtype prot -out /shared/projects/meetu2019/fgayrard//Data/Uniref90/uniref90.fa

This gives me several files like this

uniref90.fa.00.phr uniref90.fa.01.psq uniref90.fa.03.pin uniref90.fa.05.phr uniref90.fa.06.psq uniref90.fa.08.pin uniref90.fa.00.pin uniref90.fa.02.phr uniref90.fa.03.psq uniref90.fa.05.pin uniref90.fa.07.phr uniref90.fa.08.psq uniref90.fa.00.psq uniref90.fa.02.pin uniref90.fa.04.phr uniref90.fa.05.psq uniref90.fa.07.pin uniref90.fa.01.phr uniref90.fa.02.psq uniref90.fa.04.pin

etc ...

I do this step as indicated by the documentation and because I cannot psiblast directly against the uniref90.fasta file in /shared/bank/uniref90/current/flat/uniref90.fasta (which seems to be a common problem when I looked on the internet)

2nd I perform as suggested by the doc the psiblast as follows

psiblast -query /shared/projects/meetu2019/fgayrard/query1.fasta -db /shared/projects/meetu2019/fgayrard/Data/Uniref90/uniref90.fa -pseudocount 1 -num_iterations 3 -out /shared/projects/meetu2019/fgayrard/1queryMSA.psiblast

the error is an alias error

I have tried before to make the database with the uniref90.fsa file in shared/bank/uniref90/current/fasta/uniref90.fsa and to psiblast against all the files created (as I described above, but this does not seem to work).

Anyone has an idea ??

Thank you for your help,

Floriane

annelopes · 2019-11-05T13:10:57Z

Hi Floriane,

try to remove the ".fa" extension with sth like -db /shared/projects/meetu2019/fgayrard/Data/Uniref90/uniref90

Please tell us if it works.

A.

florianecoulmance · 2019-11-05T14:42:12Z

Hello Anne,

I tried this already but I still have this alias problem even when removing fa

What is weird is that it works perfectly on the small subset of uniref I dowloaded on my computer !

Floriane

annelopes · 2019-11-05T20:29:14Z

For me it works:

blastp -query /shared/projects/meetu2019/alopes/runtest/query1.fasta -db /shared/bank/uniref50/current/blast/uniref50

You have to provide the complete path where your db is stored + the rootname of the db (here uniref50).

Anyway, I don't understand why do you need to make your blast db with uniref50 since it already exists (in shared/bank/uniref50/current/blast/ - you have all the corresponding uniref50 files (*.phr, *.pin, *.psd etc) in this directory). This is precisely the purpose of these bank dir. So you don't have to provide a fasta file but rather the path to the dir containing the db encoded into .phr, .psd etc files (for uniref, again, this dir already exists and is stored in /shared/bank/).

That said, uniref50 is big (about 18G) and each node memory is limited to 2G. So to run your blast on uniref50, you must add the following command at the beginning of your script:

#SBATCH --mem 20GB

Good luck,

Anne

florianecoulmance · 2019-11-05T20:52:02Z

Thank you Anne,

I was trying to do it with the uniref90 not uniref50 for which there is no formatted db in blast.

A week ago I did -makedb on the .fsa file of uniref90, it worked and then the psiblast seemed to work. However, it did an error due to memory problem which I attempted to solve with #SBATCH --mem parameter. Again, I had memory problem. Is uniref90 just too big for psiblast to run on the cluster ?

I will try with the uniref50 and let you know the outcome !

Thank you for your help,

Floriane

annelopes · 2019-11-05T21:10:05Z

indeed, you're right, for uniref90, you have to create the db with makeblastdb. Depending on the size (du -h dir_uniref90/ ) you have to adapt the memory you need with the flag #SBATCH --mem XXXGB. But don't think it is a good idea on the cluster since you won't have enough space to store it in your dir. So better to run on uniref50 or uniprot for instance.

annelopes · 2019-11-05T21:12:10Z

I ask to IFB whether they can put the uniref90. Keep in touch.

florianecoulmance · 2019-11-05T21:14:37Z

The formatted uniref90 dataset is in my directory /shared/projects/meetu2019/fgayrard/Data/Uniref90 and its size is 49GB.

Maybe now that I have it in my own directory, we can just copy paste it in the shared/bank/uniref90/current/blast folder ?

annelopes · 2019-11-05T21:22:56Z

Otherwise, you can use the old version of uniref90 (2018), results will be more or less the same! (but it will be much more expensive in terms of computational time)

/dhared/bank/uniref90/uniref90_2018-10-10/

annelopes · 2019-11-05T21:23:45Z

No, you can't write in shared/bank. Please use their old version.

florianecoulmance · 2019-11-05T21:25:30Z

Ok great ! I will let you know the outcome.

Thank you very much for the advices !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Psiblast contre uniref90 sur le cluster #17

Psiblast contre uniref90 sur le cluster #17

florianecoulmance commented Nov 4, 2019

annelopes commented Nov 5, 2019

florianecoulmance commented Nov 5, 2019

annelopes commented Nov 5, 2019

florianecoulmance commented Nov 5, 2019

annelopes commented Nov 5, 2019

annelopes commented Nov 5, 2019

florianecoulmance commented Nov 5, 2019

annelopes commented Nov 5, 2019

annelopes commented Nov 5, 2019

florianecoulmance commented Nov 5, 2019

Psiblast contre uniref90 sur le cluster #17

Psiblast contre uniref90 sur le cluster #17

Comments

florianecoulmance commented Nov 4, 2019

annelopes commented Nov 5, 2019

florianecoulmance commented Nov 5, 2019

annelopes commented Nov 5, 2019

florianecoulmance commented Nov 5, 2019

annelopes commented Nov 5, 2019

annelopes commented Nov 5, 2019

florianecoulmance commented Nov 5, 2019

annelopes commented Nov 5, 2019

annelopes commented Nov 5, 2019

florianecoulmance commented Nov 5, 2019