Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Psiblast contre uniref90 sur le cluster #17

Open
florianecoulmance opened this issue Nov 4, 2019 · 10 comments
Open

Psiblast contre uniref90 sur le cluster #17

florianecoulmance opened this issue Nov 4, 2019 · 10 comments

Comments

@florianecoulmance
Copy link

Hello,

I have tried several times to psiblast 1 query against uniref90 in the cluster.

-1st I perform :

makeblastdb -in /shared/bank/uniref90/current/flat/uniref90.fasta -dbtype prot -out /shared/projects/meetu2019/fgayrard//Data/Uniref90/uniref90.fa

This gives me several files like this

uniref90.fa.00.phr uniref90.fa.01.psq uniref90.fa.03.pin uniref90.fa.05.phr uniref90.fa.06.psq uniref90.fa.08.pin uniref90.fa.00.pin uniref90.fa.02.phr uniref90.fa.03.psq uniref90.fa.05.pin uniref90.fa.07.phr uniref90.fa.08.psq uniref90.fa.00.psq uniref90.fa.02.pin uniref90.fa.04.phr uniref90.fa.05.psq uniref90.fa.07.pin uniref90.fa.01.phr uniref90.fa.02.psq uniref90.fa.04.pin

etc ...

I do this step as indicated by the documentation and because I cannot psiblast directly against the uniref90.fasta file in /shared/bank/uniref90/current/flat/uniref90.fasta (which seems to be a common problem when I looked on the internet)

  • 2nd I perform as suggested by the doc the psiblast as follows

psiblast -query /shared/projects/meetu2019/fgayrard/query1.fasta -db /shared/projects/meetu2019/fgayrard/Data/Uniref90/uniref90.fa -pseudocount 1 -num_iterations 3 -out /shared/projects/meetu2019/fgayrard/1queryMSA.psiblast

the error is an alias error

I have tried before to make the database with the uniref90.fsa file in shared/bank/uniref90/current/fasta/uniref90.fsa and to psiblast against all the files created (as I described above, but this does not seem to work).

Anyone has an idea ??

Thank you for your help,

Floriane

@annelopes
Copy link
Contributor

Hi Floriane,

try to remove the ".fa" extension with sth like -db /shared/projects/meetu2019/fgayrard/Data/Uniref90/uniref90

Please tell us if it works.

A.

@florianecoulmance
Copy link
Author

Hello Anne,

I tried this already but I still have this alias problem even when removing fa

What is weird is that it works perfectly on the small subset of uniref I dowloaded on my computer !

Floriane

@annelopes
Copy link
Contributor

For me it works:

blastp -query /shared/projects/meetu2019/alopes/runtest/query1.fasta -db /shared/bank/uniref50/current/blast/uniref50

You have to provide the complete path where your db is stored + the rootname of the db (here uniref50).

Anyway, I don't understand why do you need to make your blast db with uniref50 since it already exists (in shared/bank/uniref50/current/blast/ - you have all the corresponding uniref50 files (*.phr, *.pin, *.psd etc) in this directory). This is precisely the purpose of these bank dir. So you don't have to provide a fasta file but rather the path to the dir containing the db encoded into .phr, .psd etc files (for uniref, again, this dir already exists and is stored in /shared/bank/).

That said, uniref50 is big (about 18G) and each node memory is limited to 2G. So to run your blast on uniref50, you must add the following command at the beginning of your script:

#SBATCH --mem 20GB

Good luck,

Anne

@florianecoulmance
Copy link
Author

Thank you Anne,

I was trying to do it with the uniref90 not uniref50 for which there is no formatted db in blast.

A week ago I did -makedb on the .fsa file of uniref90, it worked and then the psiblast seemed to work. However, it did an error due to memory problem which I attempted to solve with #SBATCH --mem parameter. Again, I had memory problem. Is uniref90 just too big for psiblast to run on the cluster ?

I will try with the uniref50 and let you know the outcome !

Thank you for your help,

Floriane

@annelopes
Copy link
Contributor

indeed, you're right, for uniref90, you have to create the db with makeblastdb. Depending on the size (du -h dir_uniref90/ ) you have to adapt the memory you need with the flag #SBATCH --mem XXXGB. But don't think it is a good idea on the cluster since you won't have enough space to store it in your dir. So better to run on uniref50 or uniprot for instance.

@annelopes
Copy link
Contributor

I ask to IFB whether they can put the uniref90. Keep in touch.

@florianecoulmance
Copy link
Author

The formatted uniref90 dataset is in my directory /shared/projects/meetu2019/fgayrard/Data/Uniref90 and its size is 49GB.

Maybe now that I have it in my own directory, we can just copy paste it in the shared/bank/uniref90/current/blast folder ?

@annelopes
Copy link
Contributor

Otherwise, you can use the old version of uniref90 (2018), results will be more or less the same! (but it will be much more expensive in terms of computational time)

/dhared/bank/uniref90/uniref90_2018-10-10/

@annelopes
Copy link
Contributor

No, you can't write in shared/bank. Please use their old version.

@florianecoulmance
Copy link
Author

Ok great ! I will let you know the outcome.

Thank you very much for the advices !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants