Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation difficulties; running through tutorials 1-4 #176

Open
tlinjordet opened this issue Sep 20, 2023 · 1 comment
Open

Installation difficulties; running through tutorials 1-4 #176

tlinjordet opened this issue Sep 20, 2023 · 1 comment

Comments

@tlinjordet
Copy link
Contributor

Hello,

I have been working a bit to get molSimplify to work in my context as I am learning the interface via the tutorials.

I have completed tutorials 1-3, but to make tutorial 4 work, I had to make some minor changes to the code, and I still am finding errors. At this stage I believe it is better to share my current status than to go further without asking for advice.

I am preparing a pull request based on my code changes, but my current solution might be overly specific to my context to adopt directly, so I want to document the challenges here as an issue separately from the proposed partial solution.

Context

I am running Ubuntu 22.04 LTS, with base conda installed with Anaconda3-2023.07-2-Linux-x86_64.sh.

I tried to conda install molSimplify from the hjkgroup channel according to the repo README.md, without success.

I also tried in various ways to build from source. I am not sure how well this worked, since the pytest failure rate was rather high. However, I ended up going to the tutorials 1-4 with a molSimplify environment that also fails some tests in the repo.

I managed to do some tutorials with the Docker image, but with tutorial 4 found it necessary to search for further solutions.

The best solution so far to create a molSimplify environment turned out to be using conda-forge and molsimplify-feedstock.

Having switched package solver to mamba previously, I created the molSimplify environment discussed in the remainder of this description as follows:

conda create --name molsimp python=3.9 molsimplify --channel conda-forge
conda activate molsimp

this installs molSimplify v1.7.3 and Open Babel v3.1.1
(in contrast to the Docker image which has Python v2.7.17, molSimplify v1.4.6 and Open Babel v2.4.1).

I also installed Avogadro2 in this environment by building from source.

Testing Tutorials

Tutorial 1

Runs without a problem:

wget http://hjkgrp.mit.edu/tutorials/2016-06-18-molsimplify-tutorial-1-structure-generation/example-1.in
molsimplify -i example-1.in

Tutorial 2

Runs without a problem:

wget http://hjkgrp.mit.edu/tutorials/2016-12-02-molsimplify-tutorial-2-slab-builder/pd.cif
molsimplify -slab_gen -cif_path pd.cif -slab_size {10,10,5}
molsimplify -slab_gen -cif_path pd.cif -slab_size {10,10,5} -freeze 1

Tutorial 3

Runs without a problem:

wget http://hjkgrp.mit.edu/tutorials/2016-12-25-molsimplify-tutorial-3-custom-core-functionalization/commands.in
molsimplify -i commands.in

Tutorial 4

Note that the Docker image did not work with Tutorial 4 since the latest version of ChEMBL, v33, is too large as an .sdf file for Open Babel v2.4.1 to handle, whereas Open Babel v3.x should handle larger than 4 GB files.

wget https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_33.sdf.gz
gunzip chembl_33.sdf.gz

Note: the need to set database directory path in the ~/.molSimplify config file was not clear from instructions encountered up until this point.

export TEMPDIR=$(pwd)
echo "CHEMDBDIR=${TEMPDIR}" > ~/.molSimplify

Tutorial 4a: Similarity search

Then to do the steps of the tutorial, first similarity search, I run the following:

wget http://hjkgrp.mit.edu/tutorials/2016-12-25-molsimplify-tutorial-4-database-searching/sim-db-search.in
sed -i 's/chembl_21/chembl_33/' sim-db-search.in
molsimplify -i sim-db-search.in
Error 1

However, here an error is thrown

FileNotFoundError: [Errno 2] No such file or directory: '{$INSTALLATION_DIR}/anaconda3/envs/molsimp/lib/python3.9/site-packages/molSimplify/plugindefines_reference.txt'

Which is solved by

wget https://raw.githubusercontent.com/hjkgrp/molSimplify/v1.7.3/molSimplify/plugindefines_reference.txt
mv plugindefines_reference.txt /home/tlinjordet/anaconda3/envs/molsimp_173/lib/python3.9/site-packages/molSimplify/

but this should probably be addressed either here or by molsimplify-feedstock, not sure which. Please advise.

Error 2

At this point molsimplify -i sim-db-search.in runs until the printed statement adding atom constraints and hangs until the process is simply killed by timing out.
If the process is keyboard interrupted, the following traceback results:

^CTraceback (most recent call last):
  File "{$INSTALLATION_DIR}/anaconda3/envs/molsimp/bin/molsimplify", line 10, in <module>
    sys.exit(main())
  File "{$INSTALLATION_DIR}/anaconda3/envs/molsimp/lib/python3.9/site-packages/molSimplify/__main__.py", line 184, in main
    startgen(sys.argv, False, gui)
  File "{$INSTALLATION_DIR}/anaconda3/envs/molsimp/lib/python3.9/site-packages/molSimplify/Scripts/generator.py", line 192, in startgen
    emsg = dbsearch(rundir, args, globs)
  File "{$INSTALLATION_DIR}/anaconda3/envs/molsimp/lib/python3.9/site-packages/molSimplify/Scripts/dbinteract.py", line 540, in dbsearch
    mybash(cmd)
  File "{$INSTALLATION_DIR}/anaconda3/envs/molsimp/lib/python3.9/site-packages/molSimplify/Classes/globalvars.py", line 463, in mybash
    line = p.stdout.readline()
KeyboardInterrupt

See pull request with changed definition of mybash in molSimplify/Classes/globalvars.py for my proposed solution.

Error 3

Open Babel v3.x no longer uses the command babel, which had been deprecated for a while. There may need to be a more general solution if molSimplify is still going to support Open Babel v2.4 in the near term. tomorrowMaybe at the try/except stage of importing open babel in the Python scripts, or else at a global config level?

My solution for Open Babel v3.1.1 with respect to the Tutorial 4 error but see pull request for changes to functions getsimilar and dbsearch under molSimplify/Scripts/dbinteract.py.

Tutorial 4b: Dissimilarity Search

The second step of Tutorial 4 is a dissimilarity search based on the results of the first step. Note: there is a discrepancy between the size of the nominal simres.smi and the results from running the tutorial. However, the tutorial results are larger with a more recent, larger ChEMBL database, so this is likely not an issue.

Note that the ChEMBL database file needed to be added for dissimilarity search.

This part of the tutorial also had errors because of Open Babel v3.x obabel syntax with explicit -O output file designation.

See pull request for some changes to the dissim function under molSimplify/Scripts/dbinteract.py to use obabel instead of babel. Also further updates to mybash in molSimplify/Classes/globalvars.py to keep line breaks in the stdout string output from this function.

The process completes without errors, aside from the normally hidden message

==============================
*** Open Babel Error  in TetStereoToWedgeHash
  Failed to set stereochemistry as unable to find an available bond

Tutorial 4c: Loop over Dissimilarity

Taking the example-loop.smi file from the Tutorial page and fixing its database reference, this part of tutorial 4 failed for less clear reasons.

Input commands are simply

wget http://hjkgrp.mit.edu/tutorials/2016-12-25-molsimplify-tutorial-4-database-searching/example-loop.in
molsimplify -i example-loop.in

Error:

Traceback (most recent call last):
  File "{$INSTALLATION_DIR}/anaconda3/envs/molsimp/bin/molsimplify", line 10, in <module>
    sys.exit(main())
  File "{$INSTALLATION_DIR}/anaconda3/envs/molsimp/lib/python3.9/site-packages/molSimplify/__main__.py", line 184, in main
    startgen(sys.argv, False, gui)
  File "{$INSTALLATION_DIR}/anaconda3/envs/molsimp/lib/python3.9/site-packages/molSimplify/Scripts/generator.py", line 255, in startgen
    emsg = multigenruns(rundir, args, write_files=write_files)
  File "{$INSTALLATION_DIR}/anaconda3/envs/molsimp/lib/python3.9/site-packages/molSimplify/Scripts/rungen.py", line 297, in multigenruns
    emsg = rungen(rundir, args, write_files=write_files)
  File "{$INSTALLATION_DIR}/anaconda3/envs/molsimp/lib/python3.9/site-packages/molSimplify/Scripts/rungen.py", line 421, in rungen
    args.smicat[multidx] = lloc
IndexError: list assignment index out of range

Tutorial 4d: Search over SMARTS/human

Again, this part fails for unclear reasons.

Input:

wget http://hjkgrp.mit.edu/tutorials/2016-12-25-molsimplify-tutorial-4-database-searching/example-smarts.in
sed -i 's/chembl_21/chembl_33/' example-smarts.in 
molsimplify -i example-smarts.in

Output with error:

number of smiles strings BEFORE SMARTS filter: 8279

---Test for developer version----
('smart is:', '[#7^3;!+][#6;R0][#6;R0][#7^3;!+]')
('current path:', '{$INSTALLATION_DIR}/Downloads/molsimp_tut4')
('file open:', 'simres.smi')
Traceback (most recent call last):
  File "{$INSTALLATION_DIR}/anaconda3/envs/molsimp/bin/molsimplify", line 10, in <module>
    sys.exit(main())
  File "{$INSTALLATION_DIR}/anaconda3/envs/molsimp/lib/python3.9/site-packages/molSimplify/__main__.py", line 184, in main
    startgen(sys.argv, False, gui)
  File "{$INSTALLATION_DIR}/anaconda3/envs/molsimp/lib/python3.9/site-packages/molSimplify/Scripts/generator.py", line 192, in startgen
    emsg = dbsearch(rundir, args, globs)
  File "{$INSTALLATION_DIR}/anaconda3/envs/molsimp/lib/python3.9/site-packages/molSimplify/Scripts/dbinteract.py", line 615, in dbsearch
    _ = matchsmarts(smistr, outf, catoms, args)
  File "{$INSTALLATION_DIR}/anaconda3/envs/molsimp/lib/python3.9/site-packages/molSimplify/Scripts/dbinteract.py", line 403, in matchsmarts
    max_atoms = int(float_from_str(args.dbatoms))
  File "{$INSTALLATION_DIR}/anaconda3/envs/molsimp/lib/python3.9/site-packages/molSimplify/Scripts/dbinteract.py", line 32, in float_from_str
    float_arr = rx.findall(txt)
TypeError: expected string or bytes-like object
tlinjordet added a commit to tlinjordet/molSimplify that referenced this issue Sep 20, 2023
The progress with Tutorial 4 is detailed in issue hjkgrp#176
hjkgrp#176
@ralf-meyer
Copy link
Member

Hi @tlinjordet,

I want to thank you for the thorough write up of the issues you encountered and the efforts you made in PR #177 to fix the issues with openbabel versions. I also want to apologize again for taking so long to respond.

Installation issues
I am surprised to hear that the conda install from the hjkgroup channel did not work on Ubuntu 22.04 as this is exactly what we test on. I will look into this and check for potential issues with the feedstock. Unfortunately, we can not control how the environment is solved without pinning every package (and I can not reproduce this without knowing exactly how anaconda solved the environment in your case). Also, we have previously observed differences between anaconda and mamba.

As a potential solution/alternative I have added a purely pip based installation option. This was enable by the latest release of openbabel-wheel on PyPI which allows to install openbabel (including the binaries and not just the python bindings) using pip.

We absolutely need to either update or retire the Docker image given the fact that it is still on python2...

Openbabel issues
I really want to apologize for the issues arising from our recent change of supporting openbabel3. This was obviously not tested enough. We will keep working on adding more and more test cases to avoid similar problems going forward. A test case for tutorial 4 was included in PR #186.

Tutorial instructions
We had some of the students go through tutorial 4 last week and will update the website over the course of next week. In addition to implementing all the changes and clarifications you suggested, we will offer a significantly downsized version of the ChEMBL v33 file to ensure consistency between the search results and the results reported in the tutorial.

4a & 4b:
Again, thank you for identifying the error with openbabel 2/3

4c:
I was not able to reproduce this, but I will keep looking into it.

4d:
I was able to fix this error that occurred when args.dbatoms not given (PR #186).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants