Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: full ecHumanGEM has multiple usage reactions for the same proteins #353

Closed
3 tasks done
wshao1 opened this issue Aug 29, 2023 · 6 comments
Closed
3 tasks done
Labels
bug fixed in develop Has already been addressed in develop branch

Comments

@wshao1
Copy link

wshao1 commented Aug 29, 2023

Description of the bug:

I have an ecHumanGEM created using the full_ecModel protocol, which I want to reduce to a context-specific GEM using getSubsetEcModel. However, this returns an error due to the ecHumanGEM having multiple protein usage reactions with the same ID (i.e. 'usage_prot_Q0WX57'). This results in the following traceback:

Unable to perform assignment because the left and right sides have a different number of elements.

Error in getIndexes (line 79)
            indexes(i)=index;

Error in removeReactions (line 37)
    indexesToDelete=getIndexes(model,rxnsToRemove,'rxns');

Error in removeGenes (line 84)
            reducedModel = removeReactions(reducedModel,rxnsToRemove,removeUnusedMets,true);

Error in getSubsetEcModel (line 26)
smallEcModel = removeGenes(bigEcModel,genesToRemove,true,true,false);

Reproducing these results:

% STEP 30: Contextualize ecModels
% To exemplify the construction of a context-specific ecModel, a
% conventional GEM of HT-29 cell line is loaded.
HCT116 = load('models/hct116_ftINIT_model_40.mat')

% Make a context-specific ecModel based on the generic Human-GEM.
ecModel = loadEcModel('ecHumanGEM.yml');
ecHCT116 = getSubsetEcModel(ecModel,HCT116.hct116_model);
saveEcModel(ecModel,'ecHCT116.yml');

System information

  • Operating system: Windows 11
  • MATLAB version: 2021a
  • GECKO version: 3.1.1
  • RAVEN version: 2.8.4
  • Solver: Gurobi 10.0

I hereby confirm that:

  • My GECKO installation met all requirements.
  • This bug occurs in the main branch of the repository.
  • A similar issue does not already exist.
@wshao1 wshao1 added the bug label Aug 29, 2023
@edkerk edkerk changed the title getSubsetEcModel multiple rxn index error bug: full ecHumanGEM has multiple usage reactions for the same proteins Aug 29, 2023
@edkerk
Copy link
Member

edkerk commented Aug 29, 2023

The problem is that ecHumanGEM should not have multiple protein usage reactions for the same protein (e.g. usage_prot_Q0WX57). This should normally have been prevented by #349, but ecHumanGEM uses the uniprotConversion.tsv that might still result in duplicate proteins.

@edkerk
Copy link
Member

edkerk commented Aug 29, 2023

@wshao1 How do you generate ecHumanGEM? With the following code a draft model is generated, but the duplicate usage protein reaction cannot be found:

adapterLocation = fullfile(findGECKOroot,'tutorials','light_ecModel','HumanGEMAdapter.m');
adapter = ModelAdapterManager.setDefault(adapterLocation);
model = loadConventionalGEM();
[ecModel, noUniprot] = makeEcModel(model,false);
find(strcmp(ecModel.rxns,'usage_prot_Q0WX57'))

ans =

  0×1 empty double column vector

@wshao1
Copy link
Author

wshao1 commented Aug 29, 2023

The only differences between my adapter and the one provided in the light_ecModel tutorial:

  • I chose to use Human-GEM v1.16 instead of v1.15 as the conventional GEM.
  • I did not run the following code to first simplify Human-GEM (lines 7-17 in HumanGEMAdapter.m):
% The model distributed with the light_ecModel tutorial is Human-GEM
% is version 1.15.0, available from
% https://github.com/SysBioChalmers/Human-GEM/releases/tag/v1.3.0
% In addition, the following lines were run to reduce its size
% before storing it in this GECKO tutorial:
% ihuman = simplifyModel(ihuman,false,false,true,true);
% [ihuman.grRules,skipped] = simplifyGrRules(ihuman.grRules,true);
% ihuman = deleteUnusedGenes(ihuman);
% ihuman = rmfield(ihuman,{'unconstrained','rxnReferences','rxnFrom','metFrom','rxnConfidenceScores'});
% ihuman.name = ihuman.id;
% writeYAMLmodel(ihuman,'human-GEM.yml')

It is unclear to me why this simplification is required, and I could not find the simplifyGrRules function in the RAVEN toolbox repository to figure out what was actually happening, so I skipped this step. I just tried running it right now, and it seems like the function cannot be found: Unrecognized function or variable 'simplifyGrRules'.

@edkerk
Copy link
Member

edkerk commented Aug 29, 2023

I can replicate this with Human-GEM v1.16. The problem is with reaction MAR09490 whose gene association contains multiple isoenzymes that are all annotated to the same UniProt ID.

As a work-around, the duplicated usage_prot_XXXXXX reactions can safely be deleted. The following code keeps only one instance of the duplicated reactions:

% Make a list of unique reactions
[~,uniqueRxns] = unique(ecModel.rxns);
duplRxns = ~ismember(1:numel(ecModel.rxns), uniqueRxns);

% Remove from all reaction-related model fields
ecModel.rxns(duplRxns)          = [];
ecModel.rxnNames(duplRxns)      = [];
ecModel.S(:,duplRxns)           = [];
ecModel.lb(duplRxns)            = [];
ecModel.ub(duplRxns)            = [];
ecModel.rev(duplRxns)           = [];
ecModel.c(duplRxns)             = [];
ecModel.grRules(duplRxns)       = [];
ecModel.rxnGeneMat(duplRxns,:)  = [];
ecModel.rxnFrom(duplRxns)       = [];

The above code should work, but I haven't tested the resulting model with getSubsetEcModel as how you use it.

In the meanwhile, we'll work on changes to makeEcModel to prevent this from happening. It is not super trivial how to fix this, as it also means that there are duplicate entries in ecModel.ec.enzymes (while their corresponding ecModel.ec.genes and ecModel.ec.rxnEnzMat entries are unique). It is therefore probably best to allow for duplicate enzymes (but not genes) in ecModel.ec, but this means it needs to be checked that this does not cause any issues when e.g. running selectKcatValue or constrainEnzConcs.

Regarding simplifyGrRules, this function is from Human-GEM and it's useful to ensure that the grRules are defined in an expanded format so makeEcModel can correctly split reactions by isoenzymes (done by RAVEN's expandModel). RAVEN's standardizeGrRules is also able to do this. We might want to reconsider how makeEcModel handles this as well. However, this is unrelated to the problem encountered in this Issue.

@wshao1
Copy link
Author

wshao1 commented Aug 31, 2023

Thanks @edkerk, removing the duplicated reactions allows getSubsetEcModel to run.

The resulting model shows zero growth rate however.

Growth rate in HCT116-GEM: 86.6224865446 /hour.
Growth rate in ecHuman-GEM: 0.0871287888 /hour.
Growth rate in ecHCT116-GEM: 0.0000000000 /hour.

I have also tried extracting an ecHT29 using the HT29-GEM from the light_ecModel tutorial, but that also yields zero growth rate.

Growth rate in HT29-GEM: 149.9672040992 /hour.
Growth rate in ecHuman-GEM: 0.0871287888 /hour.
Growth rate in ecHT29-GEM: 0.0000000000 /hour.

Integrating proteomics data and then flexibilizing enzyme concentrations results in the following error, suggesting getSubsetEcModel may have returned a non-functional model.

No (more) limiting enzymes have been found. Attempting to increase protein pool exchange...
Protein pool exchange was also not limiting. Inability to reach growth rate is not related to
enzyme constraints. Maximum growth rate is 0.

I will investigate this further.

@edkerk
Copy link
Member

edkerk commented Dec 1, 2023

The context-specific ecModel failing to simulate is likely the result of the large ecModel and the context-specific non-ec GEM that are used as input by getSubsetEcModel are not derived from the same starting GEM. The latter is essential, as only then can a reliable subset of reactions be selected.

This is now clarified in the function documentation, and getSubsetEcModel tries to prevent such cases by at least checking that all the reactions from the context-specific non-ec GEM can be found in the large ecModel. This is not fool-proof, as other curations could have been done while keeping the same reaction identifiers, but a full check on whether both models have derived from the same starting GEM is outside of the scope.

@edkerk edkerk added the fixed in develop Has already been addressed in develop branch label Dec 2, 2023
@edkerk edkerk closed this as completed Dec 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fixed in develop Has already been addressed in develop branch
Projects
None yet
Development

No branches or pull requests

2 participants