Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout during property creation on a distributed OrientDB configuration #10318

Open
mikhalov opened this issue Sep 25, 2024 · 4 comments
Open
Milestone

Comments

@mikhalov
Copy link

mikhalov commented Sep 25, 2024

Due to this issue, we cannot launch the production server successfully. The migration stalls and the application fails to initialize correctly in the distributed environment.

We are encountering a problem when starting the application in a distributed database configuration with 5 nodes. The issue occurs specifically when creating a property for a vertex with a large number of records in the database. During this operation, we see the following error:

Caused by: com.orientechnologies.orient.server.distributed.task.ODistributedOperationException: Quorum 5 not reached for request (id=2.15877 task=sql_command_ddl_second_phase). Elapsed=24740ms. No server in conflict. Received: 

- DB-node-A: waiting-for-response

- DB-node-B: waiting-for-response

- DB-node-C: waiting-for-response

- DB-node-D: waiting-for-response

- DB-node-E: waiting-for-response	DB name="db"	DB name="db"

	at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)

	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)

	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)

	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.handleException(OChannelBinaryAsynchClient.java:355)

	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.handleStatus(OChannelBinaryAsynchClient.java:303)

	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.handleStatus(OChannelBinaryAsynchClient.java:325)

	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:209)

	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:167)

	at com.orientechnologies.orient.client.remote.OStorageRemote.beginResponse(OStorageRemote.java:2003)

	at com.orientechnologies.orient.client.remote.OStorageRemote.lambda$networkOperationRetryTimeout$2(OStorageRemote.java:435)

	at com.orientechnologies.orient.client.remote.OStorageRemote.baseNetworkOperation(OStorageRemote.java:500)

	at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperationRetryTimeout(OStorageRemote.java:415)

	at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperationNoRetry(OStorageRemote.java:450)

	at com.orientechnologies.orient.client.remote.OStorageRemote.command(OStorageRemote.java:1169)

	at com.orientechnologies.orient.client.remote.db.document.ODatabaseDocumentRemote.command(ODatabaseDocumentRemote.java:430)

	at com.orientechnologies.orient.client.remote.metadata.schema.OClassRemote.addProperty(OClassRemote.java:83)

	at com.orientechnologies.orient.core.metadata.schema.OClassImpl.createProperty(OClassImpl.java:417)

	at com.orientechnologies.orient.core.metadata.schema.OClassAbstractDelegate.createProperty(OClassAbstractDelegate.java:166)

	at com.tinkerpop.blueprints.impls.orient.OrientElementType.access$201(OrientElementType.java:34)

	at com.tinkerpop.blueprints.impls.orient.OrientElementType$3.call(OrientElementType.java:94)

	at com.tinkerpop.blueprints.impls.orient.OrientElementType$3.call(OrientElementType.java:91)

	at com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.executeOutsideTx(OrientBaseGraph.java:1849)

	at com.tinkerpop.blueprints.impls.orient.OrientElementType.createProperty(OrientElementType.java:90)

	at com.tinkerpop.blueprints.impls.orient.OrientVertexType.createProperty(OrientVertexType.java:133)

	at com.tinkerpop.blueprints.impls.orient.OrientVertexType.createProperty(OrientVertexType.java:32)

We tried changing the database configuration, but it didn't help. Here's the configuration we used:

final long distributedResponsesTimeout = 60000L;
final var orientDBConfig = OrientDBConfig.builder()
        .addConfig(DISTRIBUTED_ASYNCH_RESPONSES_TIMEOUT, distributedResponsesTimeout)
        .addConfig(NETWORK_SOCKET_TIMEOUT, distributedResponsesTimeout)
        .addConfig(NETWORK_LOCK_TIMEOUT, distributedResponsesTimeout)
        .addConfig(DISTRIBUTED_PURGE_RESPONSES_TIMER_DELAY, distributedResponsesTimeout)
        .addConfig(DISTRIBUTED_AUTO_REMOVE_OFFLINE_SERVERS, distributedResponsesTimeout)
        .addConfig(DISTRIBUTED_CHECK_HEALTH_CAN_OFFLINE_SERVER, true)
        .addConfig(DISTRIBUTED_CRUD_TASK_SYNCH_TIMEOUT, distributedResponsesTimeout)
        .addConfig(DISTRIBUTED_COMMAND_TASK_SYNCH_TIMEOUT, distributedResponsesTimeout)
        .addConfig(DISTRIBUTED_MAX_STARTUP_DELAY, distributedResponsesTimeout)
        .addConfig(DISTRIBUTED_HEARTBEAT_TIMEOUT, distributedResponsesTimeout)
        .addConfig(DISTRIBUTED_CHECK_HEALTH_EVERY, 5000L)
        .build();
instance.orientDB = new OrientDB(
        databaseUrl,
        databaseSettings.databaseUser,
        databaseSettings.databasePassword,
        orientDBConfig
);

Additionally, in the logs of the master node, we see the following warning:

2024-09-25 09:26:18:430 WARNI {db=db} [DB-node-B] Timeout (24465ms) on waiting for synchronous responses from nodes=[DB-node-A, DB-node-B, DB-node-C, DB-node-D, DB-node-E] responsesSoFar=[DB-node-D] request=(id=1.5390 task=sql_command_ddl_second_phase) [OHazelcastPlugin]
@mikhalov mikhalov changed the title Timeout during large migration on a distributed OrientDB configuration Timeout during property creation on a distributed OrientDB configuration Sep 25, 2024
@tglman
Copy link
Member

tglman commented Sep 25, 2024

Hi,

I think this is due some data migrations happening while the property is created, you can skip the check and migration of the property using the unsafe option, if you are not sure the property has the right values you can run a migration of data before, this is actually what OrientDB does for itself.

here an example of create property unsafe:

crate property MyVertex.name String unsafe 

Correspondent code that do the data migration, take from OrientDB code:

 try (OResultSet result =
        database.query("select from MyVertex where name.type() <> 'STRING' ")) {
      while (result.hasNext()) {
        ODocument record = (ODocument) result.next().getElement().get();
        record.field("name", record.field("name"), OType.STRING);
        database.save(record);
      }
    }

Obviously you can edit as you need.

We will check on our side as well, this data migrations need to be done only by one node, and I think as today are re-executed on all the nodes in parallel, creating potential issues.

@mikhalov
Copy link
Author

@tglman
Currently, we are using the Tinkerpop API to create properties: com.tinkerpop.blueprints.impls.orient.OrientElementType#createProperty(java.lang.String, com.orientechnologies.orient.core.metadata.schema.OType)

@mikhalov
Copy link
Author

When we do this via ODatabaseSession.query(), we get the following error:
Caused by: com.orientechnologies.orient.core.exception.OSchemaException: Cannot create property 'property' inside a transaction

@tglman
Copy link
Member

tglman commented Sep 25, 2024

Hi,

DDL as today cannot run when another transactions is active, you can make sure that no transaction is active with commit or rollback methods, for the blueprints APIs are still supported in 3.2.x but are deprecated and will be removed in the next major, so for long term support I would suggest to use the gremlin (orientdb-gremlin dependency and tp3 distribution) or the native OrientDB APIs.

Bye

@tglman tglman added this to the 3.2.x milestone Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants