Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return re-connection failures immediately #159

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion hircluster.c
Original file line number Diff line number Diff line change
Expand Up @@ -1902,7 +1902,10 @@ redisContext *ctx_get_by_node(redisClusterContext *cc, redisClusterNode *node) {
c = node->con;
if (c != NULL) {
if (c->err) {
redisReconnect(c);
if (redisReconnect(c) != REDIS_OK) {
__redisClusterSetError(cc, c->err, c->errstr);
return NULL;
}

if (cc->ssl && cc->ssl_init_fn(c, cc->ssl) != REDIS_OK) {
__redisClusterSetError(cc, c->err, c->errstr);
Comment on lines -1905 to 1911
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw this too but I thought ssl_init_fn is supposed to handle a connection which failed to reconnect.

It seems cleaner to return ASAP though, but I'd like to understand the consequences better.

Expand Down
4 changes: 4 additions & 0 deletions tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,10 @@ add_test(NAME redirect-with-hostname-test
COMMAND "${CMAKE_SOURCE_DIR}/tests/scripts/redirect-with-hostname-test.sh"
"$<TARGET_FILE:clusterclient>"
WORKING_DIRECTORY "${CMAKE_SOURCE_DIR}/tests/scripts/")
add_test(NAME reconnect-failure-test
COMMAND "${CMAKE_SOURCE_DIR}/tests/scripts/reconnect-failure-test.sh"
"$<TARGET_FILE:clusterclient>"
WORKING_DIRECTORY "${CMAKE_SOURCE_DIR}/tests/scripts/")
# This test can't be run on hiredis v1.1.0 due to hiredis issue #1171.
# Disabling the testcase if hiredis contains the issue or if the version is unknown.
add_test(NAME redirect-with-hostname-test-async
Expand Down
25 changes: 16 additions & 9 deletions tests/clusterclient.c
Original file line number Diff line number Diff line change
Expand Up @@ -22,23 +22,30 @@ int main(int argc, char **argv) {
redisClusterSetOptionAddNodes(cc, initnode);
redisClusterSetOptionConnectTimeout(cc, timeout);
redisClusterSetOptionRouteUseSlots(cc);
redisClusterSetOptionMaxRetry(cc, 2);

if (redisClusterConnect2(cc) != REDIS_OK) {
fprintf(stderr, "Connect error: %s\n", cc->errstr);
exit(100);
}

char command[256];
while (fgets(command, 256, stdin)) {
size_t len = strlen(command);
if (command[len - 1] == '\n') // Chop trailing line break
command[len - 1] = '\0';
redisReply *reply = (redisReply *)redisClusterCommand(cc, command);
char cmd[256];
while (fgets(cmd, 256, stdin)) {
size_t len = strlen(cmd);
if (cmd[len - 1] == '\n') // Chop trailing line break
cmd[len - 1] = '\0';

if (cmd[0] == '\0') /* Skip empty lines */
continue;
if (cmd[0] == '#') /* Skip comments */
continue;

redisReply *reply = (redisReply *)redisClusterCommand(cc, cmd);
if (cc->err) {
fprintf(stderr, "redisClusterCommand error: %s\n", cc->errstr);
exit(101);
printf("error: %s\n", cc->errstr);
} else {
printf("%s\n", reply->str);
}
Comment on lines +43 to 48
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now we can continue to use the cluster context after a reconnect failure? Sending queries still works as long as it's not the failing node? What would happen if we'd do that without this change? Would it crash the program?

How does redisClusterSetOptionMaxRetry affect the behaviour?

I think the PR description lacks some details about behaviour before and after.

printf("%s\n", reply->str);
freeReplyObject(reply);
}

Expand Down
85 changes: 85 additions & 0 deletions tests/scripts/reconnect-failure-test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
#!/bin/bash
#
# Simulate a 2 node cluster.
# - Node 1 will handle slotmap updates and redirects.
# - Node 2 will simulate a short and temporary network issue
# that trigger the client to reconnect.
#
# Usage: $0 /path/to/clusterclient-binary

clientprog=${1:-./clusterclient}
testname=reconnect-failure-test

# Sync process just waiting for server to be ready to accept connection.
perl -we 'use sigtrap "handler", sub{exit}, "CONT"; sleep 1; die "timeout"' &
syncpid1=$!
perl -we 'use sigtrap "handler", sub{exit}, "CONT"; sleep 1; die "timeout"' &
syncpid2=$!

# Start simulated redis node #1
timeout 5s ./simulated-redis.pl -p 7401 -d --sigcont $syncpid1 <<'EOF' &
# Inital slotmap
EXPECT CONNECT
EXPECT ["CLUSTER", "SLOTS"]
SEND [[0, 6000, ["127.0.0.1", 7401, "nodeid1"]],[6001, 16383, ["127.0.0.1", 7402, "nodeid2"]]]
EXPECT CLOSE

EXPECT CONNECT
EXPECT ["SET", "bar", "initial"]
SEND +OK

EXPECT CLOSE
EOF
server1=$!

# Start simulated redis node #2
timeout 5s ./simulated-redis.pl -p 7402 -d --sigcont $syncpid2 <<'EOF' &
EXPECT CONNECT
EXPECT ["SET", "foo", "initial"]
SEND +OK
CLOSE
EOF
server2=$!

# Wait until both nodes are ready to accept client connections
wait $syncpid1 $syncpid2;

# Run client
timeout 4s "$clientprog" 127.0.0.1:7401 > "$testname.out" <<'EOF'
SET foo initial
SET bar initial

SET foo first
SET foo second

EOF
clientexit=$?

# Wait for servers to exit
wait $server1; server1exit=$?
wait $server2; server2exit=$?

# Check exit statuses
if [ $server1exit -ne 0 ]; then
echo "Simulated server #1 exited with status $server1exit"
exit $server1exit
fi
if [ $server2exit -ne 0 ]; then
echo "Simulated server #2 exited with status $server2exit"
exit $server2exit
fi
if [ $clientexit -ne 0 ]; then
echo "$clientprog exited with status $clientexit"
exit $clientexit
fi

# Check the output from clusterclient
expected="OK
OK
error: Server closed the connection
error: Connection refused"

cmp "$testname.out" <(echo "$expected") || exit 99

# Clean up
rm "$testname.out"