Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zenoh 1.0.0 Porting #276

Open
wants to merge 68 commits into
base: rolling
Choose a base branch
from
Open

Conversation

YuanYuYuan
Copy link
Contributor

@YuanYuYuan YuanYuYuan commented Sep 5, 2024

Notes

  • Remove some unused dependencies.
  • Use z_view_keyexpr_t rather z_owned_keyexpr_t if possible. This keeps the reference only and doesn't need to be dropped.
  • The attachment type has been removed. Use the plain z_bytes instead.
  • Attachment is nothing but z_bytes. Use an iterator to get value by key.
  • z_check is now internal. We check zenoh entities' health by the creation functions' return.

Mallets and others added 30 commits August 9, 2024 14:32
Signed-off-by: Luca Cominardi <[email protected]>
Signed-off-by: Luca Cominardi <[email protected]>
Signed-off-by: Luca Cominardi <[email protected]>
chore: checkout dev/1.0.0
Copy link
Member

@Yadunund Yadunund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @YuanYuYuan,

Thanks for this huge effort to port the codebase to Zenoh 1.0. My request is to keep the diff minimal by restricting changes to only lines of code that need to change. There seems to be many changes in the diff caused by newlines in either comments, function signatures or assignment statements. If we could keep the structure of those lines the same and only update those lines that need new API, it would be a lot easier to review.

rmw_zenoh_cpp/src/detail/attachment_helpers.cpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/detail/rmw_data_types.cpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Outdated Show resolved Hide resolved
@YuanYuYuan
Copy link
Contributor Author

Hi @Yadunund, thanks for your reminder. I guess the style was messed up due to multiple uncrustify and IDE built-in formatting. I had manually aligned the style to the one used in rolling.

But I'd highly recommend introducing a formatting tool that produces a consistent format like clang-format. We have many inconsistent styles existing in the codes. For example,

// >>> These are inconsistent
rmw_client_t * rmw_client = static_cast<rmw_client_t *>(allocator->zero_allocate(
    1,
    sizeof(rmw_client_t),
    allocator->state));
rmw_node_t * node =
  static_cast<rmw_node_t *>(allocator->zero_allocate(1, sizeof(rmw_node_t), allocator->state));
context->impl = static_cast<rmw_context_impl_t *>(
  allocator->zero_allocate(1, sizeof(rmw_context_impl_t), allocator->state));

// >>> These are inconsistent
RMW_TRY_PLACEMENT_NEW(
  publisher_data, publisher_data, return nullptr,
  rmw_zenoh_cpp::rmw_publisher_data_t);
RMW_TRY_PLACEMENT_NEW(
  service_data, service_data, return nullptr,
  rmw_zenoh_cpp::rmw_service_data_t);
RMW_TRY_PLACEMENT_NEW(client_data, client_data, return nullptr, rmw_zenoh_cpp::rmw_client_data_t);
RMW_TRY_PLACEMENT_NEW(
  client_data->request_type_support,
  client_data->request_type_support,
  return nullptr,
  rmw_zenoh_cpp::RequestTypeSupport, service_members);
RMW_TRY_PLACEMENT_NEW(
  service_data->response_type_support,
  service_data->response_type_support,
  return nullptr,
  rmw_zenoh_cpp::ResponseTypeSupport, service_members);
RMW_ZENOH_LOG_ERROR_NAMED(
  "rmw_zenoh_cpp",
  "Unexpected type %s for service %s. Report this bug",
  service_type.c_str(), rmw_service->service_name);


// >>> Comment whose length isn't too long is sometimes written in two lines
// Convert the type hash to a string so that it can be included in
// the keyexpr.

client_data->entity = rmw_zenoh_cpp::liveliness::Entity::make(
  z_info_zid(z_loan(node->context->impl->session)),
  std::to_string(node_data->id),                       //  >>> These are inconsistent
  std::to_string(                                      //
    context_impl->get_next_entity_id()),               //
  rmw_zenoh_cpp::liveliness::EntityType::Client,
  rmw_zenoh_cpp::liveliness::NodeInfo{
    node->context->actual_domain_id, node->namespace_, node->name, context_impl->enclave},
  rmw_zenoh_cpp::liveliness::TopicInfo{
    node->context->actual_domain_id,
    rmw_client->service_name,
    std::move(service_type),
    type_hash_c_str,
    client_data->adapted_qos_profile}
);


// >>> These are inconsistent
auto service_members = static_cast<const service_type_support_callbacks_t *>(type_support->data);
auto request_members = static_cast<const message_type_support_callbacks_t *>(
  service_members->request_members_->data);
auto response_members = static_cast<const message_type_support_callbacks_t *>(
  service_members->response_members_->data);


// >>> These are inconsistent
RMW_TRY_DESTRUCTOR(
  service_data->request_type_support->~RequestTypeSupport(), rmw_zenoh_cpp::RequestTypeSupport, );
allocator->deallocate(service_data->request_type_support, allocator->state);
RMW_TRY_DESTRUCTOR(
  service_data->response_type_support->~ResponseTypeSupport(), rmw_zenoh_cpp::ResponseTypeSupport,
);

@clalancette
Copy link
Collaborator

But I'd highly recommend introducing a formatting tool that produces a consistent format like clang-format. We have many inconsistent styles existing in the codes. For example,

Uncrustify should already be doing this. If it is not, then we should revisit the uncrustify configuration.

@Yadunund Yadunund marked this pull request as ready for review September 17, 2024 20:37
Copy link
Member

@Yadunund Yadunund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm halfway thought the review! Left some comments to consider.

The big issue is that when I run colcon test --packages-select rcl with this branch, I'm seeing test_graph__rmw_zenoh_cpp failing. This indicates a regression somewhere. Detailed logs below.

ARN] [1726606953.014733342] [rmw_zenoh_cpp]: Received liveliness token to remove unknown node /test_graph_node from the graph. Ignoring...
[WARN] [1726606953.014809662] [rmw_zenoh_cpp]: Received liveliness token to remove unknown node /test_graph_node from the graph. Ignoring...
[  FAILED  ] NodeGraphMultiNodeFixture.test_node_info_services (14045 ms)
[ RUN      ] NodeGraphMultiNodeFixture.test_node_info_clients
[INFO] [1726606953.046300511] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id 60023da01865606de8e91384a1abf6c0.
[INFO] [1726606954.051276780] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id 60023da01865606de8e91384a1abf6c0.
[INFO] [1726606955.056553238] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id 60023da01865606de8e91384a1abf6c0.
/usr/local/google/home/yadunund/ros2_rolling/src/ros2/rcl/rcl/test/rcl/test_graph.cpp:1067: Failure
Expected: (attempts) <= (max_attempts), actual: 11 vs 10
Unable to attain all required nodes

[WARN] [1726606967.059638300] [rmw_zenoh_cpp]: Received liveliness token to remove unknown node /test_graph_node from the graph. Ignoring...
[WARN] [1726606967.059714050] [rmw_zenoh_cpp]: Received liveliness token to remove unknown node /test_graph_node from the graph. Ignoring...
[  FAILED  ] NodeGraphMultiNodeFixture.test_node_info_clients (14017 ms)
[----------] 4 tests from NodeGraphMultiNodeFixture (56140 ms total)

[----------] Global test environment tear-down
[==========] 22 tests from 2 test suites ran. (96328 ms total)
[  PASSED  ] 18 tests.
[  FAILED  ] 4 tests, listed below:
[  FAILED  ] NodeGraphMultiNodeFixture.test_node_info_subscriptions
[  FAILED  ] NodeGraphMultiNodeFixture.test_node_info_publishers
[  FAILED  ] NodeGraphMultiNodeFixture.test_node_info_services
[  FAILED  ] NodeGraphMultiNodeFixture.test_node_info_clients

@ahcorde you're most familiar with the tests, so I'd appreciate if you could test this branch to check for any other regressions.

/// In peer mode, the period dedicated to scouting remote peers before attempting other operations
delay: 1,
/// In peer mode, the maximum period in milliseconds dedicated to scouting remote peers before attempting other operations.
delay: 500,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the rationale for increasing this delay?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now the default value in Zenoh config. We aligned it with the current one.
In fact, this is the maximum delay, which means it doesn't always take 500 ms everytime. zenoh::open() can return quicker than this delay if some preconditions are met (configured peers are connected and scouted peers are connected when enabled).

rmw_zenoh_cpp/src/detail/attachment_helpers.hpp Outdated Show resolved Hide resolved

class attachement_data_t final
{
public:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move data members into private scope and provide accessor functions if needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would introduce lots of complexity since we need to do some serializations in attachment_helpers.cpp which need to access the values. To make them private, we need to provide either public serialization functions or public get functions (including returning an array pointer of a known sized array source_gid[RMW_GID_STORAGE_SIZE])

rmw_zenoh_cpp/src/detail/attachment_helpers.hpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/detail/attachment_helpers.hpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/detail/rmw_data_types.cpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/detail/rmw_data_types.cpp Outdated Show resolved Hide resolved
@@ -284,7 +282,7 @@ class ZenohReply final

~ZenohReply();

std::optional<z_sample_t> get_sample() const;
const z_loaned_sample_t * get_sample() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of this change as it requires the user to check if the returned value is a nullptr.

For consistency, it would be good to retain the std::optional return or another alternative could be to update this function to (similar to how get_query() works above)

Suggested change
const z_loaned_sample_t * get_sample() const;
const z_loaned_reply_t * get_reply() const;

Then users can call z_reply_ok() on this loaned reply and handle the nullptr case accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included in 54dd96a. Now we consistently keep a pointer to owned data and provide a get function to access the loaned data. Users can fetch the sample and check its validity via z_reply_ok.

rmw_zenoh_cpp/src/rmw_init.cpp Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_init.cpp Show resolved Hide resolved
@YuanYuYuan
Copy link
Contributor Author

YuanYuYuan commented Sep 18, 2024

Hi @Yadunund, thanks for your comments. I will update the PR accordingly.

Regarding the failures in RCL, we have identified the problem. The issue is the latest zenoh library uses a safer mechanism that would undeclare the liveliness token whenever the session is closed. In the NodeGraphMultiNodeFixture test, it is expected to detect three nodes after the initiation. Investigating the setup for the test, we call a rcl_shutdown after we create the old_node which calls rmw_shutdown and therefore destroys its liveliness token. That's why we failed this series of tests. We only observe two alive nodes instead of three.

The question is do we need to keep the node liveliness after it has been shut down? This is also relevant to our previous discussion on the unsoundness of rmw_shutdown. Again, rmw_cyclonedds_cpp and rmw_fastrtps_cpp don't have this issue since the only thing they do during rmw_shutdown is to change the status rather than kill any runtimes.

@ahcorde
Copy link
Contributor

ahcorde commented Sep 18, 2024

Test summary

package test_name info
rcl test_graph__rmw_zenoh_cpp segfault
rcl test_events__rmw_zenoh_cpp NodeGraphMultiNodeFixture.test_node_info_subscriptions
rcl_action test_action_communication__rmw_zenoh_cpp segfault
rcl_action test_action_interaction__rmw_zenoh_cpp segfault
rcl_action test_graph__rmw_zenoh_cpp Failure
rcl_lifecycle test_rcl_lifecycle timeout ros2/rcl#1185
rcl_yaml_param None
rclcpp_action test_client segfault
test_component test_component_manager ugly backtrace ros2/rclcpp#2633
test_component test_component_manager_api ugly backtrace ros2/rclcpp#2633
rclcpp_lifecycle test_lifecycle_service_client timeout
rclcpp test_allocator_memory_strategy timeout
rclcpp test_intra_process_manager Failed test
rclcpp test_node_interfaces__get_node_interfaces backtrace
rclcpp test_node_interfaces__node_graph failed test
rclcpp test_node_global_args backtrace
rclcpp test_parameter_client failed test
rclcpp test_parameter backtrace
rclcpp test_parameter_event_handler backtrace
rclcpp test_publisher TestPublisher.run_event_handlers C++ exception with description "'data' is empty" thrown in the test body.
rclcpp test_qos Failed tests
rclcpp test_find_weak_nodes backtrace
rclcpp test_externally_defined_services backtrace
rclcpp test_wait_for_message segfault
rclcpp test_executors Failed tests
rclcpp test_multi_threaded_executor backtrace
rclcpp test_events_executor timeout
rclcpp test_guard_condition backtrace
rclcpp test_wait_set backtrace
rclcpp test_subscription_options backtrace
rclcpp test_qos_event__rmw_zenoh_cpp segfault
rclcpp test_generic_pubsub__rmw_zenoh_cpp segfault
rclcpp test_add_callback_groups_to_executor__rmw_zenoh_cpp segfault
rclcpp test_subscription_content_filter__rmw_zenoh_cpp segfault
rmw_zenoh_cpp None
test_cli none
test_cli_remap None
test_communication test_requester_replier__rclpy__rmw_zenoh_cpp
test_communication test_action_client_server__rclpy__rmw_zenoh_cpp
test_communication test_publisher_subscriber__rclpy__rclcpp__rmw_zenoh_cpp
test_communication test_action_client_server__rclpy__rclcpp__rmw_zenoh_cpp
test_communication test_publisher_subscriber__rclcpp__rclpy__rmw_zenoh_cpp
test_communication test_requester_replier__rclcpp__rclpy__rmw_zenoh_cpp
test_communication test_action_client_server__rclcpp__rclpy__rmw_zenoh_cpp
test_communication test_publisher_subscriber__rclcpp__rmw_zenoh_cpp
test_communication test_action_client_server__rclcpp__rmw_zenoh_cpp

rcl_action - test_graph__rmw_zenoh_cpp

13: /home/ahcorde/ros2_rolling/src/ros2/rcl/rcl_action/test/rcl_action/test_graph.cpp:347: Failure
13: Expected: (attempts) <= (max_attempts), actual: 5 vs 4
13: Unable to attain all required nodes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants