Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions Regarding Your Code (demo .npy files and control points) #7

Open
baotruyenthach opened this issue May 7, 2020 · 2 comments

Comments

@baotruyenthach
Copy link

baotruyenthach commented May 7, 2020

Hi Dr. Mousavian,

I am an undergraduate student very interested in your 6DOF Graspnet project. I have read your paper and am currently reading through your code. I am having a few questions about the code which I really hope you could spend some time helping me answer them. Thank you very much in advance.

First, in the folder demo/data, you provided a few data (.npy files) to run the demo. I am just wondering if these .npy files contain the 3D point clouds of test objects or something else? If they are point clouds, could you tell me the technique that you used to create these files? What kind of vision sensor did you use? How did you process data from the sensor and generate those .npy files? How did you filter the measured point clouds as well as remove the table plane? I am trying to implement your code in my Panda robot. Therefore, I hope you could give me some instructions of how a robot, at experiment time, can extract 3D point clouds from an object and feed it to the decoder properly (compatible with your code).

Second, at a lot of places in your code, you mentioned the Panda gripper control points. Could you help me define what are these control points? Specifically, in utils/utils.py, what are the purpose of these functions: transform_control_points; transform_control_points_numpy; transform_control_points; and control_points_from_rot_and_trans?

I saw that you utilized one of those functions in grasp_sampling_data.py, where you calculated the meta[‘target_cps’] (line 54, 68) from output_grasps (line 44)? I observed that ‘target_cps’, 'grasp_rt', and ‘pc’ are data from the dataset that will later be used to train the VAE. Could you tell me what is ‘target_cps’, and what is the difference between ‘target_cps’ and 'grasp_rt'? From my understanding, 'grasp_rt' is the grasp g that we feed to the encoder, and ‘target_cps’ is the reconstructed g_hat that comes out from the decoder. Therefore, should they be the same thing? Why 'grasp_rt' is directly computed from output_grasps (line 58), but ‘target_cps’ has to go through the control points?

Thanks a lot for your answer. Best regards.

@c-keil
Copy link

c-keil commented May 18, 2020

I can't give you definite answers, but the samples were probably generated using the Intel Realsense shown in their video.

@arsalan-mousavian
Copy link
Collaborator

@baotruyenthach

  1. getting point cloud from depth images: given the intrinsic matrix and the depth image you can easily compute the point cloud. You can find plenty of code snippet for it and it is part of many computer vision slides that talk about projection and how images are transformed from 3D world to 2D plane.

  2. Regarding control points: these are some points sampled on the gripper and fingers to represent the gripper and used for computing losses and also in the evaluator model.

  3. grasp_rt: is the 4x4 matrix that contains rotation, translation of grasps. grasp_cps are the control points of the gripper transformed to the pose of grasp. I would recommend to read paper in more details to understand it. Implementation follows the paper closely and vice versa.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants