Results are not reproducible on custom datasets - Issues related to mmdetection, CascadeTabNet training on custom dataset #164

inkarar · 2022-05-13T09:55:02Z

Need better and detailed step by step documentation on how to TRAIN CascadeTabNet on custom datasets to replicate the results produced by CascadeTabNet
There are tons of issues related to training this model and mmdetection and the relative paths to config & datasets mentioned in code are a huge mess. Tree/Directory structure should have been mentioned and file names/paths should have been clearly specified.

Here's what I did:

1. Annotated custom datasets using labeleme and converted these annotations to COCOjson and VOC format
2. Trained mmdetection on custom dataset using the blog mentioned in the repo's readme file (unsure which architecture to use)
3. But I'm unsure which config, epoch, .pth files to use for inference/testing the results from model
4. I'm unsure how to use the code from this repo and the trained model from mmdetection to get the desired results.

Here's what I'm asking:

1. Please include a readme file mentioned exact steps to follow to replicate results on custome datasets
2. Please mention which config/.pth file to use where and how to use this repo: which files to execute when/where.
3. How to format dataset: this repo needs COCO json but mmdetection needs VOC

The code written is neat and documented and the results achieved are commendable.

Help me out to replicate the results on custom dataset.

@DevashishPrasad @AyanGadpal @kshitijkapadni @ManishDV @francescoperessini @mhmd-azeez @akadirpamukcu @MrZilinXiao @mfproto @iiLaurens @NISH1001

NISH1001 · 2022-05-13T14:49:58Z

@inkarar The best way to approach training the cascadetabnet is to go through mmdetection framework where we treat each table as an object. So, once you are able to get the correct training annotation format, it's pretty straightforward for training/inference here for table detection.

I haven't actually tried benchmarking it w.r.t the original paper (didn't get time to do it) but had done custom training with the custom dataset I had. It was reasonably good for detection (table only, no cells). The vanilla configuration did struggle with tables with very small heights (for instance, tables with a single row). So, I had to change the anchor box scale in the config and it worked. In fact, I used the exact change of scale to train the header region and it was pretty good.

So, I recommend you try mmdet first for training/inference with its train detector. After that it's pretty straightforward.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results are not reproducible on custom datasets - Issues related to mmdetection, CascadeTabNet training on custom dataset #164

Results are not reproducible on custom datasets - Issues related to mmdetection, CascadeTabNet training on custom dataset #164

inkarar commented May 13, 2022 •

edited

Loading

NISH1001 commented May 13, 2022 •

edited

Loading

Results are not reproducible on custom datasets - Issues related to mmdetection, CascadeTabNet training on custom dataset #164

Results are not reproducible on custom datasets - Issues related to mmdetection, CascadeTabNet training on custom dataset #164

Comments

inkarar commented May 13, 2022 • edited Loading

Here's what I did:

Here's what I'm asking:

NISH1001 commented May 13, 2022 • edited Loading

inkarar commented May 13, 2022 •

edited

Loading

NISH1001 commented May 13, 2022 •

edited

Loading