Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results are not reproducible on custom datasets - Issues related to mmdetection, CascadeTabNet training on custom dataset #164

Open
inkarar opened this issue May 13, 2022 · 1 comment

Comments

@inkarar
Copy link

inkarar commented May 13, 2022

Need better and detailed step by step documentation on how to TRAIN CascadeTabNet on custom datasets to replicate the results produced by CascadeTabNet
There are tons of issues related to training this model and mmdetection and the relative paths to config & datasets mentioned in code are a huge mess. Tree/Directory structure should have been mentioned and file names/paths should have been clearly specified.

Here's what I did:

1. Annotated custom datasets using labeleme and converted these annotations to COCOjson and VOC format
2. Trained mmdetection on custom dataset using the blog mentioned in the repo's readme file (unsure which architecture to use)
3. But I'm unsure which config, epoch, .pth files to use for inference/testing the results from model
4. I'm unsure how to use the code from this repo and the trained model from mmdetection to get the desired results.

Here's what I'm asking:

1. Please include a readme file mentioned exact steps to follow to replicate results on custome datasets
2. Please mention which config/.pth file to use where and how to use this repo: which files to execute when/where.
3. How to format dataset: this repo needs COCO json but mmdetection needs VOC

The code written is neat and documented and the results achieved are commendable.

Help me out to replicate the results on custom dataset.

@DevashishPrasad @AyanGadpal @kshitijkapadni @ManishDV @francescoperessini @mhmd-azeez @akadirpamukcu @MrZilinXiao @mfproto @iiLaurens @NISH1001

@NISH1001
Copy link

NISH1001 commented May 13, 2022

@inkarar The best way to approach training the cascadetabnet is to go through mmdetection framework where we treat each table as an object. So, once you are able to get the correct training annotation format, it's pretty straightforward for training/inference here for table detection.

I haven't actually tried benchmarking it w.r.t the original paper (didn't get time to do it) but had done custom training with the custom dataset I had. It was reasonably good for detection (table only, no cells). The vanilla configuration did struggle with tables with very small heights (for instance, tables with a single row). So, I had to change the anchor box scale in the config and it worked. In fact, I used the exact change of scale to train the header region and it was pretty good.

So, I recommend you try mmdet first for training/inference with its train detector. After that it's pretty straightforward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants