-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Area Plugin Support #305
base: master
Are you sure you want to change the base?
Area Plugin Support #305
Conversation
@kavinsharma: I think implementing this feature as separated plugin is non optimal and will cause maintenance problems in a long term. You add another piece of code handling regular expressions. Soon someone will ask for specifying field types (integers, floats, dates). Later someone will ask for handling sums. I think support for areas should be added to standard I'd suggest working on top of syntax like:
|
@rmilecki thanks for the review, I totally agree with you. Let me make these changes and update the PR |
@kavinsharma , any update on reworking this as a parser so it can be merged? This is exactly what I'm looking for to extract addresses from invoices. |
any update? |
Needs to pass tests and also a rebase. |
@m3nu, do you still think this would be better as a parser or as a plugin like lines or tables is acceptable? |
Makes sense as plugin, which it already is. |
Hey @kavinsharma :) I'm currently working on this too :P |
Added some comments. I see you need to rely on existing modules. We don't really have that anywhere else. Will be better to try reusing what's already there. |
This PR is open for a while. What needs to be done to get this merged? |
Any news? |
Area support has been added in #438 |
An invoice2data
area
plugin helps in extracting text on the basis of area coordinates utilizing pdf2text area cropping option. An area plugin is customization to invoice2data to define area cropping with coordinates. Coordinates defined for the template can vary from pdf to pdf.You just have to add a normal template containing the YAML file in which there are different plugins for fields and tables and you just have to add an area plugin and it works on every pdf.
Just write the field of multiple lines you want to extract and give the coordinates of that field that is (
x=? y=? r=? H=? W=?
)Area Plugin Options:
Name: field name to map with extracted text
Area: takes dict as input for cropping pdf area and extract text
Regex: Optional parameter, used for further extracting text from the cropped area.
Sample Invoice
Here is a sample of an invoice template including the area plugin that helps you extract multiple lines.
Output :