Skip to content

Commit

Permalink
Add "priority" support for templates
Browse files Browse the repository at this point in the history
In case of multiple templates matching given invoice - choose the one
with the highest "priority" value. To provide proper support for
prioritizing AND existing templates (backward compatibility) the default
value 5 is assumed in case "priority" property is missing.

This feature can be used for writing more generic as well as more
specific templates. So far all templates were assumed to be
company-specific. With this change we can have:
1. Invoice-generating software specific templates
2. In-company varying templates

This feature may be very useful for:
1. Countries with just few very popular accounting software applications
2. Big companies with multiple departments adding some invoice details

Signed-off-by: Rafał Miłecki <[email protected]>
  • Loading branch information
Rafał Miłecki authored and bosd committed Feb 25, 2023
1 parent 7f7280e commit b1cdfb7
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 15 deletions.
15 changes: 15 additions & 0 deletions TUTORIAL.md
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,21 @@ options and their defaults are:
different fields, you can supply a list here. The extraction will
fail if not all fields are matched.

### Priority

In case of multiple templates matching single invoice the one with the
highest priority will be used. Default `priority` value (assigned if
missing) is 5.

This property needs to be specified only when designing some generic or
very specific templates.

Suggested values:

- 0-4: accounting/invoice software specific template
- 5: company specific template
- 6-10: company department/unit specific template

### Example of template using most options

issuer: Free Mobile
Expand Down
3 changes: 3 additions & 0 deletions src/invoice2data/extract/loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,9 @@ def read_templates(folder=None):
elif type(tpl["exclude_keywords"]) is not list:
tpl["exclude_keywords"] = [tpl["exclude_keywords"]]

if 'priority' not in tpl.keys():
tpl['priority'] = 5

output.append(InvoiceTemplate(tpl))

logger.info("Loaded %d templates from %s", len(output), folder)
Expand Down
26 changes: 11 additions & 15 deletions src/invoice2data/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,6 @@ def extract_data(invoicefile, templates=None, input_module=None):
'currency': 'INR', 'desc': 'Invoice IBZY2087 from OYO'}
"""
if templates is None:
templates = read_templates()

# print(templates[0])

if input_module is None:
if invoicefile.lower().endswith('.txt'):
Expand All @@ -98,18 +94,18 @@ def extract_data(invoicefile, templates=None, input_module=None):
logger.debug("START pdftotext result ===========================\n" + extracted_str)
logger.debug("END pdftotext result =============================")

for t in templates:
optimized_str = t.prepare_input(extracted_str)

if t.matches_input(optimized_str):
logger.info("Using %s template", t["template_name"])
# Call extract with entire text and the invoicefile path
# The path is used if an area is called as a field option
return t.extract(optimized_str, invoicefile, input_module)

logger.error("No template for %s", invoicefile)
return False
if templates is None:
templates = read_templates()
templates = filter(lambda t: t.matches_input(t.prepare_input(extracted_str)), templates)
templates = sorted(templates, key=lambda k: k['priority'], reverse=True)
if not templates:
logger.error("No template for %s", invoicefile)
return False

t = templates[0]
logger.info("Using %s template", t["template_name"])
optimized_str = t.prepare_input(extracted_str)
return t.extract(optimized_str, invoicefile, input_module)

def create_parser():
"""Returns argument parser """
Expand Down

0 comments on commit b1cdfb7

Please sign in to comment.