Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse Multiple lines in the same field #377

Open
bosd opened this issue May 15, 2022 · 2 comments
Open

Parse Multiple lines in the same field #377

bosd opened this issue May 15, 2022 · 2 comments

Comments

@bosd
Copy link
Collaborator

bosd commented May 15, 2022

Since #308 it is possible to have multiple line parsers.

In my use case I want the line parsers to output to the same dict.
On each additional match, append it to the dict.

This is currently not possible.
Instead of appending the next line match it replaces the previous match.

Best to be explained by an example.

Step 1:
For demonstration purposes, have the items to match in a separate field.

output:

'lines': [{'category': 'FOOD'}], 
'line_items2': [{'barcode': '2231012001992', 'name': 'KROKETBROODJES', 'qty': 2.0, 'uom': 'KG', 'price_unit': 1.0, 'discount': 0.0}],

used template:

fields:
  lines:
    parser: lines
    start: Barcode
    end:  total  
    line: (?P<category>(FOOD))

  line_items2:
    parser: lines
    start: Barcode
    end:  total  
    line: (?P<barcode>(\d{13}))\s+(?P<name>(\w+(?:\s\S+)*))\s+(?P<qty>(\d))\s+(?P<uom>\w+)\s+(?P<price_unit>(\d+.\d+))\s+(?P<discount>\d+.\d+)
    types:
      qty: float
      price_unit: float
      discount: float

Step 2:
Now we know the template is correct, change it so it uses the same field.

fields:
  lines:
    parser: lines
    start: Barcode
    end:  total  
    line: (?P<category>(FOOD))

  lines:
    parser: lines
    start: Barcode
    end:  total  
    line: (?P<barcode>(\d{13}))\s+(?P<name>(\w+(?:\s\S+)*))\s+(?P<qty>(\d))\s+(?P<uom>\w+)\s+(?P<price_unit>(\d+.\d+))\s+(?P<discount>\d+.\d+)
    types:
      qty: float
      price_unit: float
      discount: float

Actual Output:
(HINT: the first match, category: FOOD is missing)
'lines': [{'barcode': '2231012001992', 'name': 'KROKETBROODJES', 'qty': 2.0, 'uom': 'KG', 'price_unit': 1.0, 'discount': 0.0}],

Desired output:
'lines': [{'category': 'FOOD'}, {'barcode': '2231012001992', 'name': 'KROKETBROODJES', 'qty': 2.0, 'uom': 'KG', 'price_unit': 1.0, 'discount': 0.0}],

@rmilecki
Copy link
Collaborator

I think the same feature was requested in the #238.

I don't think YAML (or Python's dict) allows using the same key multiple times. That would also require picking some merge strategy generic for all parsers. As commented in #238 we could think about something like

fields:
  lines:
    parser: lines
    rules:
      - start: Barcode
        end:  total  
        line: (?P<category>(FOOD))
      - start: Barcode
        end:  total  
        line: (?P<barcode>(\d{13}))\s+(?P<name>(\w+(?:\s\S+)*))\s+(?P<qty>(\d))\s+(?P<uom>\w+)\s+(?P<price_unit>(\d+.\d+))\s+(?P<discount>\d+.\d+)
    types:
      qty: float
      price_unit: float
      discount: float

@bosd
Copy link
Collaborator Author

bosd commented Jun 20, 2022

That is correct, that is what we are looking for.
Indeed Yaml syntax does not allow the same key to be used multiple times.

(indeed very similar to that reqeust, however that being an old request, the fix for that was not complete and did not get merged. Due to other code changes it needed refactoring. Hence the reason for PR378)

PR 378 is supporting this, when using the following yaml syntax:

lines:
 - start: Barcode
   end:  Netto totaal  
   line: (?P<line_note>(FOOD))
 - start: Barcode
   end:  Netto totaal
   line: (?P<barcode>(\d{13}))\s+(?P<name>(\w+(?:\s\S+)*))\s+(?P<qty>(\d))\s+(?P<uom>\w+)\s+(?P<price_unit>(\d+.\d+))\s+(?P<discount>\d+.\d+)

Hope this is more clear :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants