Skip to content

Commit

Permalink
Regex: String joining in case of multiple matches
Browse files Browse the repository at this point in the history
  • Loading branch information
bosd committed Mar 7, 2023
1 parent e3a5a2d commit a5bdd50
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 4 deletions.
8 changes: 4 additions & 4 deletions TUTORIAL.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

A template defines which data attributes you wish to retrieve from an
invoice. Each template should work on all invoices of a company or
subsidiary (e.g. Amazon Germany vs Amazon US).
subsidiary (e.g. Amazon Germany vs Amazon US).

Adding templates is easy and shouldn't take longer than adding 2-3
invoices by hand. We use a simple YML-format. Many options are optional
Expand Down Expand Up @@ -77,7 +77,7 @@ only required property is `regex` that has to contain one or multiple
(specified using array) regexes.

It's not required to put add the whole regex to the capturing group.
Often we use keywords and only capture part of the match (e.g. the
Often we use keywords and only capture part of the match (e.g. the
amount).

You will need to understand regular expressions to find the right
Expand All @@ -87,7 +87,7 @@ you can learn about them
here](http://www.regexr.com/). We use [Python's regex
engine](https://docs.python.org/2/library/re.html). It won't matter for
the simple expressions we need, but sometimes there are subtle
differences when e.g. coming from Perl.
differences when e.g. coming from Perl.

By default `regex` parser removes all duplicated matches. It results a
single value or an array (depending an amount of unique matches found).
Expand All @@ -97,7 +97,7 @@ Optional properties:
- `type` (if present must be one of: `int`, `float`, `date`) -results
in parsing every matched value to a specified type
- `group` (if present must be one of: `sum`, `min`, `max`, `first`,
`last`) - specifies grouping function (defines what value to return in
`last`, join) - specifies grouping function (defines what value to return in
case of multiple matches)

Example for `regex`:
Expand Down
2 changes: 2 additions & 0 deletions src/invoice2data/extract/parsers/regex.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ def parse(template, field, settings, content, legacy=False):
result = result[0]
elif settings["group"] == "last":
result = result[-1]
elif settings["group"] == "join":
result = " ".join(str(v) for v in result)
else:
logger.warning("Unsupported grouping method: " + settings["group"])
return None
Expand Down

0 comments on commit a5bdd50

Please sign in to comment.