Skip to content

Commit

Permalink
safer stripping, fail on too many tabs
Browse files Browse the repository at this point in the history
  • Loading branch information
jindrahelcl committed Oct 31, 2023
1 parent 50eb41f commit 28ada09
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions opuscleaner/filters/normalize_whitespace.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,17 @@ def collapse_whitespace(s):
def clean(collapse):
"""Runs the filter."""

for line in sys.stdin:
fields = line.strip().split("\t")
for i, line in enumerate(sys.stdin):
fields = line.split("\t")

if len(fields) == 1:
src = fields[0].strip()
trg = None
else:
# Similar to max_length filter, here we throw away potential
# newlines.
elif len(fields) == 2:
src = fields[0].strip()
trg = fields[1].strip()
else:
raise ValueError(f"Too many tabs on input line {i + 1}")

if collapse:
src = collapse_whitespace(src)
Expand Down

0 comments on commit 28ada09

Please sign in to comment.