Sentence segmentation error case #1130
Labels
bug
From Hemiptera and especially its suborder Heteroptera
implemented
The issue has been implemented
Milestone
This is an error case not to forget that causes some trouble with the sentence segmentation.
The document is not CC-BY, referenced here: https://dx.doi.org/10.1063/1.1874292
Here the
delinquent
paragraph:With version 0.8.0 and the current master, the process fails:
There are two problems (code
grobid/grobid-core/src/main/java/org/grobid/core/document/TEIFormatter.java
Line 2028 in 694f0ed
String local_text_chunk = text.substring(pos+posInSentence, theSentences.get(i).end);
may crash when the sentence is going over the text lengthThe text was updated successfully, but these errors were encountered: