ebeeb8bc48
* Change the regular expression for the Bert tokenizer. The previous regular expression causes StackOverflowErrors if given a document with a large amount of whitespace. I believe that the one I've provided is an equivalent. * Add test for new BertWordPieceTokenizer RegEx. This test should cause a StackOverflowError with the previous version. * Fix assert off by one. |
||
---|---|---|
.. | ||
src | ||
pom.xml |