cavis/deeplearning4j/deeplearning4j-nlp-parent/deeplearning4j-nlp
Eduardo Gonzalez ebeeb8bc48 Fix BERT word piece tokenizer stack overflow error (#205)
* Change the regular expression for the Bert tokenizer.

The previous regular expression causes StackOverflowErrors
if given a document with a large amount of whitespace. I
believe that the one I've provided is an equivalent.

* Add test for new BertWordPieceTokenizer RegEx.

This test should cause a StackOverflowError with the previous version.

* Fix assert off by one.
2020-02-10 14:33:04 +11:00
..
src Fix BERT word piece tokenizer stack overflow error (#205) 2020-02-10 14:33:04 +11:00
pom.xml Various fixes (#143) 2020-01-04 13:45:07 +11:00