cavis/deeplearning4j/deeplearning4j-nlp-parent/deeplearning4j-nlp/src
Eduardo Gonzalez ebeeb8bc48 Fix BERT word piece tokenizer stack overflow error (#205)
* Change the regular expression for the Bert tokenizer.

The previous regular expression causes StackOverflowErrors
if given a document with a large amount of whitespace. I
believe that the one I've provided is an equivalent.

* Add test for new BertWordPieceTokenizer RegEx.

This test should cause a StackOverflowError with the previous version.

* Fix assert off by one.
2020-02-10 14:33:04 +11:00
..
main Fix BERT word piece tokenizer stack overflow error (#205) 2020-02-10 14:33:04 +11:00
test Fix BERT word piece tokenizer stack overflow error (#205) 2020-02-10 14:33:04 +11:00