cavis/deeplearning4j/deeplearning4j-nlp-parent
Eduardo Gonzalez ebeeb8bc48 Fix BERT word piece tokenizer stack overflow error (#205)
* Change the regular expression for the Bert tokenizer.

The previous regular expression causes StackOverflowErrors
if given a document with a large amount of whitespace. I
believe that the one I've provided is an equivalent.

* Add test for new BertWordPieceTokenizer RegEx.

This test should cause a StackOverflowError with the previous version.

* Fix assert off by one.
2020-02-10 14:33:04 +11:00
..
deeplearning4j-nlp Fix BERT word piece tokenizer stack overflow error (#205) 2020-02-10 14:33:04 +11:00
deeplearning4j-nlp-chinese Various fixes (#143) 2020-01-04 13:45:07 +11:00
deeplearning4j-nlp-japanese Unit/integration test split + test speedup (#166) 2020-01-22 22:27:01 +11:00
deeplearning4j-nlp-korean Various fixes (#143) 2020-01-04 13:45:07 +11:00
deeplearning4j-nlp-uima Test fixes (#218) 2020-02-07 16:25:02 +11:00
pom.xml Add support for CUDA 10.2 (#89) 2019-11-29 16:31:03 +11:00