ebeeb8bc48
* Change the regular expression for the Bert tokenizer. The previous regular expression causes StackOverflowErrors if given a document with a large amount of whitespace. I believe that the one I've provided is an equivalent. * Add test for new BertWordPieceTokenizer RegEx. This test should cause a StackOverflowError with the previous version. * Fix assert off by one. |
||
---|---|---|
.. | ||
deeplearning4j-nlp | ||
deeplearning4j-nlp-chinese | ||
deeplearning4j-nlp-japanese | ||
deeplearning4j-nlp-korean | ||
deeplearning4j-nlp-uima | ||
pom.xml |