* Change the regular expression for the Bert tokenizer.
The previous regular expression causes StackOverflowErrors
if given a document with a large amount of whitespace. I
believe that the one I've provided is an equivalent.
* Add test for new BertWordPieceTokenizer RegEx.
This test should cause a StackOverflowError with the previous version.
* Fix assert off by one.