Character-level models like ByT5 are proof-of-idea that if architected carefully, character models come at relatively modest extra cost, and are each less complicated & usually higher than their sub-word counterparts.
Check out my web-site
My Free camd