Part of the magic of LanguageModels is the way they encode words and sentences in the first place.

AFAICT, the important point is that the encoding :

a) makes similar things look "close" in some higher dimensional space of the vector.

b) makes all vectors the same size to allow efficient processing.

See also VectorDatabase

