According to recent studies, language (text) accounts for only a fraction (7%) of human communication. Studies indicate that the majority of communication, approximately 93%, is nonverbal, including body language, facial expressions, and tone of voice.
Limitation of Language Models
Despite all this training data in place today, Machines (LLMs) will never reach human-level AI without learning from high-bandwidth sensory inputs, such as vision or touch.
Sensory inputs (Vision and Touch) are of much higher bandwidth relative to Language/Texts.
The data bandwidth of visual perception is roughly 16 million times higher than the data bandwidth of written (or spoken) language.
In a mere 4 years, a child has seen 50 times more data than the biggest LLMs trained on all the text publicly available on the internet.
Most of human knowledge (and almost all of animal knowledge) comes from our sensory experience of the physical world.