#MultimodalLearning

#CodeX: AI & Humans —> Bridging Worlds Beyond Text by Ajit Minhas

According to recent studies, language (text) accounts for only a fraction (7%) of human communication. Studies indicate that the majority of communication, approximately 93%, is nonverbal, including body language, facial expressions, and tone of voice.

Limitation of Language Models

Despite all this training data in place today, Machines (LLMs) will never reach human-level AI without learning from high-bandwidth sensory inputs, such as vision or touch.

Sensory inputs (Vision and Touch) are of much higher bandwidth relative to Language/Texts.

  • The data bandwidth of visual perception is roughly 16 million times higher than the data bandwidth of written (or spoken) language.

  • In a mere 4 years, a child has seen 50 times more data than the biggest LLMs trained on all the text publicly available on the internet.

  • Most of human knowledge (and almost all of animal knowledge) comes from our sensory experience of the physical world.

Read More