#CodeX: Generative AI – Beyond the hype / by Ajit Minhas

Artificial Intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines to accelerate innovation and operational efficiency.

Artificial Intelligence will be the next technology platform shift — following cloud, social, and mobile, with GENERATIVE AI leading the rapid charge.

Generative Al is a subset of Machine Learning (ML) and refers to technology that can take human inputs and create completely new content (text, videos, code, images … ). It has the potential to present an exciting horizon of new ways computers can help people get things done and transform user experiences drastically.

Most past ML work has focused on models that deal with a single modality of data (e.g., language models, image models or speech recognition models). While there has been plenty of amazing progress in these areas, the future is even more exciting as we look forward to Multi-modal Models that can flexibly handle many different modalities simultaneously, both as model inputs and as model outputs.

Landmark AI models from OpenAI and DeepMind (a wholly owned subsidiary of Alphabet Inc.) have been implemented, cloned, and improved by the open-source community much faster over the last few years. The figure below depicts the evolution of LLMs (Large Language Model) over last few years. 280-Billion parameter AI Natural Language Processing (NLP) model exists today. AI Model researchers have found that scaling the number of training tokens (that is, the amount of text data the model is fed) is as important as scaling model size. The hyperscalers and challenger AI compute providers are fostering up key partnerships, notably Microsoft’s investment in OpenAI. There will be more to come as AI becomes core, including exciting developments from Google (Generative AI APIs), and the way AI is used for conversations & in search.

Although language models are trained on surprisingly simple objectives, like predicting the next token in a sequence of text given the preceding tokens, when large models are trained on sufficiently large and diverse corpora of text, the models can generate coherent, contextual, natural-sounding responses, and can be used for a wide range of tasks, such as generating creative content, translating languages, helping with coding tasks, and answering questions.

One of the key challenges in AI is to build systems that can perform multi-step reasoning, learning to break down complex problems into smaller tasks and combining solutions to those to address the larger problem — Google has done a lot of research and development in this space by developing and training models on “Chain of Thought Prompting” as opposed to “Standard Prompting”.

Under the “Chain of Thought Prompting” the model is encouraged to follow a logical chain of thought (multiple steps of reasoning) and generate more structured, organized, and accurate responses in solving problems.

The AI platform shift, led by Generative AI models, will create as much value as the cloud platform shift, completely transforming the broader knowledge economy and getting us closer to unlocking the potential of outperforming humans in many tasks by multiple orders of magnitude.

The figure below is the condensed form of Generative AI universe market map across emerging segments by Base10. All the companies in this space are capable of automating large sections of content creation of all kinds.

Many hot technology trends get over-hyped far before the market catches up. But the Generative AI boom has been accompanied by real traction from real companies.

ChatGPT User Growth

It took ChatGPT 5 days to reach 1 million users compared to all other popular social media platforms and streaming services.

Primarily, Generative AI Technology Stack can be divided into 3 distinct layers. The figure below represents a high-level tech stack for Generative AI system architecture:

  1. Applications — Apps that integrate Generative AI models into a user-facing product, either based on their own model or relying on third-party APIs.

  2. Models — Models are hosted solutions that power Generative AI applications and are available for integration as APIs or as open-source.

  3. Infrastructure — Cloud platforms and hardware providers that run training and inference workloads for Generative AI models.

Nearly everything in Generative AI leverages cloud-hosted GPU or TPU at some point. Infrastructure providers (Cloud – AWS, GCP and Azure & Hardware – Nvidia, Google) are likely going to be the biggest winners, capturing majority of $$$$ flowing through the stack. Application providers though will scale up rapidly but will struggle with retention and product differentiation (becoz they use similar models). And Model providers though deliver the most value — i.e., training generative AI models, are yet to achieve expansive commercial scale and achieving differentiation is a challenge because they are trained on similar datasets with similar architectures.

The key is to understand parts of the stack that are truly differentiated, defensible and can achieve profitable growth driven by network effects and data. This will have a major impact on market structure (i.e., horizontal vs. vertical) and the drivers of long-term value (e.g., margins and retention). Vertically integrated apps that deliver end to end solutions have an advantage in driving differentiation.

But overall Infrastructure layer is an outlier and defensible (though limited differentiation because they run the same GPUs), and big 3 cloud providers will be the long-term winners — Microsoft Azure, Google Cloud Platform & Amazon Web Services.

One important area to recognize is that any innovation in AI must be pursued responsibly. Many silicon valley practitioners have flagged that artificial intelligence if not pursued responsibly is mankind's “biggest existential threat”.

AI achieving “superintelligence” — where machines have advanced beyond human-level intelligence and may have objectives not in-line with #humanity.

Powerful language models can help to execute mundane machine tasks leaving humans to pursue more creative work. But without appropriate safety controls they can also generate misinformation leading to catastrophic conflicts and major AI mishaps with disastrous consequences. As AI becomes mainstream, there is a strong need to govern these models, and understand their impact on society and the economy more carefully. Not only do we have to consider the unintended consequences of deploying AI, but we must also consider the malicious use of AI that must be investigated and have counterstrategies developed in order to prevent harm.

Thus, AI SAFETY is a key area of focus for institutions to endeavor and ensure that AI is deployed in ways that do not harm humanity. Discussions about ethical design and development of technology contains a lot of empty rhetoric. But for AI, the ethical design principles have to be broadly adopted to harness the real potential or else it will just lead to the death of humanity without dignity.

An ethical, human-centric AI must be designed and developed in a manner that is aligned with the values and ethical principles of a society or the community it affects. Ethics is based on well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society, fairness, or specific virtues.
— Markkula Center for Applied Ethics

In conclusion, Generative AI will be a force multiplier on technological progress in our increasingly digital and data-driven world. One thing is certain that Generative AI is a game changer and there is a tremendous value to be unlocked as the world navigates through such a huge transition.


Below is a synopsis of two very pragmatic and interesting Generative AI Use Cases that I find powerful and valuable.

Use Case 1: Vubbing (synchronizing mouth movements when films are dubbed in other languages)

Scott Mann is the founder of the neural network lab Flawless and has delivered films such as “Heist”, "Final Score” and “The Tournament”. As a director, Mann saw a problem with the visuals of films dubbed into other languages.

In response, his company developed in-house proprietary software called TrueSync. Through a process Flawless calls "vubbing", it makes an actor's mouth movements match the dubbed dialogue when a film is translated into another language.

Generative AI is applied to video to make speech authentic and to create something that didn't exist before using a lot of underlying ground truth training data of the original to make it as genuine as the original.

Use Case 2: GIS Arta Command and Control System

Ukraine’s geographic information system GIS Arta is a geospatial intelligence software and is a sign of things to come.

GIS Arta is a homegrown application developed prior to Russia’s invasion based on lessons learned from the conflict in the Donbas. It’s a guidance command and control system for drone, artillery, or mortar strikes. 

The use of this geospatial intelligence application has reportedly reduced the decision chain around artillery from 20 minutes to under 1 minute.