The generative AI revolution has begun. How did we get there ?

This image was partially generated by the AI with the prompt — Enlarge / This image was partially AI-generated with the prompt “a pair of robot hands holding crayons drawing a pair of human hands, oil paint, colored”, inspired by the classic drawing by MC Escher. Watching the AI mutilate draw hands helps us feel superior to machines…for now. —Aurich

Aurich Lawson | Steady broadcast

Progress in AI systems often seems cyclical. Every few years computers can suddenly do something they never could before. “To see!” proclaim the true believers of AI, “the era of general artificial intelligence is near!” “Absurdity!” say the skeptics. “Remember self-driving cars? »

The truth usually lies somewhere in between.

We’re in another cycle, this time with generative AI. Media headlines are dominated by news about the art of AI, but there is also unprecedented progress in many very disparate fields. Everything from videos to biology, programming, writing, translating, and more is seeing AI advancing at the same incredible pace.

Why is all this happening now?

You may be familiar with the latest happenings in the world of AI. You’ve seen the award-winning artwork, heard the deceased person interviews, and read about breakthroughs in protein folding. But these new AI systems don’t just turn out cool demos in research labs. They are quickly transformed into practical tools and real commercial products that everyone can use.

There’s a reason all of this happened at the same time. The breakthroughs are all underpinned by a new class of AI models that are more flexible and powerful than anything that has come before. Because they were first used for language tasks such as answering questions and writing essays, they are often known as large language models (LLMs). OpenAI’s GPT3, Google’s BERT, etc. are all LLMs.

But these models are extremely flexible and adaptable. The same mathematical structures have been so useful in computer vision, biology, and more that some researchers have begun calling them “basic models” to better articulate their role in modern AI.

Where did these core patterns come from and how did they move beyond language to drive so much of what we see in AI today?

The foundation of foundation models

There is a holy trinity in machine learning: models, data and computation. Models are algorithms that take inputs and produce outputs. The data refers to the examples on which the algorithms are trained. To learn something, there must be enough data with sufficient richness for the algorithms to produce useful output. Models should be flexible enough to capture the complexity of the data. And finally, there must be enough computing power to run the algorithms.

The first modern AI revolution took place with deep learning in 2012, when solving computer vision problems with convolutional neural networks (CNNs) took off. CNNs are similar in structure to the visual cortex of the brain. They have been around since the 1990s but were not yet practical due to their intense computing power requirements.

In 2006, however, Nvidia released CUDA, a programming language that enabled the use of GPUs as general-purpose supercomputers. In 2009, Stanford AI researchers presented Imagenet, a collection of labeled images used to train computer vision algorithms. In 2012, AlexNet combined CNNs trained on GPUs with Imagenet data to create the best visual classifier the world had ever seen. Deep learning and AI exploded from there.

CNNs, the ImageNet dataset, and GPUs were a magical combination that enabled massive advances in computer vision. 2012 sparked a boom of excitement around deep learning and spawned entire industries, like those involved in autonomous driving. But we quickly realized that there were limits to this generation of deep learning. CNNs were great for vision, but other areas didn’t have their model breakthrough. A huge gap was in natural language processing (NLP), that is, getting computers to understand and work with normal human language rather than code.

The problem of understanding and working with language is fundamentally different from that of working with images. Language processing requires working with sequences of words, where order matters. A cat is a cat no matter where it is in an image, but there is a big difference between “this reader learns the AI” and “the AI learns this reader”.

Until recently, researchers relied on models such as recurrent neural networks (RNN) and long-short-term memory (LSTM) to process and analyze data over time. These models were good at recognizing short sequences, like words spoken from short sentences, but they struggled to handle longer sentences and paragraphs. The memory of these models was simply not sophisticated enough to capture the complexity and richness of ideas and concepts that arise when sentences are combined into paragraphs and essays. They were great for simple Siri and Alexa-style voice assistants, but not much else.

Getting the training data right was another challenge. ImageNet was a collection of one hundred thousand tagged images that required significant human effort to generate, mostly by graduate students and Amazon Mechanical Turk workers. And ImageNet was actually inspired and modeled after an older project called WordNet, which attempted to create a labeled dataset for English vocabulary. Although there is no shortage of text on the Internet, creating a meaningful data set to teach a computer to work with human language beyond individual words is extremely time consuming. And labels you create for one application on the same data may not apply to another task.

skorpii technology

The generative AI revolution has begun. How did we get there ?

Why is all this happening now?

The foundation of foundation models

Post a Comment

Infinix ZERO BOOK Laptops Launched with 12th Gen Intel Core H-Series Processors, 15.6" FHD Display, Starting at INR 49,990 • TechVorm

skorpii technology