How chatbots and large language models, or LLMs, actually work

In our second Five part seriesI will explain how the technology actually works.

The AI powering ChatGPT, Microsoft’s Bing chatbot, and Google Bard can conduct human-like conversations and write smooth, natural prose on an endless variety of topics. They can also perform complex tasks, from writing code to planning a children’s birthday party.

But how does all this work? To answer that, we need to take a peek at something called the big language model — the kind of artificial intelligence that drives these systems.

Language Large Models, or LLM, are relatively new to the AI scene. The first came out only about five years ago, and it wasn’t very good. But today they can draft emails, presentations, memos, and Your tutor in a foreign language. More potential is sure to emerge in the coming months and years, as technology improves and Silicon Valley speeds up to capitalize on it.

I’m going to walk you through setting up a large language model from scratch, keeping things simple and leaving out a lot of the hard math. Let’s pretend we’re trying to build an LLM to help you answer your emails. We’ll call it MailBot.

Step 1: Set a goal

Every AI system needs a goal. Researchers call this objective function. It can be simple – for example, “Win as many games of chess as possible” – or complex, such as “Predict the 3D shapes of proteins, using only their amino acid sequence.”

Most large language paradigms have the same basic objective function: given a sequence of text, guess what comes next. We’ll give MailBot more specific goals later, but let’s stick to that goal for now.

Step two: collect a lot of data

Next, we need to compile training data that will teach MailBot how to write. Ideally, we’d accumulate a huge repository of text, which usually means billions of pages pulled from the Internet — like blog posts, tweets, Wikipedia articles, and news stories.

To get started, we’ll use some free and publicly available data libraries, such as the Shared Web Data Crawl Repository. But we’ll also want to add our own secret sauce, in the form of proprietary or proprietary data. Perhaps we will license some foreign-language scripts, so that MailBot will learn to compose emails in French or Spanish as well as English. In general, the more data we have, and the more diverse the sources, the better our model.

Before we can enter data into our form, we need to break it down into units called tokens, which can be words, phrases, or even individual characters. Converting text into smaller fragments helps the form to parse it more easily.

Step 3: Build your neural network

Once our data has been converted into tokens, we need to put together the AI’s “brain” – a type of system known as a neural network. This is a complex network of interconnected nodes (or “neurons”) that process and store information.

For MailBot we will want to use a relatively new type of neural network known as a Transformers model. They can parse multiple pieces of text at the same time, which makes it faster and more efficient. (Transformer models are key to systems like ChatGPT — whose full acronym stands for “Generative Pretrained Transformer.”)

Step 4: Train your neural network

Then, the model will analyze the data, tokenize it, and identify patterns and relationships. You may often notice that “Dear” is followed by a name, or that “Best Regards” usually comes before your name. By identifying these patterns, the AI learns how to form messages that make sense.

The system also develops a sense of context. For example, he may learn that “bank” can refer to a financial institution or a side of a river, depending on the surrounding words.

As the Transformer model learns these patterns, it draws a map: a very complex mathematical representation of human language. It traces these relationships using numerical values known as border. Many of today’s best LLM’s have hundreds of billions of parameters or more.

Training could take days or even weeks, and would require an enormous amount of computing power. But once it’s done, it’ll be almost ready to start writing your emails.

Oddly enough, he might develop other skills as well. As LLMs learn to predict the next word in a sequence, over and over, they can pick up other unexpected abilities, such as knowing how to program. AI researchers call these behaviors emergent, and they are still sometimes baffled.

Step 5: Refine your model

Once a large language model is trained, it must be calibrated for a specific function. A chatbot used by a hospital may need to understand medical terminology, for example.

To tune the MailBot, we can ask it to generate a batch of emails, hire people to rate it based on accuracy, and then put the ratings back into the model until it gets better.

This is a rough approximation of the approach that was used with ChatGPT, which is known as Enhance learning with human feedback.

Step 6: Launch, Carefully

Congratulations! Once MailBot is trained and configured, it is ready to use. After you create some kind of user interface for it – like a Chrome extension that plugs into your email app – it can start sending emails.

But no matter how good it gets, you’ll still want to keep an eye on your new assistant. As companies like Microsoft and Meta have learned the hard way, AI systems can be erratic and unpredictable, or even scary and dangerous.

Tomorrow, we’ll hear more about how things can go wrong in unexpected and sometimes annoying ways.

Your homework

Let’s explore one of the LLM’s most creative abilities: the ability to combine disparate concepts and formats into something strange and new. For example, colleagues at Well asked ChatGPT to “write a song in Taylor Swift’s voice that uses themes from a Dr. Seuss book.”

For today’s homework, try mixing and matching format, style, and subject matter — like, “Write a Snoop Dogg-style limerick about global warming.”

Don’t forget to share your creation as a comment.

a test

Question 1 of 3

What is the basic objective functionality of large language models like ChatGPT?

Start the test by choosing your answer.

Glossary

transformer model: A useful neural network architecture for language understanding, which does not have to parse words one by one but can look at an entire sentence at once. A technique called self-attention allows the model to focus on specific words that are important in understanding the meaning of a sentence.
border: Numerical values that define the structure and behavior of a large language model, such as clues that help it guess which words will come next. Modern systems such as GPT-4 are believed to contain hundreds of billions of parameters.
Learning reinforcement: A technique that teaches an artificial intelligence model to find the best outcome by trial and error, and receive rewards or penalties from an algorithm based on its results. This system can be improved by giving the human feedback on its performance.

Click here for more terms.

Izer

Step 1: Set a goal

Step two: collect a lot of data

A new generation of chatbots

Step 3: Build your neural network

Step 4: Train your neural network

Step 5: Refine your model

Step 6: Launch, Carefully

Your homework

a test

What is the basic objective functionality of large language models like ChatGPT?

Glossary

JPMorgan expects the Fed to cut its benchmark interest rate by 100 basis points this year

Shares of AI chip giant Nvidia fall despite record $30 billion in sales

Nasdaq falls as investors await Nvidia earnings

How Google’s New Gemini Gems AI Experts Can Boost SEO

Journalists convicted in Hong Kong sedition case

More than 200 former Republican aides back Kamala Harris | US Election 2024

JPMorgan expects the Fed to cut its benchmark interest rate by 100 basis points this year

Step 1: Set a goal

Step two: collect a lot of data

A new generation of chatbots

Step 3: Build your neural network

Step 4: Train your neural network

Step 5: Refine your model

Step 6: Launch, Carefully

Your homework

a test

What is the basic objective functionality of large language models like ChatGPT?

Glossary

Leave a Reply Cancel reply

More Stories

JPMorgan expects the Fed to cut its benchmark interest rate by 100 basis points this year

Shares of AI chip giant Nvidia fall despite record $30 billion in sales

Nasdaq falls as investors await Nvidia earnings

You may have missed

How Google’s New Gemini Gems AI Experts Can Boost SEO

Journalists convicted in Hong Kong sedition case

More than 200 former Republican aides back Kamala Harris | US Election 2024

JPMorgan expects the Fed to cut its benchmark interest rate by 100 basis points this year