How ChatGPT actually works (and why it's been so game-changing)

3 months ago 160

Back in the day (and by "in the day," I mean late 2022, before AI chatbots exploded on the scene), tools like Google and Wolfram Alpha interacted with users via a single-line text entry field and provided text results. Google returned search results -- a list of web pages and articles that would (hopefully) provide information related to the search queries. Wolfram Alpha generally provided answers that were mathematical and data analysis-related.

ChatGPT, by contrast, provides a response based on the context and intent behind a user's question. Google, of course, has changed up its response mode. It now provides AI-based responses before search results, and it's likely to continue to do so. Wolfram Alpha, on the other hand, uses AI behind the scenes to help it with its calculations but does not provide AI-based answers.

Also: How to use ChatGPT: A beginner's guide to the most popular AI chatbot

Fundamentally, Google's searching power is its ability to do enormous database lookups and provide a series of matches. Wolfram Alpha's power is its ability to parse data-related questions and perform calculations.

ChatGPT's power (and that of almost any other AI chatbot, like Claude, Copilot, Perplexity, and Google Gemini) is the ability to parse queries and produce fully fleshed-out answers and results based on most of the world's digitally accessible text-based information. Some chatbots have restrictions based on when they stopped scanning information, but most can now access the live Internet to factor current data into their answers.

In this article, we'll see how ChatGPT can produce those fully fleshed-out answers using a technology called generative artificial intelligence. We'll start by looking at the main phases of ChatGPT operation, then cover some core AI architecture components that make it all work.

The two main phases of ChatGPT operation

Let's use Google Search (as distinguished from Google Gemini AI) as an analogy again. When you ask Google Search to look up something, you probably know that it doesn't -- at the moment you ask -- go out and scour the entire web for answers. Instead, Google searches its database for pages that match that request. Google search has two main phases: the spidering and data-gathering phase, and the user interaction/lookup phase.

Also: The best AI chatbots: ChatGPT and other fun alternatives to try

Roughly speaking, ChatGPT and the other AI chatbots work the same way. The data-gathering phase is called pre-training, while the user responsiveness phase is known as inference. The magic behind generative AI and the reason it has exploded is that the way pre-training works has proven to be enormously scalable. That scalability has been made possible by recent innovations in affordable hardware technology and cloud computing.

How pre-training AI works

Generally speaking (because getting into specifics would take volumes), AIs pre-train using two principal approaches: supervised and non-supervised. Most AI projects until the current crop of generative AI systems like ChatGPT used the supervised approach.

Also: How to make ChatGPT provide sources and citations

Supervised pre-training is a process where a model is trained on a labeled dataset, where each input is associated with a corresponding output.

For example, an AI could be trained on a dataset of customer service conversations, where the user's questions and complaints are labeled with the appropriate responses from the customer service representative. To train the AI, questions like, "How can I reset my password?" would be provided as user input, and answers like, "You can reset your password by visiting the account settings page on our website and following the prompts," would be provided as output.

In a supervised training approach, the overall model is trained to learn a mapping function that can map inputs to outputs accurately. This process is often used in supervised learning tasks, such as classification, regression, and sequence labeling.

As you might imagine, there are limits to how this can scale. Human trainers would have to go pretty far in anticipating all the inputs and outputs. Training could take a very long time and be limited in subject matter expertise.

Also: My two favorite ChatGPT Plus features and the remarkable things I can do with them

But as we've come to realize, ChatGPT has very few limits in subject matter expertise. You can ask it to write a resume for the character Chief Miles O'Brien from Star Trek, have it explain quantum physics, write a piece of code, produce a short piece of fiction, and compare the governing styles of former presidents of the United States.

It would be impossible to anticipate all the questions that would ever be asked, so there is no way that ChatGPT could have been trained with a supervised model. Instead, ChatGPT uses non-supervised pre-training -- and this is the game-changer.

Non-supervised pre-training is the process by which a model is trained on data where no specific output is associated with each input. Instead, the model is trained to learn the underlying structure and patterns in the input data without any task in mind. This process is often used in unsupervised learning tasks, such as clustering, anomaly detection, and dimensionality reduction. In language modeling, non-supervised pre-training can train a model to understand the syntax and semantics of natural language so the model can generate coherent and meaningful text in a conversational context.

Also: Is ChatGPT Plus really worth $20 when the free version offers so many premium features?

It's here where ChatGPT's apparently limitless knowledge becomes possible. Because the developers don't need to know the outputs that come from the inputs, all they have to do is dump more and more information into the ChatGPT pre-training mechanism, which is called transformer-based language modeling.

Also: How AI companies are secretly collecting training data from the web (and why it matters)

It's also here, in the dumping of data into the AI, that modern chatbot makers have started to find themselves in trouble. AI companies have been training their AIs on copyrighted information from other companies without permission. In fact, some publishers, like Ziff Davis (ZDNET's parent company) and the New York Times, are suing OpenAI for copyright infringement. You've probably seen the disclaimer on ZDNET that says, "Disclosure: Ziff Davis, ZDNET's parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems."

This universal training approach does make the chatbots more capable. But the side effect is they are taking traffic away from the companies and writers who wrote the original content. Expect this aspect of generative AI to be fought in the courts for years to come.

But this article is about technology, so let's move on to a key technology that makes generative AI possible...

Transformer architecture

Transformer architecture is a type of neural network that is used for processing natural language data. A neural network simulates how a human brain works by processing information through layers of interconnected nodes. You can think of a neural network like a hockey team. Each player has a role, but they pass the puck back and forth among players with specific positions, all working together to score the goal.

The transformer architecture processes sequences of words by using "self-attention" to weigh the importance of different words in a sequence when making predictions. Self-attention is similar to how a reader might look back at a previous sentence or paragraph for the context needed to understand a new word in a book. The transformer looks at all the words in a sequence to understand the context and the relationships between them.

Also: How I used ChatGPT to quickly fix a critical plugin - without touching a line of code

The transformer is made up of several layers, each with multiple sub-layers. The two main sub-layers are the self-attention layer and the feedforward layer. The self-attention layer computes the importance of each word in the sequence, while the feedforward layer applies non-linear transformations to the input data. These layers help the transformer learn and understand the relationships between the words in a sequence.

During training, the transformer is given input data, such as a sentence, and is asked to make a prediction based on that input. The model is updated based on how well its prediction matches the actual output. Through this process, the transformer learns to understand the context and relationships between words in a sequence, making it a powerful tool for natural language processing tasks such as language translation and text generation.

One thing to remember is that there are issues around the potential for these models to generate harmful or biased content, as they may learn patterns and biases present in the training data. The companies implementing these models are trying to provide "guard rails," but those guard rails may themselves cause issues. Those concerns are because different people have different perspectives. An attempt to prevent bias based on one school of thought may be claimed as bias by another school of thought. This situation makes the design of a universal chatbot difficult because society is complex.

Also: 7 advanced ChatGPT prompt-writing tips you need to know

Let's discuss the data that gets fed into ChatGPT first, and then the user-interaction phase of ChatGPT and natural language.

ChatGPT's training datasets

The dataset used to train ChatGPT is huge. ChatGPT is based on something called a large language model, or LLM. Let's take a moment to clarify chatbot vs. LLM. A chatbot is essentially an app with a user interface. It takes in questions or prompts, feeds those to an LLM, and then retrieves the answers, formats them, and presents them to a user. Essentially, a chatbot is a UI shell. It's the LLM that provides the AI capability itself.

LLMs come in a wide variety of names and versions. Right now, the main ChatGPT LLM is GPT-4o. When ChatGPT burst onto the scene back in early 2023, the LLM was GPT-3. There are some LLMs, like OpenAI's o3, that spend more time reasoning, while others are better at interacting with human communication styles. Over time, the LLMs get better, and as a result, the chatbots themselves get more capable as well.

GPT is an acronym that covers three areas: it's generative (G), meaning it generates results; it's pre-trained (P), meaning it's based on all the data it ingests; and it uses the transformer architecture (T), which weighs text inputs to understand context.

GPT-3 was trained on a dataset called WebText2, a library of over 45 terabytes of text data. When you can buy a 16-terabyte hard drive for under $300, a 45-terabyte corpus may not seem that large. But text takes up a lot less storage space than pictures or video.

Also: How to subscribe to ChatGPT Plus (and 7 reasons why you should)

This massive amount of data allowed ChatGPT to learn patterns and relationships between words and phrases in natural language at an unprecedented scale, which is one of the reasons why it is so effective at generating coherent and contextually relevant responses to user queries.

While ChatGPT is based on the GPT architecture, it has been fine-tuned on multiple datasets and optimized for conversational use cases. This process allows it to provide a more personalized and engaging experience for users who interact with the technology via a chat interface.

For example, OpenAI (developers of ChatGPT) has released a dataset called Persona-Chat that is specifically designed for training conversational AI models like ChatGPT. This dataset consists of over 160,000 dialogues between two human participants, with each participant assigned a unique persona that describes their background, interests, and personality. This process allows ChatGPT to learn how to generate responses that are personalized to the specific context of the conversation.

Cornell Movie Dialogs Corpus: A dataset containing conversations between characters in movie scripts. It includes over 200,000 conversational exchanges between more than 10,000 movie character pairs, covering diverse topics and genres.
Ubuntu Dialogue Corpus: A collection of multi-turn dialogues between users seeking technical support and the Ubuntu community support team. It contains over one million dialogues, making it one of the largest publicly available datasets for research on dialog systems.
DailyDialog: A collection of human-to-human dialogues on multiple topics, ranging from daily life conversations to discussions about social issues. Each dialogue in the dataset consists of several turns and is labeled with a set of emotion, sentiment, and topic information.

In addition to these datasets, ChatGPT was trained on lots of unstructured data found on the internet, including websites, books, and other text sources. This allowed ChatGPT to learn about the structure and patterns of language in a more general sense, which could then be fine-tuned for specific applications like dialogue management or sentiment analysis.

ChatGPT is a distinct model trained using a similar approach to the GPT series but with some differences in architecture and training data.

Also: The best AI image generators of 2025: Gemini, ChatGPT, Midjourney, and more

Overall, the training data used to fine-tune ChatGPT is typically conversational and specifically curated to include dialogues between humans, allowing ChatGPT to learn how to generate natural and engaging responses in a conversational format.

Here's how to think of ChatGPT's unsupervised training: it was fed a lot of data and left to its own devices to find patterns and make sense of it all. This mechanism allowed the new generative AI systems to scale up so quickly.

While the pre-training process does the heavy lifting for ChatGPT's generative AI, the technology also has to understand questions and construct answers from data. That part is done by the inference phase, which consists of natural language processing and dialogue management.

What about human involvement in pre-training?

Despite the inherent scalability of non-supervised pre-training, there is some evidence that human assistance may have been involved in the preparation of ChatGPT for public use.

The big reveal was in an article in TIME Magazine that discussed human "data labelers" earning between $1.32 and $2/hour in Kenya. According to the TIME report, it was the responsibility of these workers to scan horrifying and sexually explicit internet content to flag it for ChatGPT training.

Also: Your next job? Managing a fleet of AI agents

Another article, posted in Martechpost, an AI newsletter, states that the large language model was trained using a process called Reinforcement Learning from Human Feedback (RLHF): "The training process involved a fine-tuned initial model using supervised learning, with human trainers playing both the role of the user and an AI assistant."

There are some subtleties around what the word "training" means. According to ChatGPT itself, "OpenAI did not use reinforcement learning with human feedback to train me. Instead, I was pre-trained using a combination of unsupervised and supervised learning techniques, such as language modeling, auto-encoding, and sequence prediction. My training involved processing massive amounts of text data from the internet, which allowed me to learn patterns and relationships between words and phrases."

The AI continued, "However, once I was pre-trained, researchers and developers can use reinforcement learning with human feedback to fine-tune me for specific tasks or domains, such as answering questions or generating text. In these cases, humans can provide feedback in the form of rewards or penalties, which can be used to update my parameters and improve my performance on similar tasks in the future."

Also: How I used ChatGPT and AI art tools to launch my Etsy business fast

This answer seems to fit with the Marktechpost and TIME reports, in that the initial pre-training was non-supervised, allowing a tremendous amount of data to be fed into the system. But in building the dialogue responses that communicate with users (more on that below), the response engines were apparently trained both on the types of responses and to filter out inappropriate material—and that training seems to have been human-assisted.

Natural language processing

Natural language processing (NLP) focuses on enabling computers to understand, interpret, and generate human language. With the exponential growth of digital data and the increasing use of natural language interfaces, NLP has become a crucial technology for many businesses.

NLP technologies can be used for many applications, including sentiment analysis, chatbots, speech recognition, and translation. By leveraging NLP, businesses can automate tasks, improve customer service, and gain valuable insights from customer feedback and social media posts.

Also: How to write better ChatGPT prompts

One of the key challenges in implementing NLP is dealing with the complexity and ambiguity of human language. NLP algorithms need to be trained on large amounts of data to recognize patterns and learn the nuances of language. They also need to be continually refined and updated to keep up with changes in language use and context.

The technology works by breaking down language inputs, such as sentences or paragraphs, into smaller components and analyzing their meanings and relationships to generate insights or responses. NLP technologies use multiple techniques, including statistical modeling, machine learning, and deep learning, to recognize patterns and learn from large amounts of data to accurately interpret and generate language.

Dialogue management

You may have noticed that ChatGPT can ask follow-up questions to clarify your intent or better understand your needs, and provide personalized responses that consider the entire conversation history.

This approach is how ChatGPT can have multi-turn conversations with users that feel natural and engaging. The process involves using algorithms and machine learning techniques to understand the context of a conversation and maintain it over multiple exchanges with the user.

Also: How to use ChatGPT to write code - and my top trick for debugging what it generates

Dialogue management is an important aspect of natural language processing because it allows computer programs to interact with people in a way that feels more like a conversation than a series of one-off interactions. This approach can help build trust and engagement with users and lead to better outcomes for both the user and the organization using the program.

Marketers, of course, want to expand how trust is built up, but this is also an area that could prove scary because it's one way an AI might be able to manipulate the people who use it.

A look inside the hardware that runs ChatGPT

Microsoft released a video that discusses how Azure is used to create a network to run all the computation and storage required by ChatGPT. It's a fascinating watch for its discussion of Azure and how AI is architected in real hardware.

Traditional chatbots operate on predefined rules and decision trees, responding to specific user inputs with predetermined answers. ChatGPT, on the other hand, utilizes generative AI, allowing it to produce unique responses by understanding context and intent, making interactions more dynamic and human-like.

Non-supervised pre-training allows AI models to learn from vast amounts of unlabeled data. This approach helps the model grasp the nuances of language without being restricted to specific tasks, enabling it to generate more diverse and contextually relevant responses.

Yes. ChatGPT relies on the data it was trained on, which means it might not always have information on recent topics or niche subjects. Additionally, its responses are generated based on patterns in the data, so it might occasionally produce factually incorrect answers or lack context. Plus, the data it's trained on may be wrong or even weaponized to be outright misleading.

And now you know

Even though we're over 3,200 words, this is still a rudimentary overview of all that happens inside ChatGPT. That said, perhaps now you understand more about why this technology has exploded over the past few years. The key to success is that the data itself isn't "supervised" and the AI can take what it's been fed and make sense of it.

Also: 6 new ways ChatGPT Projects supercharges your AI chats - how to try it

What do you think? Are you using ChatGPT? What questions do you still have about how it works? Share your opinions with us in the comments below.

You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.

Want more stories about AI? Sign up for Innovation, our weekly newsletter.

Read Entire Article