Artificial Intelligence

Published: 26 mei 2023

How ChatGPT Works: A Deep Dive Into Its Innovative Technology

You’ve no doubt read about OpenAI’s latest groundbreaking technology and how it’s been whipping the internet into a frenzy. A a powerful AI model that understands language, generates human-like responses, and continuously improves through feedback. But how does it actually work?

Logo of Chat GPT in Purple

In this article, we’ll examine some of the science behind this sophisticated AI to let you know how ChatGPT actually works and tackles a big Range of Tasks.

Don’t worry, we’ll keep it as human-friendly as possible!

TABLE OF CONTENT

1. What does chatGPT stand for? 

2. Large Language Model: what does it mean? 

3. Supervised vs. unsupervisd learning in AI 

4. RHLF model: what is it? 

5. How chatGPT works in practice

6. FAQs

 

What Does ChatGPT Stand For?

To truly comprehend the intricacies of ChatGPT, we must first uncover the meaning behind its name. So, let’s dive into the acronym and decode it together.

  • The “Chat” component refers to the chatbot itself – the virtual assistant responding to your input sentence.
  • The “Generative” component refers to the AI’s ability to generate natural-sounding human text.
  • The “Pre-trained” component refers to all of the text datasets that have been fed into the model.

    In ChatGPT’s case, this is reportedly around 45TB worth of data, including books and texts, which roughly equates to one million feet of bookshelf space.

    Think about how you might study for a test by reading a few books before sitting an exam, the “training” is kind of like that.
  • The “Transformer” component refers to the machine learning architecture that the model is based on. Transformers use previous data (inputs) to understand the context and then make predictions for the output based on this.

When we bring this all together we get ChatGPT.

Now, it’s worth noting that all of this fancy technology was implemented in GPT-3, the model that ChatGPT is built on.

The evolutionary jump that separates Chat from its predecessors is the inclusion of Reinforcement Learning from Human Feedback (RLHF) combined with Supervised Learning (more on these later).

chatgpt explanation

ChatGPT is a Large Language Model, What Does That Mean?

Large Language Models (LLMs) at their very basic level are fed massive amounts of data, passed through the transformer architecture with the final goal of predicting what word will come next in a sequence of words.

The sophistication and accuracy of these predictions are influenced by how much data, and how many parameters (the number of factors it considers before making a decision) the model has.

Pretty much the same as how humans think!

How many parameters does ChatGPT have we hear you ask… 1.3 billion.

The vast amount of data and parameters the model is “trained” with is what makes its computational potency so incredible.

It knows a lot and therefore can produce more complex and accurate answers to user prompts.

What this means for ChatGPT users is that when they ask the AI to do something, ChatGPT will answer your query with an unprecedented level of detail that reads like natural human language.

Whether that’s answering a question, writing code, composing a marketing email headline or a romantic poem.  

Supervised vs Unsupervised learning in AI

As we established earlier, ChatGPT is an iteration of the original GPT-3 model which is a fine-tuned version of previous models.

What made GPT-3 model and similar AI models so successful was how they combined supervised learning process and reinforcement learning with human feedback.

When we talk about AI pre-training process, it refers to these two common approaches: supervised and unsupervised learning.

Supervised learning refers to the process of training a model using training datasets that map an input to a corresponding output. Humans are involved in this process to label data.

Because the data is labelled, the model understands that specific prompts link to specific responses.

All of this data forms a neural network architecture which acts like our human brain, but made of artificial neurons.

Within the network are all of the digital pathways that an AI travels across to find answers to questions.

Let’s use a customer service chatbot as an example of supervised learning in action:

When you visit a website, give it a few seconds and a chatbox will pop up in the corner of the screen.

These helpful chatbots are used to save the company time and resources while helping customers get answers quickly and resolve a wide range of issues.

When you engage in a conversation with the chatbot, you’re witnessing supervised learning in real-time.

You ask a question (human input) and the chatbot gives relevant response (output) using the data it’s been trained with.

For example, you might ask:

– “What are your returns policies?” and the chatbot will respond with something like

– “We operate a 30-day no-hassle returns policy. You can find out more about how to return your purchase using this link.”

two droids doing homework

The chatbot has been taught that ‘X’ question equals ‘X’ answer.

Now imagine this principle being scaled to millions of inputs and outputs.

It would be a colossal undertaking for a human to train an AI on every conceivable input and outcome that may arise from a user prompt.

That’s where unsupervised learning comes in and also what makes ChatGPT so special.

Unsupervised learning refers to training a model on datasets with no specific output corresponding to the input. No human is involved in data labeling.

Instead, the model tries to recognise patterns in the input data to provide a contextualised response that mimics human-like responses.

ChatGPT is essentially predicting what the answer should be based on all of the data it has been trained on.

Now, the obvious criticism and one that OpenAI, the company behind it openly admit, is that ChatGPT will give sometimes inaccurate, biased or nonsensical responses.

To combat this and further refine the model, OpenAI uses reinforcement learning with human feedback.

Let’s explain RHLF models: what are they?

RLHF is basically like having someone give you feedback on your work until you get it right.

If we apply this to the pre – training stage of an AI like ChatGPT here’s what happens:

  • The human AI trainers provide both sides of the conversation as the user and the AI assistant.
  • They then use model-written suggestions (from the AI) to compose their responses.
  • The resulting dialogue dataset is then combined with the existing datasets to create a new dialogue format.

OpenAI then pushed this one step further by implementing a reward system for its Artificial Intelligence.

This way it could learn how to weigh its responses and choose the most appropriate answer.

 To create a reward model for reinforcement learning, we needed to collect comparison data, which consisted of two or more model responses ranked by quality.

To collect this data, we took conversations that AI trainers had with the chatbot.

We randomly selected a model-written message, sampled several alternative completions, and had AI trainers rank them.

Using these reward models, we can fine-tune the model using Proximal Policy Optimization. We performed several iterations of this process.”

What this all boils down to is the fact that ChatGPT is essentially using a sort of computer logic to determine an answer.

If the input from a user is false it can flag that and dispute the information being asked of it to create a coherent response:

ChatGPT can also reject toxic queries that are inappropriate or offensive:

By combing supervised learning (pumping all of the information in) and RLHF (feedback and reward) ChatGPT is trained to generate human-sounding responses that are both coherent and accurate a lot of the time.

How ChatGPT Works in Practice

So now you understand a bit about what powers ChatGPT but how does this translate into a real-world scenario?

Let’s say you’re shopping for a new car and want to find the most efficient vehicle, you might ask this:

As you can see, ChatGPT has gathered online information to direct its response but can’t accurately give a definitive answer.

Instead, it uses inference to identify patterns in the user query and language to predict a response that sounds strikingly nuanced and human.

And it’s in these predictions that the magic happens. Through a combination of natural language processing and machine learning algorithms, ChatGPT is utilising all of its “training” to give the best possible outcome for your query.

ChatGPT: Aligning Human Language with Machine Thinking

ChatGPT is a marvellous technological feat. While its applications are still being explored, there is much to be excited about in terms of how AI can assist humans in various tasks.

Aside from any malicious intentions, ChatGPT and AI in general have immense value for society, ranging from healthcare assistance to revolutionizing education and enhancing productivity. The possibilities are endless.

But that’s thinking ahead. What we have now is a very clever chatbot that takes everything it has learned to produce answers when prompted, and for now, that’s good enough.

FAQ's

If you like this topic, take a look at our Certificates: