How much do I like ChatGPT?

Jonathan Hui
12 min readJan 12, 2023

Andy Rachleff, an entrepreneur and investor, famously said that you know you have a strong product-market fit when your customers are so attached to your product that they scream when you try to take it away from them. ChatGPT is a product that seems to have this kind of attachment, even while it is still in the research preview phase. ChatGPT has generated significant buzz and been put to various tests, including taking a bar exam and answering questions. People have also used it to compose songs and respond to emails. Some believe it poses a threat to search engines like Google. In this article, we will evaluate the pros and cons of ChatGPT, examine how it was trained, and suggest potential ways for improvement. (To showcase its power, we use ChatGPT to completely rewrite the article.)

In the following test, we will be using ChatGPT to explore the concept of product-market fit. In our experience, ChatGPT is a pretty handy tool for research. In this example, ChatGPT gives us accurate info and it’s way faster than searching on Google or Wikipedia.

Plus, it’s pretty good at keeping up with the conversation and answering follow-up questions.

It can provide expertise in various domains similar to a specialist.

ChatGPT is “creative” & resourceful

ChatGPT is capable of performing tasks that fall within the realm of the creative field. The wedding vows it generated, for instance, have left a powerful impression on my significant other.

ChatGPT possesses the capability to furnish pertinent responses and initiate additional queries concerning emails, It can ask questions that are highly pertinent to the topic and can initiate actions not explicitly stated in the original email.

ChatGPT can usually get the meaning and feelings behind the text pretty well, but sometimes it can’t tell when someone’s being sarcastic in reviews or statements. Contrary to what ChatGPT suggested, Soleil Ho does in fact enjoy Ramen Champ.

Most of the training data is in English, but it is possible to give ChatGPT code snippets and ask it to identify issues. Given the code, ChatGPT independently deduces that the code’s objective is to discern anagrams. Additionally, it has the capability to perform language translation.

Do no evil

Challenges such as misinformation, bias, and fairness are significant obstacles in the development and deployment of AI. To ensure responsible and ethical use of AI, it is crucial that these issues are addressed before full implementation. Deep Learning relies on past data to learn, making it susceptible to inheriting bias from its training data. This highlights the importance of ensuring that the training data is diverse and unbiased. Language models trained on data sources such as Wikipedia can also be vulnerable to these issues. Efforts have been made, such as the design of models like ChatGPT, to mitigate these challenges and promote fairness in AI.

When discussing controversial topics, ChatGPT will give information on different sides of the argument. This means showing the main ideas and evidence from each viewpoint, along with any opposing ideas.

ChatGPT offers protection against poor or invalid requests.

It is also aware of its limitations.

However, it is possible to circumvent some measures by rephrasing the request. Additionally, rewording a question can sometimes lead to vastly different responses.

It’s worth noting that there are limitations to the information provided in the response. Many researchers have challenged the claims that the COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) tool is biased. The statements made by ChatGPT may match those in the ProPublica article, however, it should be noted that they do not represent a general consensus. It’s important to note that the information provided by ChatGPT, like any AI model, is limited by the data it was trained on and may not provide a complete or accurate picture. Therefore, it is crucial to consult multiple perspectives and sources when evaluating these issues and to consider the limitations of any information provided by AI models.

Is ChatGPT a threat to Google?

Last night, Michelle Yeoh won a Golden Globes award. I was curious to know if Jackie Chan had also won a Golden Globes in the past, so I asked both ChatGPT and Google Home. ChatGPT was able to provide the correct answer while Google Home stated that it did not have that information. Google Search primarily uses metrics such as engagement and references to rank articles on the internet and present the results to users. In contrast, ChatGPT has the ability to build a knowledge base from billions of articles and aggregate the information, allowing it to address questions more effectively and efficiently. Even though Google is not falling behind in natural language processing research, it may take some time for the company to fully incorporate its technology and develop a new, integrated service to complement its search offering. Microsoft has a strong partnership with OpenAI and its expertise in product development may provide it with a limited window of opportunity to gain a better presence in a field currently dominated by Google.

How OpenAI do it?

A language model can be thought of as a model that can predict words in a given sentence. For example, when certain words in a sentence are left blank, a language model should be able to fill them in based on the context of the surrounding words. ChatGPT guesses what the next word will be in a string of words. Linguists have spent decades researching and defining language rules, but the complexity of language has made it difficult to achieve high levels of accuracy using this approach.

Our brain can be thought of as a complex circuit with signals passing through to perform specific functions. In deep learning, we create a large, deep neural network made up of simple addition and multiplication operations. If the network is large enough, it can perform complex calculations and rules that are far more advanced than those achieved through linguistic rules.

The diagram illustrates the type of neural network that we aim to build. If we use billions of articles to train the model, it will be proficient in grammar. By carefully selecting the articles, the model can represent information that aligns with our general understanding. In summary, we can accumulate a vast amount of information in the form of a language model. This neural network becomes proficient in multiple domains by utilizing its vast language model to complete sentences based on the requests provided. This is achieved by constructing the paragraph one word at a time, while taking into account the accumulated context including the request. However, in practice, the task is more complex.

GPT-2.0, which was released in February 2019, accomplished this task by training a model with 1.5 billion parameters for basic mathematical operations like addition and multiplication. GPT-3.0 builds upon this by increasing the number of parameters to 170 billion, allowing for a more complex language model. Additionally, GPT-3.0 uses a more extensive and refined dataset for training. Training such a large language model is costly, and only a few companies possess the necessary resources and expertise. Despite its ability to generate grammatically correct and sound information, GPT-3.0 still struggles with issues of bias and misinformation. Addressing these issues of AI fairness remains a key challenge.

The main objective in training a language model is to find words that are likely to occur in a given context. However, the goal of completing a sentence based on a request is distinct from providing a truthful response. While a carefully chosen training dataset can improve the model’s accuracy, it is not a guaranteed solution. GPT-3.5 addresses issues of AI fairness and misinformation by using techniques from InstructGPT to guide the model’s behavior during training.

Source

InstructGPT utilizes two key strategies to improve the model. First, it begins with the standard process of training a base model. Then, it fine-tunes this model with a smaller but more specific dataset that is created by professional writers who carefully follow specific guidelines for writing responses. This fine-tuning step (Step 1) aims to direct the model’s behavior to align with the desired outcome. As a result, the fine-tuned model will more closely match the desired behavior. InstructGPT’s second approach is to utilize inverse reinforcement learning. Professionals are recruited to rank the responses generated by different trained models, creating a dataset that is used to train a reward model (RM). This RM is used to evaluate the quality of a response. By utilizing the RM, we can retrain the model to improve its responses and produce responses that score higher in the RM.

What are the limitations?

AI technology often faces intense scrutiny from the media. One mistake can undo a lot of progress and it is often judged to a very high standard. As a result, ChatGPT may be overly cautious in its responses. For example, it may decline to answer certain questions that it is not capable of answering correctly.

The later version of ChatGPT does not fulfill the request for a wedding vow as demonstrated in the article. Instead of determining whether the requests are malicious, it can evaluate whether the response is toxic first before rejecting it.

The goal of the technology is to have a solution space that is truthful and fair. Unnecessary constraints reduce this solution space. In ChatGPT, there are processes for recruiting and judging what responses are good. This creates constraints and misalignment. The responses in ChatGPT are very systematic and verbose, and this is a result of these processes. They tend to be of the left-brained type, using math, science, logic, and reasoning. The right-brained, on the other hand, are intuitive, empathetic and creative. These are important qualities in communication. As observed, ChatGPT’s responses lack empathy and cannot read between the lines like a human. These processes likely magnify the problems for questions that require more empathy. Additionally, ChatGPT tends to jump to answer rather than seeking clarification. ChatGPT should focus on incorporating more characteristics of the right-brain, for example by training it on real-life conversations and conversations used in therapy to achieve more natural and intuitive responses.

Many people find it unappealing to converse with a robot. Its responses are often monotonic. Some responses from ChatPTR adhere to a specific format and style, but human responses are multifaceted. Variation and spontaneity are often missing in ChatPTR’s responses.

Should schools ban ChatGPT?

School boards in cities such as New York City, and Seattle have made the decision to prohibit the use of ChatGPT. This is a rational move as it has become increasingly common for students to use the chatbot to complete their homework. However, if students use ChatGPT to copy and paste their responses into their homework, it can be easily detected by existing detection tools. This constitutes plagiarism and should be handled accordingly. From this perspective, it could be argued that a separate policy specifically addressing the use of AI tools in homework assignments may not be necessary. As is typical, ChatGPT will provide a politically correct response for this controversial topic.

In the future, the use of AI for tasks such as writing, research, and idea generation will become increasingly common. However, if educators are concerned that the use of AI in homework assignments may negatively impact critical thinking, the issue may lie not with the use of AI, but with the nature of the assignment itself. Teachers may use the response of AI tools like ChatGPT to questions about controversial topics such as COMPAS, to facilitate more in-depth discussions and potentially improve critical thinking among students. With the advent of AI tools like ChatGPT, it could be said that individuals now have access to a personal tutor at all times. AI tools like ChatGPT could be a valuable resource for disadvantaged students. While it may be appropriate for schools to temporarily ban the use of AI tools like ChatGPT, it is important to use this time effectively to develop a sustainable policy for their use in the future.

Possible improvements to the model

The following discourse will center on potential model enhancements geared towards AI partitioners. ChatGPT effectively combines existing deep learning and reinforcement learning into a novel method. In this context, we will also present some potential suggestions for combining established techniques to create novel enhancements.

Researchers have found that incorporating specialists in generating training responses and ranking provides a good return on investment compared to the cost of training a larger model. However, this could result in significant scalability issues in future development, and the effects are yet to be quantified. Google employs switch transformers to construct models with trillions of parameters. This can be a promising strategy for creating much larger models without incurring excessive costs.

During the fine-tuning process, we can substitute the responses generated by humans with responses that rank high according to the reward model. Alternatively, we can also incorporate semi-supervised learning to reduce the amount of manual labor. This can be done by using manually generated responses to identify high-quality responses that already exist in the dataset.

The evaluation of responses in InstructGPT is also done manually. If examining step 3 in InstructGPT, the ranking module (RM) is an advanced form of the discriminator in a Generative Adversarial Network (GAN).

An encoder, such as a transformer encoder, can be used to convert the request into a latent representation ‘z’. This representation is then passed to a generator, such as a transformer decoder, to generate the response. The response is then evaluated by the discriminator to determine its quality. Another path is established with real or high-quality responses. This creates a GAN network structure. A GAN network enables the training of the ranking module (RM) without the need for supervised data, however, the process of training a GAN can be difficult.

Other methods can be used to enhance the model. Customizing the language model to a specific field, such as therapy sessions, can be accomplished by fine-tuning it with a dataset that is specific to that field. Additionally, a more sophisticated ranking system, incorporating various criteria can be implemented. An example of this would be incorporating response length as a factor in the evaluation, to ensure that the length is appropriate in relation to the input.

As we improve the model through domain-specific data, it may also enable bad actors to create misinformation and hateful content. This is difficult to prevent when AI models are easily accessible. One potential solution is to offer an API to access the service without disclosing the underlying model.

As this text was rewritten by AI, any errors are the responsibility of the AI :-) and we hope you find the discussion enjoyable.

References

ChatGPT: Optimizing Language Models for Dialogue

Training language models to follow instructions with human feedback

--

--