Member-only story

ChatGPT — Search Accuracy, Hallucinations, and Prompt Engineering

David Such
10 min readMar 18, 2023

--

With OpenAI releasing the latest iteration of its large language model, GPT-4 (Generative Pre-trained Transformer 4), we thought we would take it’s free little brother for a spin. ChatGPT currently uses GPT-3.5, a smaller and faster version of the GPT-3 model.

In this article we will investigate using large language models (LLMs) for search applications, illustrate some of the issues with this including hallucinations, and finally will explain how you can use prompt engineering to fine tune answer style, context and content. In part 2 of this series, we will explore using LLMs to develop software.

A large language model is a type of machine learning model that is trained on a huge body of data to generate outputs for various language processing tasks, such as text generation, question answering, and machine translation. ChatGPT is trained using Reinforcement Learning from Human Feedback (RLHF).

Source: DALL-E, Prompt: ChatGPT and artificial intelligence replacing programmers, digital art

We love the progress that is being made in Machine Learning (ML) and predictive Large Language Models (LLMs), but believe users need to be cautious in their application and aware of the pitfalls. LLMs are being spruiked as a replacement for search engines (*cough* Microsoft *cough*), but until they stop hallucinating and making up facts, caution is warranted. It is interesting that Google have not released their LLM Bard for general consumption yet. We suspect it is because they haven’t stopped the hallucinations either.

We do find LLMs useful for idea generation, image construction (we can’t draw), and boiler plate code. There is a lot of breathless commentary about ChatGPT replacing programmers, which is definitely not the case. While you can use models like ChatGPT and Copilot to develop simple apps, we would argue that you still need someone to architect and test/confirm that the software meets its objectives. In order to do this you need to understand the code generated. We can’t see ChatGPT being used to develop the flight control firmware for the next mission to Mars, but more on this in the next article.

LLMs depend on being trained on relevant example code or text, if this doesn’t exist, then they struggle. You could however, put together parts of your code using these models, and this is where prompt engineering…

--

--

David Such
David Such

Written by David Such

Reefwing Software · Embedded Systems Engineer · iOS & AI Development · Robotics · Drones · Arduino · Raspberry Pi · Flight Control

Responses (1)

Write a response