Hacking is not what it used to be! In my day hackers had a deep knowledge of operating systems and programming languages (or at least knew how to download a script), these days you just need to have a casual conversation with an AI.
Recently, OpenAI made it possible for you to release custom GPTs based on your own data and carefully sculptured prompts. These can be monetised by publishing to the GPT Store. Should you publish a popular GPT which is raking in the money, it is likely someone will try to copy your secret sauce. In the context of GPTs this is the prompt and any custom data files that you have uploaded. In this article we will investigate how easy it is to extract these by trying to hack our GPT called Oz Trivia. We will also talk about ways that you can harden your GPT security.
Creating a Custom AI using GPT Builder and your Data
OpenAI have made it possible to create your own chatbot, using their language model and a combination of custom data…
Prompt hacking, in the context of ChatGPT and other GPT models, refers to the practice of crafting specific, often cleverly designed prompts to elicit specific types of responses or to navigate around the model’s limitations and restrictions. This can involve using a variety of techniques, such as:
- Phrasing and Structure: Adjusting the way a question or command is phrased to get a more desirable or specific answer. For example, rephrasing a question to make it more direct or specific can lead to more detailed responses.
- Keyword Optimization: Including certain keywords that are likely to trigger the desired response. This is based on understanding how the model weights certain words and concepts.
- Role-playing Scenarios: Setting up a scenario where the AI assumes a role (like a character in a story or a specific profession) to guide its responses in a certain direction.
- Guided Imagery or Situations: Creating a detailed scenario to direct the AI’s responses. This is often used in creative tasks like storytelling or generating specific types of content.
- Bypassing Restrictions: Finding ways to phrase…