Posted on:

A few months ago, I created an LLM Prompt Organizer that reduces token counts while providing a framework for a more disciplined approach to prompt engineering. It's simply an HTML form that provides default prompt elements such as context, constraints, audience, etc. Users can disable elements or add custom ones. Once users populate the form fields, they can drop the JSON prompt into the LLM of their choice.

Prompt Organizer

Was this JSON approach to prompt construction effective for overall token conservation?

After several months of use, I give it a strong maybe, with caveats...

It stands to reason that concise prompts pare down the number of request tokens, leading to longer sessions with fewer cutoffs than by submitting requests in paragraph form. I wasn't sure if an LLM used fewer tokens in its response, so I ran an ad hoc, not-so-scientific test in Gemini to quantify the total token delta for both JSON vs. paragraph formatting.

Here's the JSON prompt and a semantically similar paragraph, submitted to Gemini for the token comparison:

JSON

{ 
"instructions": "Process and respond to this prompt.", 
"role": "Computer science educator for AI concepts", 
"context": "User seeks basic transformer model understanding", 
"task": "Explain architecture, key components, importance for LLMs", 
"format": "Introduction, three main components, brief conclusion", 
"examples": "Attention mechanism like highlighting key words for context", 
"constraints": "No heavy math, use analogies, under 400 words", 
"audience": "Basic tech literacy, no ML background" 
}

Paragraph

I'm a computer science educator who specializes in making complex AI concepts accessible to beginners. This explanation is intended for someone with basic tech literacy but no deep machine learning background. Explain the transformer architecture, its key components, and why it's important for LLMs. Structure your response with an introduction, followed by three main components (attention mechanism, architecture overview, and practical significance), and end with a brief conclusion. Keep the explanation clear, avoid heavy mathematical notation, and limit your response to under 400 words. Use analogies to illustrate concepts, for example, think of the attention mechanism like highlighting important words in a sentence to understand context better.

I used OpenAI's Tokenizer to generate token counts for the initial prompt request and its response and ran the tokenizer for both the JSON structure and the paragraph version, where one token corresponds to approximately four characters of text.

OpenAI's Tokenizer Tool

Results

JSON Paragraph
Chars Tokens Chars Tokens
Request 522 114 853 143
Response 3006 585 3460 671
Total
Tokens
699 814

The JSON request and subsequent LLM response used a total of 115 fewer tokens than did its natural language counterpart. Rate limits are based, in part, by tokens per minute (TPM) use, so any token efficiencies are likely to length LLM sessions.

Postmortem

A fatal flaw of my test, and with counting tokens in general, is that I was not able to quantify other factors contributing to the duration of an LLM session, such as computational resources, used and how these other factors are weighted. Moreover, each LLM has its own algorithm for this purpose. One of the principle downsides of the organizer itself is that, in trying to conserve tokens, the tool kneecaps a key strength of LLMs by favoring JSON-formatted prompts over natural language ones. Also, the organizer is overkill for simple prompts like 'Is Brookings Oregon really in a "banana belt," and what's a banana belt anyway?'

A virtuous byproduct of the LLM Prompt Organizer, as its name implies, is that it provides a framework to assemble prompts in a consistent, repeatable way, while flexible enough to accommodate custom elements. The tool also helps to tighten up the language within the prompt to avoid ambiguity and serves as a solid base for follow-up prompts.

In sum, the prompt organizer is a blunt tool for gaming the rate limits of LLMs. One the other hand, it still holds up as a prompt templating tool, which was my primary reason for creating it.

Currently, users must copy JSON-formatted prompts into their LLM manually. If I were to improve the tool, I'd pass the JSON prompt and other necessary parameters to the LLM API and handle the response received from the API within the prompt organizer.


More posts: