Prompt Efficiencies

Posted on: 15 August 2025

Tokens

A token is a small chunk of text that the AI model uses to process your message and generate a reply. Both what you enter and what the model sends back count toward your token limit. When you write short, focused prompts and ask for concise answers, you use fewer tokens overall. That means your sessions can last longer, responses come faster, and you’re less likely to hit token limits or get cut off mid-reply.

Conserve tokens = longer sessions + faster responses + fewer cutoffs

1️. Prompt Design

Keep prompts concise: use direct verbs (“summarize,” “revise,” “explain briefly”).
Reference earlier text instead of re-pasting it.
State the output size:
- “In 3 bullet points.”
- “Under 100 words.”
- “Code only—no explanation.”

Don’t

Repeat long setup text (“As I mentioned earlier…”).
Ask for multiple tasks in one prompt if they can be done sequentially.
Use filler (“Could you please kindly explain…”).

2. Conversation Management

Stay in the same chat thread so context persists.
Split long projects:
1. Outline
2. Draft one part
3. Revise iteratively
Delete or summarize unneeded sections before continuing.

3️. Handling Large Text

Paste only relevant sections (“Here’s the last 30 lines”).
Summarize before submission (“This 20-page doc covers X, Y, Z…”).
Use compressed summaries or keywords when possible.

4. LLM-Specific Tips

LLM	Efficiency Tactics
ChatGPT	GPT-4x tokenizer is efficient. Reuse previous messages and cap outputs.
Claude	Takes long input well — summarize documents instead of feeding full text.
Gemini	Large window; main limit is requests/day. Keep prompts precise and output short.

5️. Output Control

Ask for structured formats (Markdown, table, JSON).
Set maximums:
- “Explain in ≤5 sentences.”
- “Show first 10 lines only.”
For code: “Return only the function definition.”

Approximate Token Use

Word Count	≈ Tokens
75 words	~100 tokens
500 words	~650 tokens
1,000 words	~1,300 tokens

Next: LLM Prompt Organizer