Introduction
The ChatGPT API is the easiest way to add real AI capability to your Python projects — chatbots, summarizers, classifiers, code assistants, agents. The hard part isn't making a single call. It's understanding the patterns that turn that one call into a production-ready app.
In this tutorial, we'll build four working examples, each more advanced than the last:
- Your first API call
- Streaming responses in real time
- Multi-turn conversations with memory
- Function calling for tool-using agents
We'll finish with a complete CLI assistant you can actually use.
Setup: Install and Authenticate
First, install the official OpenAI Python SDK:
pip install openai python-dotenv
Get your API key from platform.openai.com/api-keys. Never commit it to git. Store it in a .env file at the root of your project:
OPENAI_API_KEY=sk-proj-your-key-here
Load it once in your script and you're authenticated:
import os
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
That client object is all you need for the rest of this tutorial.
1. Your First API Call
The simplest possible call — send a prompt, get a response:
from openai import OpenAI
client = OpenAI()
def ask(prompt: str, model: str = "gpt-4o-mini") -> str:
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "You are a concise, helpful assistant."},
{"role": "user", "content": prompt},
],
temperature=0.7,
)
return response.choices[0].message.content
if __name__ == "__main__":
answer = ask("Explain Python decorators in two sentences.")
print(answer)
The three things to understand about every API call:
| Parameter | What it does |
|---|---|
model |
Which model to use. gpt-4o-mini is cheap and fast; gpt-4o is smarter and more expensive. |
messages |
A list of role-tagged messages. system sets behavior, user is the prompt, assistant is past replies. |
temperature |
0 = deterministic, 1 = creative. Use 0–0.3 for factual tasks, 0.7+ for writing. |
That's the whole foundation. Everything else is variations on this.
2. Streaming Responses in Real Time
Without streaming, users wait 3–10 seconds staring at a blank screen. With streaming, tokens appear as they're generated — the experience feels 10× faster:
from openai import OpenAI
client = OpenAI()
def ask_streaming(prompt: str, model: str = "gpt-4o-mini"):
stream = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
stream=True,
)
full_response = ""
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)
full_response += delta
print() # final newline
return full_response
if __name__ == "__main__":
ask_streaming("Write a haiku about debugging Python at 3am.")
Two things to notice: stream=True in the request, and flush=True in print so output appears immediately instead of being buffered. Use streaming for any user-facing interface. Skip it for background jobs where you just need the full response.
3. Multi-Turn Conversations With Memory
The ChatGPT API is stateless — it does not remember previous messages. You remember them, by passing the entire conversation history with every call:
from openai import OpenAI
client = OpenAI()
class Chatbot:
def __init__(self, system_prompt: str, model: str = "gpt-4o-mini"):
self.model = model
self.messages = [{"role": "system", "content": system_prompt}]
def send(self, user_message: str) -> str:
self.messages.append({"role": "user", "content": user_message})
response = client.chat.completions.create(
model=self.model,
messages=self.messages,
)
reply = response.choices[0].message.content
self.messages.append({"role": "assistant", "content": reply})
return reply
def reset(self):
self.messages = self.messages[:1] # keep only system prompt
if __name__ == "__main__":
bot = Chatbot("You are a friendly Python tutor.")
print(bot.send("What is a list comprehension?"))
print(bot.send("Show me an example with filtering."))
print(bot.send("Now rewrite it as a for-loop."))
The bot now "remembers" — each call sees the full history. Be careful with long conversations: every message is re-sent every turn, so your token cost grows quadratically. For long chats, summarize older messages or truncate to the last N turns.
4. Function Calling: Let the Model Use Your Code
This is where the API stops being a chatbot and starts being an agent. You define Python functions; the model decides when to call them and with what arguments:
import json
from openai import OpenAI
client = OpenAI()
# 1. Define the actual Python functions
def get_weather(city: str) -> dict:
# In real code, call a weather API. Here, just mock it.
return {"city": city, "temp_c": 22, "condition": "sunny"}
def calculate(expression: str) -> float:
# WARNING: eval is unsafe for untrusted input. Use a real parser.
return eval(expression, {"__builtins__": {}}, {})
# 2. Describe them to the model as tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
},
{
"type": "function",
"function": {
"name": "calculate",
"description": "Evaluate a math expression like '23 * 4 + 9'.",
"parameters": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"],
},
},
},
]
# 3. Map names back to real functions
AVAILABLE_FUNCTIONS = {"get_weather": get_weather, "calculate": calculate}
def run_agent(user_message: str, model: str = "gpt-4o-mini") -> str:
messages = [{"role": "user", "content": user_message}]
while True:
response = client.chat.completions.create(
model=model, messages=messages, tools=tools,
)
msg = response.choices[0].message
messages.append(msg)
if not msg.tool_calls:
return msg.content # done
# Execute each tool call the model asked for
for call in msg.tool_calls:
fn = AVAILABLE_FUNCTIONS[call.function.name]
args = json.loads(call.function.arguments)
result = fn(**args)
messages.append({
"role": "tool",
"tool_call_id": call.id,
"content": json.dumps(result),
})
if __name__ == "__main__":
print(run_agent("What's the weather in Lahore, and what's 12% of 850?"))
The flow: model receives the question → decides it needs get_weather("Lahore") and calculate("0.12 * 850") → you run them → send results back → model writes the final answer. That's the foundation of every "AI agent" framework on the market.
Putting It All Together: A CLI Assistant
Here's a complete, working terminal assistant that combines streaming, memory, and the chat interface:
import os
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
client = OpenAI()
SYSTEM_PROMPT = """You are a senior Python developer.
Answer questions clearly, give code examples, and be honest
when you don't know something."""
def run_assistant():
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
print("Python Assistant — type 'exit' to quit, 'reset' to clear context.\n")
while True:
try:
user_input = input("you ▸ ").strip()
except (EOFError, KeyboardInterrupt):
print()
break
if user_input.lower() == "exit":
break
if user_input.lower() == "reset":
messages = messages[:1]
print("(context cleared)\n")
continue
if not user_input:
continue
messages.append({"role": "user", "content": user_input})
print("ai ▸ ", end="", flush=True)
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
stream=True,
)
reply = ""
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)
reply += delta
print("\n")
messages.append({"role": "assistant", "content": reply})
if __name__ == "__main__":
run_assistant()
Save it as assistant.py, run python assistant.py, and you have a working AI assistant in your terminal. Under 50 lines of code.
Production Checklist
Before you ship anything using the ChatGPT API, run through this:
| Concern | What to do |
|---|---|
| Secrets | Use environment variables, never hardcode keys |
| Cost | Set hard usage limits in the OpenAI dashboard |
| Errors | Wrap calls in try/except and retry with exponential backoff |
| Latency | Stream responses for any user-facing UI |
| Token limits | Truncate or summarize long conversation history |
| Safety | Validate user input; don't blindly eval model output |
| Logging | Log prompts + responses for debugging (scrub PII) |
The official SDK has built-in retry logic, but you should still handle RateLimitError and APIError explicitly so failures are visible, not silent.
Install Everything
pip install openai python-dotenv
That's the entire stack. The OpenAI SDK gives you streaming, function calling, file uploads, vision, and embeddings out of the box.
Final Thought
The four patterns in this tutorial — single call, streaming, memory, and tool use — are the building blocks of every AI product you'll ever build. Chatbots, customer support agents, code reviewers, research assistants, internal tools: all of them are remixes of these four ideas.
The API itself takes ten minutes to learn. The interesting work is everything that wraps around it — prompt design, evaluation, cost control, and UX. Start with the CLI assistant above, then build the version that solves a real problem in your own workflow. That's how you actually learn this.