OpenAI Compatible API: Implementation Strategy for /v1/chat/completions and /v1/responses
Use this OpenAI Compatible API to connect existing OpenAI SDKs/clients directly to your Antigravity Tools local gateway. Focus on making /v1/chat/completions and /v1/responses work, and learn to troubleshoot quickly using response headers.
What You'll Learn
- Connect directly to Antigravity Tools' local gateway using OpenAI SDK (or curl)
- Make
/v1/chat/completions(includingstream: true) and/v1/responseswork - Understand the model list from
/v1/modelsand theX-Mapped-Modelin response headers - Know where to start troubleshooting when encountering 401/404/429 errors
Your Current Struggles
Many clients/SDKs only recognize OpenAI's interface shape: fixed URLs, fixed JSON fields, fixed SSE streaming format. Antigravity Tools' goal is not to make you change clients, but to make clients "think they're calling OpenAI"—actually transforming requests to internal upstream calls, then converting results back to OpenAI format.
When to Use This Approach
- You already have a bunch of tools that only support OpenAI (IDE plugins, scripts, bots, SDKs), and don't want to write a new integration for each one
- You want to unify by routing requests to a local (or LAN) gateway using
base_url, then let the gateway handle account scheduling, retry, and monitoring
🎒 Prerequisites
Prerequisites
- You have already started the reverse proxy service on Antigravity Tools' "API Proxy" page, and noted the port (e.g.,
8045) - You have added at least one available account, otherwise the reverse proxy cannot get upstream tokens
How to pass authentication?
When you enable proxy.auth_mode and configure proxy.api_key, requests need to carry an API Key.
Antigravity Tools' middleware prioritizes reading Authorization, and also supports x-api-key and x-goog-api-key. (Implementation in src-tauri/src/proxy/middleware/auth.rs)
What is OpenAI Compatible API?
OpenAI Compatible API is a set of "looks like OpenAI" HTTP routes and JSON/SSE protocols. Clients send requests to the local gateway in OpenAI's request format, the gateway then transforms requests to internal upstream calls and converts upstream responses back to OpenAI response structure, so existing OpenAI SDKs can work with basically no modifications.
Compatible Endpoints Overview (relevant to this lesson)
| Endpoint | Purpose | Code Evidence |
|---|---|---|
POST /v1/chat/completions | Chat Completions (including streaming) | Route registration in src-tauri/src/proxy/server.rs; src-tauri/src/proxy/handlers/openai.rs |
POST /v1/completions | Legacy Completions (reuses same handler) | Route registration in src-tauri/src/proxy/server.rs |
POST /v1/responses | Responses/Codex CLI compatible (reuses same handler) | Route registration in src-tauri/src/proxy/server.rs (comment: compatible with Codex CLI) |
GET /v1/models | Returns model list (including custom mapping + dynamic generation) | src-tauri/src/proxy/handlers/openai.rs + src-tauri/src/proxy/common/model_mapping.rs |
Follow Along
Step 1: Confirm Service is Running with curl (/healthz + /v1/models)
Why Eliminate basic issues like "service not started / wrong port / blocked by firewall" first.
# 1) Health check
curl -s http://127.0.0.1:8045/healthz
# 2) Pull model list
curl -s http://127.0.0.1:8045/v1/modelsWhat you should see: /healthz returns something like {"status":"ok"}; /v1/models returns {"object":"list","data":[...]}.
Step 2: Call /v1/chat/completions with OpenAI Python SDK
Why This step proves that the entire chain "OpenAI SDK → local gateway → upstream → OpenAI response transformation" is working.
import openai
client = openai.OpenAI(
api_key="sk-antigravity",
base_url="http://127.0.0.1:8045/v1",
)
response = client.chat.completions.create(
model="gemini-3-flash",
messages=[{"role": "user", "content": "你好,请自我介绍"}],
)
print(response.choices[0].message.content)What you should see: Terminal prints a model reply text.
Step 3: Enable stream, confirm SSE streaming return
Why Many clients rely on OpenAI's SSE protocol (Content-Type: text/event-stream). This step confirms the streaming chain and event format are available.
curl -N http://127.0.0.1:8045/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3-flash",
"stream": true,
"messages": [
{"role": "user", "content": "用三句话解释一下什么是本地反代网关"}
]
}'What you should see: Terminal continuously outputs lines starting with data: { ... }, ending with data: [DONE].
Step 4: Make a request with /v1/responses (Codex/Responses style)
Why Some tools use /v1/responses or use fields like instructions and input in the request body. This project "normalizes" such requests into messages and then reuses the same transformation logic. (Handler in src-tauri/src/proxy/handlers/openai.rs)
curl -s http://127.0.0.1:8045/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3-flash",
"instructions": "你是一个严谨的代码审查员。",
"input": "请指出下面这段代码最可能的 bug:\n\nfunction add(a, b) { return a - b }"
}'What you should see: Response body is an OpenAI-style response object (this project converts Gemini responses to OpenAI choices[].message.content).
Step 5: Confirm Model Routing Works (Check X-Mapped-Model Response Header)
Why The model you write in the client may not be the actual "physical model" being called. The gateway first does model mapping (including custom mapping/wildcards, see Model Routing: Custom Mapping, Wildcard Priority, and Preset Strategies), then puts the final result in response headers for easy troubleshooting.
curl -i http://127.0.0.1:8045/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "hi"}]
}'What you should see: Response headers include X-Mapped-Model: ... (e.g., mapped to gemini-2.5-flash), and may also include X-Account-Email: ....
Checkpoint ✅
GET /healthzreturns{"status":"ok"}(or equivalent JSON)GET /v1/modelsreturnsobject=listanddatais an array/v1/chat/completionsnon-streaming request can getchoices[0].message.content- When
stream: true, can receive SSE ending with[DONE] curl -ican seeX-Mapped-Modelresponse header
Common Pitfalls
1) Base URL wrong → 404 (most common)
- In OpenAI SDK examples,
base_urlneeds to end with/v1(see Python example in project README). - Some clients "stack paths". For example, README explicitly mentions: Kilo Code in OpenAI mode might build non-standard paths like
/v1/chat/completions/responses, triggering 404.
2) 401: Not upstream down, but you didn't bring key or mode is wrong
When the auth strategy's "effective mode" is not off, middleware validates request headers: Authorization: Bearer <proxy.api_key>, and also supports x-api-key and x-goog-api-key. (Implementation in src-tauri/src/proxy/middleware/auth.rs)
Auth mode hint
When auth_mode = auto, it automatically decides based on allow_lan_access:
allow_lan_access = true→ effective mode isall_except_health(auth required except/healthz)allow_lan_access = false→ effective mode isoff(no auth required for local access)
3) 429/503/529: Proxy retries + rotates accounts, but may also "pool exhausted"
OpenAI handler has built-in max 3 attempts (limited by account pool size), encounters some errors and waits/rotates accounts to retry. (Implementation in src-tauri/src/proxy/handlers/openai.rs)
Lesson Summary
/v1/chat/completionsis the most universal entry point,stream: truegoes through SSE/v1/responsesand/v1/completionsuse the same compatible handler, core is first normalizing requests tomessagesX-Mapped-Modelhelps you confirm the "client model name → final physical model" mapping result
Next Lesson Preview
In the next lesson, we continue to look at Anthropic Compatible API: /v1/messages and Claude Code's Key Contracts (corresponding chapter:
platforms-anthropic).
Appendix: Source Code Reference
Click to expand source code locations
Last updated: 2026-01-23
| Feature | File Path | Lines |
|---|---|---|
| OpenAI route registration (including /v1/responses) | src-tauri/src/proxy/server.rs | 120-194 |
| Chat Completions handler (including Responses format detection) | src-tauri/src/proxy/handlers/openai.rs | 70-462 |
| /v1/completions and /v1/responses handler (Codex/Responses normalization + retry/rotation) | src-tauri/src/proxy/handlers/openai.rs | 464-1080 |
| /v1/models return (dynamic model list) | src-tauri/src/proxy/handlers/openai.rs | 1082-1102 |
| OpenAI request data structure (messages/instructions/input/size/quality) | src-tauri/src/proxy/mappers/openai/models.rs | 7-38 |
| --- | --- | --- |
| --- | --- | --- |
| Model mapping and wildcard priority (exact > wildcard > default) | src-tauri/src/proxy/common/model_mapping.rs | 180-228 |
| --- | --- | --- |
Key Constants:
MAX_RETRY_ATTEMPTS = 3: Max attempts for OpenAI protocol (including rotation) (seesrc-tauri/src/proxy/handlers/openai.rs)
Key Functions:
transform_openai_request(...): Converts OpenAI request body to internal upstream request (seesrc-tauri/src/proxy/mappers/openai/request.rs)transform_openai_response(...): Converts upstream response to OpenAIchoices/usage(seesrc-tauri/src/proxy/mappers/openai/response.rs)