12 KiB
Inference‑Only Copilot Bridge (VS Code Desktop)
Scope & Goals
Expose a local, OpenAI‑compatible chat endpoint inside a running VS Code Desktop session that forwards requests to GitHub Copilot Chat via the VS Code Chat provider. No workspace tools (no search/edit), no VS Code Server.
- Endpoints:
POST /v1/chat/completions(supports streaming via SSE)GET /v1/models(synthetic listing)GET /healthz(status)
- Local only (
127.0.0.1), single user, opt‑in via VS Code settings/command. - Minimal state: one Copilot session per request; no history persisted by the bridge.
Non‑goals: multi‑tenant proxying, private endpoint scraping, file I/O tools, function/tool calling emulation.
Architecture (Desktop‑only, in‑process)
VS Code Desktop (running)
┌──────────────────────────────────────────────────────────────┐
│ Bridge Extension (TypeScript, Extension Host) │
│ - HTTP server on 127.0.0.1:<port> │
│ - POST /v1/chat/completions → Copilot Chat provider │
│ - GET /v1/models (synthetic) │
│ - GET /healthz │
│ Copilot pipe: │
│ vscode.chat.requestChatAccess('copilot') │
│ → access.startSession().sendRequest({ prompt, ... }) │
└──────────────────────────────────────────────────────────────┘
Data flow
Client (OpenAI API shape) → Bridge HTTP → normalize messages → requestChatAccess('copilot') → startSession().sendRequest → stream chunks → SSE to client.
API Contract (subset, OpenAI‑compatible)
POST /v1/chat/completions
Accepted fields
model: string (ignored internally, echoed back as synthetic id).messages: array of{role, content}; roles:system,user,assistant.stream: boolean (defaulttrue). Iffalse, return a single JSON completion.
Ignored fields
tools, function_call/tool_choice, temperature, top_p, logprobs, seed, penalties, response_format, stop, n.
Prompt normalization
- Keep the last system message and the last N user/assistant turns (configurable, default 3) to bound prompt size.
- Render into a single text prompt:
[SYSTEM]
<system text>
[DIALOG]
user: ...
assistant: ...
user: ...
Streaming response (SSE)
- For each Copilot content chunk:
data: {"id":"cmp_<uuid>","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"<chunk>"}}]}
- Terminate with:
data: [DONE]
Non‑streaming response
{
"id": "cmpl_<uuid>",
"object": "chat.completion",
"choices": [
{ "index": 0, "message": { "role": "assistant", "content": "<full text>" }, "finish_reason": "stop" }
]
}
GET /v1/models
{
"data": [
{ "id": "gpt-4o-copilot", "object": "model", "owned_by": "vscode-bridge" }
]
}
GET /healthz
{ "ok": true, "copilot": "ok", "version": "<vscode.version>" }
Error envelope (OpenAI‑style)
{ "error": { "message": "Copilot unavailable", "type": "server_error", "code": "copilot_unavailable" } }
Extension Design
package.json (relevant)
activationEvents:onStartupFinished(and commands).contributes.commands:bridge.enable,bridge.disable,bridge.status.contributes.configuration(underbridge.*):enabled(bool; defaultfalse)host(string; default"127.0.0.1")port(int; default0= random ephemeral)token(string; optional bearer; empty means no auth, still loopback only)historyWindow(int; default3)
Lifecycle
- On activate:
- Check
bridge.enabled; if false, return. - Attempt
vscode.chat.requestChatAccess('copilot'); cache access if granted. - Start HTTP server bound to loopback.
- Status bar item:
Copilot Bridge: OK/Unavailable @ <host>:<port>.
- Check
- On deactivate/disable: close server, dispose listeners.
Copilot Hook
const access = await vscode.chat.requestChatAccess('copilot'); // per enable or per request
const session = await access.startSession();
const stream = await session.sendRequest({ prompt, attachments: [] });
// stream.onDidProduceContent(text => ...)
// stream.onDidEnd(() => ...)
Implementation Notes
- HTTP server: Node
httpor a tinyexpressrouter. Keep it minimal to reduce dependencies. - Auth: optional
Authorization: Bearer <token>; recommended for local automation. Reject mismatches with 401. - Backpressure: serialize requests or cap concurrency (configurable). If Copilot throttles, return 429 with
Retry-After. - Message normalization:
- Coerce content variants (
string, arrays, objects withtext) into plain strings. - Join multi‑part content with
\n.
- Coerce content variants (
- Streaming:
- Set headers:
Content-Type: text/event-stream,Cache-Control: no-cache,Connection: keep-alive. - Flush after each chunk; handle client disconnect by disposing stream subscriptions.
- Set headers:
- Non‑stream: buffer chunks; return a single completion object.
- Errors:
503when Copilot access unavailable;400for invalid payloads;500for unexpected failures. - Logging: VS Code Output channel: start/stop, port, errors (no prompt bodies unless user enables verbose logging).
- UX:
bridge.statusshows availability, bound address/port, and whether a token is required; status bar indicator toggles on availability.
Security & Compliance
- Local only: default bind to
127.0.0.1; no remote exposure. - Single user: relies on the user’s authenticated VS Code Copilot session; bridge does not handle tokens.
- No scraping/private endpoints: all calls go through the VS Code Chat provider.
- No multi‑tenant/proxying: do not expose to others; treat as a personal developer convenience.
Testing Plan
-
Health
curl http://127.0.0.1:<port>/healthzExpect
{ ok: true, copilot: "ok" }when signed in. -
Streaming completion
curl -N -H "Content-Type: application/json" \ -d '{"model":"gpt-4o-copilot","stream":true,"messages":[{"role":"user","content":"hello"}]}' \ http://127.0.0.1:<port>/v1/chat/completionsExpect multiple
data:chunks and[DONE]. -
Non‑stream (
"stream": false) → single JSON completion. -
Bearer (when configured): missing/incorrect token →
401. -
Unavailable: sign out of Copilot →
/healthzshowsunavailable; POST returns503. -
Concurrency/throttle: fire two requests; verify cap or serialized handling.
Minimal Code Skeleton
src/extension.ts
import * as vscode from 'vscode';
import * as http from 'http';
let server: http.Server | undefined;
let access: vscode.ChatAccess | undefined;
export async function activate(ctx: vscode.ExtensionContext) {
const cfg = vscode.workspace.getConfiguration('bridge');
if (!cfg.get<boolean>('enabled')) return;
try { access = await vscode.chat.requestChatAccess('copilot'); }
catch { access = undefined; }
const host = cfg.get<string>('host') ?? '127.0.0.1';
const portCfg = cfg.get<number>('port') ?? 0;
const token = (cfg.get<string>('token') ?? '').trim();
const hist = cfg.get<number>('historyWindow') ?? 3;
server = http.createServer(async (req, res) => {
try {
if (token && req.headers.authorization !== `Bearer ${token}`) {
res.writeHead(401, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error:{ message:'unauthorized' } }));
return;
}
if (req.method === 'GET' && req.url === '/healthz') {
res.writeHead(200, { 'Content-Type':'application/json' });
res.end(JSON.stringify({ ok: !!access, copilot: access ? 'ok':'unavailable', version: vscode.version }));
return;
}
if (req.method === 'GET' && req.url === '/v1/models') {
res.writeHead(200, { 'Content-Type':'application/json' });
res.end(JSON.stringify({ data:[{ id:'gpt-4o-copilot', object:'model', owned_by:'vscode-bridge' }] }));
return;
}
if (req.method === 'POST' && req.url?.startsWith('/v1/chat/completions')) {
if (!access) {
res.writeHead(503, { 'Content-Type':'application/json' });
res.end(JSON.stringify({ error:{ message:'Copilot unavailable', type:'server_error', code:'copilot_unavailable' } }));
return;
}
const body = await readJson(req);
const prompt = normalizeMessages(body?.messages ?? [], hist);
const streamMode = body?.stream !== false; // default=true
const session = await access.startSession();
const chatStream = await session.sendRequest({ prompt, attachments: [] });
if (streamMode) {
res.writeHead(200, {
'Content-Type':'text/event-stream',
'Cache-Control':'no-cache',
'Connection':'keep-alive'
});
const id = `cmp_${Math.random().toString(36).slice(2)}`;
const h1 = chatStream.onDidProduceContent((chunk) => {
res.write(`data: ${JSON.stringify({
id, object:'chat.completion.chunk',
choices:[{ index:0, delta:{ content: chunk } }]
})}\n\n`);
});
const endAll = () => {
res.write('data: [DONE]\n\n'); res.end();
h1.dispose(); h2.dispose();
};
const h2 = chatStream.onDidEnd(endAll);
req.on('close', endAll);
return;
} else {
let buf = '';
const h1 = chatStream.onDidProduceContent((chunk) => { buf += chunk; });
await new Promise<void>(resolve => {
const h2 = chatStream.onDidEnd(() => { h1.dispose(); h2.dispose(); resolve(); });
});
res.writeHead(200, { 'Content-Type':'application/json' });
res.end(JSON.stringify({
id:`cmpl_${Math.random().toString(36).slice(2)}`,
object:'chat.completion',
choices:[{ index:0, message:{ role:'assistant', content: buf }, finish_reason:'stop' }]
}));
return;
}
}
res.writeHead(404).end();
} catch (e:any) {
res.writeHead(500, { 'Content-Type':'application/json' });
res.end(JSON.stringify({ error:{ message: e?.message ?? 'internal_error', type:'server_error', code:'internal_error' } }));
}
});
server.listen(portCfg, host, () => {
const addr = server!.address();
const shown = typeof addr === 'object' && addr ? `${addr.address}:${addr.port}` : `${host}:${portCfg}`;
vscode.window.setStatusBarMessage(`Copilot Bridge: ${access ? 'OK' : 'Unavailable'} @ ${shown}`);
});
ctx.subscriptions.push({ dispose: () => server?.close() });
}
export function deactivate() {
server?.close();
}
function readJson(req: http.IncomingMessage): Promise<any> {
return new Promise((resolve, reject) => {
let data = ''; req.on('data', c => data += c);
req.on('end', () => { try { resolve(data ? JSON.parse(data) : {}); } catch (e) { reject(e); } });
req.on('error', reject);
});
}
function normalizeMessages(messages: any[], histWindow: number): string {
const system = messages.filter((m:any) => m.role === 'system').pop()?.content;
const turns = messages.filter((m:any) => m.role === 'user' || m.role === 'assistant').slice(-histWindow*2);
const dialog = turns.map((m:any) => `${m.role}: ${asText(m.content)}`).join('\n');
return `${system ? `[SYSTEM]\n${asText(system)}\n\n` : ''}[DIALOG]\n${dialog}`;
}
function asText(content: any): string {
if (typeof content === 'string') return content;
if (Array.isArray(content)) return content.map(asText).join('\n');
if ((content as any)?.text) return (content as any).text;
try { return JSON.stringify(content); } catch { return String(content); }
}
Delivery Checklist
- Extension skeleton with settings + commands.
- HTTP server (loopback),
/healthz,/v1/models,/v1/chat/completions. - Copilot access + session streaming.
- Prompt normalization (system + last N turns).
- SSE mapping and non‑stream fallback.
- Optional bearer token check.
- Status bar + Output channel diagnostics.
- Tests: health, streaming, non‑stream, auth, unavailability.