vscode-copilot-bridge/AGENTS.md
2025-08-12 19:33:14 +02:00

336 lines
No EOL
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# InferenceOnly Copilot Bridge (VS Code Desktop)
## Scope & Goals
Expose a **local, OpenAIcompatible chat endpoint** inside a **running VS Code Desktop** session that forwards requests to **GitHub Copilot Chat** via the **VS Code Chat provider**. No workspace tools (no search/edit), no VS Code Server.
- Endpoints:
- `POST /v1/chat/completions` (supports streaming via SSE)
- `GET /v1/models` (synthetic listing)
- `GET /healthz` (status)
- Local only (`127.0.0.1`), single user, optin via VS Code settings/command.
- Minimal state: one Copilot session per request; no history persisted by the bridge.
**Nongoals:** multitenant proxying, private endpoint scraping, file I/O tools, function/tool calling emulation.
---
## Architecture (Desktoponly, inprocess)
```
VS Code Desktop (running)
┌──────────────────────────────────────────────────────────────┐
│ Bridge Extension (TypeScript, Extension Host) │
│ - HTTP server on 127.0.0.1:<port> │
│ - POST /v1/chat/completions → Copilot Chat provider │
│ - GET /v1/models (synthetic) │
│ - GET /healthz │
│ Copilot pipe: │
│ vscode.chat.requestChatAccess('copilot') │
│ → access.startSession().sendRequest({ prompt, ... }) │
└──────────────────────────────────────────────────────────────┘
```
### Data flow
Client (OpenAI API shape) → Bridge HTTP → normalize messages → `requestChatAccess('copilot')``startSession().sendRequest` → stream chunks → SSE to client.
---
## API Contract (subset, OpenAIcompatible)
### POST `/v1/chat/completions`
**Accepted fields**
- `model`: string (ignored internally, echoed back as synthetic id).
- `messages`: array of `{role, content}`; roles: `system`, `user`, `assistant`.
- `stream`: boolean (default `true`). If `false`, return a single JSON completion.
**Ignored fields**
`tools`, `function_call/tool_choice`, `temperature`, `top_p`, `logprobs`, `seed`, penalties, `response_format`, `stop`, `n`.
**Prompt normalization**
- Keep the last **system** message and the last **N** user/assistant turns (configurable, default 3) to bound prompt size.
- Render into a single text prompt:
```
[SYSTEM]
<system text>
[DIALOG]
user: ...
assistant: ...
user: ...
```
**Streaming response (SSE)**
- For each Copilot content chunk:
```
data: {"id":"cmp_<uuid>","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"<chunk>"}}]}
```
- Terminate with:
```
data: [DONE]
```
**Nonstreaming response**
```json
{
"id": "cmpl_<uuid>",
"object": "chat.completion",
"choices": [
{ "index": 0, "message": { "role": "assistant", "content": "<full text>" }, "finish_reason": "stop" }
]
}
```
### GET `/v1/models`
```json
{
"data": [
{ "id": "gpt-4o-copilot", "object": "model", "owned_by": "vscode-bridge" }
]
}
```
### GET `/healthz`
```json
{ "ok": true, "copilot": "ok", "version": "<vscode.version>" }
```
### Error envelope (OpenAIstyle)
```json
{ "error": { "message": "Copilot unavailable", "type": "server_error", "code": "copilot_unavailable" } }
```
---
## Extension Design
### `package.json` (relevant)
- `activationEvents`: `onStartupFinished` (and commands).
- `contributes.commands`: `bridge.enable`, `bridge.disable`, `bridge.status`.
- `contributes.configuration` (under `bridge.*`):
- `enabled` (bool; default `false`)
- `host` (string; default `"127.0.0.1"`)
- `port` (int; default `0` = random ephemeral)
- `token` (string; optional bearer; empty means no auth, still loopback only)
- `historyWindow` (int; default `3`)
### Lifecycle
- On activate:
1. Check `bridge.enabled`; if false, return.
2. Attempt `vscode.chat.requestChatAccess('copilot')`; cache access if granted.
3. Start HTTP server bound to loopback.
4. Status bar item: `Copilot Bridge: OK/Unavailable @ <host>:<port>`.
- On deactivate/disable: close server, dispose listeners.
### Copilot Hook
```ts
const access = await vscode.chat.requestChatAccess('copilot'); // per enable or per request
const session = await access.startSession();
const stream = await session.sendRequest({ prompt, attachments: [] });
// stream.onDidProduceContent(text => ...)
// stream.onDidEnd(() => ...)
```
---
## Implementation Notes
- **HTTP server:** Node `http` or a tiny `express` router. Keep it minimal to reduce dependencies.
- **Auth:** optional `Authorization: Bearer <token>`; recommended for local automation. Reject mismatches with 401.
- **Backpressure:** serialize requests or cap concurrency (configurable). If Copilot throttles, return 429 with `Retry-After`.
- **Message normalization:**
- Coerce content variants (`string`, arrays, objects with `text`) into plain strings.
- Join multipart content with `\n`.
- **Streaming:**
- Set headers: `Content-Type: text/event-stream`, `Cache-Control: no-cache`, `Connection: keep-alive`.
- Flush after each chunk; handle client disconnect by disposing stream subscriptions.
- **Nonstream:** buffer chunks; return a single completion object.
- **Errors:** `503` when Copilot access unavailable; `400` for invalid payloads; `500` for unexpected failures.
- **Logging:** VS Code Output channel: start/stop, port, errors (no prompt bodies unless user enables verbose logging).
- **UX:** `bridge.status` shows availability, bound address/port, and whether a token is required; status bar indicator toggles on availability.
---
## Security & Compliance
- Local only: default bind to `127.0.0.1`; no remote exposure.
- Single user: relies on the users authenticated VS Code Copilot session; bridge does not handle tokens.
- No scraping/private endpoints: all calls go through the VS Code Chat provider.
- No multitenant/proxying: do not expose to others; treat as a personal developer convenience.
---
## Testing Plan
1. **Health**
```bash
curl http://127.0.0.1:<port>/healthz
```
Expect `{ ok: true, copilot: "ok" }` when signed in.
2. **Streaming completion**
```bash
curl -N -H "Content-Type: application/json" \
-d '{"model":"gpt-4o-copilot","stream":true,"messages":[{"role":"user","content":"hello"}]}' \
http://127.0.0.1:<port>/v1/chat/completions
```
Expect multiple `data:` chunks and `[DONE]`.
3. **Nonstream** (`"stream": false`) → single JSON completion.
4. **Bearer** (when configured): missing/incorrect token → `401`.
5. **Unavailable**: sign out of Copilot → `/healthz` shows `unavailable`; POST returns `503`.
6. **Concurrency/throttle**: fire two requests; verify cap or serialized handling.
---
## Minimal Code Skeleton
### `src/extension.ts`
```ts
import * as vscode from 'vscode';
import * as http from 'http';
let server: http.Server | undefined;
let access: vscode.ChatAccess | undefined;
export async function activate(ctx: vscode.ExtensionContext) {
const cfg = vscode.workspace.getConfiguration('bridge');
if (!cfg.get<boolean>('enabled')) return;
try { access = await vscode.chat.requestChatAccess('copilot'); }
catch { access = undefined; }
const host = cfg.get<string>('host') ?? '127.0.0.1';
const portCfg = cfg.get<number>('port') ?? 0;
const token = (cfg.get<string>('token') ?? '').trim();
const hist = cfg.get<number>('historyWindow') ?? 3;
server = http.createServer(async (req, res) => {
try {
if (token && req.headers.authorization !== `Bearer ${token}`) {
res.writeHead(401, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error:{ message:'unauthorized' } }));
return;
}
if (req.method === 'GET' && req.url === '/healthz') {
res.writeHead(200, { 'Content-Type':'application/json' });
res.end(JSON.stringify({ ok: !!access, copilot: access ? 'ok':'unavailable', version: vscode.version }));
return;
}
if (req.method === 'GET' && req.url === '/v1/models') {
res.writeHead(200, { 'Content-Type':'application/json' });
res.end(JSON.stringify({ data:[{ id:'gpt-4o-copilot', object:'model', owned_by:'vscode-bridge' }] }));
return;
}
if (req.method === 'POST' && req.url?.startsWith('/v1/chat/completions')) {
if (!access) {
res.writeHead(503, { 'Content-Type':'application/json' });
res.end(JSON.stringify({ error:{ message:'Copilot unavailable', type:'server_error', code:'copilot_unavailable' } }));
return;
}
const body = await readJson(req);
const prompt = normalizeMessages(body?.messages ?? [], hist);
const streamMode = body?.stream !== false; // default=true
const session = await access.startSession();
const chatStream = await session.sendRequest({ prompt, attachments: [] });
if (streamMode) {
res.writeHead(200, {
'Content-Type':'text/event-stream',
'Cache-Control':'no-cache',
'Connection':'keep-alive'
});
const id = `cmp_${Math.random().toString(36).slice(2)}`;
const h1 = chatStream.onDidProduceContent((chunk) => {
res.write(`data: ${JSON.stringify({
id, object:'chat.completion.chunk',
choices:[{ index:0, delta:{ content: chunk } }]
})}\n\n`);
});
const endAll = () => {
res.write('data: [DONE]\n\n'); res.end();
h1.dispose(); h2.dispose();
};
const h2 = chatStream.onDidEnd(endAll);
req.on('close', endAll);
return;
} else {
let buf = '';
const h1 = chatStream.onDidProduceContent((chunk) => { buf += chunk; });
await new Promise<void>(resolve => {
const h2 = chatStream.onDidEnd(() => { h1.dispose(); h2.dispose(); resolve(); });
});
res.writeHead(200, { 'Content-Type':'application/json' });
res.end(JSON.stringify({
id:`cmpl_${Math.random().toString(36).slice(2)}`,
object:'chat.completion',
choices:[{ index:0, message:{ role:'assistant', content: buf }, finish_reason:'stop' }]
}));
return;
}
}
res.writeHead(404).end();
} catch (e:any) {
res.writeHead(500, { 'Content-Type':'application/json' });
res.end(JSON.stringify({ error:{ message: e?.message ?? 'internal_error', type:'server_error', code:'internal_error' } }));
}
});
server.listen(portCfg, host, () => {
const addr = server!.address();
const shown = typeof addr === 'object' && addr ? `${addr.address}:${addr.port}` : `${host}:${portCfg}`;
vscode.window.setStatusBarMessage(`Copilot Bridge: ${access ? 'OK' : 'Unavailable'} @ ${shown}`);
});
ctx.subscriptions.push({ dispose: () => server?.close() });
}
export function deactivate() {
server?.close();
}
function readJson(req: http.IncomingMessage): Promise<any> {
return new Promise((resolve, reject) => {
let data = ''; req.on('data', c => data += c);
req.on('end', () => { try { resolve(data ? JSON.parse(data) : {}); } catch (e) { reject(e); } });
req.on('error', reject);
});
}
function normalizeMessages(messages: any[], histWindow: number): string {
const system = messages.filter((m:any) => m.role === 'system').pop()?.content;
const turns = messages.filter((m:any) => m.role === 'user' || m.role === 'assistant').slice(-histWindow*2);
const dialog = turns.map((m:any) => `${m.role}: ${asText(m.content)}`).join('\n');
return `${system ? `[SYSTEM]\n${asText(system)}\n\n` : ''}[DIALOG]\n${dialog}`;
}
function asText(content: any): string {
if (typeof content === 'string') return content;
if (Array.isArray(content)) return content.map(asText).join('\n');
if ((content as any)?.text) return (content as any).text;
try { return JSON.stringify(content); } catch { return String(content); }
}
```
---
## Delivery Checklist
- Extension skeleton with settings + commands.
- HTTP server (loopback), `/healthz`, `/v1/models`, `/v1/chat/completions`.
- Copilot access + session streaming.
- Prompt normalization (system + last N turns).
- SSE mapping and nonstream fallback.
- Optional bearer token check.
- Status bar + Output channel diagnostics.
- Tests: health, streaming, nonstream, auth, unavailability.
-