Natural Language to SQL with Claude Haiku: Schema Grounding, Validation, and a Read-Only DuckDB Connection

The analytics platform needed a self-serve query interface — analysts ask questions in English, get SQL results back. The constraints: no open-ended generation, no writes to the database, no hallucinated tables, and latency under two seconds.

I wired Claude Haiku through Azure AI Foundry into a FastAPI endpoint that translates natural language to DuckDB SQL. The system uses full schema grounding in the system prompt, regex-based SQL validation, and a read-only database connection as a three-layer defense against both hallucination and injection.

The Architecture

The flow:

User question
  → FastAPI /api/v1/query/sql
    → Claude Haiku (system prompt with full schema)
      → Raw SQL string
    → validate_sql() — regex blocks DML/DDL
    → DuckDB read-only connection
      → Execute query
    → JSON results
  → Frontend renders table

Each layer is a defense boundary. If the LLM generates something dangerous, validation catches it. If validation misses something, the read-only connection prevents writes. The LLM itself is constrained by the system prompt to only produce SELECT queries.

Connecting to Claude via Azure AI Foundry

The Anthropic Python SDK supports custom base URLs. Azure AI Foundry exposes Claude models through a compatible endpoint:

import anthropic

ANTHROPIC_KEY = os.getenv("ANTHROPIC_API_KEY", "")
ANTHROPIC_BASE_URL = os.getenv("ANTHROPIC_BASE_URL", "")
HAIKU_MODEL = os.getenv("ANTHROPIC_HAIKU_MODEL", "claude-haiku-4-5-20251001")

client_kwargs = {"api_key": ANTHROPIC_KEY}
if ANTHROPIC_BASE_URL:
    client_kwargs["base_url"] = ANTHROPIC_BASE_URL

client = anthropic.Anthropic(**client_kwargs)

The Kubernetes deployment passes these as environment variables:

env:
- name: ANTHROPIC_API_KEY
  valueFrom:
    secretKeyRef:
      name: anthropic-secret
      key: api-key
- name: ANTHROPIC_BASE_URL
  value: "https://<client>-dev.services.ai.azure.com/anthropic"
- name: ANTHROPIC_HAIKU_MODEL
  value: "claude-haiku-4-5"

The API key is stored in a Kubernetes Secret — not hardcoded in the manifest. The base URL points to Azure Foundry’s Anthropic-compatible endpoint. From the SDK’s perspective, it’s the same API — just a different host.

Schema Grounding

The system prompt contains the complete DuckDB schema. Every table, every column, every data type:

SCHEMA_CONTEXT = """
You have access to a DuckDB database with the following tables:

TABLE: gold_subscriber_features
- subscriber_id (VARCHAR) — e.g. 'ZAM-12345'
- total_events (INTEGER) — total events across all types
- bss_events (INTEGER) — billing/CRM events
- oss_events (INTEGER) — network telemetry events
- iot_events (INTEGER) — device/IoT events
- avg_signal_dbm (DOUBLE) — average signal strength (negative, higher is better)
- avg_latency_ms (DOUBLE) — average network latency in milliseconds
- avg_packet_loss_pct (DOUBLE) — average packet loss percentage
- avg_bill_zmw (DOUBLE) — average bill amount in Zambian Kwacha
- avg_data_used_mb (DOUBLE) — average data usage in MB
- max_days_overdue (INTEGER) — maximum payment days overdue
- overdue_count (INTEGER) — number of overdue payment events
- avg_battery_pct (DOUBLE) — average device battery percentage
- avg_quality_score (DOUBLE) — average data quality score (0-100)
- churn_label (INTEGER) — 0 = retained, 1 = at risk of churning

TABLE: silver_clean_events
- event_id (VARCHAR)
- event_type (VARCHAR) — 'BSS', 'OSS', or 'IoT'
- source_system (VARCHAR) — 'billing', 'crm', 'network_telemetry', 'device_sensors'
- subscriber_id (VARCHAR)
- ...

TABLE: bronze_raw_events
- event_id (VARCHAR)
- raw_payload (VARCHAR) — JSON string
- quality_score (DOUBLE)
- ingested_at (TIMESTAMP)
"""

This is not a summary or description — it’s the actual column definitions with example values and domain semantics. The annotations are critical:

avg_signal_dbm (DOUBLE) — average signal strength (negative, higher is better) prevents the LLM from generating WHERE avg_signal_dbm > 90 when it should be WHERE avg_signal_dbm > -70.
churn_label (INTEGER) — 0 = retained, 1 = at risk prevents WHERE churn_label = 'at_risk' (string vs integer).
avg_bill_zmw (DOUBLE) — average bill amount in Zambian Kwacha grounds currency context.

Without these semantic annotations, the LLM hallucinates column semantics. With them, it generates correct WHERE clauses on the first try.

The System Prompt

The full system prompt wraps the schema with behavioral constraints:

SYSTEM_PROMPT = f"""You are a SQL assistant for <client>, a telecom company.
# REVIEW: redacted — confirm ("Zambian telecom company" narrowed to "telecom company" to avoid geographic client identifier)
You translate natural language questions into DuckDB SQL queries.

{SCHEMA_CONTEXT}

Rules:
- ONLY generate SELECT queries. Never generate INSERT, UPDATE, DELETE, DROP, CREATE, or ALTER.
- ONLY query the tables listed above. Do not reference any other tables.
- Return ONLY the raw SQL query, no explanation, no markdown, no backticks.
- Use LIMIT 20 by default unless the user specifies a different limit.
- For churn questions, churn_label=1 means at-risk, churn_label=0 means retained.
- Currency is ZMW (Zambian Kwacha).
- Signal strength is in dBm (negative values, closer to 0 is better).
- Round numeric outputs to 2 decimal places where appropriate.
- If the question cannot be answered with the available data, respond with exactly: CANNOT_ANSWER
"""

Key design decisions:

“Return ONLY the raw SQL query, no explanation, no markdown, no backticks.” — Claude Haiku sometimes wraps SQL in markdown code blocks. Without this instruction, the response comes back as ```sql\nSELECT...\n```. The extraction code handles this defensively anyway:

def extract_sql(raw: str) -> str:
    text = raw.strip()
    if text.startswith("```"):
        lines = text.split("\n")
        lines = [l for l in lines if not l.startswith("```")]
        text = "\n".join(lines).strip()
    return text

“LIMIT 20 by default” — prevents the LLM from generating unbounded queries that return the entire table. Analysts can ask “show me all subscribers” and get a manageable result set.

“CANNOT_ANSWER” — a sentinel value for out-of-scope questions. The API handler checks for it:

if sql == "CANNOT_ANSWER":
    raise HTTPException(
        422,
        "I can't answer that with the available subscriber data. "
        "Try asking about churn risk, billing, signal quality, "
        "overdue accounts, or subscriber profiles."
    )

SQL Validation

Even with prompt constraints, the LLM might produce dangerous SQL. The validation layer uses word-boundary regex to block DML and DDL:

def validate_sql(sql: str) -> bool:
    normalized = sql.strip().upper()
    
    if not normalized.startswith("SELECT"):
        return False
    
    dangerous = [
        "INSERT", "UPDATE", "DELETE", "DROP", "CREATE",
        "ALTER", "TRUNCATE", "EXEC", "GRANT", "REVOKE"
    ]
    for keyword in dangerous:
        if re.search(rf'\b{keyword}\b', normalized):
            return False
    return True

The \b word boundary anchors prevent false positives — a column named created_at contains “CREATE” as a substring but \bCREATE\b won’t match it.

If validation fails:

if not validate_sql(sql):
    raise HTTPException(422, "Generated query was not a valid SELECT statement.")

Read-Only Database Connection

The third layer: DuckDB opened in read-only mode.

con = duckdb.connect(DB_PATH, read_only=True)
rows = con.execute(sql).fetchdf()
con.close()

Even if the LLM generates an INSERT that passes regex validation (theoretically possible with obfuscation), the read-only connection refuses the write. DuckDB raises an exception, the endpoint returns 500, and no data is modified.

Calling the LLM

The actual API call is minimal:

message = client.messages.create(
    model=HAIKU_MODEL,
    max_tokens=512,
    system=SYSTEM_PROMPT,
    messages=[{"role": "user", "content": req.question}],
)
raw = message.content[0].text

max_tokens=512 is sufficient for any single SELECT query. Haiku’s response time through Azure Foundry is 200-500ms — fast enough for interactive use.

The full endpoint:

@router.post("/sql")
def query_sql(req: QueryRequest):
    if not LLM_AVAILABLE:
        raise HTTPException(503, "LLM not available — ANTHROPIC_API_KEY not set")
    
    message = client.messages.create(
        model=HAIKU_MODEL,
        max_tokens=512,
        system=SYSTEM_PROMPT,
        messages=[{"role": "user", "content": req.question}],
    )
    raw = message.content[0].text
    sql = extract_sql(raw)
    
    if sql == "CANNOT_ANSWER":
        raise HTTPException(422, "I can't answer that with the available subscriber data.")
    
    if not validate_sql(sql):
        raise HTTPException(422, "Generated query was not a valid SELECT statement.")
    
    con = duckdb.connect(DB_PATH, read_only=True)
    rows = con.execute(sql).fetchdf()
    con.close()
    
    results = rows.fillna(0).to_dict(orient="records")
    for row in results:
        for k, v in row.items():
            if isinstance(v, float) and (np.isnan(v) or np.isinf(v)):
                row[k] = 0
    
    return {"sql": sql, "results": results, "model": HAIKU_MODEL}

NaN and Inf Serialization

DuckDB’s aggregation functions can produce NaN and Inf — AVG() on an empty group, division by zero in calculated columns. Python’s json.dumps() chokes on these values because they’re not valid JSON.

The fix is two-stage: pandas fillna(0) catches NaN in the DataFrame, then an explicit loop catches any remaining float anomalies:

results = rows.fillna(0).to_dict(orient="records")
for row in results:
    for k, v in row.items():
        if isinstance(v, float) and (np.isnan(v) or np.isinf(v)):
            row[k] = 0

This was a production incident — the subscribers endpoint was returning 500s because one subscriber had zero events, producing NaN in the averaging columns. The fix took three lines but the diagnosis took thirty minutes of tracing through FastAPI’s JSON serialization.

The Frontend

The query interface is a chat-style UI with structured response cards:

async function submitQuery() {
    const res = await fetch('/api/v1/query/sql', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ question: q })
    });
    
    const data = await res.json();
    addResponse(buildResponseCard(data.sql, data.results, data.model));
}

Responses are rendered as cards with two sections: the generated SQL (in a monospace code block) and the results (in a scrollable table). Column headers replace underscores with spaces. Numbers are right-aligned with tabular numerals. Strings are highlighted in blue.

All rendering uses DOM methods — createElement, textContent — not innerHTML. This was enforced by a pre-commit hook that flagged any innerHTML assignment with API data as a potential XSS vector.

What Works and What Doesn’t

The system handles most analytical questions well:

“Show me subscribers with the worst signal” → SELECT * FROM gold_subscriber_features ORDER BY avg_signal_dbm ASC LIMIT 20
“What’s the average bill for at-risk subscribers?” → SELECT ROUND(AVG(avg_bill_zmw), 2) FROM gold_subscriber_features WHERE churn_label = 1
“Count events by source type” → SELECT event_type, COUNT(*) as count FROM silver_clean_events GROUP BY event_type

It correctly refuses out-of-scope questions:

“What’s the weather in Lusaka?” → CANNOT_ANSWER
“Drop the subscribers table” → Blocked by both the system prompt and regex validation

Edge cases that need production hardening:

Multi-statement injection: SELECT 1; DROP TABLE gold_subscriber_features — the regex catches DROP but a more sophisticated attack might encode it. The read-only connection is the real safeguard.
Subquery complexity: Users can ask questions that generate deeply nested subqueries. No query complexity limit is enforced.
Context window: The system prompt consumes ~800 tokens. Each user question adds more. For multi-turn conversations, context management would be needed. Currently each query is stateless.

Deployment Details

The LLM module is a FastAPI router mounted at /api/v1/query:

app.include_router(query_router, prefix="/api/v1/query", tags=["Self-Serve"])

Environment variables for the Kubernetes deployment:

- name: ANTHROPIC_API_KEY
  valueFrom:
    secretKeyRef:
      name: anthropic-secret
      key: api-key
- name: ANTHROPIC_BASE_URL
  value: "https://<client>-dev.services.ai.azure.com/anthropic"
- name: ANTHROPIC_HAIKU_MODEL
  value: "claude-haiku-4-5"

The secret was created manually:

kubectl create secret generic anthropic-secret \
  --from-literal=api-key="$ANTHROPIC_API_KEY" \
  -n <namespace>

Design Rules

Schema grounding beats few-shot examples for structured databases

The full schema in the system prompt gives the LLM complete information about what columns exist and what they mean. Few-shot examples only cover specific patterns and leave the LLM guessing for novel queries. With 15 columns across 3 tables, the schema fits comfortably in the context window.

Semantic annotations in the schema prevent logic errors

“Signal strength is negative, higher is better” is domain knowledge that no amount of SQL examples can teach. The annotation in the schema description is more reliable than demonstrations.

Three layers of defense is the minimum for LLM-generated SQL

Prompt constraints reduce attack surface. Regex validation catches obvious violations. Read-only connections prevent damage from anything that slips through. No single layer is sufficient.

Azure AI Foundry’s Anthropic endpoint is API-compatible

The only code change is setting base_url on the Anthropic client. No SDK fork, no custom HTTP handler, no wrapper library. This makes it trivial to switch between direct Anthropic and Azure-hosted models.

NaN/Inf values in aggregation are silent killers

They don’t fail at query time — they fail at JSON serialization time, producing opaque 500 errors. Always sanitize numeric results from analytical databases before returning JSON.