Benchmark a stock with Equity Strategist

Run the Equity Strategist agent over a list of tickers in parallel and aggregate the rationale and latency into a single table — the same shape you would use to A/B-test against another provider.

benchmark.ts

import { Chaos, EQUITY_MODEL } from '@chaoslabs/ai-sdk';
 
const chaos = new Chaos({ apiKey: process.env.CHAOS_API_KEY! });
const tickers = ['NVDA', 'AAPL', 'MSFT', 'TSLA'];
 
const results = await Promise.all(
  tickers.map(async (t) => {
    const start = Date.now();
    const r = await chaos.chat.responses.create({
      model: EQUITY_MODEL,
      input: [
        {
          type: 'message',
          role: 'user',
          content: `Is ${t} fairly valued at the current price? Give a rationale.`,
        },
      ],
      metadata: {
        user_id: 'benchmark-runner',
        session_id: `benchmark-${t}-${start}`,
      },
    });
 
    // info blocks are emitted at runtime but aren't yet in the public Block helper union;
    // widen locally to read the discriminator + content.
    const blocks = (r.messages ?? [])
      .filter((m): m is Extract<typeof m, { type: 'block' }> => m.type === 'block')
      .map((m) => m.data.block as { type: string; content?: string });
    const rationale = blocks
      .filter((b) => b.type === 'info' && typeof b.content === 'string')
      .map((b) => b.content)
      .join('\n\n');
 
    return { ticker: t, rationale, latencyMs: Date.now() - start };
  })
);
 
console.table(results);

Adapting for benchmarking

Swap the chaos.chat.responses.create(...) call for your incumbent provider's client to compare latency, rationale length, and answer quality side-by-side. Capture Date.now() deltas around each call to track end-to-end latency including network and streaming time.

Was this helpful?

Benchmark a stock with Equity Strategist

Adapting for benchmarking

On this page