Optimization

Ladger’s optimization engine analyzes your AI usage and provides actionable recommendations with estimated savings and quality confidence scores.

Optimization Strategies

Model Switching

Switch to cheaper models for tasks that don’t require full capability:

┌─────────────────────────────────────────────────────────────────┐
│  💡 OPTIMIZATION: Model Switch                                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Flow: customer-support → classify-intent                       │
│                                                                  │
│  Current Model:    GPT-4o           $0.030/request              │
│  Suggested Model:  GPT-3.5-turbo    $0.002/request              │
│                                                                  │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  │
│                                                                  │
│  Estimated Savings:    $2,340/month  (93% reduction)            │
│  Quality Confidence:   97.2%         (based on 500 replays)     │
│  Requests Analyzed:    45,230                                   │
│                                                                  │
│  [Simulate]  [Apply to 10%]  [Apply to All]  [Dismiss]          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Model Switching Matrix

Task Type	Low Complexity	Medium	High
Code Generation	GPT-3.5	GPT-4o-mini	GPT-4o
Q&A/Retrieval	Haiku	Sonnet	Sonnet
Summarization	Haiku	GPT-3.5	GPT-4o-mini
Data Extraction	GPT-3.5	GPT-3.5	GPT-4o-mini
Classification	Fine-tuned	GPT-3.5	GPT-4o-mini
Planning	Sonnet	GPT-4o	Opus/o1

Prompt Compression

Reduce token usage by optimizing prompts:

Technique	Savings	Implementation
Remove redundant context	10-30%	Low effort
Shorten system prompts	5-15%	Low effort
Use examples efficiently	10-20%	Medium effort
Dynamic context injection	20-40%	Higher effort

┌─────────────────────────────────────────────────────────────────┐
│  💡 OPTIMIZATION: Prompt Compression                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Flow: qa-pipeline → generate-answer                            │
│                                                                  │
│  Current Avg Input:    2,450 tokens                             │
│  Potential Avg Input:  1,715 tokens  (30% reduction)            │
│                                                                  │
│  System Prompt Analysis:                                        │
│  • 45% of prompt is repeated boilerplate                       │
│  • Context section averages 800 tokens (often unused)          │
│                                                                  │
│  Estimated Savings:    $680/month                               │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Semantic Caching

Cache responses for semantically similar queries:

┌─────────────────────────────────────────────────────────────────┐
│  💡 OPTIMIZATION: Semantic Caching                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Flow: faq-bot                                                  │
│                                                                  │
│  Cache Hit Potential:   35%  (based on query similarity)        │
│  Similar Query Groups:  1,245 clusters identified               │
│                                                                  │
│  Example Cluster:                                               │
│  • "What are your hours?"                                       │
│  • "When are you open?"                                         │
│  • "Business hours please"                                      │
│  → Same response, 3 API calls → 1 API call + 2 cache hits       │
│                                                                  │
│  Estimated Savings:    $1,200/month  (35% reduction)            │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Request Batching

Group compatible requests into single API calls:

// Before: 10 separate API calls
for (const item of items) {
  await classify(item); // 10 requests × $0.002 = $0.02
}

// After: 1 batched call
await classifyBatch(items); // 1 request × $0.008 = $0.008 (60% savings)

Context Trimming

Optimize RAG context retrieval:

Issue	Solution	Savings
Irrelevant chunks	Better retrieval	20-40%
Redundant context	Deduplication	10-20%
Overly large chunks	Smart chunking	15-25%

Optimization Dashboard

┌────────────────────────────────────────────────────────────────────┐
│  OPTIMIZATION OPPORTUNITIES                                        │
├────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Total Potential Savings: $4,890/month                             │
│                                                                     │
│  ┌────────────────────────────────────────────────────────────────┐│
│  │ Priority │ Strategy        │ Flow           │ Savings │ Conf. ││
│  ├──────────┼─────────────────┼────────────────┼─────────┼───────┤│
│  │ 1        │ Model Switch    │ classify-intent│ $2,340  │ 97%   ││
│  │ 2        │ Semantic Cache  │ faq-bot        │ $1,200  │ 92%   ││
│  │ 3        │ Prompt Compress │ qa-pipeline    │ $680    │ 89%   ││
│  │ 4        │ Model Switch    │ summarizer     │ $450    │ 94%   ││
│  │ 5        │ Batching        │ classifier     │ $220    │ 96%   ││
│  └────────────────────────────────────────────────────────────────┘│
│                                                                     │
└────────────────────────────────────────────────────────────────────┘

Quality Validation

Before recommending changes, Ladger validates quality:

Validation Process

Baseline Capture: Store original model outputs
Replay Subset: Run 5-10% of historical requests through new model
Quality Scoring: Compare outputs using semantic similarity
Threshold Check: Only recommend if quality > 95%

Quality Metrics

Metric	Description	Threshold
Semantic Similarity	Embedding distance	> 0.95
Structure Match	JSON schema compliance	100%
Key Information	Critical data preserved	100%
Human Eval Sample	Manual review sample	> 90% approval

Implementing Recommendations

Gradual Rollout

Apply changes safely with gradual rollout:

┌─────────────────────────────────────────────────────────────────┐
│  ROLLOUT: classify-intent Model Switch                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Phase 1:  10% of traffic  ████░░░░░░░░░░░░░░░░  Running       │
│  Phase 2:  50% of traffic  ░░░░░░░░░░░░░░░░░░░░  Pending       │
│  Phase 3: 100% of traffic  ░░░░░░░░░░░░░░░░░░░░  Pending       │
│                                                                  │
│  Current Stats (Phase 1):                                       │
│  • Requests: 4,523 / 45,230                                    │
│  • Quality Score: 97.8% ✓                                      │
│  • Cost Saved: $234                                            │
│  • Errors: 0                                                    │
│                                                                  │
│  [Proceed to Phase 2]  [Rollback]  [View Details]               │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Code Implementation

Apply model switching in your code:

async function classifyIntent(message: string) {
  // Get recommendation from Ladger
  const config = await ladger.getConfig('classify-intent');

  return tracer.trace('classify-intent', async (span) => {
    const model = config.recommendedModel || 'gpt-4o';

    span.setAttributes({
      'optimization.applied': config.optimizationId,
      'model.original': 'gpt-4o',
      'model.actual': model,
    });

    const result = await openai.chat.completions.create({
      model,
      messages: [{ role: 'user', content: message }],
    });

    span.recordCost({
      provider: 'openai',
      model,
      inputTokens: result.usage?.prompt_tokens,
      outputTokens: result.usage?.completion_tokens,
    });

    return result.choices[0].message.content;
  });
}

Dismissing Recommendations

Dismiss recommendations that don’t fit your use case:

Why dismiss?
○ Quality requirements are higher than shown
○ Already tried, didn't work
○ Business constraints prevent this change
○ Will implement later
○ Other: _______________

[Dismiss]  [Dismiss & Don't Show Similar]

ROI Tracking

Track the impact of implemented optimizations:

┌────────────────────────────────────────────────────────────────────┐
│  OPTIMIZATION ROI                                                  │
├────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Total Savings (Last 30 Days):  $3,240                             │
│                                                                     │
│  Implemented Optimizations:                                        │
│  ┌────────────────────────────────────────────────────────────────┐│
│  │ Optimization       │ Status  │ Savings │ Quality │ Impl Date  ││
│  ├────────────────────┼─────────┼─────────┼─────────┼────────────┤│
│  │ Model: classify    │ Live    │ $2,100  │ 98.2%   │ Jan 5      ││
│  │ Cache: faq-bot     │ Live    │ $890    │ 99.1%   │ Jan 12     ││
│  │ Batch: classifier  │ Rolling │ $250    │ 100%    │ Jan 15     ││
│  └────────────────────────────────────────────────────────────────┘│
│                                                                     │
└────────────────────────────────────────────────────────────────────┘

Next Steps

Test changes safely with Simulations
Set up alerts in Cost Analysis