Skip to content

Optimization

Ladger’s optimization engine analyzes your AI usage and provides actionable recommendations with estimated savings and quality confidence scores.

Optimization Strategies

Model Switching

Switch to cheaper models for tasks that don’t require full capability:

┌─────────────────────────────────────────────────────────────────┐
│ 💡 OPTIMIZATION: Model Switch │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Flow: customer-support → classify-intent │
│ │
│ Current Model: GPT-4o $0.030/request │
│ Suggested Model: GPT-3.5-turbo $0.002/request │
│ │
│ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │
│ │
│ Estimated Savings: $2,340/month (93% reduction) │
│ Quality Confidence: 97.2% (based on 500 replays) │
│ Requests Analyzed: 45,230 │
│ │
│ [Simulate] [Apply to 10%] [Apply to All] [Dismiss] │
│ │
└─────────────────────────────────────────────────────────────────┘

Model Switching Matrix

Task TypeLow ComplexityMediumHigh
Code GenerationGPT-3.5GPT-4o-miniGPT-4o
Q&A/RetrievalHaikuSonnetSonnet
SummarizationHaikuGPT-3.5GPT-4o-mini
Data ExtractionGPT-3.5GPT-3.5GPT-4o-mini
ClassificationFine-tunedGPT-3.5GPT-4o-mini
PlanningSonnetGPT-4oOpus/o1

Prompt Compression

Reduce token usage by optimizing prompts:

TechniqueSavingsImplementation
Remove redundant context10-30%Low effort
Shorten system prompts5-15%Low effort
Use examples efficiently10-20%Medium effort
Dynamic context injection20-40%Higher effort
┌─────────────────────────────────────────────────────────────────┐
│ 💡 OPTIMIZATION: Prompt Compression │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Flow: qa-pipeline → generate-answer │
│ │
│ Current Avg Input: 2,450 tokens │
│ Potential Avg Input: 1,715 tokens (30% reduction) │
│ │
│ System Prompt Analysis: │
│ • 45% of prompt is repeated boilerplate │
│ • Context section averages 800 tokens (often unused) │
│ │
│ Estimated Savings: $680/month │
│ │
└─────────────────────────────────────────────────────────────────┘

Semantic Caching

Cache responses for semantically similar queries:

┌─────────────────────────────────────────────────────────────────┐
│ 💡 OPTIMIZATION: Semantic Caching │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Flow: faq-bot │
│ │
│ Cache Hit Potential: 35% (based on query similarity) │
│ Similar Query Groups: 1,245 clusters identified │
│ │
│ Example Cluster: │
│ • "What are your hours?" │
│ • "When are you open?" │
│ • "Business hours please" │
│ → Same response, 3 API calls → 1 API call + 2 cache hits │
│ │
│ Estimated Savings: $1,200/month (35% reduction) │
│ │
└─────────────────────────────────────────────────────────────────┘

Request Batching

Group compatible requests into single API calls:

// Before: 10 separate API calls
for (const item of items) {
await classify(item); // 10 requests × $0.002 = $0.02
}
// After: 1 batched call
await classifyBatch(items); // 1 request × $0.008 = $0.008 (60% savings)

Context Trimming

Optimize RAG context retrieval:

IssueSolutionSavings
Irrelevant chunksBetter retrieval20-40%
Redundant contextDeduplication10-20%
Overly large chunksSmart chunking15-25%

Optimization Dashboard

┌────────────────────────────────────────────────────────────────────┐
│ OPTIMIZATION OPPORTUNITIES │
├────────────────────────────────────────────────────────────────────┤
│ │
│ Total Potential Savings: $4,890/month │
│ │
│ ┌────────────────────────────────────────────────────────────────┐│
│ │ Priority │ Strategy │ Flow │ Savings │ Conf. ││
│ ├──────────┼─────────────────┼────────────────┼─────────┼───────┤│
│ │ 1 │ Model Switch │ classify-intent│ $2,340 │ 97% ││
│ │ 2 │ Semantic Cache │ faq-bot │ $1,200 │ 92% ││
│ │ 3 │ Prompt Compress │ qa-pipeline │ $680 │ 89% ││
│ │ 4 │ Model Switch │ summarizer │ $450 │ 94% ││
│ │ 5 │ Batching │ classifier │ $220 │ 96% ││
│ └────────────────────────────────────────────────────────────────┘│
│ │
└────────────────────────────────────────────────────────────────────┘

Quality Validation

Before recommending changes, Ladger validates quality:

Validation Process

  1. Baseline Capture: Store original model outputs
  2. Replay Subset: Run 5-10% of historical requests through new model
  3. Quality Scoring: Compare outputs using semantic similarity
  4. Threshold Check: Only recommend if quality > 95%

Quality Metrics

MetricDescriptionThreshold
Semantic SimilarityEmbedding distance> 0.95
Structure MatchJSON schema compliance100%
Key InformationCritical data preserved100%
Human Eval SampleManual review sample> 90% approval

Implementing Recommendations

Gradual Rollout

Apply changes safely with gradual rollout:

┌─────────────────────────────────────────────────────────────────┐
│ ROLLOUT: classify-intent Model Switch │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Phase 1: 10% of traffic ████░░░░░░░░░░░░░░░░ Running │
│ Phase 2: 50% of traffic ░░░░░░░░░░░░░░░░░░░░ Pending │
│ Phase 3: 100% of traffic ░░░░░░░░░░░░░░░░░░░░ Pending │
│ │
│ Current Stats (Phase 1): │
│ • Requests: 4,523 / 45,230 │
│ • Quality Score: 97.8% ✓ │
│ • Cost Saved: $234 │
│ • Errors: 0 │
│ │
│ [Proceed to Phase 2] [Rollback] [View Details] │
│ │
└─────────────────────────────────────────────────────────────────┘

Code Implementation

Apply model switching in your code:

async function classifyIntent(message: string) {
// Get recommendation from Ladger
const config = await ladger.getConfig('classify-intent');
return tracer.trace('classify-intent', async (span) => {
const model = config.recommendedModel || 'gpt-4o';
span.setAttributes({
'optimization.applied': config.optimizationId,
'model.original': 'gpt-4o',
'model.actual': model,
});
const result = await openai.chat.completions.create({
model,
messages: [{ role: 'user', content: message }],
});
span.recordCost({
provider: 'openai',
model,
inputTokens: result.usage?.prompt_tokens,
outputTokens: result.usage?.completion_tokens,
});
return result.choices[0].message.content;
});
}

Dismissing Recommendations

Dismiss recommendations that don’t fit your use case:

Why dismiss?
○ Quality requirements are higher than shown
○ Already tried, didn't work
○ Business constraints prevent this change
○ Will implement later
○ Other: _______________
[Dismiss] [Dismiss & Don't Show Similar]

ROI Tracking

Track the impact of implemented optimizations:

┌────────────────────────────────────────────────────────────────────┐
│ OPTIMIZATION ROI │
├────────────────────────────────────────────────────────────────────┤
│ │
│ Total Savings (Last 30 Days): $3,240 │
│ │
│ Implemented Optimizations: │
│ ┌────────────────────────────────────────────────────────────────┐│
│ │ Optimization │ Status │ Savings │ Quality │ Impl Date ││
│ ├────────────────────┼─────────┼─────────┼─────────┼────────────┤│
│ │ Model: classify │ Live │ $2,100 │ 98.2% │ Jan 5 ││
│ │ Cache: faq-bot │ Live │ $890 │ 99.1% │ Jan 12 ││
│ │ Batch: classifier │ Rolling │ $250 │ 100% │ Jan 15 ││
│ └────────────────────────────────────────────────────────────────┘│
│ │
└────────────────────────────────────────────────────────────────────┘

Next Steps