Optimization
Ladger’s optimization engine analyzes your AI usage and provides actionable recommendations with estimated savings and quality confidence scores.
Optimization Strategies
Model Switching
Switch to cheaper models for tasks that don’t require full capability:
┌─────────────────────────────────────────────────────────────────┐│ 💡 OPTIMIZATION: Model Switch │├─────────────────────────────────────────────────────────────────┤│ ││ Flow: customer-support → classify-intent ││ ││ Current Model: GPT-4o $0.030/request ││ Suggested Model: GPT-3.5-turbo $0.002/request ││ ││ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ││ ││ Estimated Savings: $2,340/month (93% reduction) ││ Quality Confidence: 97.2% (based on 500 replays) ││ Requests Analyzed: 45,230 ││ ││ [Simulate] [Apply to 10%] [Apply to All] [Dismiss] ││ │└─────────────────────────────────────────────────────────────────┘Model Switching Matrix
| Task Type | Low Complexity | Medium | High |
|---|---|---|---|
| Code Generation | GPT-3.5 | GPT-4o-mini | GPT-4o |
| Q&A/Retrieval | Haiku | Sonnet | Sonnet |
| Summarization | Haiku | GPT-3.5 | GPT-4o-mini |
| Data Extraction | GPT-3.5 | GPT-3.5 | GPT-4o-mini |
| Classification | Fine-tuned | GPT-3.5 | GPT-4o-mini |
| Planning | Sonnet | GPT-4o | Opus/o1 |
Prompt Compression
Reduce token usage by optimizing prompts:
| Technique | Savings | Implementation |
|---|---|---|
| Remove redundant context | 10-30% | Low effort |
| Shorten system prompts | 5-15% | Low effort |
| Use examples efficiently | 10-20% | Medium effort |
| Dynamic context injection | 20-40% | Higher effort |
┌─────────────────────────────────────────────────────────────────┐│ 💡 OPTIMIZATION: Prompt Compression │├─────────────────────────────────────────────────────────────────┤│ ││ Flow: qa-pipeline → generate-answer ││ ││ Current Avg Input: 2,450 tokens ││ Potential Avg Input: 1,715 tokens (30% reduction) ││ ││ System Prompt Analysis: ││ • 45% of prompt is repeated boilerplate ││ • Context section averages 800 tokens (often unused) ││ ││ Estimated Savings: $680/month ││ │└─────────────────────────────────────────────────────────────────┘Semantic Caching
Cache responses for semantically similar queries:
┌─────────────────────────────────────────────────────────────────┐│ 💡 OPTIMIZATION: Semantic Caching │├─────────────────────────────────────────────────────────────────┤│ ││ Flow: faq-bot ││ ││ Cache Hit Potential: 35% (based on query similarity) ││ Similar Query Groups: 1,245 clusters identified ││ ││ Example Cluster: ││ • "What are your hours?" ││ • "When are you open?" ││ • "Business hours please" ││ → Same response, 3 API calls → 1 API call + 2 cache hits ││ ││ Estimated Savings: $1,200/month (35% reduction) ││ │└─────────────────────────────────────────────────────────────────┘Request Batching
Group compatible requests into single API calls:
// Before: 10 separate API callsfor (const item of items) { await classify(item); // 10 requests × $0.002 = $0.02}
// After: 1 batched callawait classifyBatch(items); // 1 request × $0.008 = $0.008 (60% savings)Context Trimming
Optimize RAG context retrieval:
| Issue | Solution | Savings |
|---|---|---|
| Irrelevant chunks | Better retrieval | 20-40% |
| Redundant context | Deduplication | 10-20% |
| Overly large chunks | Smart chunking | 15-25% |
Optimization Dashboard
┌────────────────────────────────────────────────────────────────────┐│ OPTIMIZATION OPPORTUNITIES │├────────────────────────────────────────────────────────────────────┤│ ││ Total Potential Savings: $4,890/month ││ ││ ┌────────────────────────────────────────────────────────────────┐││ │ Priority │ Strategy │ Flow │ Savings │ Conf. │││ ├──────────┼─────────────────┼────────────────┼─────────┼───────┤││ │ 1 │ Model Switch │ classify-intent│ $2,340 │ 97% │││ │ 2 │ Semantic Cache │ faq-bot │ $1,200 │ 92% │││ │ 3 │ Prompt Compress │ qa-pipeline │ $680 │ 89% │││ │ 4 │ Model Switch │ summarizer │ $450 │ 94% │││ │ 5 │ Batching │ classifier │ $220 │ 96% │││ └────────────────────────────────────────────────────────────────┘││ │└────────────────────────────────────────────────────────────────────┘Quality Validation
Before recommending changes, Ladger validates quality:
Validation Process
- Baseline Capture: Store original model outputs
- Replay Subset: Run 5-10% of historical requests through new model
- Quality Scoring: Compare outputs using semantic similarity
- Threshold Check: Only recommend if quality > 95%
Quality Metrics
| Metric | Description | Threshold |
|---|---|---|
| Semantic Similarity | Embedding distance | > 0.95 |
| Structure Match | JSON schema compliance | 100% |
| Key Information | Critical data preserved | 100% |
| Human Eval Sample | Manual review sample | > 90% approval |
Implementing Recommendations
Gradual Rollout
Apply changes safely with gradual rollout:
┌─────────────────────────────────────────────────────────────────┐│ ROLLOUT: classify-intent Model Switch │├─────────────────────────────────────────────────────────────────┤│ ││ Phase 1: 10% of traffic ████░░░░░░░░░░░░░░░░ Running ││ Phase 2: 50% of traffic ░░░░░░░░░░░░░░░░░░░░ Pending ││ Phase 3: 100% of traffic ░░░░░░░░░░░░░░░░░░░░ Pending ││ ││ Current Stats (Phase 1): ││ • Requests: 4,523 / 45,230 ││ • Quality Score: 97.8% ✓ ││ • Cost Saved: $234 ││ • Errors: 0 ││ ││ [Proceed to Phase 2] [Rollback] [View Details] ││ │└─────────────────────────────────────────────────────────────────┘Code Implementation
Apply model switching in your code:
async function classifyIntent(message: string) { // Get recommendation from Ladger const config = await ladger.getConfig('classify-intent');
return tracer.trace('classify-intent', async (span) => { const model = config.recommendedModel || 'gpt-4o';
span.setAttributes({ 'optimization.applied': config.optimizationId, 'model.original': 'gpt-4o', 'model.actual': model, });
const result = await openai.chat.completions.create({ model, messages: [{ role: 'user', content: message }], });
span.recordCost({ provider: 'openai', model, inputTokens: result.usage?.prompt_tokens, outputTokens: result.usage?.completion_tokens, });
return result.choices[0].message.content; });}Dismissing Recommendations
Dismiss recommendations that don’t fit your use case:
Why dismiss?○ Quality requirements are higher than shown○ Already tried, didn't work○ Business constraints prevent this change○ Will implement later○ Other: _______________
[Dismiss] [Dismiss & Don't Show Similar]ROI Tracking
Track the impact of implemented optimizations:
┌────────────────────────────────────────────────────────────────────┐│ OPTIMIZATION ROI │├────────────────────────────────────────────────────────────────────┤│ ││ Total Savings (Last 30 Days): $3,240 ││ ││ Implemented Optimizations: ││ ┌────────────────────────────────────────────────────────────────┐││ │ Optimization │ Status │ Savings │ Quality │ Impl Date │││ ├────────────────────┼─────────┼─────────┼─────────┼────────────┤││ │ Model: classify │ Live │ $2,100 │ 98.2% │ Jan 5 │││ │ Cache: faq-bot │ Live │ $890 │ 99.1% │ Jan 12 │││ │ Batch: classifier │ Rolling │ $250 │ 100% │ Jan 15 │││ └────────────────────────────────────────────────────────────────┘││ │└────────────────────────────────────────────────────────────────────┘Next Steps
- Test changes safely with Simulations
- Set up alerts in Cost Analysis