🎯 Executive Summary
Last Updated: 2025-09-15 10:26 UTC
Best Framework by Metric
Metric | Winner | Score | Runner-up | Score |
Speed (files/sec) | kreuzberg_sync | 1.33 | extractous | 1.33 |
Memory Efficiency | kreuzberg_sync | 62.5 MB | extractous | 62.5 MB |
Quality Score | kreuzberg_sync | 85.0% | extractous | 85.0% |
Success Rate | kreuzberg_sync | 100.0% | extractous | 100.0% |
How we calculate these metrics
- Best Speed: Framework with lowest average extraction time for this file type
- Best Memory: Framework with lowest peak memory usage (RSS) for this file type
- Best Quality: Framework with highest average quality score (0-100 scale)
- Avg Time: Mean extraction time across all frameworks
Python pseudocode for calculations
| # Best Speed calculation
best_speed_fw = min(frameworks, key=lambda fw:
sum(result.extraction_time for result in fw_results) / len(fw_results))
# Best Memory calculation
best_memory_fw = min(frameworks, key=lambda fw:
sum(result.peak_memory_mb for result in fw_results) / len(fw_results))
# Best Quality calculation
best_quality_fw = max(frameworks, key=lambda fw:
sum(result.quality_score or 0 for result in fw_results) / len(fw_results))
# Average time across all frameworks
avg_time = sum(all_extraction_times) / total_test_count
|
File Type | Files | Best Speed | Best Memory | Best Quality | Avg Time (s) |
DOCX | 9 | kreuzberg_sync | kreuzberg_sync | kreuzberg_sync | 0.75 |
Calculation Details
- Avg Speed: Files per second throughput for this size category
- Avg Memory: Average peak memory usage across all files in category
- Success Rate: Percentage of successful extractions vs total attempts
- Best Framework: Framework with best combined performance score
Python code for file size metrics
| # Average speed (throughput) calculation
def calc_avg_speed(results):
total_files = len(results)
total_time = sum(result.extraction_time for result in results)
return total_files / total_time # files per second
# Average memory calculation
def calc_avg_memory(results):
return sum(result.peak_memory_mb for result in results) / len(results) if results else 0
# Success rate calculation
def calc_success_rate(results):
successful = sum(1 for result in results if result.status == ExtractionStatus.SUCCESS)
return (successful / len(results) if results else 0) * 100
# Best framework calculation
def find_best_framework(frameworks_results):
scores = {}
for framework, results in frameworks_results.items():
success_rate = calc_success_rate(results) / 100 # normalize to 0-1
throughput = calc_avg_speed(results)
scores[framework] = success_rate * throughput # combined score
return max(scores.items(), key=lambda x: x[1])[0] # framework with max score
|
Size Category | Files | Avg Speed (f/s) | Avg Memory (MB) | Success Rate | Best Framework |
Tiny | 9 | 1.33 | 62.5 | 100.0% | kreuzberg_sync |
Framework Comparison Matrix
Grading System
Grade Scale: A+ (95-100), A (90-94), B+ (85-89), B (80-84), C+ (75-79), C (70-74), D (60-69), F (<60)
Overall Score is weighted average: Speed 30% + Memory 20% + Quality 30% + Success 20%
Python code for grading system
| def score_to_grade(score):
'''Convert 0-100 score to letter grade'''
if score >= 95: return "A+"
elif score >= 90: return "A"
elif score >= 85: return "B+"
elif score >= 80: return "B"
elif score >= 75: return "C+"
elif score >= 70: return "C"
elif score >= 60: return "D"
else: return "F"
def calculate_overall_score(framework_results):
# Speed: files per second (normalized to 0-100)
speed = len(results) / sum(r.extraction_time for r in results)
speed_score = min(speed * 10, 100) # cap at 100
# Memory: lower is better (inverted score)
avg_memory = sum(r.peak_memory_mb for r in results) / len(results) if results else 0
memory_score = max(0, 100 - min(avg_memory, 100))
# Quality: direct average of quality scores
quality_score = sum(r.quality_score or 0 for r in results) / len(results) if results else 0
# Success rate: percentage successful
success_rate = (sum(1 for r in results if r.status == ExtractionStatus.SUCCESS) / len(results) * 100) if results else 0
# Weighted composite score (0-5 scale)
overall = (
speed_score * 0.3 +
memory_score * 0.2 +
quality_score * 0.3 +
success_rate * 0.2
) / 100 * 5 # convert to 0-5 scale
return {
"speed_grade": score_to_grade(speed_score),
"memory_grade": score_to_grade(memory_score),
"quality_grade": score_to_grade(quality_score),
"success_rate": success_rate,
"overall_score": overall
}
|
Framework | Formats | Speed Grade | Memory Grade | Quality Grade | Success % | Overall Score |
extractous | 52 | F | F | B+ | 100.0% | 3.6 |
kreuzberg_sync | 25 | F | F | B+ | 100.0% | 3.6 |
markitdown | 19 | F | F | B+ | 100.0% | 3.6 |
📊 Quick Navigation
🔍 Key Findings
- Fastest Framework: kreuzberg_sync (1.33 files/sec)
- Most Memory Efficient: kreuzberg_sync (62.5 MB avg)
- Best Quality: kreuzberg_sync (85.0% score)
- Most Challenging Format: DOCX (100.0% success)
📈 Methodology
Our benchmarks test 3 frameworks across 1 file types with 9 total test runs.
- Quality Assessment: Enabled by default
- Performance Profiling: CPU and memory tracked at 50ms intervals
- Timeout Protection: 300 seconds per file
- Test Categories: All file sizes from <100KB to >50MB
Learn more about our methodology →