From Manual Metadata to Intelligent Video Search
Turning millions of videos into searchable, usable assets with AI.
David Trumpey • Janurary 30, 2026 • AI

Summary
Kurator has one of the largest video libraries in the market, but like most media archives, a lot of its value was locked behind manual work. Tagging, transcription, and metadata all required hands-on effort.
AccelOne helped them ensure over 2.5 million videos are easy to search, reference, and purchase . Videos are automatically transcribed, tagged, and summarized, so teams can jump straight to the right moment instead of scrubbing through hours of footage.
Manual tagging is nearly eliminated, the results are reliable enough for day-to-day use, and the cost is dramatically lower than traditional cloud-based approaches.
About Kurator
Kurator is a a video licensing and discovery platform within Nimia, serving major media and entertainment buyers with high-value archival and broadcast footage.
Kurator manages a catalog of over 2.5 million videos, representing 30,000+ hours of long-form content, including news, interviews, and historical footage.
The challenge
Kurator’s value is in helping customers find the right moment inside long video assets and then enabling users to easily purchase those assets and have confidence in rights management. With millions of videos, search and tagging had become a serious bottleneck.
Several cloud-first and vendor-based approaches were explored early on, but were ultimately rejected due to cost, accuracy, and data transfer trade-offs that made them impractical at Kurator’s scale.
Key challenges included:
- Manual tagging did not scale. Teams spent hours per batch entering transcripts, metadata, keywords, and compliance flags, often with inconsistent results.
- Incomplete metadata hurt discovery. When hundreds of videos were uploaded at once, only partial tagging was applied, limiting search quality and buyer confidence.
- Critical moments were hard to find. Valuable clips, such as short interviews embedded in hour-long broadcasts, required time-consuming manual review.
- Cloud-only AI was cost-prohibitive. Vendor and cloud-only pipelines created unsustainable compute and data transfer costs at Kurator’s scale.
The solution
AccelOne designed and built a hybrid AI video intelligence pipeline optimized for scale, accuracy, and cost control. The system analyzes each video, extracts meaningful signals, and makes long-form content searchable down to the exact moment.
Rather than relying on a single cloud service, AccelOne engineered a multi-model hybrid execution architecture that balances performance with economics. The system was designed to enable easy search and purchase from large, long-tail media libraries, where cost, accuracy, and operational feasibility must be balanced at scale.
AccelOne was the perfect partner for Kurator with deep experience designing AI systems that are economically viable at production scale, not just technically impressive in isolation.
How the AI Pipeline Works
- Hybrid execution model
Heavy video inference runs on on-prem GPU machines, while AWS handles orchestration, staging, and delivery of inputs and outputs. This avoids runaway cloud compute and data transfer costs at multi-million–video scale.
- Speech-to-text transcription
Audio is extracted from each video and processed using Whisper (large), configured for English-only transcription to maximize accuracy. The output is a time-coded VTT file, enabling keyword search and frame-accurate navigation.
- Cost-aware frame sampling
Instead of analyzing every frame, the pipeline samples one frame every two seconds. This interval was selected through testing to balance coverage, accuracy, and feasibility across millions of hours of footage.
- On-prem vision analysis
Sampled frames are analyzed using a fast multimodal vision-language model (Gemma 3) running locally on GPUs, selected for its speed and cost efficiency at scale. The model generates concise on-screen descriptions that feed metadata and summaries.
- Gated celebrity detection
Face detection runs first using OpenCV-based computer vision, a lightweight real-time approach that prevents unnecessary downstream processing. Only when faces are present does the system invoke AWS Rekognition for celebrity identification, ensuring high-confidence results while minimizing costly external calls.
- Inference optimization
Frames are resized and combined into mosaic batches before being sent to Rekognition, cutting external API calls by up to 50× while preserving detection accuracy.
- Quality and reliability controls
The pipeline filters blank or single-color frames, removes blurred images that can trigger hallucinations, and normalizes transcription artifacts such as repeated phrases. Modular components, retries, and validation steps improve production reliability.
All outputs include transcripts, detections, structured metadata, and summaries, which are indexed back into Kurator’s platform for search and playback.
Results and Impact
Near-Elimination of Manual Tagging
Manual metadata entry is now almost non-existent.
Instead of spending hours tagging each batch of uploads, Kurator’s team:
- Performs a quick spot check
- Adds only information that requires human judgment
This reduced metadata tagging efforts from hours to minutes per batch, freeing teams to focus on quality review rather than manual entry, while improving consistency and completeness across the entire catalog.
Production-Grade Transcription at Scale
Transcription accuracy consistently exceeds 95% word accuracy in spot-checked samples, approaching human-level performance under good audio conditions.
High transcription accuracy is critical because transcripts power:
- Keyword search across millions of assets
- Time-based navigation inside long videos
- Downstream metadata extraction and summaries
Cost Efficiency at Enterprise Scale
The hybrid architecture delivered:
- ~100× overall cost reduction compared to cloud-only or vendor pipelines
- Up to ~1000× reduction in specific high-volume processing paths
Instead of six-figure processing contracts, Kurator can scale throughput by adding low-cost GPU machines (approximately $2.5K–$3K each). This shifts video intelligence from a capital-intensive project into a scalable, repeatable operational capability.
Under the Hood
The pipeline combines open-source models (Whisper, Gemma 3, OpenCV) with selective cloud services (AWS Rekognition) and runs primarily on on-prem GPU infrastructure.
This hybrid approach delivers production-grade accuracy while avoiding the cost, lock-in, and unpredictability of cloud-only architectures, making large-scale video intelligence economically sustainable.
The Outcome
Kurator enhanced its video library and transformed it into a searchable, time-addressable catalogue spanning millions of hours of content.
What was previously locked behind sparse tagging and manual review is now discoverable by:
- What’s said
- What’s shown
- Who appears
- When it happens
All while balancing cost efficiency with production-grade reliability.
About the Author
David Trumpey, Chief Operating Officer of AccelOne.
Join our developer community to stay on
top of new releases, features, and updates.
Real outcomes, measurable impact
From FinTech to Government and Enterprise, we help organizations achieve faster delivery, higher quality, and sustainable innovation.