From Manual Metadata to Intelligent Video Search
Turning 2.5 million videos into searchable, time-addressable assets with a cost-efficient hybrid AI pipeline.
A video discovery platform with millions of high-value assets
Kurator is a video licensing and discovery platform within Nimia, serving major media and entertainment buyers with high-value archival and broadcast footage — including news, interviews, and historical content.
The platform's core value is helping customers find the right moment inside long video assets, then enabling easy purchase with confidence in rights management. But at scale, that promise depended entirely on metadata quality.
Search and tagging had become a serious bottleneck
With millions of videos, search and tagging had become a serious bottleneck. Several cloud-first and vendor-based approaches were explored but rejected due to cost, accuracy, and data transfer trade-offs that made them impractical at Kurator's scale.
Manual tagging didn't scale
Teams spent hours per batch entering transcripts, metadata, keywords, and compliance flags — often with inconsistent results across the catalog.
OperationsManual tagging didn't scale
Teams spent hours per batch entering transcripts, metadata, keywords, and compliance flags — often with inconsistent results across the catalog.
OperationsManual tagging didn't scale
Teams spent hours per batch entering transcripts, metadata, keywords, and compliance flags — often with inconsistent results across the catalog.
OperationsManual tagging didn't scale
Teams spent hours per batch entering transcripts, metadata, keywords, and compliance flags — often with inconsistent results across the catalog.
OperationsA hybrid AI video intelligence pipeline built for scale
AccelOne designed and built a multi-model hybrid execution architecture that balances performance with economics — running heavy inference on-premises while using cloud services selectively and only when necessary.
Intelligent video processing platform
A hybrid architecture balancing performance, security, and cost.
7-step AI pipeline
Each component is modular and optimized for cost and reliability at multi-million-video scale. The system analyzes every video, extracts meaningful signals, and makes long-form content searchable down to the exact moment.
Hybrid execution model
Heavy video inference runs on on-prem GPU machines, while AWS handles orchestration, staging, and delivery. Avoids runaway cloud costs at scale.
Cost-aware frame sampling
Instead of analyzing every frame, the pipeline samples one frame every two seconds — selected through testing to balance coverage, accuracy, and cost.
On-prem vision analysis
Sampled frames are analyzed using Gemma 3 running locally on GPUs. The model generates concise on-screen descriptions that feed metadata and summaries.
Gated celebrity detection
Face detection runs first using OpenCV. Only when faces are present does the system invoke AWS Rekognition for celebrity.
Inference optimization
Frames are resized and combined into mosaic batches before being sent to Rekognition, cutting external API calls by up to 50× while preserving detection accuracy.
Quality & reliability controls
The pipeline filters blank frames, removes blurred images, normalizes transcription artifacts, and includes modular retries and validation for production reliability.
Real outcomes, measurable impact
Manual tagging is nearly eliminated, results are reliable enough for day-to-day production use, and the cost is dramatically lower than any traditional cloud-based approach.
Compared to cloud-only or vendor pipelines. Throughput scales by adding low-cost GPU machines at $2.5K–$3K each — shifting video intelligence from a capital project into a repeatable operational capability.
Consistently exceeds 95% in spot-checked samples, approaching human-level performance under good audio conditions. Powers keyword search, time-based navigation, and downstream metadata extraction.
Manual metadata entry reduced from hours to minutes per batch. Teams perform a quick spot-check and add only information requiring human judgment, freeing them to focus on quality.
In specific high-volume processing paths, including the 50× reduction in external API calls to AWS Rekognition via mosaic batching optimization.
Cost comparison by architecture approach
Relative cost normalized to cloud-only baseline (100%)
Open-source intelligence, cloud applied selectively
The pipeline combines open-source models with selective cloud services, running primarily on on-prem GPU infrastructure to deliver production-grade accuracy while avoiding cost, lock-in, and unpredictability.
Whisper
Speech-to-text transcription. Time-coded VTT output for keyword search and frame-accurate navigation.
Open SourceGemma 3
On-prem vision-language model for frame analysis. Generates on-screen descriptions that feed metadata and summaries.
Open SourceOpenCV
Lightweight real-time face detection. Acts as a gate to prevent unnecessary stream processing and API costs.
Open SourceRecognition
Celebrity identification. Used selectively — only invoked after face detection confirms a face is present in frame.
Cloud · SelectiveThis hybrid approach delivers production-grade accuracy while avoiding the cost, lock-in, and unpredictability of cloud-only architectures — making large-scale video intelligence economically sustainable at 2.5M+ video scale.
“Lorem ipsum dolor sit amet, consectetur adipiscing
elit, sed do eiusmod tempor incididunt ut labore
et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex
ea commodo consequat.
A searchable, time-addressable catalogue spanning millions of hours
What was previously locked behind sparse tagging and manual review is now discoverable at scale — while balancing cost efficiency with production-grade reliability.
Hours per batch of manual tagging with inconsistent results across the catalog
Incomplete metadata limited search quality and buyer confidence at point of purchase
Manual scrubbing required to find short clips inside hour-long broadcast footage
Six-figure vendor contracts for cloud-only AI at Kurator's scale — cost-prohibitive
Minutes of spot-check per batch. Tagging is automatic, consistent, and complete.
95%+ transcription accuracy powering search, navigation, and metadata across the full catalog
Frame-accurate navigation — jump directly to any moment in millions of hours of content
~100× cost reduction. GPU on-prem at $2.5K–$3K per unit for scalable, repeatable throughput
The catalogue is now discoverable by:
What's said
Keyword search via time-coded VTT transcription
What's shown
Frame-level descriptions from Gemma 3 vision analysis
Who appears
Celebrity identification via gated AWS Recognition
When it happens
Timecode-level navigation inside any long-form asset
Project Based FAQ
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Book a CallLorem ipsum dolor sit amet, consectetur adipiscing elit.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Real outcomes, measurable impact
From FinTech to Government and Enterprise, we help organizations achieve faster delivery, higher quality, and sustainable innovation.