Case Study AI • Media & Entertainment

From Manual Metadata to Intelligent Video Search

Turning 2.5 million videos into searchable, time-addressable assets with a cost-efficient hybrid AI pipeline.

AI Pipeline Hybrid Architecture Video Intelligence
client: Kurator • Nimia Jan 2026
Group 5
2.5M
Videos processed & indexed
~100×
Cost reduction vs. cloud-only
95%+
Transcription word accuracy
30K+
Hours of long-form content
~0
Manual tagging remaining
Near-complete elimination.
hrs → min
Time saved per upload batch
From hours of manual tagging to minutes of quality review.
$2.5K
Cost per additional GPU unit
Scalable throughput by adding on-prem GPUs vs. six-figure contracts.
Kurator
Media & Entertainment Broadcast Footage Video Licensing
Kurator • Nimia
Video licensing & discovery

A video discovery platform with millions of high-value assets

Kurator is a video licensing and discovery platform within Nimia, serving major media and entertainment buyers with high-value archival and broadcast footage — including news, interviews, and historical content.

The platform's core value is helping customers find the right moment inside long video assets, then enabling easy purchase with confidence in rights management. But at scale, that promise depended entirely on metadata quality.

2.5M+
Videos in catalog
30K+
Hours of long-form content

Search and tagging had become a serious bottleneck

With millions of videos, search and tagging had become a serious bottleneck. Several cloud-first and vendor-based approaches were explored but rejected due to cost, accuracy, and data transfer trade-offs that made them impractical at Kurator's scale.

challenge 01

Manual tagging didn't scale

Teams spent hours per batch entering transcripts, metadata, keywords, and compliance flags — often with inconsistent results across the catalog.

Operations
challenge 02

Manual tagging didn't scale

Teams spent hours per batch entering transcripts, metadata, keywords, and compliance flags — often with inconsistent results across the catalog.

Operations
challenge 01

Manual tagging didn't scale

Teams spent hours per batch entering transcripts, metadata, keywords, and compliance flags — often with inconsistent results across the catalog.

Operations
challenge 01

Manual tagging didn't scale

Teams spent hours per batch entering transcripts, metadata, keywords, and compliance flags — often with inconsistent results across the catalog.

Operations

A hybrid AI video intelligence pipeline built for scale

AccelOne designed and built a multi-model hybrid execution architecture that balances performance with economics — running heavy inference on-premises while using cloud services selectively and only when necessary.

Intelligent video processing platform

A hybrid architecture balancing performance, security, and cost.

~1000× cost reduction
three orders of magnitude
Video library
Millions of videos · Legacy to 8K
On-premises · GPU machines
Heavy Inference
Tech name
Tech name
Tech name
Transcription · Video analysis
~$2.5K–$3K per GPU unit
SELECTIVE
Cloud · AWS
Orchestration & delivery
Tech name
Tech name
Tech name
Selective API usage only
Rekognition only when faces detected
▼ OUTPUT
Transcript
Transcript
VTT · Time-coded
Tags & metadata
Tags & metadata
Auto-generated · Structured
Celebrity detection
Celebrity detection
High-confidence · Gated
Searchable index
Searchable index
Frame-accurate navigation
Key Results
Millions of videos processed · Scalable · Secure
1000×
Cost reduction
95%+
Transcription accuracy
Millions
Videos processed

7-step AI pipeline

Each component is modular and optimized for cost and reliability at multi-million-video scale. The system analyzes every video, extracts meaningful signals, and makes long-form content searchable down to the exact moment.

Step 01

Hybrid execution model

Heavy video inference runs on on-prem GPU machines, while AWS handles orchestration, staging, and delivery. Avoids runaway cloud costs at scale.

Architecture
Step 02

Cost-aware frame sampling

Instead of analyzing every frame, the pipeline samples one frame every two seconds — selected through testing to balance coverage, accuracy, and cost.

1 frame / 2 sec
Step 03

On-prem vision analysis

Sampled frames are analyzed using Gemma 3 running locally on GPUs. The model generates concise on-screen descriptions that feed metadata and summaries.

Gemma 3
Step 04

Gated celebrity detection

Face detection runs first using OpenCV. Only when faces are present does the system invoke AWS Rekognition for celebrity.

OpenCV
Step 05

Inference optimization

Frames are resized and combined into mosaic batches before being sent to Rekognition, cutting external API calls by up to 50× while preserving detection accuracy.

50× fewer API calls
Step 06

Quality & reliability controls

The pipeline filters blank frames, removes blurred images, normalizes transcription artifacts, and includes modular retries and validation for production reliability.

Production-grade
archive 1

Real outcomes, measurable impact

Manual tagging is nearly eliminated, results are reliable enough for day-to-day production use, and the cost is dramatically lower than any traditional cloud-based approach.

~100×
Overall cost reduction

Compared to cloud-only or vendor pipelines. Throughput scales by adding low-cost GPU machines at $2.5K–$3K each — shifting video intelligence from a capital project into a repeatable operational capability.

95%+
Transcription word accuracy

Consistently exceeds 95% in spot-checked samples, approaching human-level performance under good audio conditions. Powers keyword search, time-based navigation, and downstream metadata extraction.

hrs→min
Reduction in tagging time

Manual metadata entry reduced from hours to minutes per batch. Teams perform a quick spot-check and add only information requiring human judgment, freeing them to focus on quality.

~1000×
Reduction in high-volume processing paths

In specific high-volume processing paths, including the 50× reduction in external API calls to AWS Rekognition via mosaic batching optimization.

Cost comparison by architecture approach

Relative cost normalized to cloud-only baseline (100%)

Cloud-only pipeline
100%
Vendor contract
100%
100%

Looking for a relevant example or similar engagement?
We’re happy to walk through comparable work in a short conversation.