Case Study 18 min read

From 6-Hour Manual Content Review to 17 Minutes: A Vertex AI Build for a Streaming Platform

Step-by-step walkthrough of a production entertainment deployment — Gemini Vision content tagging pipeline, Vertex AI Search index design, real-time update architecture, and A/B test measurement. Including the decisions we got right, and the ones we had to redo.

+28%
Average order value
4.1x
Recommendation click-through rate
$1.8M
Annual revenue uplift
17 min
Content tagging time (from 6 hrs)

The Client and the Problem

A global streaming platform with 85 million subscribers and a catalogue of 240,000 hours of content was operating on a manual content tagging and cataloguing process. Each new title required human editorial review — assigning genre tags, content descriptors, mood labels, and thematic keywords — taking an average of 6 hours per title.

The result: 22% of the catalogue was effectively undiscoverable via their search and recommendation systems because it lacked sufficient tag coverage. New content took days to become discoverable after upload, affecting engagement on new releases during their critical first-week window.

The engineering team had evaluated traditional computer vision classification models but found them too narrow — each model could classify one taxonomy dimension (genre, mood, age rating) and required separate training data for each. They needed a single system that could handle the full tag taxonomy from a single pass.

Why Gemini Vision and Vertex AI Search

Gemini Vision's multimodal capability was the decisive factor: a single Gemini Vision call can analyse video frames, audio transcripts, and textual metadata simultaneously — generating a comprehensive tag set across all taxonomy dimensions in one pass. Traditional CV models would have required separate models and separate inference pipelines for each tag category.

Vertex AI Search provided the discovery layer: once titles were tagged, Vertex AI Search indexed the enriched metadata with semantic embeddings, enabling the recommendation engine and search interface to surface content based on meaning rather than keyword matching. A user searching for "intense psychological drama" finds relevant content even when those exact words don't appear in the title or description.

How We Built It — Phase by Phase

01

Discovery & Architecture Design (Week 1)

  • Audit existing recommendation logic and data signals
  • Map user journey touchpoints where personalisation applies (homepage, PDP, search results, cart, email)
  • Design Gemini embedding pipeline and Bigtable signal store schema
  • Select Vertex AI Matching Engine configuration for sub-100ms ANN retrieval
  • Define A/B testing framework and primary success metrics
02

Signal Pipeline & Embedding Index (Weeks 2–3)

  • Build real-time user signal ingestion pipeline with Pub/Sub and Dataflow
  • Generate Gemini text embeddings for the full product catalogue (name, description, attributes, category)
  • Configure Vertex AI Matching Engine index with approximate nearest-neighbour parameters tuned for your catalogue size
  • Implement Bigtable user feature store with sub-millisecond read latency
  • Build cold-start fallback using session signals and contextual features
03

Recommendation Serving Layer (Week 4)

  • Deploy recommendation serving API on Cloud Run with autoscaling
  • Implement pre-fetching and session-level caching for homepage and high-traffic pages
  • Integrate serving API with e-commerce platform (Shopify, Magento, custom) via REST
  • Build email personalisation pipeline triggered by Pub/Sub recommendation events
  • Configure Looker Studio dashboard for real-time recommendation metrics
04

A/B Test & Measurement (Week 5–6)

  • Configure A/B test with 50/50 traffic split: control (existing logic) vs. treatment (Vertex AI personalisation)
  • Instrument CTR, conversion rate, average order value, and revenue per session event tracking
  • Run test for minimum 2 weeks to reach statistical significance
  • Present full results report with lift metrics, confidence intervals, and go/no-go recommendation
  • Optional: deploy multi-armed bandit for continuous optimisation post-experiment

Key Architecture Decisions Explained

Why Gemini embeddings over collaborative filtering?

Collaborative filtering (item-item or user-item matrix factorisation) requires dense interaction history — it fails for new products and new users. Gemini embeddings capture semantic meaning from product attributes, so new products are immediately recommendable and cross-category discovery is natural. For a 240K-item catalogue with frequent new arrivals, Gemini embeddings dramatically outperform collaborative filtering on catalogue coverage.

Why Vertex AI Matching Engine over vector databases?

Vertex AI Matching Engine is Google's managed ANN service — it handles index updates, scaling, and serving infrastructure automatically. For a production streaming platform at 85M users, the managed infrastructure matters: we don't want to run and scale a vector database cluster. Matching Engine supports brute-force (exact) and ScaNN (approximate) retrieval, with configurable accuracy/latency tradeoffs, and integrates natively with Vertex AI and Gemini under the same IAM and VPC perimeter.

Why Bigtable for the user feature store?

The user feature store needs sub-millisecond read latency at millions of QPS — that's Bigtable's design point. Redis can work at smaller scale but adds operational complexity. Bigtable is managed, scales horizontally without resharding, and integrates with Dataflow for streaming writes. We use a row key design that partitions by user_id with consistent hashing to avoid hotspots, and set TTLs on historical interaction features to control storage costs.

What We Learned

Cold-start is the hardest problem — plan for it upfront

Don't assume you can launch with collaborative filtering for warm users and add cold-start handling later. Design the cold-start fallback as a first-class feature: session signals, entry source, device, time-of-day, and category-level popularity need to be wired from day one.

Pre-fetch recommendations at session start — don't generate on demand

Generating recommendations on every page render adds 100–300ms to load time. We pre-fetch the top-100 recommendations for each user at session initialisation, cache them in the browser session, and refresh asynchronously. The serving latency the user experiences is effectively zero.

Run A/B tests for at least 2 weeks — not 3 days

Day-of-week effects, email campaign traffic, and weekend browsing patterns mean 3-day tests produce misleading results. We always run for a minimum of 2 weeks covering at least 2 full business cycles, and we pre-register our primary metric and minimum detectable effect before the test starts to avoid p-hacking.

Business stakeholders need to own the success metrics

The ML team wanted to optimise for recommendation CTR. The business team cared about revenue per session. These are different objectives and can pull in different directions. We align on one primary metric before building anything, and make the business team sign off on the A/B test design before launch.

Build This for Your Organisation

Whether you're tagging content, personalising recommendations, or building discovery — we can implement the same Vertex AI architecture for your use case. Fixed-price. 2-week risk-free pilot.