Step-by-step walkthrough of a production entertainment deployment — Gemini Vision content tagging pipeline, Vertex AI Search index design, real-time update architecture, and A/B test measurement. Including the decisions we got right, and the ones we had to redo.
A global streaming platform with 85 million subscribers and a catalogue of 240,000 hours of content was operating on a manual content tagging and cataloguing process. Each new title required human editorial review — assigning genre tags, content descriptors, mood labels, and thematic keywords — taking an average of 6 hours per title.
The result: 22% of the catalogue was effectively undiscoverable via their search and recommendation systems because it lacked sufficient tag coverage. New content took days to become discoverable after upload, affecting engagement on new releases during their critical first-week window.
The engineering team had evaluated traditional computer vision classification models but found them too narrow — each model could classify one taxonomy dimension (genre, mood, age rating) and required separate training data for each. They needed a single system that could handle the full tag taxonomy from a single pass.
Gemini Vision's multimodal capability was the decisive factor: a single Gemini Vision call can analyse video frames, audio transcripts, and textual metadata simultaneously — generating a comprehensive tag set across all taxonomy dimensions in one pass. Traditional CV models would have required separate models and separate inference pipelines for each tag category.
Vertex AI Search provided the discovery layer: once titles were tagged, Vertex AI Search indexed the enriched metadata with semantic embeddings, enabling the recommendation engine and search interface to surface content based on meaning rather than keyword matching. A user searching for "intense psychological drama" finds relevant content even when those exact words don't appear in the title or description.
Collaborative filtering (item-item or user-item matrix factorisation) requires dense interaction history — it fails for new products and new users. Gemini embeddings capture semantic meaning from product attributes, so new products are immediately recommendable and cross-category discovery is natural. For a 240K-item catalogue with frequent new arrivals, Gemini embeddings dramatically outperform collaborative filtering on catalogue coverage.
Vertex AI Matching Engine is Google's managed ANN service — it handles index updates, scaling, and serving infrastructure automatically. For a production streaming platform at 85M users, the managed infrastructure matters: we don't want to run and scale a vector database cluster. Matching Engine supports brute-force (exact) and ScaNN (approximate) retrieval, with configurable accuracy/latency tradeoffs, and integrates natively with Vertex AI and Gemini under the same IAM and VPC perimeter.
The user feature store needs sub-millisecond read latency at millions of QPS — that's Bigtable's design point. Redis can work at smaller scale but adds operational complexity. Bigtable is managed, scales horizontally without resharding, and integrates with Dataflow for streaming writes. We use a row key design that partitions by user_id with consistent hashing to avoid hotspots, and set TTLs on historical interaction features to control storage costs.
Don't assume you can launch with collaborative filtering for warm users and add cold-start handling later. Design the cold-start fallback as a first-class feature: session signals, entry source, device, time-of-day, and category-level popularity need to be wired from day one.
Generating recommendations on every page render adds 100–300ms to load time. We pre-fetch the top-100 recommendations for each user at session initialisation, cache them in the browser session, and refresh asynchronously. The serving latency the user experiences is effectively zero.
Day-of-week effects, email campaign traffic, and weekend browsing patterns mean 3-day tests produce misleading results. We always run for a minimum of 2 weeks covering at least 2 full business cycles, and we pre-register our primary metric and minimum detectable effect before the test starts to avoid p-hacking.
The ML team wanted to optimise for recommendation CTR. The business team cared about revenue per session. These are different objectives and can pull in different directions. We align on one primary metric before building anything, and make the business team sign off on the A/B test design before launch.