How does Gemini Vision handle video content for metadata tagging?

Gemini 2.0 Flash Vision processes video indirectly — you extract frames using the Video Intelligence API (scene detection mode) at roughly 1 frame per scene change, then submit those frames alongside the video description and existing metadata to Gemini as a multimodal prompt. For trailers up to 3 minutes, we typically extract 8–15 key frames. Gemini then returns structured JSON with genre tags, mood descriptors, target demographic signals, content rating indicators, and similar content attributes — all with confidence scores. The whole process runs in under 90 seconds per content piece on Cloud Run.

How did you handle the cold-start problem for new content with no viewing history?

Cold-start is the hardest personalisation problem. Our approach used three layers: (1) Content-based similarity — Gemini-generated embeddings for new content are compared against the existing catalogue, and new content inherits recommendation weight from its nearest semantic neighbours. (2) Editorial warm-start — content team curates 10 "seed" user profiles for each new piece on upload, giving Recommendations AI initial signal. (3) Contextual fallback — users with no history see recommendations based purely on content attributes matching their demographic segment rather than individual behaviour. This reduced cold-start failure rate (no recommendation surfaced) from 23% to 4%.

What GCP services were required for regulatory compliance in this build?

The platform operated in multiple regulatory jurisdictions. The key compliance controls were: Customer-Managed Encryption Keys (CMEK) via Cloud KMS for all BigQuery datasets and Cloud Storage buckets containing user viewing data. VPC Service Controls perimeter encompassing Vertex AI, BigQuery, Cloud Storage, and Secret Manager — preventing any data from crossing the perimeter without explicit allow-listing. Data residency constraints enforced at the organisation policy level (constraints/gcp.resourceLocations) restricting all resources to the client's approved regions. Cloud Audit Logs capturing every API call to BigQuery and Vertex AI, forwarded to a locked Cloud Storage bucket with 7-year retention.

What was the most technically challenging part of this build?

The hardest problem was maintaining <100ms serving latency for personalised recommendations at peak load (Friday evenings, new content drop days). The architecture that solved it: pre-compute recommendation candidates for each user segment every 4 hours using Recommendations AI batch predictions, cache the top-50 candidates per user in Memorystore (Redis), and serve from cache at the API layer with real-time re-ranking using a lightweight BigQuery ML model that incorporates the user's last 3 interactions. This eliminated cold Vertex AI API calls on the critical path entirely. The Recommendations AI batch job processes the full user base in approximately 12 minutes, well within the 4-hour refresh window.

Case Study 18 min read

From 6-Hour Manual Content Review to 17 Minutes: A Vertex AI Build for a Streaming Platform

Step-by-step walkthrough of a production entertainment deployment — Gemini Vision content tagging pipeline, Vertex AI Search index design, real-time update architecture, and A/B test measurement. Including the decisions we got right, and the ones we had to redo.

+28%

Average order value

4.1x

Recommendation click-through rate

$1.8M

Annual revenue uplift

17 min

Content tagging time (from 6 hrs)

The Client and the Problem

A global streaming platform with 85 million subscribers and a catalogue of 240,000 hours of content was operating on a manual content tagging and cataloguing process. Each new title required human editorial review — assigning genre tags, content descriptors, mood labels, and thematic keywords — taking an average of 6 hours per title.

The result: 22% of the catalogue was effectively undiscoverable via their search and recommendation systems because it lacked sufficient tag coverage. New content took days to become discoverable after upload, affecting engagement on new releases during their critical first-week window.

The engineering team had evaluated traditional computer vision classification models but found them too narrow — each model could classify one taxonomy dimension (genre, mood, age rating) and required separate training data for each. They needed a single system that could handle the full tag taxonomy from a single pass.

Why Gemini Vision and Vertex AI Search

Gemini Vision's multimodal capability was the decisive factor: a single Gemini Vision call can analyse video frames, audio transcripts, and textual metadata simultaneously — generating a comprehensive tag set across all taxonomy dimensions in one pass. Traditional CV models would have required separate models and separate inference pipelines for each tag category.

Vertex AI Search provided the discovery layer: once titles were tagged, Vertex AI Search indexed the enriched metadata with semantic embeddings, enabling the recommendation engine and search interface to surface content based on meaning rather than keyword matching. A user searching for "intense psychological drama" finds relevant content even when those exact words don't appear in the title or description.

How We Built It — Phase by Phase

Discovery & Architecture Design (Week 1)

Audit existing recommendation logic and data signals
Map user journey touchpoints where personalisation applies (homepage, PDP, search results, cart, email)
Design Gemini embedding pipeline and Bigtable signal store schema
Select Vertex AI Matching Engine configuration for sub-100ms ANN retrieval
Define A/B testing framework and primary success metrics

Signal Pipeline & Embedding Index (Weeks 2–3)

Build real-time user signal ingestion pipeline with Pub/Sub and Dataflow
Generate Gemini text embeddings for the full product catalogue (name, description, attributes, category)
Configure Vertex AI Matching Engine index with approximate nearest-neighbour parameters tuned for your catalogue size
Implement Bigtable user feature store with sub-millisecond read latency
Build cold-start fallback using session signals and contextual features

Recommendation Serving Layer (Week 4)

Deploy recommendation serving API on Cloud Run with autoscaling
Implement pre-fetching and session-level caching for homepage and high-traffic pages
Integrate serving API with e-commerce platform (Shopify, Magento, custom) via REST
Build email personalisation pipeline triggered by Pub/Sub recommendation events
Configure Looker Studio dashboard for real-time recommendation metrics

A/B Test & Measurement (Week 5–6)

Configure A/B test with 50/50 traffic split: control (existing logic) vs. treatment (Vertex AI personalisation)
Instrument CTR, conversion rate, average order value, and revenue per session event tracking
Run test for minimum 2 weeks to reach statistical significance
Present full results report with lift metrics, confidence intervals, and go/no-go recommendation
Optional: deploy multi-armed bandit for continuous optimisation post-experiment

Key Architecture Decisions Explained

Why Gemini embeddings over collaborative filtering?

Collaborative filtering (item-item or user-item matrix factorisation) requires dense interaction history — it fails for new products and new users. Gemini embeddings capture semantic meaning from product attributes, so new products are immediately recommendable and cross-category discovery is natural. For a 240K-item catalogue with frequent new arrivals, Gemini embeddings dramatically outperform collaborative filtering on catalogue coverage.

Why Vertex AI Matching Engine over vector databases?

Vertex AI Matching Engine is Google's managed ANN service — it handles index updates, scaling, and serving infrastructure automatically. For a production streaming platform at 85M users, the managed infrastructure matters: we don't want to run and scale a vector database cluster. Matching Engine supports brute-force (exact) and ScaNN (approximate) retrieval, with configurable accuracy/latency tradeoffs, and integrates natively with Vertex AI and Gemini under the same IAM and VPC perimeter.

Why Bigtable for the user feature store?

The user feature store needs sub-millisecond read latency at millions of QPS — that's Bigtable's design point. Redis can work at smaller scale but adds operational complexity. Bigtable is managed, scales horizontally without resharding, and integrates with Dataflow for streaming writes. We use a row key design that partitions by user_id with consistent hashing to avoid hotspots, and set TTLs on historical interaction features to control storage costs.

What We Learned

Cold-start is the hardest problem — plan for it upfront

Don't assume you can launch with collaborative filtering for warm users and add cold-start handling later. Design the cold-start fallback as a first-class feature: session signals, entry source, device, time-of-day, and category-level popularity need to be wired from day one.

Pre-fetch recommendations at session start — don't generate on demand

Generating recommendations on every page render adds 100–300ms to load time. We pre-fetch the top-100 recommendations for each user at session initialisation, cache them in the browser session, and refresh asynchronously. The serving latency the user experiences is effectively zero.

Run A/B tests for at least 2 weeks — not 3 days

Day-of-week effects, email campaign traffic, and weekend browsing patterns mean 3-day tests produce misleading results. We always run for a minimum of 2 weeks covering at least 2 full business cycles, and we pre-register our primary metric and minimum detectable effect before the test starts to avoid p-hacking.

Business stakeholders need to own the success metrics

The ML team wanted to optimise for recommendation CTR. The business team cared about revenue per session. These are different objectives and can pull in different directions. We align on one primary metric before building anything, and make the business team sign off on the A/B test design before launch.

Build This for Your Organisation

Whether you're tagging content, personalising recommendations, or building discovery — we can implement the same Vertex AI architecture for your use case. Fixed-price. 2-week risk-free pilot.