Millions of automotive parts and part numbers exist online — but the data is inconsistent, incomplete, and poorly described. A bare SKU with no description is effectively invisible to the mechanic searching for it. Kovil AI's embedded AI Engineer built an end-to-end cataloguing pipeline that ingested 2.1 million raw parts, deduplicated them to 1.4 million unique components, and generated technically accurate, LLM-enriched descriptions for every one — cutting returns by 27% and improving search-to-purchase conversion by 38%.
1.4M
Parts Catalogued
Deduplicated from 2.1M raw
95.8%
Auto-Published
Without manual intervention
38%
Search Conversion Lift
Within 60 days of launch
27%
Fewer Wrong-Part Returns
Quarter post-launch
Engineers Used
Tech Stack
The automotive aftermarket parts industry operates on a fundamental information problem. Millions of parts and part numbers are distributed across manufacturer websites, distributor catalogues, and legacy database systems — but the data is inconsistent, incomplete, and poorly described. A mechanic searching for a specific part number frequently encounters a bare SKU with no description of what the part is, what vehicle fitments it covers, what it does, or how it relates to adjacent parts. Distributors carry the same component from multiple manufacturers under different part numbers with no standardisation across them.
The client — an automotive parts platform serving both trade (workshops) and retail (DIY mechanics) customers — had access to a large catalogue of parts and part numbers scraped from manufacturer websites and supplier feeds. The data existed. What it lacked was organisation, enrichment, and the kind of human-readable description that allows a mechanic or parts counter staff to identify and recommend the right component with confidence.
The scale of the problem: the client's working catalogue contained approximately 2.1 million parts across 23 manufacturer families. Of these, fewer than 180,000 had even basic human-readable descriptions. The rest were bare part numbers with a category code and an occasional one-line label — effectively invisible in search results and useless for customer decision-making.
Automotive parts cataloguing is a domain requiring deep technical specificity. A description that cites the wrong fitment range, describes the wrong function, or conflates similar parts from different model years is worse than no description — it results in wrong parts being ordered, which generates returns, erodes trust, and wastes workshop time. Specific challenges the solution had to address:
Kovil AI embedded an AI Engineer into the client's team to design and build the cataloguing system end-to-end. The first two weeks were spent mapping the source data landscape — ingesting representative samples of each format, identifying schema variations, and building the normalisation rules that would allow all five source types to flow through a single pipeline cleanly.
For description generation, we evaluated several approaches before settling on a structured prompting strategy with GPT-4o. Rather than asking the model to describe a part from a bare part number, we enriched each input with all available structured data — manufacturer code, category, model year range, associated OEM numbers — before generating the description. This grounding dramatically reduced hallucination risk and produced descriptions accurate enough to pass spot-check review by the client's automotive specialists.
Quality was validated at scale using a separate classification model that scored each generated description on accuracy indicators: correct fitment language, consistent use of automotive terminology, absence of hedging markers that signalled the model was guessing. Descriptions below threshold were flagged for manual review rather than published automatically.
The ingestion pipeline processed all five source data formats. Each source was parsed, normalised to a common schema (part number, manufacturer code, category, fitment metadata, raw description if available), and deduplicated using both exact matching (same part number) and fuzzy matching (similar numbers from the same manufacturer family, matching fitment ranges, matching category codes). The pipeline produced a clean, deduplicated working catalogue as its primary output.
Deduplication yielded an immediate result: the 2.1 million parts in the raw catalogue reduced to 1.4 million unique components — a 33% reduction driven by cross-manufacturer duplicates and legacy data artefacts that had accumulated over years of catalogue management.
For each unique part, the description engine assembled a structured context block — all available factual data about the component — and submitted it to GPT-4o with an engineered prompt specifying: the target audience (trade mechanics and parts counter staff), the required description format (fitment range, function, installation notes, OEM equivalents where available), and explicit constraints (no speculative language, no unsupported fitment claims, metric and imperial specifications where applicable).
Descriptions were generated in batches of 500, with parallel processing to manage throughput. The full catalogue of 1.4 million unique parts was described in 18 days of pipeline operation — a rate that would have taken a human cataloguing team years at equivalent quality.
Every generated description passed through a validation model before publication. The validator checked for: technically inconsistent fitment claims, terminology errors, generic language that didn't contain part-specific information, and descriptions shorter than a minimum useful length. The reject rate across the full catalogue was 4.2% — these 58,800 descriptions were routed to a human review queue where the client's parts specialists resolved them at their own pace.
The final component was a search layer built on the enriched catalogue: a full-text search index combining structured catalogue data with the AI-generated descriptions, enabling trade and retail customers to find parts by function, fitment, OEM number, or competitor cross-reference. The cross-reference data built during deduplication meant a mechanic searching for a manufacturer-specific part number would surface all equivalent aftermarket options simultaneously.
The cataloguing pipeline now runs continuously, processing new parts arriving from supplier feeds within 24 hours of ingestion — keeping the catalogue current without ongoing manual effort.
Start Your Project
See the engagement model that fits your situation.