Gen AI Review Summary

2024, 2025

Designing the company's first AI product — and saving millions in returns

Role

Design Lead

Timeline

6 months

Impact

$10M annualized value from reduced returns across platforms, showing increased user trust in purchasing decisions based on reviews.

Successful launch of Gap's first customer-facing AI product

Role

Design Lead

Timeline

6 months

Impact

$10M annualized value from reduced returns across platforms, showing increased user trust in purchasing decisions based on reviews.

Successful launch of Gap's first customer-facing AI product

The Highlight

Reviews are a critical factor in any online clothing purchase decision, where interacting with reviews leads to a 10% increase in conversion rate. However, it can be tough to search through thousands of reviews.


As a solution, I designed the company’s first customer-facing generative AI feature in a way that built trust through transparency, and saved the company over $10M across platforms through increased user confidence in their purchases.

The Problem

User reviews help make purchases. AI-generated review summaries? Not so much.

The business problem

  • Users who interact with reviews are 10% more likely to convert.

  • However, when products have 5,000+ reviews, users struggle to read them, creating decision fatigue and cart abandonment.

Brand goal

Leadership wanted to pilot an AI summary to help users make faster decisions. Brand stakeholders also wanted to mask negative feedback.

Success metrics

  • Primary: Add-to-Bag

  • Secondary: Customer trust, conversion, decreased returns

The real risk identified

Users don't trust AI summaries — they trust other customers. If an AI summary omitted negatives and users discovered flaws after delivery, this would cause a returns spike AND destroy customer trust in both the feature and the brand.

What was at stake:

10%

10%

A 10% drop in conversion rate (from 18% to 8% according to metric data)

Trust

Trust

We lose customer trust and retention if the summary is not accurate or trustworthy

Returns

Returns

Returns

Increased returns due to fragmented customer decision making, costing millions

User behavior

Users don't want universal praise. They want to see if it's a right fit.

Users rely on negative reviews

When evaluating products, users don't just look for praise; they actively seek negative reviews to understand how a product might fail to meet their specific needs.

A product that's "not durable" matters differently for workout gear vs. casual wear.

The trust problem

Users have learned through experience that brands tend to hide flaws. When they see only positive information:

  • They assume the company is hiding something

  • They distrust the AI as being biased toward making sales

  • They second-guess their purchase decision

“I feel like this brand really is being transparent…I don’t get the feeling they’re only publishing certain reviews, but that they’re literally taking customers’ voices and putting it here.”

The core insight: Transparency doesn't reduce trust, omission does.

The design

How I turned AI review summaries into a trusted decision-making tool

Trust through transparency, scaled to user behavior

Guided ‘sentiment analysis’ keyword buckets

I prioritized highlighting keywords to help users quickly identify what mattered most to reviewers.

Guided ‘sentiment analysis’ keyword buckets

I prioritized highlighting keywords to help users quickly identify what mattered most to reviewers.

Trust-building design decisions

I advocated for showing the negatives despite brand resistance and added in a disclaimer, as transparency is critical to user trust.

Layer in additional context

I introduced moderate-length review summaries for additional info, highlighting key points for scannability.

Layer in additional context

I introduced moderate-length review summaries for additional info, highlighting key points for scannability.

Results

User trust, due to transparency, resulted in lower returns

While metrics were flat, my strategic thinking found the real win

A/B testing results

Two different placement options (above and below the review star ratings & slider scale) were tested against the current experience with a 30-day window to account for the return policy.

Challenger B showed the most impact, where the summary added context to the ratings of the product & overall sentiment.

Challenger A

Reviews set above the review star rating, histogram, & sliders

*Challenger B*

Reviews set below the review star rating, histogram, & sliders

+0.7%

+0.7%

+0.7% lift in conversion & add-to-bag

≈ Little incremental impact

$2.5M

$2.5M

–0.9% reduction in returns

≈ $2.5M Annualized Value

From usability testing:

"Right away, I feel that this paints a more comprehensive and accurate picture simply because there are dislikes...I think it offers nuance and highlights the fact that one size doesn't really fit all."

Impact

How my GenAI feature work became the company standard for AI

Long-term strategic impact:

Framework that became standard:

  • The 3-bucket transparency approach set the design pattern for AI summaries across all brands

    • Currently live for Athleta and Old Navy

  • The design was updated for flexibility of use across brands and implemented into the design system.

  • Rolled out to mobile apps (native iOS and Android) with comparable A/B test performance.

Organizational influence:

The success of this project led to the formation of a dedicated AI team and "Office of AI" within the tech org. Gap's first customer-facing AI became the blueprint.

The images below are screenshots taken from the live online experience across mobile web and native mobile app.

Live screenshot from Athleta website with redesigned module

Live screenshot from Athleta mobile app (Android)

The best AI experiences don't hide imperfections,
they help users navigate them.

The best AI experiences don't hide imperfections,
they help users navigate them.

The real lesson here shows that strategic questioning about trust and verification, with an understanding of the users' current sentiment of AI, shaped a design that prioritized user confidence over brand perfection.

The real lesson here shows that strategic questioning about trust and verification, with an understanding of the users' current sentiment of AI, shaped a design that prioritized user confidence over brand perfection.