Scoring Web pages for Usability at scale

Reduced experiment planning time from 1–2 hours to 10 minutes by building an AI-powered workflow that translates stakeholder ideas into structured test plans.

THE PROBLEM

UX audits were:

  • Time-consuming (2–3 hours per page)

  • Inconsistent depending on who performed them

  • Hard to scale across multiple pages

  • Often delayed or deprioritized because of effort required

For my team, the cost was:

  • Missed optimization opportunities

  • Reliance on subjective opinions instead of structured evaluation

My Role

As the Lead UX Designer, I…

  • Defined an evaluation framework (heuristics, structure, scoring)

  • Designed prompt architecture + agent workflow

  • Built and tested agent in Microsoft Copilot Sudio

  • Iterated on output quality + usefulness

NEW WORKFLOW LOADING

*

NEW WORKFLOW LOADING

*

NEW WORKFLOW LOADING

*

NEW WORKFLOW LOADING

*

NEW WORKFLOW LOADING

NEW WORKFLOW LOADING * NEW WORKFLOW LOADING * NEW WORKFLOW LOADING * NEW WORKFLOW LOADING * NEW WORKFLOW LOADING

The Solution

User Submits URL to agent

1

“I need to quickly understand where potential usability weaknesses exist on this page".”


Agent evaluates page

2

Using the rubric provided in the instructions and topic, agent will evaluate and score the page in the following categories:

  • Content Quality & Information Design (max 30 points)

  • Visual Layout & Hierarchy (max 20 points)

  • Interaction & Task Support (max 25 points)

  • Navigation & Wayfinding (max 15 points)

  • Accessibility & UX Fundamentals (max 10 points)


Agent outputs scorecard

3

Scorecard organized by categories above with the following:

  • Category score

  • Key issues

  • Prioritized recommendations

THE IMPACT

  • Reduced audit time: 2–3 hours → ~10 minutes

  • Increased consistency across evaluations

  • Enabled faster, data-driven iteration for websites

  • Scalable across multiple pages without additional effort

Final thoughts

Automating this process shifted UX evaluation from ad hoc reviews to a scalable system. The scorecards became a consistent starting point for discovery and testing, allowing us to quickly identify opportunities and support recommendations with GA4 and Clarity data.

Previous
Previous

Turning Ideas into Test Plans in 10 Minutes

Next
Next

Designing a Scalable B2B Website System at Morningstar