AI-Enhanced Remote Proctoring · Proposal

Lyzr×Firstsource×ETS

Lower cost. Better experience. Full governance. The question is how to rebuild ETS's proctoring model — not whether.

Proposed by Lyzr + Firstsource

Audience: Head of Ops · CIO
Deployment: AWS VPC
Term: 3 Years
Model: AI + Human in the Loop

Case for Change

ETS's current proctoring model creates cost, experience, and competitive exposure.

Current-state pain points

Source: ETS-shared materials

Candidate onboarding is labor-heavy

Manual script delivery, room/desk/item checks
Manual ID and documentation validation
No consistent guided workflow for the candidate

Live monitoring is human-intensive

Heavy reliance on human judgment with high cognitive load
Variable consistency across sessions
Limited ability to surface the right anomalies in real time

Post-exam QA is expensive and slow

6–8% of sessions re-reviewed manually
Evidence gathering is fragmented
Compliance review not consistently standardized

PSI Bridge transition creates risk and opportunity

Migration work already underway
ETS cannot absorb another high-touch transformation
Any AI initiative must run as sidecar — minimal ETS lift

21M

exams/yr

Pearson VUE delivering with 2nd-gen AI sensors

73%

AI-assisted (2025)

US proctoring adoption is AI-assisted

$988M

market (2025)

Global remote proctoring market

Peer competitors are shipping AI proctoring at production scale. The market has converged on AI + Human models — ETS's current model is the outlier.

The Proctoring Inflection Point

ETS's measurement science is unmatched. The proctoring delivery model is the bottleneck — and every peer competitor has already moved.

The Solution

AI + Human in the Loop. Not AI-only. Not human-only.

AI handles repetitive checks, continuous monitoring, and QA triage. Humans supervise, adjudicate, and train the system. This is where market leaders are converging — and where ETS's proposed model is strategically strongest.

Design principles

Deploy in ETS cloud / VPC

ETS owns the runtime, storage, and data boundary.

Integrate, don't disrupt

Work in parallel with PSI Bridge, SORs, and related systems. No disruption to V-box migration.

Automate only what should be automated

Deterministic tasks and strong-signal detection go to AI. Ambiguity stays with humans.

Make every alert explainable

Every AI decision has trace, evidence, and rationale. No black-box flagging.

Use low-cost CV by default

MediaPipe and open-source vision for continuous monitoring. Commercial APIs (Rekognition) only for high-value checks.

Human feedback improves the system

Supervisor corrections, overrides, and adjudications become training inputs. The system learns from use.

Three-agent architecture

01Detection

Onboarding Assistant

Candidate greeting and script orchestration

Sponsor / exam rule retrieval

ID and liveness workflow orchestration

Environment verification and readiness summary

02Vigilance

Vigilance Core

Fuse CV detections with session context

Distinguish low-value anomalies from real risk

Prioritize alerts (low / medium / high)

Prepare intervention context for the proctor

03Adjudication

Forensic QA

Re-run checks on sampled sessions (6–8%)

Organize evidence by timestamp, issue, severity

Compare onboarding baseline vs. exam conditions

Produce QA report and disposition suggestions

Then → Human Proctor

Reviews AI alerts and intervenes when needed. Clears false positives, approves or rejects QA findings, and provides feedback that retrains the system.

What AI does vs. what humans do

AUTOMATED

AI handles

Repetitive script handling and check-in flows
First-pass validation and anomaly surfacing
Confidence scoring and structured summarization
Suggested actions and batch QA review
Evidence assembly and audit preparation

SUPERVISED

Humans handle

Ambiguous case review and intervention
Policy exceptions and exam termination
Adjudication and final QA approval
Feedback correction and training input

AI does not make final punitive decisions. Alerts are reviewed by humans. The evidence path is transparent. Quality improves through reviewer feedback. The system is tunable by sponsor, program, and test type. This is why the HITL model is strategically stronger than either fully manual or fully automated proctoring — and why ProctorU/Meazure discontinued AI-only services in favor of this exact approach.

What Lyzr brings

Seven platform capabilities that make the three-agent architecture production-ready, governed, and continuously improving.

Govern & Trust

Responsible AI / Guardrails

Automated safety checks on every agent interaction — security, privacy, brand risk, content quality. Each check independently configurable.

Prevent the AI proctor from leaking test content, surfacing biased alerts, or making decisions outside its defined scope. Integrates with AWS Bedrock Guardrails.

GitAgent

Every agent lives in a git repo — identity, rules, memory, tools are version-controlled files. Every decision is a commit. Full audit trail out of the box.

Know exactly which agent version ran during any exam session. Roll back a bad update instantly. Compliance teams review agent config like a code PR — SOX, GDPR, FERPA aligned.

Observability

Real-time visibility into every agent action, following OpenTelemetry standard. Detailed per-step traces, aggregated overview, and downloadable reports.

Trace exactly how the Vigilance Core reached a flagging decision — which CV signal triggered, what confidence score, what context was evaluated. Every alert is explainable.

Test & Improve

Agent Evaluation (Simulation Engine)

Test agents against synthetic conversations before deployment. Scores across task completion, hallucination, faithfulness, toxicity, bias, and tool accuracy.

Before deploying to live exams, run the Onboarding Assistant against 1,000 synthetic check-in scenarios — edge cases, accessibility needs, unusual environments. Fix failures before candidates see them.

Agent Improvement Engine

Monitors live agents continuously, detects quality issues from production traces, and auto-generates prompt improvements to address them.

If the Forensic QA agent starts producing lower-quality evidence reports, the engine detects the drift, diagnoses the cause, and suggests a fix — before ETS's QA team notices.

Scale & Own

Control Plane

Centralized registry and management across all agents, all frameworks, all clouds. One view of your entire agent workforce — monitoring, guardrails, access controls, deployment.

One dashboard showing Onboarding, Vigilance, and Forensic QA agents across all exam programs. Performance, cost, alert volume, and compliance status — per program, per region.

ShadowLM

Progressively moves agent workloads from frontier models to small, owned models trained on ETS's approved decision traces. Up to 90% inference cost reduction.

Routine proctoring decisions — clear environments, standard ID matches, normal behavior — shift to a model ETS owns and runs in-VPC. Frontier handles edge cases only. Cost bends down permanently.

platform capabilities included

additional vendors to procure

100%

of agent actions traceable and explainable

Why this combination wins

ETS/PSI testing domain and platform ownership. Firstsource operating muscle and trained proctors. Lyzr agent orchestration, explainability, and deployability. A cloud-native, sponsor-aware, feedback-learning design — with minimal disruption to the current migration.

Delivery Roadmap

Lyzr + Firstsource own end-to-end delivery. ETS provides approvals.

The goal is zero disruption to the current PSI Bridge migration. Lyzr + Firstsource absorb the build burden. ETS involvement is limited to 1 weekly governance meeting, named approvers, 2–3 focused workshops, and UAT sign-off.

Phase 0

Mobilization & Access

2–3 weeks

Named counterparts and architecture sign-off
Access plan and deployment blueprint
Sponsor rulebook and current process pack
KPI baseline

Phase 1

Foundation Deployment

2–4 weeks

Deployment in ETS cloud / VPC
Logging, observability, security controls
Connectivity to PSI Bridge, SORs, DBs

Phase 2

Onboarding MVP

4–6 weeks

AI-guided check-in flow
Liveness + identity verification
Rule-based item/room scan
Readiness summary and proctor handoff

Phase 3

Live Proctoring Build

4–6 weeks

Anomaly detection and proctor alert console
Threshold tuning and event trace
Intervention and escalation workflows

Phase 4

Post-Exam QA

3–4 weeks

Sample re-review engine (6–8% of sessions)
Evidence package and audit-ready reports
Issue classification and trend view

Phase 5

Pilot, Hardening & Enablement

4–6 weeks

Pilot rollout with precision/recall tuning
False-positive reduction
Training, SOPs, runbooks for proctors/supervisors

Timeline

Total: 4–6 months to first scaled pilot

Lyzr + Firstsource own

Solution deployment in ETS cloud
API integration to PSI Bridge and internal systems
Agent design, build, testing, hardening
Prompt/rules/knowledge base setup
UAT support, enablement, runbooks, hypercare

ETS provides

Cloud/security approvals
Access to named systems
Sample rules, scripts, SOPs
1 product owner / business lead
Periodic review and UAT sign-off

Architecture

A layer above the existing stack, not another tool inside it.

ETS keeps its LLM contracts, cloud infrastructure, and assessment systems. Lyzr sits between — orchestrating, governing, and auditing. Nothing replaced.

Agent Experiences

What ETS's candidates and staff experience

Live AI ProctoringIdentity VerificationEnvironment MonitorChat AI ProctorSession ReviewProgram Analytics

Lyzr Platform Layer

Lyzr Agentic Workbench — the control plane

Agent orchestration, identity verification, behavior detection, chat engine, guardrails, observability, ShadowLM, RBAC — all governed, all auditable, all in ETS's VPC.

Agent StudioMulti-Agent MeshVision PipelineIdentity AgentsBehavior ModelsChat EngineShadowLMGuardrailsObservabilityDecision InboxAudit Trail

ETS Infrastructure

What ETS already pays for — unchanged

Lyzr reads from and writes to these systems. It holds no canonical state.

AWS / AzureBedrock / OpenAIPSI BridgeAssessment PlatformScoring EngineCandidate DBContent Management

Technology pattern

Use low-cost computer vision by default. Use commercial APIs selectively.

MediaPipe + Open-Source CV

Continuous monitoring

Face landmarks and gaze estimation
Head pose and body posture
Multiple-person detection
Object emergence and item classification
No per-frame commercial API cost

Amazon Rekognition

Selective, high-value

Face liveness verification
Face compare / identity persistence
Optional spot verification events
Priced per-check, used selectively not continuously

Data sovereignty

FEATURE

Single-tenant VPC, multi-region

Deployed inside ETS's AWS account with region-isolated instances. Candidate data generated in EU stays in EU. GDPR, CCPA, jurisdiction-specific requirements enforced at infrastructure level.

FEATURE

Private inference

ShadowLM models run in-VPC per region. Zero external API calls for candidate data.

FEATURE

FERPA · GDPR · SOC 2 · Biometric

Compliant with FERPA (US), GDPR (EU), SOC 2, and state biometric privacy laws (BIPA, CCPA). Data residency enforced per region — as architecture, not policy.

FEATURE

Calibrated autonomy

Assist → Copilot → Semi-autonomous → Autonomous. Set per exam program and per region. Enforced at runtime, recorded in immutable audit trail.

5-Year Cost Model

Unit economics normalize as volume scales. Lyzr stays flat.

Lyzr platform is flat at $1.5M/yr. LLM and AWS scale with volume. ShadowLM bends inference cost down. Per-exam cost drops 72% as volume grows nearly 5×.

$1.71

Per-exam cost, Year 1 · 1.2M exams

$0.48

Per-exam cost, Year 5 · 5.8M exams

72%

Unit cost reduction over 5 years

$2.68M

5-yr LLM savings via ShadowLM

Executive takeaway: total cost rises modestly with scale, but unit cost drops 72% by Y5 due to platform leverage, volume growth, and ShadowLM optimization.

Unit Cost per Exam Declines as Exam Volume Scales

Total annual cost remains controlled with ShadowLM while per-exam cost falls from $1.71 to $0.48 across Y1–Y5

Lyzr platformLLM with ShadowLMAWS infrastructurePer-exam cost

	Y1 1.2M	Y2 1.7M	Y3 2.6M	Y4 3.9M	Y5 5.8M	5-Yr Total
Exam volume	1.2M	1.7M	2.6M	3.9M	5.8M	—
Lyzr platform (flat)	$1,500K	$1,500K	$1,500K	$1,500K	$1,500K	$7,500K
LLM (without ShadowLM)	$350K	$496K	$758K	$1,138K	$1,692K	$4,434K
LLM (with ShadowLM)	$350K	$347K	$379K	$341K	$338K	$1,755K
AWS infrastructure	$200K	$283K	$433K	$650K	$967K	$2,533K
Total with ShadowLM	$2,050K	$2,130K	$2,312K	$2,491K	$2,805K	$11,788K
Per-exam with ShadowLM	$1.71	$1.25	$0.89	$0.64	$0.48	—
ShadowLM cumulative savings	$0	$149K	$528K	$1,325K	$2,679K	$2,679K

ShadowLM — a small model you own

The frontier model starts the work. ETS's model finishes it, and keeps it.

Frontier

Run the task. Log every run.

Distill

Fine-tune a small model on the logs.

Reinforce

RL until it matches the teacher.

04Optional

Mid-train

Inject domain knowledge.

Each rung entered only when the eval and the economics agree. Every human correction retrains the model. Currently being built with enterprise design partners.

Up to 90% lower cost

Routine proctoring decisions shift to owned model. Frontier handles edge cases only.

Data stays home

Model runs inside ETS's VPC. Candidate data never leaves the perimeter.

IP you own

Fine-tuned weights are an asset on ETS's balance sheet, not a subscription.

Volume grows nearly 5×. Total cost grows 37%. Per-exam cost drops 72%. ShadowLM saves $2.68M over 5 years.

What the alternative costs

Without Lyzr, ETS stitches together 9 vendor tools and 5 dedicated resources. The vendor stack is usage-based — costs scale with exam volume. Lyzr is flat. Server and LLM costs are common to both paths and excluded from this comparison.

Beyond licensing, 9 vendors means 9 NDAs, 9 security reviews, 9 SLAs, 9 integration maintenance cycles, and 9 renewal negotiations annually.

Cost line	Y1 (1.2M)	Y2 (1.7M)	Y3 (2.6M)	Y4 (3.9M)	Y5 (5.8M)	5-Yr Total
Software (9 vendors)	$1,255K	$1,770K	$2,460K	$3,575K	$5,180K	$14,240K
Team (5 resources)	$558K	$558K	$558K	$558K	$558K	$2,790K
Subtotal — what Lyzr replaces	$1,813K	$2,328K	$3,018K	$4,133K	$5,738K	$17,030K
Lyzr platform	$1,500K	$1,500K	$1,500K	$1,500K	$1,500K	$7,500K
Annual savings	$313K	$828K	$1,518K	$2,633K	$4,238K	$9,530K

Server and LLM costs (~$550K/yr) are common to both paths and excluded. Total savings reflect only the components Lyzr replaces: software licensing and dedicated team.

View tool-level breakdown +

LangChain + LangSmith: Y1 $400K → Y5 $1,800K (5-yr: $4,800K)
Mem0: Y1 $250K → Y5 $1,100K (5-yr: $2,950K)
Arize AI: Y1 $180K → Y5 $750K (5-yr: $2,030K)
N8N: Y1 $100K → Y5 $400K (5-yr: $1,170K)
Weaviate: Y1 $90K → Y5 $400K (5-yr: $1,060K)
Guardrails AI: Y1 $80K → Y5 $350K (5-yr: $980K)
Composio: Y1 $75K → Y5 $300K (5-yr: $850K)
Lovable: $50K/yr flat (5-yr: $250K)
Langfuse: $30K/yr flat (5-yr: $150K)

⚑ Pricing based on enterprise tier estimates scaled to exam volume. Revalidation in progress.

Annual cost of what Lyzr replaces — 9 vendors + team vs. Lyzr platform

$9.5M

5-year savings on software + team

3.8×

Y5 vendor cost vs. Lyzr platform

9 → 1

Vendor relationships eliminated

Note: the $9.5M reflects savings on software and team that Lyzr replaces. ShadowLM's $2.68M savings (shown above) are additional — reducing LLM inference cost within the Lyzr path. These are separate, non-overlapping benefits.

Assumptions

Parameter	Value
Lyzr license	$1.5M/yr flat · 3-year term · includes build, no cap on agent actions
LLM base cost	$350K/yr at 1.2M exams · scales proportionally with volume
AWS base cost	$200K/yr at 1.2M exams · scales proportionally with volume
ShadowLM reduction	Y1: 0% · Y2: 30% · Y3: 50% · Y4: 70% · Y5: 80%
Volume model	10% penetration by Y5 · Y3-Y5 growth ~50% YoY
Per-exam unit cost	(Lyzr + LLM + AWS) ÷ exam volume
Penetration benchmark	Enterprise AI handles 30% of customer interactions today (Zendesk/McKinsey). At 10% by Y5, ETS's model is conservative.

⚑ ShadowLM reduction rates illustrative. Currently being built with enterprise design partners.
⚑ Per-exam cost excludes Firstsource human supervision/QA. Separate line item.
⚑ At 5.8M sessions, Lyzr flat pricing may warrant review. Flag for internal discussion.
⚑ Software pricing based on enterprise tier estimates. Revalidation in progress.

The Question

ETS's defensibility is measurement science, not platform engineering. Lyzr handles the platform. ETS ships the science.

Closing Directive

Next Steps

STEP

Discovery Workshop

Validate pain points, map sponsor rules, confirm integration architecture.

2 weeks

STEP

Pilot Scope

Select 1–2 exam programs, define success metrics, set autonomy levels.

2 weeks

STEP

Production

Deploy, test, harden, enable.

4–6 months to scaled pilot

Lyzr × Firstsource × ETS · AI-Enhanced Remote Proctoring Proposal · Confidential · June 2026