A
Candidate onboarding is labor-heavy
- Manual script delivery, room/desk/item checks
- Manual ID and documentation validation
- No consistent guided workflow for the candidate
×
AI-Enhanced Remote Proctoring · Proposal
Proposed by Lyzr + Firstsource
Source: ETS-shared materials
A
B
C
D
21M
exams/yr
Pearson VUE delivering with 2nd-gen AI sensors
73%
AI-assisted (2025)
US proctoring adoption is AI-assisted
$988M
market (2025)
Global remote proctoring market
Peer competitors are shipping AI proctoring at production scale. The market has converged on AI + Human models — ETS's current model is the outlier.
The Proctoring Inflection Point
ETS's measurement science is unmatched. The proctoring delivery model is the bottleneck — and every peer competitor has already moved.
AI handles repetitive checks, continuous monitoring, and QA triage. Humans supervise, adjudicate, and train the system. This is where market leaders are converging — and where ETS's proposed model is strategically strongest.
01
ETS owns the runtime, storage, and data boundary.
02
Work in parallel with PSI Bridge, SORs, and related systems. No disruption to V-box migration.
03
Deterministic tasks and strong-signal detection go to AI. Ambiguity stays with humans.
04
Every AI decision has trace, evidence, and rationale. No black-box flagging.
05
MediaPipe and open-source vision for continuous monitoring. Commercial APIs (Rekognition) only for high-value checks.
06
Supervisor corrections, overrides, and adjudications become training inputs. The system learns from use.
Candidate greeting and script orchestration
Sponsor / exam rule retrieval
ID and liveness workflow orchestration
Environment verification and readiness summary
Fuse CV detections with session context
Distinguish low-value anomalies from real risk
Prioritize alerts (low / medium / high)
Prepare intervention context for the proctor
Re-run checks on sampled sessions (6–8%)
Organize evidence by timestamp, issue, severity
Compare onboarding baseline vs. exam conditions
Produce QA report and disposition suggestions
Reviews AI alerts and intervenes when needed. Clears false positives, approves or rejects QA findings, and provides feedback that retrains the system.
AUTOMATED
SUPERVISED
AI does not make final punitive decisions. Alerts are reviewed by humans. The evidence path is transparent. Quality improves through reviewer feedback. The system is tunable by sponsor, program, and test type. This is why the HITL model is strategically stronger than either fully manual or fully automated proctoring — and why ProctorU/Meazure discontinued AI-only services in favor of this exact approach.
Seven platform capabilities that make the three-agent architecture production-ready, governed, and continuously improving.
Govern & Trust
Automated safety checks on every agent interaction — security, privacy, brand risk, content quality. Each check independently configurable.
Prevent the AI proctor from leaking test content, surfacing biased alerts, or making decisions outside its defined scope. Integrates with AWS Bedrock Guardrails.
Every agent lives in a git repo — identity, rules, memory, tools are version-controlled files. Every decision is a commit. Full audit trail out of the box.
Know exactly which agent version ran during any exam session. Roll back a bad update instantly. Compliance teams review agent config like a code PR — SOX, GDPR, FERPA aligned.
Real-time visibility into every agent action, following OpenTelemetry standard. Detailed per-step traces, aggregated overview, and downloadable reports.
Trace exactly how the Vigilance Core reached a flagging decision — which CV signal triggered, what confidence score, what context was evaluated. Every alert is explainable.
Test & Improve
Test agents against synthetic conversations before deployment. Scores across task completion, hallucination, faithfulness, toxicity, bias, and tool accuracy.
Before deploying to live exams, run the Onboarding Assistant against 1,000 synthetic check-in scenarios — edge cases, accessibility needs, unusual environments. Fix failures before candidates see them.
Monitors live agents continuously, detects quality issues from production traces, and auto-generates prompt improvements to address them.
If the Forensic QA agent starts producing lower-quality evidence reports, the engine detects the drift, diagnoses the cause, and suggests a fix — before ETS's QA team notices.
Scale & Own
Centralized registry and management across all agents, all frameworks, all clouds. One view of your entire agent workforce — monitoring, guardrails, access controls, deployment.
One dashboard showing Onboarding, Vigilance, and Forensic QA agents across all exam programs. Performance, cost, alert volume, and compliance status — per program, per region.
Progressively moves agent workloads from frontier models to small, owned models trained on ETS's approved decision traces. Up to 90% inference cost reduction.
Routine proctoring decisions — clear environments, standard ID matches, normal behavior — shift to a model ETS owns and runs in-VPC. Frontier handles edge cases only. Cost bends down permanently.
7
platform capabilities included
0
additional vendors to procure
100%
of agent actions traceable and explainable
Why this combination wins
ETS/PSI testing domain and platform ownership. Firstsource operating muscle and trained proctors. Lyzr agent orchestration, explainability, and deployability. A cloud-native, sponsor-aware, feedback-learning design — with minimal disruption to the current migration.
The goal is zero disruption to the current PSI Bridge migration. Lyzr + Firstsource absorb the build burden. ETS involvement is limited to 1 weekly governance meeting, named approvers, 2–3 focused workshops, and UAT sign-off.
Phase 0
2–3 weeks
Phase 1
2–4 weeks
Phase 2
4–6 weeks
Phase 3
4–6 weeks
Phase 4
3–4 weeks
Phase 5
4–6 weeks
Timeline
Total: 4–6 months to first scaled pilot
Lyzr + Firstsource own
ETS provides
ETS keeps its LLM contracts, cloud infrastructure, and assessment systems. Lyzr sits between — orchestrating, governing, and auditing. Nothing replaced.
Agent Experiences
What ETS's candidates and staff experience
Lyzr Platform Layer
Lyzr Agentic Workbench — the control plane
Agent orchestration, identity verification, behavior detection, chat engine, guardrails, observability, ShadowLM, RBAC — all governed, all auditable, all in ETS's VPC.
ETS Infrastructure
What ETS already pays for — unchanged
Lyzr reads from and writes to these systems. It holds no canonical state.
Use low-cost computer vision by default. Use commercial APIs selectively.
MediaPipe + Open-Source CV
Continuous monitoring
Amazon Rekognition
Selective, high-value
FEATURE
Deployed inside ETS's AWS account with region-isolated instances. Candidate data generated in EU stays in EU. GDPR, CCPA, jurisdiction-specific requirements enforced at infrastructure level.
FEATURE
ShadowLM models run in-VPC per region. Zero external API calls for candidate data.
FEATURE
Compliant with FERPA (US), GDPR (EU), SOC 2, and state biometric privacy laws (BIPA, CCPA). Data residency enforced per region — as architecture, not policy.
FEATURE
Assist → Copilot → Semi-autonomous → Autonomous. Set per exam program and per region. Enforced at runtime, recorded in immutable audit trail.
Lyzr platform is flat at $1.5M/yr. LLM and AWS scale with volume. ShadowLM bends inference cost down. Per-exam cost drops 72% as volume grows nearly 5×.
Total annual cost remains controlled with ShadowLM while per-exam cost falls from $1.71 to $0.48 across Y1–Y5
| Y1 1.2M | Y2 1.7M | Y3 2.6M | Y4 3.9M | Y5 5.8M | 5-Yr Total | |
|---|---|---|---|---|---|---|
| Exam volume | 1.2M | 1.7M | 2.6M | 3.9M | 5.8M | — |
| Lyzr platform (flat) | $1,500K | $1,500K | $1,500K | $1,500K | $1,500K | $7,500K |
| LLM (without ShadowLM) | $350K | $496K | $758K | $1,138K | $1,692K | $4,434K |
| LLM (with ShadowLM) | $350K | $347K | $379K | $341K | $338K | $1,755K |
| AWS infrastructure | $200K | $283K | $433K | $650K | $967K | $2,533K |
| Total with ShadowLM | $2,050K | $2,130K | $2,312K | $2,491K | $2,805K | $11,788K |
| Per-exam with ShadowLM | $1.71 | $1.25 | $0.89 | $0.64 | $0.48 | — |
| ShadowLM cumulative savings | $0 | $149K | $528K | $1,325K | $2,679K | $2,679K |
The frontier model starts the work. ETS's model finishes it, and keeps it.
Run the task. Log every run.
Fine-tune a small model on the logs.
RL until it matches the teacher.
Inject domain knowledge.
Each rung entered only when the eval and the economics agree. Every human correction retrains the model. Currently being built with enterprise design partners.
Routine proctoring decisions shift to owned model. Frontier handles edge cases only.
Model runs inside ETS's VPC. Candidate data never leaves the perimeter.
Fine-tuned weights are an asset on ETS's balance sheet, not a subscription.
Without Lyzr, ETS stitches together 9 vendor tools and 5 dedicated resources. The vendor stack is usage-based — costs scale with exam volume. Lyzr is flat. Server and LLM costs are common to both paths and excluded from this comparison.
| Cost line | Y1 (1.2M) | Y2 (1.7M) | Y3 (2.6M) | Y4 (3.9M) | Y5 (5.8M) | 5-Yr Total |
|---|---|---|---|---|---|---|
| Software (9 vendors) | $1,255K | $1,770K | $2,460K | $3,575K | $5,180K | $14,240K |
| Team (5 resources) | $558K | $558K | $558K | $558K | $558K | $2,790K |
| Subtotal — what Lyzr replaces | $1,813K | $2,328K | $3,018K | $4,133K | $5,738K | $17,030K |
| Lyzr platform | $1,500K | $1,500K | $1,500K | $1,500K | $1,500K | $7,500K |
| Annual savings | $313K | $828K | $1,518K | $2,633K | $4,238K | $9,530K |
Server and LLM costs (~$550K/yr) are common to both paths and excluded. Total savings reflect only the components Lyzr replaces: software licensing and dedicated team.
⚑ Pricing based on enterprise tier estimates scaled to exam volume. Revalidation in progress.
Note: the $9.5M reflects savings on software and team that Lyzr replaces. ShadowLM's $2.68M savings (shown above) are additional — reducing LLM inference cost within the Lyzr path. These are separate, non-overlapping benefits.
| Parameter | Value |
|---|---|
| Lyzr license | $1.5M/yr flat · 3-year term · includes build, no cap on agent actions |
| LLM base cost | $350K/yr at 1.2M exams · scales proportionally with volume |
| AWS base cost | $200K/yr at 1.2M exams · scales proportionally with volume |
| ShadowLM reduction | Y1: 0% · Y2: 30% · Y3: 50% · Y4: 70% · Y5: 80% |
| Volume model | 10% penetration by Y5 · Y3-Y5 growth ~50% YoY |
| Per-exam unit cost | (Lyzr + LLM + AWS) ÷ exam volume |
| Penetration benchmark | Enterprise AI handles 30% of customer interactions today (Zendesk/McKinsey). At 10% by Y5, ETS's model is conservative. |
ETS's defensibility is measurement science, not platform engineering. Lyzr handles the platform. ETS ships the science.
Closing Directive
Next Steps
STEP
Validate pain points, map sponsor rules, confirm integration architecture.
2 weeks
STEP
Select 1–2 exam programs, define success metrics, set autonomy levels.
2 weeks
STEP
Deploy, test, harden, enable.
4–6 months to scaled pilot
Lyzr × Firstsource × ETS · AI-Enhanced Remote Proctoring Proposal · Confidential · June 2026