Recent Posts
Archives
Why 93 out of 100 users never finish KYC — and what to do about it
A deep-dive into 720,000 KYC events across 100,000 users reveals a funnel so leaky that only 0.6% of users ever reach approval. Here's exactly what's breaking, who it's hitting hardest, and the interventions that will actually move the number.
Data Science · FinTech Product Analytics
The activation gate nobody's watching
In most fintech products, KYC is not optional. It sits between your acquisition spend and your first active user. You cannot fund a wallet, send money, or access credit without clearing it. Which means every user who drops out of the KYC funnel is acquisition budget spent with nothing to show for it — no revenue, no engagement, no return.
We ran a full end-to-end analysis on 100,000 user journeys through a 10-step KYC funnel, pulling together event telemetry (720,012 events), session data, device profiles, network logs, and user demographics. What we found was not a single catastrophic failure. It was something harder to fix: a slow, steady bleed at every step, driven by systemic friction that no one intervention will fully solve.
Only 605 out of 100,000 users — 0.6% — successfully completed the KYC process. A further 13.1% ended in an unrecoverable failure state. 22.2% abandoned before reaching a final outcome. The other ~64% are stuck in partial completion or dropped off at early stages without a recorded final status.
To put that in business terms: if you're spending $12 to acquire each user, you are spending $1.1 million on users who never activate. Every percentage point of improvement in KYC completion rate is worth, conservatively, $120,000 in recovered acquisition spend at that scale.
A waterfall that never stops falling
The KYC funnel runs 10 steps from first click to approval. Most product teams focus on the document and biometric steps — the ones that feel "technical." But the data tells a more uncomfortable story: the biggest absolute losses happen before users ever reach a document upload screen.
Users reaching each step as a % of total starters (n=100,000). Note: the 7,067 reaching kyc_approved includes partial-progress users; fully verified completers = 605.
The three steepest drops by absolute volume land at personal_information (10,382 users, −11.7%), phone_verification (10,207 users, −10.8%), and document_upload (2,712 users). These are not technical failures. They are UX and process failures — places where the product is making a reasonable person give up.
"The funnel doesn't collapse at one catastrophic step. It bleeds continuously from the first screen to the last — which means there is no single fix, but there are several high-leverage ones."
From the analysis notebookWhat makes the manual_review step particularly alarming is its rate rather than its volume. By the time a user reaches manual review, they have cleared nine steps and invested significant time. Yet 54.2% of them never advance from that point. That is almost entirely an operational and communication failure: users are placed in a queue with no status update, no estimated wait time, and no way to know if something went wrong. They simply stop checking.
Four error codes are doing all the damage
One of the most striking findings in the data is just how few distinct failure modes exist. The entire error distribution across 720,000 events collapses to exactly four error codes — and they appear at near-identical frequencies, around 144,000 occurrences each. That uniformity is itself a signal: this is not random noise. These are structural, repeatable failure patterns.
Look at what these four errors have in common: every single one is a post-submission failure that could be caught pre-submission. Face_mismatch could be reduced with a live alignment guide before the shutter fires. Document_rejected could be caught with a format validator before the upload button enables. Network_timeout disappears with chunked resumable uploads. Blurry_document is solved by a real-time blur detection pass before the user hits submit.
None of these require changes to the backend KYC verification logic. They are all front-of-the-pipe, client-side interventions.
The four friction buckets
Stepping back from individual error codes, the failures cluster into four categories of friction:
| Friction Type | Steps Affected | Rate | Primary Driver |
|---|---|---|---|
| Document | document_upload, document_validation | ~15.1% fail | blurry_document + document_rejected |
| Technical | Upload-heavy steps | ~15.0% fail | network_timeout + camera quality |
| Process | personal_info, address_verification | ~10.0% abandon | Form complexity, UX friction |
| Operational | manual_review | 54.2% drop-off | No comms, no SLA, no transparency |
Status proportions (success / fail / abandon) are remarkably consistent across all 10 funnel steps. This tells us the problem is systemic — embedded in the onboarding experience as a whole — rather than isolated to any single step. Fixing one step in isolation will improve that step's metrics but will not resolve the underlying pattern.
Segment intelligence: modest gaps, important patterns
One of the more nuanced findings is that segment differences are relatively small in proportional terms. Most groups fall within ±0.3pp of the 7.09% average. This is actually good news: it means the funnel problem is structural, not audience-specific. You don't need to fix it differently for different users. You need to fix the experience itself.
That said, some segments are consistently below average and deserve targeted attention:
Three patterns worth flagging. First, organic traffic underperforms paid channels — the 0.56pp gap between organic (6.85%) and affiliate (7.41%) suggests paid traffic arrives with clearer intent or better device quality. Second, the medium fraud risk bucket is a meaningful outlier at 6.73%, significantly below both low-risk and high-risk users. This warrants its own investigation — it may be experiencing additional friction steps not visible in the current schema. Third, 5G users underperforming WiFi and 4G users (6.81% vs 7.07–7.15%) is counter-intuitive and likely reflects demographic or geographic confounders rather than a network quality effect.
Ten interventions, three tracks, one sprint to start
The analysis points clearly toward three tracks of intervention. The product and UX track addresses the human experience of onboarding. The technical track handles the infrastructure behind the errors. The operational track deals with the processes and instrumentation that currently make measurement and improvement impossible.
Track 1: Product & UX
Before the user can tap "upload," run a real-time client-side check for blur, lighting, completeness, and file format. Show a pass/fail indicator with specific corrective guidance. This eliminates the two largest document error codes — blurry_document and document_rejected — at the point of capture rather than after server-side validation. No backend changes required.
Add a face alignment overlay, distance indicator, and lighting quality check to the selfie capture screen. The shutter should not fire until the quality check passes. This directly targets face_mismatch — the single most common error code at 144,792 occurrences. Guardrail: monitor liveness failure rate to ensure no increase in spoofing attempts.
A 54.2% drop-off at manual review is not a technical problem — it is a communication problem. Users placed in a queue with no visibility will disengage. Implement in-app push, email, and SMS status updates that tell users their review is in progress, provide an estimated timeframe, and notify them immediately when a decision is made.
personal_information is the highest-volume drop-off step (10,382 users). Break it into 2–3 shorter screens with a within-step progress indicator. The abandonment here has no associated error code — it is pure UX friction. Users encounter a lengthy form and leave before submitting.
The user base spans Nigeria, Kenya, India, UK, and South Africa — five countries with meaningfully different address formats. The current address form does not adapt to local conventions. Add format-aware input with autocomplete, and explicitly list accepted proof-of-address documents for each market.
Track 2: Technical
143,822 network_timeout errors means nearly 144,000 upload attempts where a dropped connection forced the user to restart entirely. Implement chunked uploads with client-side resume tokens. A connection failure should resume from the last successful chunk, not restart from zero. This is especially impactful for 3G users.
Apply lossy compression to document and selfie images on the client before upload, reducing payload size without sacrificing the quality needed for verification. This reduces upload duration and timeout probability for users on slower connections, with no backend changes required.
Track 3: Instrumentation (the unlocks)
The most important thing to understand about the instrumentation recommendations is that without them, you cannot measure the impact of anything else. They are not nice-to-have additions — they are prerequisites for running a proper experiment programme.
The current schema has no review start or resolution timestamp. This means you cannot measure how long reviews take, you cannot set or monitor an SLA, and you cannot A/B test changes to the review process. Add review_started_at and review_resolved_at to the event schema immediately.
document_rejected is the second most common error code, but the schema does not record whether the rejected document was a passport, national ID, or driver's licence — or which provider rejected it. Without this, it is impossible to tell whether the problem is one specific document type, one specific vendor integration, or the flow in general.
What to test, and how to test it
Two experiments are ready to launch now. Two more require the instrumentation fixes first.
| Experiment | Hypothesis | Primary Metric | Guardrail | Status |
|---|---|---|---|---|
| E1: Doc Quality Gate | Client-side pre-check reduces document_rejected + blurry_document | doc_validation completion rate | Fraud rejection rate | Ready |
| E2: Selfie Coach | Live alignment prompts reduce face_mismatch failures | face_match completion rate | Liveness failure rate | Ready |
| E3: MR Comms | Status notifications reduce manual review abandonment | MR → approved rate | Support ticket volume | In Design |
| E4: Address Reflow | Format-aware address input reduces drop-off for NG/KE/IN | address verification rate | Data completeness score | Planned |
All experiments should be designed at 95% confidence interval, 80% power, with a minimum detectable effect of 2pp absolute improvement in the target step completion rate. Assignment must be at user_id level to avoid session-level contamination. Guardrail metrics must be monitored in real time throughout each experiment — a meaningful increase in fraud rejection rate or liveness failure rate should trigger a pause regardless of primary metric performance.
How far can we realistically move the number?
Taking a conservative view — assuming the top interventions produce 30–50% improvement in affected step completion rates — the estimated recovery potential by cluster is:
Doc quality gate (R1 + R7): +2.1pp ·
Selfie guidance (R2): +1.8pp ·
MR comms (R3): +2.8pp ·
Chunked uploads (R4): +1.4pp ·
Form redesign (R5 + R6): +1.2pp
All combined (upper bound): approximately 7.1% → 19%. Industry benchmark for digital-first KYC is 25–35%, so a fully optimised funnel could eventually reach that range. But the first sprint alone — R1, R2, R3, R4 — is estimated to recover 8–10pp, more than doubling the current activation rate.
The key caveat is that these estimates assume the error rate distributions in the data reflect true production variation. Some uniformities in the dataset — notably the near-identical frequency of all four error codes and the very similar step durations (~62 seconds per step) — are statistically unusual and should be validated against live production data before committing to specific targets.
The instrumentation gaps blocking future progress
Perhaps the most honest section of any analysis is the one that lists what it cannot tell you. The current data schema has six gaps that will constrain future measurement until they are resolved:
| Missing Field | What It Blocks | Priority |
|---|---|---|
| Manual review timestamps | Cannot measure review SLA, cannot A/B test MR process changes | HIGH |
| Document type field | Cannot attribute failures to passport vs. national ID vs. licence | HIGH |
| KYC provider name | Cannot identify underperforming vendor integrations | HIGH |
| Experiment assignment field | Cannot run A/B tests or measure product changes cleanly | HIGH |
| Screen/view-level events | No granular UX instrumentation inside individual steps | MEDIUM |
| Post-KYC activation events | Cannot link KYC completion to funded, transacted, or churned state | MEDIUM |
The experiment assignment field deserves special attention. Without a reliable, user-level assignment identifier in the event schema, every A/B test we run will produce ambiguous results. This is not a data science ask — it is an engineering requirement. It should be in Sprint 1 alongside the P1 product fixes.
A recoverable problem, not an intractable one
A 0.6% KYC completion rate sounds catastrophic. In some ways it is. But the encouraging reading of this data is that the causes are known, the interventions are well-defined, and the technology to execute them already exists. None of the top four recommendations require new product categories, new vendor relationships, or multi-quarter engineering efforts.
The document quality pre-check is a few hundred lines of client-side JavaScript. The selfie guidance overlay is a camera library with an alignment mask. The manual review notification system is a webhook to a messaging service. The chunked upload implementation is a well-understood pattern available in most mobile SDKs.
What has been missing is not capability — it is visibility. This analysis provides the visibility. The question is how quickly the organisation moves from insight to implementation.
The KYC funnel is losing 93% of users to a combination of four fixable error codes, two UX design decisions that need rethinking, and one operational communication gap — and the first four interventions alone are estimated to more than double the completion rate.
The full analysis notebook, BI dashboard, and detailed documentation are available internally. Methodology, data sources, and all segment breakdowns are documented in the accompanying project brief.