JWT Dual-Token Hardening Postmortem: From Stateless Refresh to Revocable Redis Sessions

This post documents a practical JWT dual-token (AT/RT) hardening effort.
The real question was simple: if a refresh token is replayed in an abnormal scenario, can the system detect it quickly and contain impact.

Note: all domains, IPs, account identifiers, session IDs, and log samples are desensitized placeholders.

System boundary:

Frontend: Web SPA (AT stays in memory)
Backend: Go + Gin (auth and refresh)
Session layer: Redis (token state index)
Gateway: HTTPS reverse proxy

Symptom and Trigger

In a reproduction test around device ingress and web application authentication, we simulated an extreme case where an old RT was intercepted and replayed.
The previous design had AT/RT separation, but the refresh path did not fully enforce state transitions, so old RTs could still be abused in narrow concurrency windows.

Desensitized event sample:

{
  "event": "auth.refresh.reuse_detected",
  "user_id_masked": "u-***39",
  "session_id": "s-***b1",
  "ip_masked": "10.**.**.21",
  "action": "session_revoked"
}

Strategy: Keep AT Stateless, Strengthen RT Governance

We kept the existing AT/RT architecture and focused on hardening the RT refresh path to make it revocable, auditable, and operationally manageable.

Layer 1: Keep Token Responsibilities Clear

AT handles high-frequency access via Authorization: Bearer
RT is delivered only through HttpOnly Cookie
AT validation remains stateless for performance and scalability

This keeps the access path lightweight while concentrating control in the refresh path.

Layer 2: Make RT Refresh a Single-Use Traceable Flow

Refresh follows a strict sequence: validate old RT -> mint new RT -> update Redis state -> invalidate old RT.

Redis key model (desensitized):

rt:active:{jti}           -> { user_id, session_id, exp }  // primary state, TTL follows RT expiry
rt:session:{session_id}   -> Set<jti>                      // secondary index for one-shot session cleanup
rt:deny:{jti}             -> 1 (TTL=remaining lifetime)    // deny-list for replayed legacy tokens
user:sessions:{user_id}   -> Set<session_id>               // user-level control, supports "sign out other devices"

Controls:

one RT can refresh successfully only once
old RT is deny-listed immediately after rotation
short refresh lock prevents race-based double refresh

Concurrency Debounce Design (Result Reuse Window)

Modern SPA clients often emit multiple requests in parallel. Without coordination, one legacy RT can trigger multiple refresh attempts and cause valid traffic to be mistaken as replay. We introduced a 5-second reuse window:

the first request acquires the lock and completes rotation
the generated AT/RT pair is cached for 5 seconds
concurrent requests with the same key reuse that exact pair within the window, with no second rotation
after the window expires, the flow returns to strict single-use rotation

Layer 3: Handling Replay Events

When RT replay is detected, the system revokes the related session, writes an audit event, and drives the client back to login.
The goal is to terminate risky sessions quickly instead of trying to patch them in place.

Layer 4: Tie Logout and High-Risk Operations to Revocation

Session invalidation is linked to:

user logout
admin password reset
account lock/disable transitions

This upgrades logout from browser-only cookie clearing to actual server-side session invalidation.

Root Cause

This was a lifecycle governance gap rather than a single coding mistake:

Early implementation focused on issuance and verification.
Session-state capability existed technically, but was not fully integrated into the refresh critical path.

Outcomes

Three practical improvements after hardening:

sessions can be revoked quickly server-side
RT replay becomes detectable and attributable
security incidents are easier to triage by user and session scope

User experience remains stable: normal traffic keeps silent refresh, while risky paths degrade to explicit re-login.

Process Improvements

1) Add Token Lifecycle Checks to Release Gate

Pre-release checks now include:

RT single-use validation
refresh concurrency consistency
session invalidation checks after logout/password reset

2) Standardize Security Audit Fields

Audit logs use desensitized fields consistently: user_id_masked, session_id, jti_prefix, ip_masked, ua_hash, risk_level.

3) Validate Replay Protection Regularly

Regular RT replay validation checks that detection, revocation, and recovery paths still work end to end.

Optional Follow-Up Controls

The current design already includes Redis-backed session revocation and concurrency-safe anti-replay controls. The refresh path enforces real-time user status checks and blocks token issuance immediately for locked or disabled accounts.

If broader hijack scenarios need to be covered later, the following controls can be added:

Environment fingerprint comparison: IP range and UA hash can be compared so that abnormal RT location jumps trigger stricter token invalidation and additional verification.
Rule calibration: in complex network environments, legitimate 4G/5G switching and egress drift should be validated before stricter decision rules are enabled.

Closing

One focus of authentication resilience is containment during abnormal events.
This hardening work adds session revocation and concurrency-safe anti-replay controls to the RT refresh path.

Symptom and Trigger#

Strategy: Keep AT Stateless, Strengthen RT Governance#

Layer 1: Keep Token Responsibilities Clear#

Layer 2: Make RT Refresh a Single-Use Traceable Flow#

Concurrency Debounce Design (Result Reuse Window)#

Layer 3: Handling Replay Events#

Layer 4: Tie Logout and High-Risk Operations to Revocation#

Root Cause#

Outcomes#

Process Improvements#

1) Add Token Lifecycle Checks to Release Gate#

2) Standardize Security Audit Fields#

3) Validate Replay Protection Regularly#

Optional Follow-Up Controls#

Closing#