Health

The Health module provides uptime/probing endpoints for external monitoring (Kubernetes, load balancer, Sentry, etc.) and one admin endpoint to toggle the system-wide maintenance mode. The user-facing endpoints are @Public() and do not pass through the global ResponseInterceptor — the response is returned as-is via res.json(...). The admin endpoint follows the standard envelope and is guarded by an ACL permission.

Property	Value
Base URL	`{HOST}/v1`
Auth	Public for probe endpoints · Bearer JWT for admin maintenance toggle
Content-Type	`application/json`
Error envelope	Probe endpoints use the raw shape (see per-endpoint) · Admin endpoint uses the standard `{ "message", "statusCode", "error" }`
Validation	Global `ValidationPipe` · `whitelist: true`, `forbidNonWhitelisted: true`
Related modules	infrastructure, monitoring (Sentry, k8s probes)
Document version	v1 · 2026-05-25
Source synced	`backend-mvp` @ `e0afbefe` · 2026-05-25
Audience	Internal FE devs (status page widget) + ops/infra

Summary

Public endpoints do not require Bearer and are used for uptime monitoring / readiness probes. The admin endpoint POST /admin/maintenance/toggle enables / disables the in-memory maintenanceMode flag in HealthService — when active, all probe endpoints return maintenance status (HTTP 503).

Method	Path	Auth	Summary
GET	`/v1/health`	public	Full status (DB, Mongo, uptime, version)
GET	`/v1/status`	public	Simple status (ok / maintenance / degraded)
GET	`/v1/ping`	public	Simple ping-pong
GET	`/v1/ready`	public	Kubernetes readiness probe
GET	`/v1/live`	public	Kubernetes liveness probe
POST	`/v1/admin/maintenance/toggle`	bearer	Enable/disable maintenance mode (admin)

The user-facing probe endpoints are not wrapped in the { status, statusCode, message, data } envelope — the controller calls res.json(...) directly. FE should parse the raw shape exactly as shown in each card.
The admin POST /admin/maintenance/toggle endpoint still passes through the global interceptor → the response is wrapped in the standard envelope.
Maintenance mode is in-memory per-instance — if the backend is running multi-replica, the toggle needs to be propagated to each instance.
HTTP status reflects the system condition: 200 ok, 503 maintenance, 503 degraded (a critical service — DB or Mongo — is unhealthy).
Only database + mongodb are critical and drive the status/HTTP code. smtp + objectStorage are optional — if they fail the system stays 200 ok and only the message changes.

`GET /v1/health` public

Full system status: MySQL + MongoDB connectivity, uptime in seconds, version, and timestamp. Suitable for a status dashboard or in-depth check.

public

Response — `200 OK` (all services healthy)

{
  "status": "ok",
  "message": "All systems operational",
  "timestamp": "2026-05-25T10:30:00.000Z",
  "version": "1.0.0",
  "environment": {
    "nodeEnv": "production",
    "appEnv": "prod"
  },
  "services": {
    "database": { "status": "healthy", "responseTime": 15, "message": "SQL database reachable" },
    "mongodb": { "status": "healthy", "responseTime": 8, "message": "MongoDB reachable" },
    "smtp": { "status": "healthy", "message": "SMTP server reachable" },
    "objectStorage": { "status": "healthy", "message": "Object storage reachable" }
  },
  "uptime": 3600,
  "responseTime": 24,
  "checkTimeoutMs": 1000
}

Response — `503 Service Unavailable` (maintenance mode active)

{
  "status": "maintenance",
  "message": "Service is currently under maintenance. Please try again later.",
  "timestamp": "2026-05-25T10:30:00.000Z",
  "version": "1.0.0",
  "environment": {
    "nodeEnv": "production",
    "appEnv": "prod"
  },
  "services": {
    "database": { "status": "healthy", "message": "Skipped during maintenance mode" },
    "mongodb": { "status": "healthy", "message": "Skipped during maintenance mode" },
    "smtp": { "status": "disabled", "message": "Skipped during maintenance mode" },
    "objectStorage": { "status": "disabled", "message": "Skipped during maintenance mode" }
  },
  "uptime": 3600,
  "responseTime": 0,
  "checkTimeoutMs": 1000
}

Response — `503 Service Unavailable` (degraded — a critical service is unhealthy)

{
  "status": "degraded",
  "message": "Some critical services are experiencing issues",
  "timestamp": "2026-05-25T10:30:00.000Z",
  "version": "1.0.0",
  "environment": {
    "nodeEnv": "production",
    "appEnv": "prod"
  },
  "services": {
    "database": { "status": "unhealthy", "message": "Timed out after 1000ms" },
    "mongodb": { "status": "healthy", "responseTime": 8, "message": "MongoDB reachable" },
    "smtp": { "status": "healthy", "message": "SMTP server reachable" },
    "objectStorage": { "status": "healthy", "message": "Object storage reachable" }
  },
  "uptime": 3600,
  "responseTime": 1001,
  "checkTimeoutMs": 1000
}

The response shape is not wrapped in the global envelope — parse it directly as the object above.

status is driven only by the two critical dependencies, database and mongodb. Both healthy → ok (HTTP 200); either unhealthy → degraded (HTTP 503).
smtp and objectStorage are optional. If they are unhealthy the system stays ok (HTTP 200) but message becomes "Core systems operational; optional dependencies are experiencing issues". A service reads disabled when it isn’t wired/configured in the current runtime (or while maintenance mode is active).
environment echoes the runtime split: nodeEnv is the Node mode (always production on any deployed environment), appEnv is the deployment target (local / dev / integration / staging / demo / sandbox / prod).
uptime is seconds since process boot · responseTime is the total check time in ms · checkTimeoutMs is the per-dependency timeout (env HEALTH_CHECK_TIMEOUT_MS, default 1000).

`GET /v1/status` public

Simple status containing only status + message. Useful for monitoring that only needs to know OK/not without per-service detail.

public

Response — `200 OK`

{ "status": "ok", "message": "API is healthy and operational" }

Response — `503 Service Unavailable` (maintenance)

{ "status": "maintenance", "message": "API is currently under maintenance. Please try again later." }

Response — `503 Service Unavailable` (degraded)

{ "status": "degraded", "message": "API is experiencing issues" }

`GET /v1/ping` public

Basic ping-pong. During maintenance, returns 503; otherwise, always 200 with body { "message": "pong" }.

public

Response — `200 OK`

{ "message": "pong", "timestamp": "2026-05-20T10:30:00.000Z" }

Response — `503 Service Unavailable` (maintenance)

{ "message": "API is under maintenance", "timestamp": "2026-05-20T10:30:00.000Z" }

`GET /v1/ready` public

Kubernetes-style readiness probe. 200 if the system is ready to accept traffic, 503 if not (maintenance or degraded).

public

Response — `200 OK`

{ "status": "ready", "message": "API is ready to accept traffic" }

Response — `503 Service Unavailable`

{
  "status": "not-ready",
  "message": "API is not ready to accept traffic",
  "reason": "API is currently under maintenance. Please try again later."
}

The reason field only appears on the 503 response and contains the message from getSimpleStatus().

`GET /v1/live` public

Kubernetes-style liveness probe. Always returns 200 as long as the process is still running — even in maintenance mode (the service is still “alive”).

public

Response — `200 OK`

{ "status": "alive", "message": "API is alive" }

Admin endpoint

`POST /v1/admin/maintenance/toggle` bearer

Toggle the system-wide maintenance mode. When enabled = true, all public probe endpoints (/health, /status, /ping, /ready) will return maintenance status / HTTP 503 until toggled off.

bearer can-manage-system

Request body — `MaintenanceModeDto`

Field	Type	Required	Notes
`enabled`	boolean	✓	`true` to enable, `false` to disable
`message`	string	optional	Custom message returned in the `data.message` field. Default follows `enabled` (`"System is under maintenance"` / `"System is operational"`)

Example request

{ "enabled": true, "message": "Backend upgrade for 30 minutes" }

Response — `200 OK`

{
  "status": "success",
  "statusCode": 200,
  "message": "Maintenance mode enabled successfully",
  "data": {
    "maintenanceMode": true,
    "timestamp": "2026-05-20T10:30:00.000Z",
    "message": "Backend upgrade for 30 minutes"
  }
}

Errors

Status	When it occurs
`400 Bad Request`	`enabled` is not boolean / empty, or there is an unknown field
`401 Unauthorized`	Bearer/cookie token is invalid
`403 Forbidden`	Caller lacks `can-manage-system` permission

Side effects

Modifies the in-memory maintenanceMode flag in HealthService. Not persistent — restarting the process resets it to the value of env MAINTENANCE_MODE.
Does not affect DB data; purely gates health probe responses.
On multi-replica deployments, the toggle must be sent to each instance (or use an external propagation mechanism — currently none).

Reference

Enum: `HealthStatus.status`

ok — both critical services healthy → HTTP 200
maintenance — maintenance flag active → HTTP 503
degraded — at least one critical service (DB or Mongo) unhealthy → HTTP 503

Enum: `services.*.status`

Reported for each of the four dependencies — database, mongodb (critical), smtp, objectStorage (optional):

healthy — check succeeded
unhealthy — check failed (timeout / exception)
disabled — dependency not wired/configured in this runtime (e.g. SMTP or object storage absent), or skipped while maintenance mode is active

Env vars

API_VERSION — appears in the version field (default 1.0.0)
MAINTENANCE_MODE=true — initial flag at boot
HEALTH_CHECK_TIMEOUT_MS — per-dependency check timeout in ms, surfaced as checkTimeoutMs (default 1000)
NODE_ENV / APP_ENV — populate environment.nodeEnv / environment.appEnv (runtime mode vs deployment target)

Standard error envelope (admin endpoint)

{
  "message": "Forbidden resource",
  "statusCode": 403,
  "error": "Forbidden"
}

The user-facing probe endpoints do not use this envelope — the per-endpoint shape is described in each card.

Common HTTP codes

200 system ok
503 system degraded (DB/Mongo unhealthy) or maintenance mode active
401 admin endpoint without token
403 admin endpoint without permission

Health

Summary

GET /v1/health public

Response — 200 OK (all services healthy)

Response — 503 Service Unavailable (maintenance mode active)

Response — 503 Service Unavailable (degraded — a critical service is unhealthy)

GET /v1/status public

Response — 200 OK

Response — 503 Service Unavailable (maintenance)

Response — 503 Service Unavailable (degraded)

GET /v1/ping public

Response — 200 OK

Response — 503 Service Unavailable (maintenance)

GET /v1/ready public

Response — 200 OK

Response — 503 Service Unavailable

GET /v1/live public

Response — 200 OK

Admin endpoint

POST /v1/admin/maintenance/toggle bearer

Request body — MaintenanceModeDto

Example request

Response — 200 OK

Errors

Side effects

Reference

Enum: HealthStatus.status

Enum: services.*.status

Env vars

Standard error envelope (admin endpoint)

Common HTTP codes

`GET /v1/health` public

Response — `200 OK` (all services healthy)

Response — `503 Service Unavailable` (maintenance mode active)

Response — `503 Service Unavailable` (degraded — a critical service is unhealthy)

`GET /v1/status` public

Response — `200 OK`

Response — `503 Service Unavailable` (maintenance)

Response — `503 Service Unavailable` (degraded)

`GET /v1/ping` public

Response — `200 OK`

Response — `503 Service Unavailable` (maintenance)

`GET /v1/ready` public

Response — `200 OK`

Response — `503 Service Unavailable`

`GET /v1/live` public

Response — `200 OK`

`POST /v1/admin/maintenance/toggle` bearer

Request body — `MaintenanceModeDto`

Response — `200 OK`

Enum: `HealthStatus.status`

Enum: `services.*.status`