Module resty.healthcheck
Healthcheck library for OpenResty.
Some notes on the usage of this library:
Each target will have 4 counters, 1 success counter and 3 failure counters (‘http’, ‘tcp’, and ‘timeout’). Any failure will only reset the success counter, but a success will reset all three failure counters.
All targets are uniquely identified by their IP address and port number combination, most functions take those as arguments.
All keys in the SHM will be namespaced by the healthchecker name as provided to the new function. Hence no collissions will occur on shm-keys as long as the
name
is unique.Active healthchecks will be synchronized across workers, such that only a single active healthcheck runs.
Events will be raised in every worker, see lua-resty-worker-events for details.
Info:
- Copyright: 2017-2023 Kong Inc.
- License: Apache 2.0
- Author: Hisham Muhammad, Thijs Schreijer
Functions
run_locked (self, key, fn, ...) | Acquire a lock and run a function
The function call itself is wrapped with pcall to protect against exception. |
Tables
checker.events | The list of potential events generated. |
Node management
checker:add_target (ip, port, hostname, is_healthy, hostheader) | Add a target to the healthchecker. |
checker:clear () | Clear all healthcheck data. |
checker:delayed_clear (delay) | Clear all healthcheck data after a period of time. |
checker:get_target_status (ip, port, hostname) | Get the current status of the target. |
checker:remove_target (ip, port, hostname) | Remove a target from the healthchecker. |
Health management
checker:report_failure (ip, port, hostname, check) | Report a health failure. |
checker:report_http_status (ip, port, hostname, http_status, check) | Report a http response code. |
checker:report_success (ip, port, hostname, check) | Report a health success. |
checker:report_tcp_failure (ip, port, hostname, operation, check) | Report a failure on TCP level. |
checker:report_timeout (ip, port, hostname, check) | Report a timeout failure. |
checker:set_all_target_statuses_for_hostname (hostname, port, is_healthy) | Sets the current status of all targets with the given hostname and port. |
checker:set_target_status (ip, port, hostname, is_healthy) | Sets the current status of the target. |
Initializing
checker:start () | Start the background health checks. |
checker:stop () | Stop the background health checks. |
new (opts) | Creates a new health-checker instance. |
Functions
- run_locked (self, key, fn, ...)
-
Acquire a lock and run a function
The function call itself is wrapped with pcall to protect against exception.
This function exhibits some special behavior when called during a non-yieldable phase such as
init_worker
orlog
:- The lock timeout is set to 0 to ensure that
resty.lock
does not attempt to sleep/yield - If acquiring the lock fails due to a timeout, run_locked
(this function) is re-scheduled to run in a timer. In this case,
the function returns
"scheduled"
Parameters:
- self The checker object
- key the key/identifier to acquire a lock for
- fn The function to execute
- ... arguments that will be passed to fn
Returns:
-
The results of the function; or nil and an error message
in case it fails locking.
- The lock timeout is set to 0 to ensure that
Tables
- checker.events
-
The list of potential events generated.
The
checker.EVENT_SOURCE
field can be used to subscribe to the events, see the example below. Each of the events will get a table passed containing the target detailsip
,port
, andhostname
. See lua-resty-worker-events.Fields:
- remove Event raised when a target is removed from the checker.
- healthy
This event is raised when the target status changed to
healthy (and when a target is added as
healthy
). - unhealthy
This event is raised when the target status changed to
unhealthy (and when a target is added as
unhealthy
). - mostly_healthy This event is raised when the target status is still healthy but it started to receive “unhealthy” updates via active or passive checks.
- mostly_unhealthy This event is raised when the target status is still unhealthy but it started to receive “healthy” updates via active or passive checks.
Usage:
-- Register for all events from
my_checker
local event_callback = function(target, event, source, source_PID) local t = target.ip .. ":" .. target.port .." by name '" .. target.hostname .. "' ") if event == my_checker.events.remove then print(t .. "has been removed") elseif event == my_checker.events.healthy then print(t .. "is now healthy") elseif event == my_checker.events.unhealthy then print(t .. "is now unhealthy") end end worker_events.register(event_callback, my_checker.EVENT_SOURCE)
Node management
- checker:add_target (ip, port, hostname, is_healthy, hostheader)
-
Add a target to the healthchecker.
When the ip + port + hostname combination already exists, it will simply
return success (without updating
is_healthy
status).Parameters:
- ip IP address of the target to check.
- port the port to check against.
- hostname (optional) hostname to set as the host header in the HTTP probe request
- is_healthy
(optional) a boolean value indicating the initial state,
default is
true
. - hostheader (optional) a value to use for the Host header on active healthchecks.
Returns:
true
on success, ornil + error
on failure. - checker:clear ()
-
Clear all healthcheck data.
Returns:
true
on success, ornil + error
on failure. - checker:delayed_clear (delay)
-
Clear all healthcheck data after a period of time.
Useful for keeping target status between configuration reloads.
Parameters:
- delay delay in seconds before purging target state.
Returns:
true
on success, ornil + error
on failure. - checker:get_target_status (ip, port, hostname)
-
Get the current status of the target.
Parameters:
- ip IP address of the target being checked.
- port the port being checked against.
- hostname the hostname of the target being checked.
Returns:
true
if healthy,false
if unhealthy, ornil + error
on failure. - checker:remove_target (ip, port, hostname)
-
Remove a target from the healthchecker.
The target not existing is not considered an error.
Parameters:
- ip IP address of the target being checked.
- port the port being checked against.
- hostname (optional) hostname of the target being checked.
Returns:
true
on success, ornil + error
on failure.
Health management
- checker:report_failure (ip, port, hostname, check)
-
Report a health failure.
Reports a health failure which will count against the number of occurrences
required to make a target “fall”. The type of healthchecker,
“tcp” or “http” (see new) determines against which counter the occurence goes.
If
unhealthy.tcp_failures
(for TCP failures) orunhealthy.http_failures
is set to zero in the configuration, this function is a no-op and returnstrue
.Parameters:
- ip IP address of the target being checked.
- port the port being checked against.
- hostname (optional) hostname of the target being checked.
- check (optional) the type of check, either “passive” or “active”, default “passive”.
Returns:
true
on success, ornil + error
on failure. - checker:report_http_status (ip, port, hostname, http_status, check)
-
Report a http response code.
How the code is interpreted is based on the configuration for healthy and
unhealthy statuses. If it is in neither strategy, it will be ignored.
If
healthy.successes
(for healthy HTTP status codes) orunhealthy.http_failures
(fur unhealthy HTTP status codes) is set to zero in the configuration, this function is a no-op and returnstrue
.Parameters:
- ip IP address of the target being checked.
- port the port being checked against.
- hostname (optional) hostname of the target being checked.
- http_status the http statuscode, or nil to report an invalid http response.
- check (optional) the type of check, either “passive” or “active”, default “passive”.
Returns:
true
on success,nil
if the status was ignored (not in active or passive health check lists) ornil + error
on failure. - checker:report_success (ip, port, hostname, check)
-
Report a health success.
Reports a health success which will count against the number of occurrences
required to make a target “rise”.
If
healthy.successes
is set to zero in the configuration, this function is a no-op and returnstrue
.Parameters:
- ip IP address of the target being checked.
- port the port being checked against.
- hostname (optional) hostname of the target being checked.
- check (optional) the type of check, either “passive” or “active”, default “passive”.
Returns:
true
on success, ornil + error
on failure. - checker:report_tcp_failure (ip, port, hostname, operation, check)
-
Report a failure on TCP level.
If
unhealthy.tcp_failures
is set to zero in the configuration, this function is a no-op and returnstrue
.Parameters:
- ip IP address of the target being checked.
- port the port being checked against.
- hostname hostname of the target being checked.
- operation The socket operation that failed: “connect”, “send” or “receive”. TODO check what kind of information we get from the OpenResty layer in order to tell these error conditions apart https://github.com/openresty/lua-resty-core/blob/master/lib/ngx/balancer.md#get_last_failure
- check (optional) the type of check, either “passive” or “active”, default “passive”.
Returns:
true
on success, ornil + error
on failure. - checker:report_timeout (ip, port, hostname, check)
-
Report a timeout failure.
If
unhealthy.timeouts
is set to zero in the configuration, this function is a no-op and returnstrue
.Parameters:
- ip IP address of the target being checked.
- port the port being checked against.
- hostname (optional) hostname of the target being checked.
- check (optional) the type of check, either “passive” or “active”, default “passive”.
Returns:
true
on success, ornil + error
on failure. - checker:set_all_target_statuses_for_hostname (hostname, port, is_healthy)
-
Sets the current status of all targets with the given hostname and port.
Parameters:
- hostname hostname being checked.
- port the port being checked against
- is_healthy
boolean:
true
for healthy,false
for unhealthy
Returns:
true
on success, ornil + error
on failure. - checker:set_target_status (ip, port, hostname, is_healthy)
-
Sets the current status of the target.
This will immediately set the status and clear its counters.
Parameters:
- ip IP address of the target being checked
- port the port being checked against
- hostname (optional) hostname of the target being checked.
- is_healthy
boolean:
true
for healthy,false
for unhealthy
Returns:
true
on success, ornil + error
on failure
Initializing
- checker:start ()
-
Start the background health checks.
Returns:
true
, ornil + error
. - checker:stop ()
-
Stop the background health checks.
The timers will be flagged to exit, but will not exit immediately. Only
after the current timers have expired they will be marked as stopped.
Returns:
true
- new (opts)
-
Creates a new health-checker instance.
It will be started upon creation.
NOTE: the returned
checker
object must be anchored, if not it will be removed by Lua’s garbage collector and the healthchecks will cease to run.Parameters:
- opts
table with checker options. Options are:
name
: name of the health checkershm_name
: the name of thelua_shared_dict
specified in the Nginx configuration to usessl_cert
: certificate for mTLS connections (string or parsed object)ssl_key
: key for mTLS connections (string or parsed object)checks.active.type
: “http”, “https” or “tcp” (default is “http”)checks.active.timeout
: socket timeout for active checks (in seconds)checks.active.concurrency
: number of targets to check concurrentlychecks.active.http_path
: path to use inGET
HTTP request to run on active checkschecks.active.https_sni
: SNI server name incase of HTTPSchecks.active.https_verify_certificate
: boolean indicating whether to verify the HTTPS certificatechecks.active.headers
: one or more lists of values indexed by header namechecks.active.healthy.interval
: interval between checks for healthy targets (in seconds)checks.active.healthy.http_statuses
: which HTTP statuses to consider a successchecks.active.healthy.successes
: number of successes to consider a target healthychecks.active.unhealthy.interval
: interval between checks for unhealthy targets (in seconds)checks.active.unhealthy.http_statuses
: which HTTP statuses to consider a failurechecks.active.unhealthy.tcp_failures
: number of TCP failures to consider a target unhealthychecks.active.unhealthy.timeouts
: number of timeouts to consider a target unhealthychecks.active.unhealthy.http_failures
: number of HTTP failures to consider a target unhealthychecks.passive.type
: “http”, “https” or “tcp” (default is “http”; for passive checks, “http” and “https” are equivalent)checks.passive.healthy.http_statuses
: which HTTP statuses to consider a failurechecks.passive.healthy.successes
: number of successes to consider a target healthychecks.passive.unhealthy.http_statuses
: which HTTP statuses to consider a successchecks.passive.unhealthy.tcp_failures
: number of TCP failures to consider a target unhealthychecks.passive.unhealthy.timeouts
: number of timeouts to consider a target unhealthychecks.passive.unhealthy.http_failures
: number of HTTP failures to consider a target unhealthy
If any of the health counters above (e.g.
checks.passive.unhealthy.timeouts
) is set to zero, the according category of checks is not taken to account. This way active or passive health checks can be disabled selectively.
Returns:
-
checker object, or
nil + error
- opts
table with checker options. Options are: