Monitoring#

This section covers details on monitoring the state of your JupyterHub installation.

JupyterHub expose the /metrics endpoint that returns text describing its current operational state formatted in a way Prometheus understands.

Prometheus is a separate open source tool that can be configured to repeatedly poll JupyterHub’s /metrics endpoint to parse and save its current state.

By doing so, Prometheus can describe JupyterHub’s evolving state over time. This evolving state can then be accessed through Prometheus that expose its underlying storage to those allowed to access it, and be presented with dashboards by a tool like Grafana.

List of Prometheus Metrics

Customizing the metrics prefix#

JupyterHub metrics all have a jupyterhub_ prefix. As of JupyterHub 5.0, this can be overridden with $JUPYTERHUB_METRICS_PREFIX environment variable in the Hub’s environment.

For example,

export JUPYTERHUB_METRICS_PREFIX=jupyterhub_prod

would result in the metric jupyterhub_prod_active_users, etc.

Customizing bucket sizes#

As of JupyterHub 5.3, the following environment variables in the Hub’s environment can be overridden to support custom bucket sizes - below are the defaults:

Variable	Default
`JUPYTERHUB_SERVER_SPAWN_DURATION_SECONDS_BUCKETS`	`0.5,1,2.5,5,10,15,30,60,120,180,300,600,inf`
`JUPYTERHUB_SERVER_STOP_DURATION_SECONDS_BUCKETS`	`0.005,0.01,0.025,0.05,0.075,0.1,0.25,0.5,0.75,1,2.5,5,7.5,10,inf`

For example,

export JUPYTERHUB_SERVER_SPAWN_DURATION_SECONDS_BUCKETS="1,2,4,6,12,30,60,120,inf"

Configuring metrics#

class jupyterhub.metrics.PeriodicMetricsCollector(*args: t.Any, **kwargs: t.Any)#

Collect metrics to be calculated periodically

active_users_enabled c.PeriodicMetricsCollector.active_users_enabled = Bool(True)#

Enable active_users prometheus metric.

Populates a active_users prometheus metric, with a label period that counts the time period over which these many users were active. Periods are 24h (24 hours), 7d (7 days) and 30d (30 days).

active_users_update_interval c.PeriodicMetricsCollector.active_users_update_interval = Int(3600)#

Number of seconds between updating active_users metrics.

To avoid extra load on the database, this is only calculated periodically rather than at per-minute intervals. Defaults to once an hour.

event_loop_interval_enabled c.PeriodicMetricsCollector.event_loop_interval_enabled = Bool(True)#

Enable event_loop_interval_seconds metric.

Measures event-loop responsiveness.

event_loop_interval_log_threshold c.PeriodicMetricsCollector.event_loop_interval_log_threshold = Float(1)#: Log when the event loop blocks for at least this many seconds.

event_loop_interval_resolution c.PeriodicMetricsCollector.event_loop_interval_resolution = Float(0.05)#

Interval (in seconds) on which to measure the event loop interval.

This is the _sensitivity_ of the event_loop_interval metric. Setting it too low (e.g. below 20ms) can end up slowing down the whole event loop by measuring too often, while setting it too high (e.g. above a few seconds) may limit its resolution and usefulness. The Prometheus Histogram populated by this metric doesn’t resolve differences below 25ms, so setting this below ~20ms won’t result in increased resolution of the histogram metric, except for the average value, computed by:

event_loop_interval_seconds_sum / event_loop_interval_seconds_count