Getting started with JupyterHub

This document describes some of the basics of configuring JupyterHub to do what you want. JupyterHub is highly customizable, so there’s a lot to cover.

Installation

See the readme for help installing JupyterHub.

Overview

JupyterHub is a set of processes that together provide a multiuser Jupyter Notebook server. There are three main categories of processes run by the jupyterhub command line program:

  • Single User Server: a dedicated, single-user, Jupyter Notebook is started for each user on the system when they log in. The object that starts these processes is called a Spawner.
  • Proxy: the public facing part of the server that uses a dynamic proxy to route HTTP requests to the Hub and Single User Servers.
  • Hub: manages user accounts and authentication and coordinates Single Users Servers using a Spawner.

JupyterHub’s default behavior

IMPORTANT: In its default configuration, JupyterHub requires SSL encryption (HTTPS) to run. You should not run JupyterHub without SSL encryption on a public network. See Security documentation for how to configure JupyterHub to use SSL, and in certain cases, e.g. behind SSL termination in nginx, allowing the hub to run with no SSL by requiring --no-ssl (as of version 0.5).

To start JupyterHub in its default configuration, type the following at the command line:

sudo jupyterhub

The default Authenticator that ships with JupyterHub authenticates users with their system name and password (via PAM). Any user on the system with a password will be allowed to start a single-user notebook server.

The default Spawner starts servers locally as each user, one dedicated server per user. These servers listen on localhost, and start in the given user’s home directory.

By default, the Proxy listens on all public interfaces on port 8000. Thus you can reach JupyterHub through either:

http://localhost:8000

or any other public IP or domain pointing to your system.

In their default configuration, the other services, the Hub and Single-User Servers, all communicate with each other on localhost only.

By default, starting JupyterHub will write two files to disk in the current working directory:

  • jupyterhub.sqlite is the sqlite database containing all of the state of the Hub. This file allows the Hub to remember what users are running and where, as well as other information enabling you to restart parts of JupyterHub separately.
  • jupyterhub_cookie_secret is the encryption key used for securing cookies. This file needs to persist in order for restarting the Hub server to avoid invalidating cookies. Conversely, deleting this file and restarting the server effectively invalidates all login cookies. The cookie secret file is discussed in the [Cookie Secret documentation](#Cookie secret).

The location of these files can be specified via configuration, discussed below.

How to configure JupyterHub

JupyterHub is configured in two ways:

  1. Configuration file
  2. Command-line arguments

Configuration file

By default, JupyterHub will look for a configuration file (which may not be created yet) named jupyterhub_config.py in the current working directory. You can create an empty configuration file with:

jupyterhub --generate-config

This empty configuration file has descriptions of all configuration variables and their default values. You can load a specific config file with:

jupyterhub -f /path/to/jupyterhub_config.py

See also: general docs on the config system Jupyter uses.

Command-line arguments

Type the following for brief information about the command-line arguments:

jupyterhub -h

or:

jupyterhub --help-all

for the full command line help.

All configurable options are technically configurable on the command-line, even if some are really inconvenient to type. Just replace the desired option, c.Class.trait, with –Class.trait. For example, to configure c.Spawner.notebook_dir = ‘~/assignments’ from the command-line:

jupyterhub --Spawner.notebook_dir='~/assignments'

Networking

Configuring the Proxy’s IP address and port

The Proxy’s main IP address setting determines where JupyterHub is available to users. By default, JupyterHub is configured to be available on all network interfaces ('') on port 8000. Note: Use of '*' is discouraged for IP configuration; instead, use of '0.0.0.0' is preferred.

Changing the IP address and port can be done with the following command line arguments:

jupyterhub --ip=192.168.1.2 --port=443

Or by placing the following lines in a configuration file:

c.JupyterHub.ip = '192.168.1.2'
c.JupyterHub.port = 443

Port 443 is used as an example since 443 is the default port for SSL/HTTPS.

Configuring only the main IP and port of JupyterHub should be sufficient for most deployments of JupyterHub. However, more customized scenarios may need additional networking details to be configured.

Configuring the Proxy’s REST API communication IP address and port (optional)

The Hub service talks to the proxy via a REST API on a secondary port, whose network interface and port can be configured separately. By default, this REST API listens on port 8081 of localhost only.

If running the Proxy separate from the Hub, configure the REST API communication IP address and port with:

# ideally a private network address
c.JupyterHub.proxy_api_ip = '10.0.1.4'
c.JupyterHub.proxy_api_port = 5432

Configuring the Hub if Spawners or Proxy are remote or isolated in containers

The Hub service also listens only on localhost (port 8080) by default. The Hub needs needs to be accessible from both the proxy and all Spawners. When spawning local servers, an IP address setting of localhost is fine. If either the Proxy or (more likely) the Spawners will be remote or isolated in containers, the Hub must listen on an IP that is accessible.

c.JupyterHub.hub_ip = '10.0.1.4'
c.JupyterHub.hub_port = 54321

Security

IMPORTANT: In its default configuration, JupyterHub requires SSL encryption (HTTPS) to run. You should not run JupyterHub without SSL encryption on a public network.

Security is the most important aspect of configuring Jupyter. There are three main aspects of the security configuration:

  1. SSL encryption (to enable HTTPS)
  2. Cookie secret (a key for encrypting browser cookies)
  3. Proxy authentication token (used for the Hub and other services to authenticate to the Proxy)

SSL encryption

Since JupyterHub includes authentication and allows arbitrary code execution, you should not run it without SSL (HTTPS). This will require you to obtain an official, trusted SSL certificate or create a self-signed certificate. Once you have obtained and installed a key and certificate you need to specify their locations in the configuration file as follows:

c.JupyterHub.ssl_key = '/path/to/my.key'
c.JupyterHub.ssl_cert = '/path/to/my.cert'

It is also possible to use letsencrypt (https://letsencrypt.org/) to obtain a free, trusted SSL certificate. If you run letsencrypt using the default options, the needed configuration is (replace your.domain.com by your fully qualified domain name):

c.JupyterHub.ssl_key = '/etc/letsencrypt/live/your.domain.com/privkey.pem'
c.JupyterHub.ssl_cert = '/etc/letsencrypt/live/your.domain.com/fullchain.pem'

Some cert files also contain the key, in which case only the cert is needed. It is important that these files be put in a secure location on your server, where they are not readable by regular users.

Note: In certain cases, e.g. behind SSL termination in nginx, allowing no SSL running on the hub may be desired. To run the Hub without SSL, you must opt in by configuring and confirming the --no-ssl option, added as of version 0.5.

Proxy authentication token

The Hub authenticates its requests to the Proxy using a secret token that the Hub and Proxy agree upon. The value of this string should be a random string (for example, generated by openssl rand -hex 32). You can pass this value to the Hub and Proxy using either the CONFIGPROXY_AUTH_TOKEN environment variable:

export CONFIGPROXY_AUTH_TOKEN=`openssl rand -hex 32`

This environment variable needs to be visible to the Hub and Proxy.

Or you can set the value in the configuration file:

c.JupyterHub.proxy_auth_token = '0bc02bede919e99a26de1e2a7a5aadfaf6228de836ec39a05a6c6942831d8fe5'

If you don’t set the Proxy authentication token, the Hub will generate a random key itself, which means that any time you restart the Hub you must also restart the Proxy. If the proxy is a subprocess of the Hub, this should happen automatically (this is the default configuration).

Another time you must set the Proxy authentication token yourself is if you want other services, such as nbgrader to also be able to connect to the Proxy.

Configuring authentication

The default Authenticator uses PAM to authenticate system users with their username and password. The default behavior of this Authenticator is to allow any user with an account and password on the system to login. You can restrict which users are allowed to login with Authenticator.whitelist:

c.Authenticator.whitelist = {'mal', 'zoe', 'inara', 'kaylee'}

Admin users of JupyterHub have the ability to take actions on users’ behalf, such as stopping and restarting their servers, and adding and removing new users from the whitelist. Any users in the admin list are automatically added to the whitelist, if they are not already present. The set of initial Admin users can configured as follows:

c.Authenticator.admin_users = {'mal', 'zoe'}

If JupyterHub.admin_access is True (not default), then admin users have permission to log in as other users on their respective machines, for debugging. You should make sure your users know if admin_access is enabled.

Adding and removing users

Users can be added and removed to the Hub via the admin panel or REST API. These users will be added to the whitelist and database. Restarting the Hub will not require manually updating the whitelist in your config file, as the users will be loaded from the database. This means that after starting the Hub once, it is not sufficient to remove users from the whitelist in your config file. You must also remove them from the database, either by discarding the database file, or via the admin UI.

The default PAMAuthenticator is one case of a special kind of authenticator, called a LocalAuthenticator, indicating that it manages users on the local system. When you add a user to the Hub, a LocalAuthenticator checks if that user already exists. Normally, there will be an error telling you that the user doesn’t exist. If you set the configuration value

c.LocalAuthenticator.create_system_users = True

however, adding a user to the Hub that doesn’t already exist on the system will result in the Hub creating that user via the system adduser command line tool. This option is typically used on hosted deployments of JupyterHub, to avoid the need to manually create all your users before launching the service. It is not recommended when running JupyterHub in situations where JupyterHub users maps directly onto UNIX users.

Configuring single-user servers

Since the single-user server is an instance of jupyter notebook, an entire separate multi-process application, there are many aspect of that server can configure, and a lot of ways to express that configuration.

At the JupyterHub level, you can set some values on the Spawner. The simplest of these is Spawner.notebook_dir, which lets you set the root directory for a user’s server. This root notebook directory is the highest level directory users will be able to access in the notebook dashboard. In this example, the root notebook directory is set to ~/notebooks, where ~ is expanded to the user’s home directory.

c.Spawner.notebook_dir = '~/notebooks'

You can also specify extra command-line arguments to the notebook server with:

c.Spawner.args = ['--debug', '--profile=PHYS131']

This could be used to set the users default page for the single user server:

c.Spawner.args = ['--NotebookApp.default_url=/notebooks/Welcome.ipynb']

Since the single-user server extends the notebook server application, it still loads configuration from the ipython_notebook_config.py config file. Each user may have one of these files in $HOME/.ipython/profile_default/. IPython also supports loading system-wide config files from /etc/ipython/, which is the place to put configuration that you want to affect all of your users.

External services

JupyterHub has a REST API that can be used to run external services. More detail on this API will be added in the future.

File locations

It is recommended to put all of the files used by JupyterHub into standard UNIX filesystem locations.

  • /srv/jupyterhub for all security and runtime files
  • /etc/jupyterhub for all configuration files
  • /var/log for log files

Example

In the following example, we show a configuration files for a fairly standard JupyterHub deployment with the following assumptions:

  • JupyterHub is running on a single cloud server
  • Using SSL on the standard HTTPS port 443
  • You want to use GitHub OAuth for login
  • You need the users to exist locally on the server
  • You want users’ notebooks to be served from ~/assignments to allow users to browse for notebooks within other users home directories
  • You want the landing page for each user to be a Welcome.ipynb notebook in their assignments directory.
  • All runtime files are put into /srv/jupyterhub and log files in /var/log.

Let’s start out with jupyterhub_config.py:

# jupyterhub_config.py
c = get_config()

import os
pjoin = os.path.join

runtime_dir = os.path.join('/srv/jupyterhub')
ssl_dir = pjoin(runtime_dir, 'ssl')
if not os.path.exists(ssl_dir):
    os.makedirs(ssl_dir)


# https on :443
c.JupyterHub.port = 443
c.JupyterHub.ssl_key = pjoin(ssl_dir, 'ssl.key')
c.JupyterHub.ssl_cert = pjoin(ssl_dir, 'ssl.cert')

# put the JupyterHub cookie secret and state db
# in /var/run/jupyterhub
c.JupyterHub.cookie_secret_file = pjoin(runtime_dir, 'cookie_secret')
c.JupyterHub.db_url = pjoin(runtime_dir, 'jupyterhub.sqlite')
# or `--db=/path/to/jupyterhub.sqlite` on the command-line

# put the log file in /var/log
c.JupyterHub.log_file = '/var/log/jupyterhub.log'

# use GitHub OAuthenticator for local users

c.JupyterHub.authenticator_class = 'oauthenticator.LocalGitHubOAuthenticator'
c.GitHubOAuthenticator.oauth_callback_url = os.environ['OAUTH_CALLBACK_URL']
# create system users that don't exist yet
c.LocalAuthenticator.create_system_users = True

# specify users and admin
c.Authenticator.whitelist = {'rgbkrk', 'minrk', 'jhamrick'}
c.Authenticator.admin_users = {'jhamrick', 'rgbkrk'}

# start single-user notebook servers in ~/assignments,
# with ~/assignments/Welcome.ipynb as the default landing page
# this config could also be put in
# /etc/ipython/ipython_notebook_config.py
c.Spawner.notebook_dir = '~/assignments'
c.Spawner.args = ['--NotebookApp.default_url=/notebooks/Welcome.ipynb']

Using the GitHub Authenticator requires a few additional env variables, which we will need to set when we launch the server:

export GITHUB_CLIENT_ID=github_id
export GITHUB_CLIENT_SECRET=github_secret
export OAUTH_CALLBACK_URL=https://example.com/hub/oauth_callback
export CONFIGPROXY_AUTH_TOKEN=super-secret
jupyterhub -f /path/to/aboveconfig.py