The Hub’s Database#
JupyterHub uses a database to store information about users, services, and other data needed for operating the Hub.
Default SQLite database#
The default database for JupyterHub is a SQLite database. We have chosen SQLite as JupyterHub’s default for its lightweight simplicity in certain uses such as testing, small deployments and workshops.
For production systems, SQLite has some disadvantages when used with JupyterHub:
upgrade-dbmay not work, and you may need to start with a fresh database
downgrade-dbwill not work if you want to rollback to an earlier version, so backup the
jupyterhub.sqlitefile before upgrading
The sqlite documentation provides a helpful page about when to use SQLite and where traditional RDBMS may be a better choice.
Using an RDBMS (PostgreSQL, MySQL)#
Notes and Tips#
The SQLite database should not be used on NFS. SQLite uses reader/writer locks
to control access to the database. This locking mechanism might not work
correctly if the database file is kept on an NFS filesystem. This is because
fcntl() file locking is broken on many NFS implementations. Therefore, you
should avoid putting SQLite database files on NFS since it will not handle well
multiple processes which might try to access the file at the same time.
We recommend using PostgreSQL for production if you are unsure whether to use MySQL or PostgreSQL or if you do not have a strong preference. There is additional configuration required for MySQL that is not needed for PostgreSQL.
MySQL / MariaDB#
You should use the
pymysqlsqlalchemy provider (the other one, MySQLdb, isn’t available for py3).
You also need to set
pool_recycleto some value (typically 60 - 300) which depends on your MySQL setup. This is necessary since MySQL kills connections serverside if they’ve been idle for a while, and the connection from the hub will be idle for longer than most connections. This behavior will lead to frustrating ‘the connection has gone away’ errors from sqlalchemy if
pool_recycleis not set.
If you use
utf8mb4collation with MySQL earlier than 5.7.7 or MariaDB earlier than 10.2.1 you may get an
1709, Index column size too largeerror. To fix this you need to set
innodb_large_prefixto enabled and
Barracudato allow for the index sizes jupyterhub uses.
row_formatwill be set to
DYNAMICas long as those options are set correctly. Later versions of MariaDB and MySQL should set these values by default, as well as have a default
row_formatand pose no trouble to users.