Skip to main content
  1. Posts/

Change Timezone in Databricks Spark

··382 words·2 mins·
Note Databricks Spark
Table of Contents

The Databricks cluster is using UTC as the default timezone. So when you run some time-related code, the displayed time is not the local time, which is not ideal. In this post, I want to share how to change the timezone setting for Databricks cluster.

Change the system timezone
#

In the notebook cell, run the following to check what Linux system Databricks is using:

%sh
lsb_release -a

It shows the underlying system is Ubuntu. On Ubuntu, we can use timedatectl command line tool to change the timezone.

List timezones:

%sh
timedatectl list-timezones

Pick a timezone we want:

%sh
timedatectl set-timezone Asia/Shanghai

Note the above command does not impact the timezone setting of spark, since spark has already been started.

ref:

Change timezone for databricks spark
#

For the current session
#

If you only want to set the timezone for current spark session, just run the following statement:

spark.conf.set('spark.sql.session.timeZone', 'Asia/Shanghai')

The explanation for spark.sql.session.timeZone:

The ID of session local timezone in the format of either region-based zone IDs or zone offsets. Region IDs must have the form ‘area/city’, such as ‘America/Los_Angeles’. Zone offsets must be in the format ‘(+|-)HH’, ‘(+|-)HH: mm’ or ‘(+|-)HH:mm:ss’, e.g ‘-08’, ‘+01:00’ or ‘-13:33:33’. Also ‘UTC’ and ‘Z’ are supported as aliases of ‘+00:00’. Other short names are not recommended to use because they can be ambiguous.

Then you can run display(spark.sql("select current_timezone()")) in Databricks notebook cell to verify the change. However, if you create a new notebook using the same cluster, the timezone setting does not persist.

ref:

In the cluster config
#

In the Advanced options section of cluster setting page, under Spark tab, you can add the following config:

spark.sql.session.timeZone Asia/Shanghai

This will make sure every notebooks that attached to this cluster has the correct timezone setup.

databricks cluster init script
#

You can also set up the timezone in init script for the cluster. Just add someting like the following to the cluster init script:

timedatectl set-timezone Asia/Shanghai

Precedence of different settings
#

The precedence of setting up timezone of the above three methods:

setting in notebook session > cluster config > init script

ref:

Related

How to get or set Databricks spark configuration
··121 words·1 min
Note Spark Databricks
How to Download Files from Google Cloud Storage in the Databricks Workspace Notebook
··551 words·3 mins
Note Databricks GCP Ubuntu
Databricks Cli Usage
·141 words·1 min
Note Databricks