Skip to main content
  1. Posts/

Running/importing Python code/module in Databricks

··523 words·3 mins·
Python Databricks
Table of Contents

Databricks is centered around notebooks for doing a lot of works. As a result, it is often not obvious how to run or re-use code from another notebook or Python module. In this post, I want to share how to do this in the Databricks.

Run/import code from another notebook
#

Suppose that we have the following notebook1:

foo = dbutils.widgets.get("foo")
bar = dbutils.widgets.get("bar")

result = foo + '&' + bar


def show():
    print(result)

dbutils.notebook.exit(result)

In another notebook, we have two ways to “run” the code in notebook1.

%run command
#

We can use the %run magic to run the code in notebook1:

%run ./path/to/notebook1 $foo=“hello” $bar=“world”

The path of notebook1 is its relative path to the current notebook. We can give parameters to notebook1 via the ‘$’ assignment. In notebook1, you can then retrieve the value via dbutils.notebook.get() method.

It will “import” the functions and variables from notebook1 to the current notebooks. So you can use those functions in your current notebook. In this sense, it it like from some_module import * in Python.

using dbutils.notebook.run()
#

We can also run notebook1 like this:

dbutils.notebook.run(path/to/notebook1, 30, {foo: fooVal, bar: barVal})

The parameters for notebook1 is provided in the 3rd argument.

This will run notebook1 in a ephemeral job. No functions or variable from that notebook will be exposed to your current notebook.

To return results from called notebook, we can use dbutils.notebook.exit(“result_str”). This is rather limited, but it seems currently only string result is supported. You can be creative in how to interpret the returned string though, e.g., as the name of table.

Import a Python module
#

Since Databricks runtime 11.3, we can import Python module in the workspace. Create a normal Python module like you often do, and you can then import that module in a notebook. It conforms better to the Python ideology of “explicit is better than implicit”.

I create a demo directory test the feature, which has the following structure.

.
|____hello.py
|____my-notebook
|____utility
| |____math_ops.py

Content of math_ops.py:

def power(x, y):
    return x**y

Content of hello.py:

print("hello")

In the notebook my-notebook, we can run the following code without error:

import hello

import utility.math_ops as m_op

m_op.power(2, 3)

autoreload
#

Since Databricks runtime 11.0, for the notebooks, it is using the Ipython kernel under the hood (source here). Note that like typical Ipython notebook, when you import a Python module, the module is not reloaded when there are changes to the module code by default. For example, if you add a new method or change method code for the module, it will not work.

Instead, we need to load the autoreload extension for the notebook (also mentioned in the official doc):

%load_ext autoreload
%autoreload 2

To check the documentation of autoreload, you can run %autoreload?.

If you don’t want to the autoreload magic, to make your module update work, you have to manually detach and attach to the cluster.

ref:

References
#

Related

Make Python logging Work in GCP
·570 words·3 mins
Python Logging GCP
Speed up document indexing in Elasticsearch via bulk indexing
·355 words·2 mins
Python Elasticsearch
Configure Python logging with dictConfig
··503 words·3 mins
Python Logging