Skip to main content
  1. Posts/

LATERAL VIEW EXPLODE in Spark

·285 words·2 mins·
Table of Contents

For array type column, explode() will convert it to n rows, where n is the number of elements in the array. For map/dictionary type column, explode() will convert it to nx2 shape, i.e., n rows, 2 columns (for key and value).

Explode for array
#

CREATE TABLE IF NOT EXISTS my_table (name STRING, books ARRAY<STRING>, grades ARRAY<FLOAT>);
INSERT INTO my_table VALUES
("Tom", array('abc', 'efs'), array(85, 70, 91)),
('Alice', array('wer', 'mzdf'), array(70, 80, 90))

LATERAL VIEW explode will generate the different combinations of exploded columns. In the above case, column books has 2 elements, and column grades has 3 elements. So for each name, you now have 2x3=6 rows.

SELECT
  name, book_name, grade
FROM my_table
LATERAL VIEW explode(books) AS book_name
LATERAL VIEW explode(grades) AS grade

The generated table is like this:

namebook_namegrade
Alicewer70
Alicewer80
Alicewer90
Alicemzdf70
Alicemzdf80
Alicemzdf90
Tomabc85
Tomabc70
Tomabc91
Tomefs85
Tomefs70
Tomefs91

Explode for map
#

CREATE TABLE IF NOT EXISTS demo_table (name STRING, cnt MAP<STRING, INT>);
INSERT INTO demo_table VALUES
("ABC", map_from_entries(array(("k1", 1), ("k2", 2)))),
("BCD", map_from_entries(array(("k1", 3), ("K2", 5)))),
("YUV", map_from_entries(array(("k1", 9), ("k2", 8), ("k3", 7))))

The demo_table:

namecnt
YUV{“k1”:9,“k2”:8,“k3”:7}
ABC{“k1”:1,“k2”:2}
BCD{“k1”:3,“K2”:5}

We can run LATERAL VIEW explode on the column cnt:

SELECT
  name, key, val
FROM nn_ecp.demo_table
LATERAL VIEW explode(cnt) AS key, val

The generated table is like this:

namekeyval
YUVk19
YUVk28
YUVk37
ABCk11
ABCk22
BCDk13
BCDK25

references
#

Related

How to Read Local CSV File to Table in MySQL
··435 words·3 mins
Creating A Trigger in PostgreSQL
··366 words·2 mins
Change Timezone in Databricks Spark
··382 words·2 mins