Skip to main content

LATERAL VIEW EXPLODE in Spark

·285 words·2 mins

For array type column, explode() will convert it to n rows, where n is the number of elements in the array. For map/dictionary type column, explode() will convert it to nx2 shape, i.e., n rows, 2 columns (for key and value).

Explode for array
#

CREATE TABLE IF NOT EXISTS my_table (name STRING, books ARRAY<STRING>, grades ARRAY<FLOAT>);
INSERT INTO my_table VALUES
("Tom", array('abc', 'efs'), array(85, 70, 91)),
('Alice', array('wer', 'mzdf'), array(70, 80, 90))

LATERAL VIEW explode will generate the different combinations of exploded columns. In the above case, column books has 2 elements, and column grades has 3 elements. So for each name, you now have 2x3=6 rows.

SELECT
  name, book_name, grade
FROM my_table
LATERAL VIEW explode(books) AS book_name
LATERAL VIEW explode(grades) AS grade

The generated table is like this:

name book_name grade
Alice wer 70
Alice wer 80
Alice wer 90
Alice mzdf 70
Alice mzdf 80
Alice mzdf 90
Tom abc 85
Tom abc 70
Tom abc 91
Tom efs 85
Tom efs 70
Tom efs 91

Explode for map
#

CREATE TABLE IF NOT EXISTS demo_table (name STRING, cnt MAP<STRING, INT>);
INSERT INTO demo_table VALUES
("ABC", map_from_entries(array(("k1", 1), ("k2", 2)))),
("BCD", map_from_entries(array(("k1", 3), ("K2", 5)))),
("YUV", map_from_entries(array(("k1", 9), ("k2", 8), ("k3", 7))))

The demo_table:

name cnt
YUV {“k1”:9,“k2”:8,“k3”:7}
ABC {“k1”:1,“k2”:2}
BCD {“k1”:3,“K2”:5}

We can run LATERAL VIEW explode on the column cnt:

SELECT
  name, key, val
FROM nn_ecp.demo_table
LATERAL VIEW explode(cnt) AS key, val

The generated table is like this:

name key val
YUV k1 9
YUV k2 8
YUV k3 7
ABC k1 1
ABC k2 2
BCD k1 3
BCD K2 5

references
#