Bootstrap

Hive中,同时存在map、array、struct这三种格式,应如何在建表语句中指定分隔符?

Hive中存在map、array、和struct格式,那如果同时存在这三种格式时,建表语句的分隔符应该怎么指定呢?

一、 先说答案

先说答案:

create table test(
	name string,
	friends array,
	children map,
	address struct
)
row format delimited
fields terminated by ','
collection items terminated by '_'
map keys terminated by ':'
lines terminated by '\n';

字段解释:

row format delimited fields terminated by ','   /* 列分隔符 */
collection items terminated by '_'         /*  MAP STRUCT 和 ARRAY 的分隔符(数据分割
符号)  */
map keys terminated by ':'    /* MAP 中的 key 与 value 的分隔符    */
lines terminated by '\n';       /* 行分隔符  */

其中需要解释的地方其实只有两个:

①. collection items terminated by '_' ,在hive中,map、array、struct都使用collection items terminated by来指定,所以只能共用一个分隔符。

②. lines terminated by '\n', 不写也行,行分隔符默认就是 \n

二、 举个例子

假设有如下数据,需要插入到hive相关表中

{
	"name": "张三",
	"friends": ["李四" , "王五"] , //列表 Array,
	"children": { //键值 Map,
		"小李四": 18 ,
		"小王五": 19
	}
	"address": { //结构 Struct,
		"street": "大兴" ,
		"city": "北京"
	}
}

张三,李四_王五,小李四:18_小王五:19,大兴_北京

注意分隔符

create table test(
	name string,
	friends array,
	children map,
	address struct
)
row format delimited
fields terminated by ','
collection items terminated by '_'
map keys terminated by ':'
lines terminated by '\n';

load data local inpath
"/home/software/data/test.txt" into table test;

访问map:

select 	friends[1], /* 这是访问array */
	children['xiaosong'], /* 这是访问map */
	address.city
	from test;