Bootstrap

OpenLookeng连接器-Clickhouse connector性能测试报告

测试背景

OpenLooKeng是一款开源的高性能数据虚拟化引擎,提供统一SQL接口,可以实现对多个数据库的跨源异构和跨域跨DC查询。

ClickHouse是Yandex开源的一个用于实时数据分析的基于列存储的数据库,其工作速度比传统方法快100-1000倍,性能超过了目前市场上的列式存储数据库4。

ClickHouse connector是为OpenLookeng开发的用于访问Clickhouse数据源的连接器,截止测试时最新版本更新至2020-12-24日的,已支持对常见数据类型和函数的映射,详细支持列表可见

测试目的

使用SSB测试基准提供的数据和SQL查询语句,对OpenLookeng的ClickHouse connector的性能进行评估。

测试环境

系统信息

硬件信息

部署情况

数据集

规模

按照Clickhouse官方文档中提到的Star Schema测试集生成数据[1]。

通过ssb-dbgen工具设置参数和分别生成了两种规模的测试集,其规模如下,分别称为和。

的数据占用空间为5.18G,包含59986052条数据。

占用的空间为395.76G,数据量为4398761522。

测试语句

SSB测试语句总共包含13条测试语句,如下

Q1.1: SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYear(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25;

Q1.2: SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYYYYMM(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;

Q1.3: SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toISOWeek(LO_ORDERDATE) = 6 AND toYear(LO_ORDERDATE) = 1994 AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;

Q2.1: SELECT sum(LO_REVENUE) toYear(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' GROUP BY year P_BRAND ORDER BY year P_BRAND;

Q2.2: SELECT sum(LO_REVENUE) toYear(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA' GROUP BY year P_BRAND ORDER BY year P_BRAND;

Q2.3: SELECT sum(LO_REVENUE) toYear(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' GROUP BY year P_BRAND ORDER BY year P_BRAND;

Q3.1: SELECT C_NATION S_NATION toYear(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997 GROUP BY C_NATION S_NATION year ORDER BY year ASC revenue DESC;

Q3.2: SELECT C_CITY S_CITY toYear(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997 GROUP BY C_CITY S_CITY year ORDER BY year ASC revenue DESC;

Q3.3: SELECT C_CITY S_CITY toYear(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997 GROUP BY C_CITY S_CITY year ORDER BY year ASC revenue DESC;

Q3.4: SELECT C_CITY S_CITY toYear(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = 199712 GROUP BY C_CITY S_CITY year ORDER BY year ASC revenue DESC;

Q4.1: SELECT toYear(LO_ORDERDATE) AS year C_NATION sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year C_NATION ORDER BY year ASC C_NATION ASC;

Q4.2: SELECT toYear(LO_ORDERDATE) AS year S_NATION P_CATEGORY sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year S_NATION P_CATEGORY ORDER BY year ASC S_NATION ASC P_CATEGORY ASC;

Q4.3: SELECT toYear(LO_ORDERDATE) AS year S_CITY P_BRAND sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14' GROUP BY year S_CITY P_BRAND ORDER BY year ASC S_CITY ASC P_BRAND ASC;

根据OpenLookeng的语法,在功能不变的情况下对Clickhouse语法的sql语句进行一定的修改,主要为一下几点:

  • 将函数统一修改为

  • 将函数修改为

  • 将替换为

  • 将语法为 等修改为,即去掉对AS的使用

修改后的语法为

Q1.1 : SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE year(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25;

Q1.2 : SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE year(LO_ORDERDATE)*100+month(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;

Q1.3 : SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE week(LO_ORDERDATE) = 6 AND year(LO_ORDERDATE) = 1994 AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;

Q2.1 : SELECT sum(LO_REVENUE) year(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' GROUP BY year(LO_ORDERDATE) P_BRAND ORDER BY year(LO_ORDERDATE) P_BRAND;

Q2.2 : SELECT sum(LO_REVENUE) year(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA' GROUP BY year(LO_ORDERDATE) P_BRAND ORDER BY year(LO_ORDERDATE) P_BRAND;

Q2.3 : SELECT sum(LO_REVENUE) year(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' GROUP BY year(LO_ORDERDATE) P_BRAND ORDER BY year(LO_ORDERDATE) P_BRAND;

Q3.1 : SELECT C_NATION S_NATION year(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year(LO_ORDERDATE) >= 1992 AND year(LO_ORDERDATE) <= 1997 GROUP BY C_NATION S_NATION year(LO_ORDERDATE) ORDER BY year(LO_ORDERDATE) ASC revenue DESC;

Q3.2 : SELECT C_CITY S_CITY year(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year(LO_ORDERDATE) >= 1992 AND year(LO_ORDERDATE) <= 1997 GROUP BY C_CITY S_CITY year(LO_ORDERDATE) ORDER BY year(LO_ORDERDATE) ASC revenue DESC;

Q3.3 : SELECT C_CITY S_CITY year(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year(LO_ORDERDATE) >= 1992 AND year(LO_ORDERDATE) <= 1997 GROUP BY C_CITY S_CITY year(LO_ORDERDATE) ORDER BY year(LO_ORDERDATE) ASC revenue DESC;

Q3.4 : SELECT C_CITY S_CITY year(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year(LO_ORDERDATE)*100+month(LO_ORDERDATE) = 199712 GROUP BY C_CITY S_CITY year(LO_ORDERDATE) ORDER BY year(LO_ORDERDATE) ASC revenue DESC;

Q4.1 : SELECT year(LO_ORDERDATE) AS year C_NATION sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year(LO_ORDERDATE) C_NATION ORDER BY year(LO_ORDERDATE) ASC C_NATION ASC;

Q4.2 : SELECT year(LO_ORDERDATE) AS year S_NATION P_CATEGORY sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year(LO_ORDERDATE) = 1997 OR year(LO_ORDERDATE) = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year(LO_ORDERDATE) S_NATION P_CATEGORY ORDER BY year(LO_ORDERDATE) ASC S_NATION ASC P_CATEGORY ASC;

Q4.3 : SELECT year(LO_ORDERDATE) AS year S_CITY P_BRAND sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE S_NATION = 'UNITED STATES' AND (year(LO_ORDERDATE) = 1997 OR year(LO_ORDERDATE) = 1998) AND P_CATEGORY = 'MFGR#14' GROUP BY year(LO_ORDERDATE) S_CITY P_BRAND ORDER BY year(LO_ORDERDATE) ASC S_CITY ASC P_BRAND ASC;

测试方法

将数据到192.168.40.152节点上,将数据导入到192.168.40.223节点上

使用python脚本分别在Clickhouse和OpenLookeng上执行脚本,每条SQL语句执行间隔为10秒,多次执行取平均值。

其中Clickhouse的驱动使用的是开源库clickhouse-driver[2],OpenLookeng的驱动来源于对web端API的调用,代码已发布于github[3]

测试结果

ssb-10

在OpenLookeng上的执行结果

在Clickhouse上的执行结果

ssb-1000

在OpenLookeng上的执行结果

后续测试可能因为缓存原因,导致执行速度大幅度加快,部分查询时间短于Clickhouse

在Clickhouse上的执行结果

结果分析

ssb-10

对sql语句的平均时间进行对比,如下表,单位为秒

ssb-1000

对sql语句的平均时间进行对比,如下表

对可能使用缓存的测试结果的平均时间进行对比,如下表

结论后续

上述SQL查询结果显示通过ClickHouse connector来查询数据的方式相比直接使用CH的驱动仍有性能损耗,我们会和社区进一步交流,对原因进行分析并进行后续改进。

说明:OpenLooKeng开源不久,仍然持续发展中,该测试目的是针对新开发的ck connector的测试,后续会进行olk内核和connector的深度联调,我们会持续关注OLK社区的发展,对新版本的OLK作出更全面的分析。

[1]  

[2]  

[3]  

[4]