We’re often engaged in consulting projects where we are asked about a range of different database options for scalability, query performance and reliability.
We’re commonly asked about ClickHouse as an option, likely because it’s free and queries are supposedly fast. Although both are true, it’s important to think about scalability, reliability and architecture change, like needing to join several tables. SingleStore is a distributed relational database known for speed, scale and its ability to join several tables. It is suited to many of the same use cases as ClickHoue, so it’s a good comparison.
In my line of consulting work, it’s not enough to offer anecdotes and opinions — the data is necessary to support my observations. Below you’ll find benchmark results against TPCH standard data.
The benchmarks for ClickHouse and SingleStore Cluster In a Box (CIAB) was performed on a 64GB, 8CPU, 200GB SSD Disk (similar to a r5.2xlarge EC2 instance). The dataset was stored on a 250 GB Digital Ocean Volume attached to the droplet. Data was ingested locally from the attached storage using the TPC-H benchmarks files — the largest file being 75GB (lineitem) for 600 MB rows.
SingleStore vs. ClickHouse Ingestion – 3 points
|Table Name||Total Rows||Total File Size||ClickHouse||SingleStore|
|lineitem||600,037,902||75 GB||5m 4s||11m 38s|
|orders||150,000,000||17GB||1m 18s||2m 49s|
|partsupp||80,000,000||12 GB||47s||1m 34s|
Data Loading was done using 8 files (1 file per table for TPC-H), residing on the attached storage, using a bulk load method (see file at end of article). We did not test ingestion using Singlestore Pipelines which performed better in another test (in AWS). For the large tables, ClickHouse performed much better on the data load, twice as fast for largest tables.
Although data load time is important, it’s not the most critical point. Our main goal was to show case how fast queries are against large tables in ClickHouse Vs Singlestore when using joins.
Note: While ingesting using Load Data Infile in SingleStore, querying the table (select count(*)) does not return records until the load is completed. This result is different when using SingleStore’s Pipelines, which allows you to query the tables as data loaded. The record count will update each time the SingleStore Pipelines commit the batch of records (which can be specified).
When it comes to ingestion, ClickHouse was twice faster on average then SingleStore. Singlestore gets one point because it’s possible to run a query against a table where a large amount of data is being ingested into, nolocking occurring using pipeline.
SingleStore pipeline ingestion is quite powerful. Not only can it connect to S3, Kafka, Azube Blog and HDFS, it can also support various formats including Parquet, CSV, TSV, JSON and more. SingleStore also offers transformational capabilities — Pipelines can be stopped and started again, without losing data. Lastly, because Pipelines are created with SQL, you can dynamically create and start them.
Points: SingleStore 1.5, ClickHouse 1.5
SingleStore vs. ClickHouse Queries – 3 points
Queries were performed on the same Digital Ocean instance. ClickHouse was installed to first perform the query test, then to perform the shutdown. Then, SingleStore was installed and set up as SingleStore-in-a-box (1 primary aggregator, and 1 leaf node).
|1||0s 0ms||0s 20ms||0|
|2||0s 322ms||0s 40ms||0|
|3||7s 727ms||2s 960ms||3|
|4||81s 626ms||0s 440ms||186|
|5||6s 470ms||0s 170ms||38|
|6||6s 359ms||0s 710ms||9|
|7||16s 397ms||18s 110ms||1|
|8||148s 0ms||3s 610ms||41|
|9||41s 135ms||4s 300ms||10|
|10||21s 876ms||8s 370ms||3|
|11||600s 0ms||21s 630ms||28|
Although ClickHouse ingests faster as seen in previous tests, results show that SingleStore clearly outperforms ClickHouse — especially when joining tables. Queries 1, 2 and 3 are simple queries against a single table: lineitem. As you can see, there are no major, notable differences between the two databases. These queries are handled within microsecond differences, which is not noticeable when manually running queries.
Performance starts to quickly degrade when ClickHouse starts joining tables (query 4 to 11 in the graph). Query 4 (joining 2 tables and doing a limit) takes 440 milliseconds in SingleStore, and 81 seconds in ClickHouse. Queries 8-11 were actually failing in ClickHouse until we increased the amount of available memory. Additionally, ClickHouse was unable to complete query 11 — even after assigning 50GB of memory. SingleStore completed the query in 21 seconds.
The shorter the bar, the better!
When it comes to queries, ClickHouse can quickly query a single table, with SingleStore closely matching performance. When ClickHouse must join tables, performance degrades considerably. This is why the benchmarks listed on ClickHouse’s website are always against single (flattened) tables.
Points: SingleStore 2.5, ClickHouse 0.5
We would like to give ClickHouse one-half point because queries against single tables are very fast. But that is where ClickHouse performance stops. We have not even tested queries with Common Table Expressions (CTE), which ClickHouse seems to support.
GUI Administration & Monitoring – 3 points
Administering and monitoring your database is critical. Clickhouse has some open source GUI but they seem pretty limited, mostly running SQL select queries. Monitoring is possible via Grafana.
SingleStore comes with SingleStore Studio, which allows you to monitor and get a great overview of the cluster’s overall health:
- The dashboard shows Cluster Health, pipeline status, cluster usage and database usage.
- It looks at the difference in host CPU consumption, disk space used and how much memory is consumed.
- Database Metadata: Users can look at each database and dive in to see the stats about each table (total rows, compression, how much memory / disk space is consumed).
- Active Queries: Similar to “show process list” in MySQL, this allows users to see running queries.
- Workload Monitoring: You can start workload monitoring which profiles the activities running on a cluster, tracking all queries being executed — and quickly identify those that are most resource intensive.
- Visual Explain: A query profile can be saved, then loaded, into Visual Explain to see a detailed query plan
- SQL Editor: One of the most popular features, this allows users to run queries within the browser (just like Snowflake)
- Pipelines: Shows pipeline running
Points: SingleStore 2.5, ClickHouse 0.5
Advanced Features – 3 points
SingleStore provides full redundancy out of the box when using a cluster with at least 2 aggregators and 2 leaf nodes. Leaf nodes can use the High Availability feature, allowing data to be copied on each leaf to provide full redundancy. If a leaf goes down, the cluster can still be used. ClickHouse can be used also as a cluster but the implementation, configuration and administration is not as simple as SingleStore.
SingleStore supports stored procedures. Pipelines can ingest into stored procedures, allowing you to transform data or maintain aggregates (for example, a materialized view).
Both SingleStore and ClickHouse support S3 as a storage engine, although SingleStore has implemented a more robust solution. In SingleStore, the S3 storage is created at the database level, meaning all tables created in that database will use S3 storage. In ClickHouse, the storage is at the table level. SingleStore has also a memory / disk caching layer for hot data when using S3 storage, enabling great performance. When using S3 as a storage layer for a database, data spills over to S3 if the disk gets full.
UDF, Time Series, Geospatial, etc.
SingleStore supports many advanced analytical functionalities including JSON extract, time series (time bucket), geospatial functions and more. The SingleStore database is really built for analytics.
PTR (Point-in-Time Recovery)
PTR enables system-of-record capabilities. Customers can now operate their SingleStore databases with the peace of mind that they can go back in time to any specific point, and restore any data lost from user error or failure.
Points: SingleStore 2.5, ClickHouse 0.5
Cost – 3 points
ClickHouse is free open source software, although there are now some paid options too. SingleStore provides a fully-featured free version for production that you can run up to 4 nodes in any environment you choose. For the cloud, SingleStore provides $500 in free credits if you prefer the managed service. ClickHouse does not provide redundancies — deploying a ClickHouse system in production is risky unless you have an in-house expert standing by. SingleStore’s support is excellent, and they’ll answer questions in their forum (even if you’re not a customer).  [LF2]
Points: SingleStore 1, ClickHouse 2
As DBAs with some Data Engineering experience, we can conclude that SingleStore offers a much stronger solution than ClickHouse. The performance of the queries when joining tables is obvious — queries were 3-186x faster.
In many cases, memory had to be increased using SET max_memory_usage = 40000000000 before running the query or it would fail. ClickHouse is another memory database that seems to heavily rely on scanning rows quickly in memory to generate results. Performance takes a big hit when tables need to be joined, which SingleStore handles without issue.
Furthermore, SingleStore consistently adds new features, improves its admin and monitoring tools and now supports S3 storage. The number of features available in SingleStore for analytics surpasses those of ClickHouse. SingleStore also supports modern data engineering ingestion, allowing ingestion from Kafka, S3 and more by just using a few lines of SQL code.