Clickhouse Database Origin
ClickHouse DB was created by Yandex, the Russian search engine company, and was first released in 2016. The original purpose of ClickHouse was to address the growing needs for a database management system that could handle massive amounts of data and provide fast and efficient processing capabilities for online analytical processing (OLAP) workloads. To compete in the industry, Clickhouse engine had to excel at storing data, data loading, and process complex analytical queries with minimal and query execution latency.
Yandex itself needed a system that could store and analyze large volumes of data generated by its search engine and advertising business in real-time. The company recognized that existing database technologies were not able to meet its performance and scalability requirements, so it decided to create its own system. At the time Snowflake and Singlestore did not exist.
What is ClickHouse?
ClickHouse is an open-source columnar database management system designed for online analytical processing (OLAP) of large volumes of data. It was developed by the Russian company Yandex and is now maintained by the ClickHouse community.
ClickHouse is designed for high performance, allowing for real-time data analysis on terabytes or petabytes of data. It uses a column-oriented data storage format, which allows for fast and efficient data compression, and efficient data processing. It’s designed to handle heavy analytical workloads, many users, batch ingestion, supports time series data, faster queries than regular OLTP databases, compressed storage, and more. ClickHouse also supports a wide range of data formats and query languages, including SQL, JSON, and CSV. It can be used for real time analytics, although most queries perform well for single tables type of queries. When joining multiple tables, the performance degrades. (as seen in the benchmark below)
ClickHouse is commonly used in data warehousing, big data analytics, ad hoc analytics, and business intelligence applications, where fast and efficient complex aggregations processing is critical. It is also used by companies in industries such as finance, advertising, and telecommunications to process and analyze large volumes of data in real-time.
Overall, ClickHouse is a powerful and flexible database management system that support very fast data ingestion and can handle massive amounts of data, can handle specific workloads, and provide efficient query processing and capabilities for a wide range of use cases.
What were the reasons ClickHouse became so successful?
1. Very fast OLAP queries
Analytical queries – High performance analytical queries
2. Highly efficient storage resources
Disruptive data compression – 10-100x storage efficiency over alternatives
3. Ease of use
Analyst-friendly SQL syntax, easy to get started, large ecosystem of integrations.
Evaluating ClickHouse Managed Solution
The idea of offering a managed solution came from several organizations who were using ClickHouse and had to carry the burden of setting up as well as providing maintenance for ClickHouse.
In the ClickHouse open-source version, you need to pick the instance type, the memory/disk size, setup multiple nodes yourself, decides on the number of replicas and configure sharding and replication, configure secure settings and then load our data into ClickHouse and run your queries. With ClickHouse managed solution version, we just need to click on few buttons to create a service then load our data and run our queries.
Note: Don’t forget that a series of additional work must be done after setting up Clickhouse using the open-source version. Monitoring the resource usage, scaling the server, backing up manually and upgrading manually. You might need to make your Clickhouse cluster fault tolerant, you may need three nodes or more, upgrade to the latest version, monitor the cluster resources, disk space … The list is very long. And this does not include the infrastructure costs.
Database As A Service
Most Cloud providers have been moving towards DAAS (Database as a Service) to avoid having to setup, manage, and maintain such database server. Customers want to load data and analyze the same data themselves, not hire a ClickHouse DBA to maintain the system. However, for large corporation, having ClickHouse in house might be more economical.
Installing ClickHouse Managed Service
The simplicity of installation is very attractive when you first encounter ClickHouse managed solution. Interestingly it is one of the unique features of this tool compared to other most distributed databases tools. When you want to run your Business Intelligence system at scale it’s hard to figure out exactly how to do it in a distributed setup. ClickHouse managed solution handles it all.
Installing ClickHouse Managed Service Steps
Start by going to ClickHouse Cloud website and creating an account. You need to fill out the details and verify your email address you can start working on ClickHouse.
Once you verified your email address, you can start and create a ClickHouse Cloud Service.
2. You must choose the Cloud Provider. In this example we selected AWS. You then specify the AWS region and choose between the development service and Production service. Fill out the Service name.
3. Next Step, you are asked to select where this cluster should be, and from where this service should be accessible from. This step is to make sure that your endpoints are not exposed directly to the internet.
For our use case I’m going to select specific locations and as you see my IP address is automatically detected at this point and then I just click continue and this service will only be accessible from where I am. If you want to allow more people or external services to connect to your service you can click on Add Entry and add their IP address as well.
4. Next, our Database user is created. Remember to copy your database credentials (username, password) because you’ll need these later.
Your ClickHouse Cloud Service is now created. You’re now presented with a screen allowing you to load data or to connect to the SQL Console. ClickHouse Cloud has kept their interface clean, simple and fast.
What is ClickHouse Cloud Service?
It is a managed ClickHouse cluster but it’s also a lot more.
Part of the service solution offering is ClickHouse’s team aims to abstract the complexity in the infrastructure and its architecture away from the user. There is no notion here of shards or any need to really understand the complexity ClickHouse replication. It’s all taken care of for you.
Note: At the moment of this evaluation, ClickHouse Cloud only support AWS but, in the future, as you can see in the screenshot, they will support GCP and Azure.
What’s the difference between DEVELOPMENT & PRODUCTION service
Loading Data into ClickHouse Cloud:
There are multiple ways to load data into ClickHouse Cloud. If you see the pictures below in the load data section, we can load data by using Native clients and HTTPS or uploading files right from ClickHouse cloud UI.
There are lots of ways to load data in Clickhouse. Native clients and HTTPS will cover most cases but for our test we decided to load data via Amazon S3:
1. First we created a new database and called it as tpch. So, in the query editor I run this code:
create database tpch;
Note: if you see the above picture, you can choose your database as well.
2. Using our DDL script, we created our tables into our tpch database.
3. We can load the data from S3 into ClickHouse database:
SELECT c1 as c_custkey, c2 as c_name, c3 as c_address, c4 as c_nationkey, c5 as c_phone, c6 as c_acctbal, c7 as c_mktsegment, c8 as c_comment FROM s3('https://xxx-datasets.s3.us-west-1.amazonaws.com/tpch100/lineitem.tbl.gz', 'xxxx', 'xxxx','CSV' ) SETTINGS format_csv_delimiter = '|';
If you want to know how you can load S3 data into Clickhouse, I’d suggest you read the ClickHouse documents. But for summary ClickHouse supports external tables, the ability to query data directly from S3 using the function described below:
s3(path, [aws_access_key_id, aws_secret_access_key,] [format, [structure, [compression]]])
Benchmarking ClickHouse Cloud
We have used a set of 10 queries, from simple table queries to more sophisticated query patterns, one joining multiple tables. To give you an idea of the size of the table, you’ll find below the total rows that were loaded into Clickhouse Cloud TPCH database.
The queries that were used are standard SQL, not using complex functions, and were not optimized. You can find out more about these queries here.
1. Query 1
It just took 0.0004s to execute. Excellent.
2. The second query return the min and max value of l_shipdate column for our biggest table and it took 1.213s to execute.
3. The 3rd query is combination of some SQL aggregation functions on our biggest table, group by 1 column order by and uses a WHERE clause. Query took 25.102s to execute.
4. For the 4th query we are joining tables and as you can see in the picture we join (part ,partsupp) table and it took 17.412s to execute.
5. Another join query with three table (part, partstupp, supplier) and it took 8.452s to execute.
6. In the 6th query we just remove p.p_partkey =1 from the above join and ran the query and it took 7.632s to execute.
7. Query 7 : Using join + Aggregation function and it face with memory allocation error!
“Code: 241. DB::Exception: Memory limit (total) exceeded: would use 7.20 GiB (attempt to allocate chunk of 4373900 bytes), maximum: 7.20 GiB. OvercommitTracker decision: Query was selected to stop by OvercommitTracker.: While executing SourceFromNativeStream. (MEMORY_LIMIT_EXCEEDED) (version 184.108.40.20601 (official build))n”
8. Query 8 we use join + aggregation + filter join condition and it face with memory allocation error again:
9. Query 9th and 10th and 11th query we face with the same error:
Clickhouse Running Out of Memory
The result of this test confirms a benchmark between Singlestore and Clickhouse we did last year and what was said earlier. Clickhouse performs very well on single tables. However, Clickhouse query performance degrades when you start joining tables.
Evaluating ClickHouse with Aiven Services
What is Aiven?
Aiven is a managed cloud database service of the next generation that houses your software infrastructure services. Customers’ peace of mind, high fault resilience, and advanced features at competitive prices are their primary priorities.
Aiven provides addional high quality managed services such as:
MySQL and more
Depending on the requirements of the customer, a selection of plans with varying features, performance, and HA characteristics are available to the user. Aiven is a self-healing service that takes care of your database’s health and backups automatically. Aiven also takes care of software upgrades on your behalf, ensuring that you always have access to the most recent version of your database software.
To evaluate Clickhouse services on Aiven, we followed the same steps as we have with ClickHouse Cloud.
Installing ClickHouse Development Service on Aiven:
A managed cloud data warehouse called Aiven for ClickHouse® is based on the open-source ClickHouse, which is a fast, resource-efficient columnar database that lets you create real-time analytical data reports with advanced SQL queries.
CreatING a new ClickHouse service in Aiven:
1. Open the Aiven web console and sign in.
2. Click “Create a new service” on the Services page. A new page with the available service options appears as a result.
3. Choose the region and cloud provider where you want to run your service.
Note: For the same service, prices may vary between providers and locations. You can see the prices for the options you’ve chosen in the service summary on the right side of the console.
4. Choose a service plan package. Your service’s memory, CPU, and disk resources are all determined by this, as are the number of servers.
Please note that Aiven has service plan for ClickHouse that starts at $200/month (Hobbyist) and $520/month for Startup. These plans use smaller ClickHouse instance size, so you will need to test performance to figure out which plan is best for you.
5. Enter a name for your service. To distinguish it from other services, you can enter a more recognizable name instead of the random name that is provided by default.
6. Under the summary on the right side of the console, select Create Service. You are brought back to the Services view by this. A status indicator indicates that your brand-new service is currently being created.
7. Select the service’s name title. The service’s Overview page opens.
Your service’s connection parameters, current status, and configuration options are all displayed in this view.
While the service is being created for you, the status is Rebuilding. The status changes to Running when the service is finished. Services start up in a few minutes on average, but performance varies by cloud provider and location, and it may take longer in some cases. As you can see in the screenshot above, you have access to integrations, metrics, logs, DB and Tables, user roles, … in 1 tab which make the access to your ClickHouse instance configuration and feactures easy.
Loading Data into Aiven Clickhouse:
First, we need to create our tables in Aiven, The steps to do the work are very simple and clear, just create your database first in the database & tables section, then create tables one by one in the Query Editor.
1. Creating database:
create table tpch ;
2. Creating tables:
We use the same script used to create the tables.
3. Load data into Aiven Clickhouse:
Since our data is stored in S3, we need to upload our data into our local server. Because Aiven does not support S3 at this time (they will be adding this feature soon ) we had to upload all the data in our server and inserting our data into ClickHouse via Clickhouse-Client:
gunzip -c lineitem.tbl.gz | clickhouse-client
–query “SET format_csv_delimiter = ‘|’; INSERT INTO tpch.lineitem FORMAT CSV”
Note: for running multi query into ClickHouse you need to include this multiquery command as well.
Performing the different queries in Aiven Clickhouse:
1. The 1st query took 0.054s to execute.
2. The 2nd query took 11.773s to execute.
3. The 3rd took 58.711s to execute.
4. the 4th query took 50.631s to execute.
5. the 5th query took 53.746s to execute.
6. the 6th query took 39.75s to execute.
7, 8, 9, 10, 11: All queries No. 7-8-9-10-11 had a memory problem and were not executed.
Clickhouse vs Aiven Query performance Results
Note : All queries were executed both in ClickHouse Cloud and Aiven in a similar instance size.
the instance size is :
16 GB RAM
ClickHouse Cloud vs. Aiven ClickHouse Conclusion
We appreciate the ease to load data in ClickHouse Cloud from S3 object storage. Aiven should be supporting that feature soon. There is clear evidence that ClickHouse Cloud queries ran faster that ClickHouse Aiven. We did not perform any tuning on either Cloud service. Some additional tunning might be required. Aiven let us know that the configuration we selected uses network attach disk (EBS), which is definitely going to be slower than ClickHouse, which uses a massive caching layer on top of storage.
In both cases however, Clickhouse was unable to run query 7,8,9,10, and 11. This clearly shows some ClickHouse limitation when it comes to performing some aggregated queries and joining multiple large tables.
ClickHouse Cloud vs. Aiven ClickHouse UI Ease of setup:
Both ClickHouse Cloud and Aiven ClickHouse offer easy setup and user-friendly interfaces, but there are some differences between the two.
ClickHouse Cloud is a fully managed service that provides a simple and intuitive interface for managing your ClickHouse instances. It offers an easy setup process with a quick start guide and automated deployment. The UI is user-friendly and provides a dashboard to manage your instances, monitor usage and query performance, and perform backups and restores.
Aiven ClickHouse, on the other hand, is also a managed service that offers a web-based interface for managing your ClickHouse instances. The setup process is easy, and Aiven offers automated deployment and scaling of your instances. The UI is user-friendly and offers a dashboard to monitor your instances, configure settings, and manage backups and restores.
Both services offer similar features and ease of setup, but ClickHouse Cloud may be a better choice for those who want a more streamlined experience and do not need as much control over their instances. Aiven ClickHouse may be a better choice for those who want more advanced customization options and control over their instances. Ultimately, the choice between the two will depend on your specific needs and preferences.
ClickHouse Cloud vs. Aiven ClickHouse – Ease of Data Ingestion
Both ClickHouse Cloud and Aiven ClickHouse offer easy ways to load data into ClickHouse instances, but there are some differences between the two.
ClickHouse Cloud provides an easy-to-use interface for loading and exporting data into your instances, especially from S3. You can upload data files directly through the dashboard or use a variety of integrations, including JDBC, ODBC, and HTTP. ClickHouse Cloud also supports various data formats, including CSV, JSON, and Avro.
Aiven ClickHouse also offers an easy way to load data into your instances. You can upload data files through the web interface, use integrations such as JDBC, ODBC, or HTTP, or use a variety of data connectors such as Kafka or PostgreSQL. Aiven also supports various data formats, including CSV, JSON, and Parquet.
Both services offer similar features for loading data, but the ease of use may vary depending on your specific use case. ClickHouse Cloud may be a better choice if you want a simple and straightforward way to load data into your instances, while Aiven ClickHouse may be a better choice if you need more advanced options or want to integrate with other data sources.
Ultimately, the choice between the two will depend on your specific needs and preferences for loading data into ClickHouse.
ClickHouse Cloud vs. Aiven ClickHouse – Support:
Both ClickHouse Cloud and Aiven ClickHouse offer strong customer support options to help users resolve any issues they may encounter.
ClickHouse Cloud offers a variety of support options, including email support, a knowledge base, and community forums. ClickHouse Cloud also provides users with access to a dedicated support team to help troubleshoot issues and provide guidance. Additionally, ClickHouse Cloud provides proactive monitoring and maintenance to ensure the performance and reliability of your instances.
Aiven ClickHouse also offers a variety of support options, including email support, 24/7 chat support, and community forums. Aiven also provides users with access to a dedicated support team to help resolve any issues that arise. Additionally, Aiven offers proactive monitoring and alerting to help prevent issues and ensure the performance of your instances.
Both services provide strong customer support options, but the level of support may vary depending on your specific needs and the plan you choose. ClickHouse Cloud may be a better choice if you need more personalized support and prefer email support, while Aiven ClickHouse may be a better choice if you need more immediate support and prefer chat support.
Ultimately, the choice between the two will depend on your specific support needs and preferences.
ClickHouse Cloud vs. Aiven ClickHouse – Price:
ClickHouse Cloud and Aiven ClickHouse offer different pricing models, with varying features and costs.
ClickHouse Cloud offers a pay-as-you-go pricing model, with prices based on the resources you consume, including CPU, memory, and storage. ClickHouse Cloud offers a free trial with $200 in credit to use within 30 days. After the free trial, pricing varies based on usage and the number of instances you deploy.
Aiven ClickHouse offers a similar pay-as-you-go pricing model, with prices based on the resources you consume, including CPU, memory, and storage. Aiven ClickHouse also offers a free trial with $300 in credit to use within 30 days. After the free trial, pricing varies based on usage and the number of instances you deploy. Aiven ClichHouse cheapest plan starts at $200/month, a great way to get your feet with ClickHouse without breaking the bank.
Overall, the pricing of both services is relatively comparable, with both offering similar pricing structures based on resource consumption. The choice between the two may depend on your specific needs and preferences for pricing, as well as the features and services included with each plan.
It’s worth noting that additional costs may apply for certain features, such as data transfer or backups, so it’s important to carefully review the pricing details for each service before deciding.
Regardless, unless you have a Clickhouse DBA on staff “database as a service” is the solution to go. Managing databases yourself plus infrastructure costs is not worth the time and money unless you are a large corporation and Cloud solution is more expensive than running your own cluster.
Clickhouse supports lots of features : External tables, time series data, real time analytics, and more. I’m sure many features will be added.
In the end, columnar databases are the best engine for analytical queries. Whether you decided to run Clickhouse on a single instance or multiple nodes it’s up to you. Just make sure to test your query performance before you go into Production.
Published On : 05/22/2023