Incident Lake Interview: ClickHouse Edition (Part 1)

Why Did SIGQ Migrate from BigQuery to ClickHouse?

June 3, 2026

SIGQ Inc. provides "Incident Lake," an AI agent specialized in incident management. The company originally used BigQuery as its data platform, but migrated to ClickHouse about six months ago and is now running production workloads for multiple customers on ClickHouse.

In late April, a discussion took place at a location in Tokyo between Alexey Milovidov (hereinafter “Alexey”), co-founder and CTO of ClickHouse, and Takaaki Kanetsuki (hereinafter “Kanetsuki”), CEO of SIGQ Inc. This article explores how ClickHouse addressed the challenges SIGQ was facing and examines the background behind the migration.

This is the first part of a two-part interview series. For the second part, please see “ The Role of SRE in the AI Era: Insights from Data and Incident Management.”

How a C++ Engineer Created a Database to Analyze Some of the World’s Highest Volumes of Traffic

Q. Alexey, you’ve been developing ClickHouse since 2009. What initially led you to work with databases? Could you tell us about that, including the challenges you faced at the beginning?

Alexey: I wasn’t originally a database developer; I was a C++ engineer at a large company. I was in charge of a service that provided visitor analytics reports to website operators. That’s where I ran into a major challenge. On a platform generating hundreds of billions of events every day, we had to make reports virtually unlimited in their customization and provide users with the ability to retrieve any type of aggregate data they wanted. What’s more, we needed real-time performance—results had to appear instantly the moment a user defined a report.

Naturally, MySQL couldn’t handle this use case, so I began testing various databases. In the process, I learned about column-oriented databases and big data platforms, and compared existing products in terms of performance, compression rates, and cost. As a result, I began to think that I might be able to develop a column-oriented storage system specialized for aggregation, sorting, and filtering on my own. This led to the development of a prototype called “OLAPServer,” which I found performed better than existing systems.

*OLAP (Online Analytical Processing): A technology that enables the immediate multidimensional analysis of vast amounts of data stored in data warehouses and other systems. For example, it is used to analyze sales data across multiple dimensions, such as region and time.

Q. So that’s how it led to the current version of ClickHouse.

Alexey: At that point, I didn’t have a clear vision for ClickHouse, but I had a vague idea that I wanted to make the system more general-purpose so it could handle a wide variety of queries. I started working on it in my spare time, and eventually began working on it during office hours as well. The first version of ClickHouse was released in 2012. It was used not only by my team but also by other departments within the company, such as e-commerce and business analytics.Later, as I attended technical conferences, I realized there was enormous demand for this kind of database. Engineers at many companies were building their own prototypes similar to mine. I was convinced we should open-source it, so I persuaded the company to release it in 2016. This June marks the 10th anniversary of ClickHouse becoming open-source.

The BigQuery limitations Incident Lake was facing

Q. SIGQ recently migrated Incident Lake’s data platform from BigQuery to ClickHouse Cloud. What challenges did this migration address?

Kanetsuki: First, let me briefly introduce the key features of Incident Lake. Incident Lake is an incident management platform that integrates AI. While many existing tools are designed for developers, our tool is focused on managers who oversee incident response.

In the midst of an incident response, requests for troubleshooting and report generation fly back and forth, but these tasks can be extremely burdensome. That’s where Incident Lake’s AI agents step in to assist with tasks such as timeline generation and report creation. As a result, when an incident occurs, Incident Lake collects not only error logs but also communication logs from platforms like Slack and Teams.

Alexey: So, you’re saying it automatically generates a summary using the collected data.

Kanetsuki: That’s right. And the most critical factor in implementing this system is data freshness. If we can’t determine whether the current data is accurate or what the latest information is, the support provided by AI agents simply cannot be trusted.

We store two types of data: raw data, such as user messages, and vectorized data. We vectorize all generated data and reports to enable a search experience similar to semantic search.

Data freshness is critical for tracking the constantly evolving situation during incident response. Therefore, it is necessary to update the collected data in line with the latest timeline.

However, at the time, BigQuery had a limitation that prevented us from updating data immediately after insertion, which posed a challenge in maintaining data freshness. So, we decided to take this opportunity to migrate to ClickHouse Cloud.

Alexey: Many ClickHouse users are companies that have been using BigQuery for years. As with SIGQ, many are migrating due to latency issues. Additionally, since ClickHouse offers a wider range of capabilities, many are migrating because of its richer feature set. Another key factor is cost. BigQuery uses a pay-as-you-go pricing model where you’re charged per query and per volume of data processed.This is fine for occasional manual queries, but it’s not well-suited for users who execute a large volume of queries—especially AI agents. AI agents often execute 10 or 20 queries in a single request, which causes costs to skyrocket in BigQuery.

Q. How does Incident Lake utilize ClickHouse’s features? In particular, could you tell us about any features that Mr. Kanetsuki finds particularly valuable?

Kanetsuki: To give one example, there is ClickHouse’s approximate vector search feature. At Incident Lake, we use this feature to retrieve similar incidents.

When responding to incidents, it is crucial to determine whether similar cases have occurred in the past. However, finding similar incidents through simple vector search is actually very difficult. That is why Incident Lake leverages ClickHouse’s approximate vector search to provide a numerical representation of similarity. This similarity display and the similar incident search feature have been very well received by our customers.

Before migrating to ClickHouse, we had implemented our own similarity calculation, but it was a very resource-intensive process. By leveraging ClickHouse’s approximate vector search, we feel we’ve been able to significantly reduce the load while enhancing the value of Incident Lake.

Q. I heard that SIGQ was the first company in Japan to utilize ClickHouse’s approximate vector search feature.

Kanetsuki: Yes. We started using this new feature the very day it became available on ClickHouse Cloud. In fact, we had been eagerly awaiting its release—so much so that we even asked the ClickHouse team, “When will it be available?”

Alexey: We’re truly delighted to have customers like this. We’re always looking for technical partners who can test new features using real-world workloads. Mr. Kanetsuki’s enthusiasm may have been the deciding factor in our decision to release this feature on ClickHouse Cloud.

Why Do Some of the World’s Largest AI Companies Choose ClickHouse?

Q. ClickHouse is known as a high-performance analytical database capable of returning results in sub-second (less than one second) response times for queries on petabyte-scale data and complex telemetry. How is it able to handle such massive data volumes while maintaining real-time performance?

Alexey: First and foremost, with ClickHouse, there is no need to separate transactional (OLTP) data from analytical (OLAP) data. One of the common misconceptions about big data is the idea that transactional data and the fresh data used for analysis should be managed separately.

*OLTP (Online Transaction Processing): A technology designed to accurately read and write small amounts of data (transactions), such as bank transfers and orders on e-commerce sites.

For example, the idea that “all historical data should be stored in S3 or an RDBMS, and only a small amount of fresh data necessary for real-time analysis should be placed in the analytics database” is a common misconception. In reality, however, by combining multiple technologies, it is possible to achieve near-real-time data ingestion and sub-second response times for analytical queries. Specifically, this involves technologies such as column-oriented storage, which enables high-speed query processing, and the MergeTree data structure, which facilitates rapid data ingestion.

Furthermore, ClickHouse stores all data in shared storage and combines it with a hierarchical caching scheme. By placing caches on the local machine’s SSD and in memory, it delivers in-memory-level performance for frequently accessed data, NVMe SSD performance for the majority of data, and the cost efficiency of object storage for the system as a whole.

ClickHouse’s strength lies in its ability to provide all of this as a single, integrated system.

Q. Many of the world’s largest AI companies, such as OpenAI and Anthropic, use ClickHouse for observability. Why did they choose ClickHouse?

Alexey: The answer is simple. It’s because the massive amounts of data that AI companies need are beyond the capabilities of any other technology. In fact, we’ve tried many different technologies, but they all broke down. ClickHouse, however, worked surprisingly well, so as a result, they’ve continued to use it.

ClickHouse’s strength lies in its scalability. AI companies must handle tens of petabytes of data that come in every day. With ClickHouse, you can build clusters comprising thousands of nodes and execute distributed queries on them. You can also efficiently create text indexes for massive amounts of log data. ClickHouse handles these tasks far more effectively than existing systems.

In addition, its high data compression ratio is a key feature. Companies such as Comcast and eBay have achieved compression ratios of 20 to 30 times. Since other systems do not offer this level of storage efficiency, ClickHouse is the clear choice for environments that store tens of petabytes of data. Furthermore, when combined with object storage, it becomes a nearly perfect solution.

In fact, leading AI companies such as OpenAI, Anthropic, Tesla, and xAI are leveraging ClickHouse at scale for observability, analyzing fresh data at its natural scale.

Kanetsuki: That explanation really made sense to me. How the nodes scale up and how the data is distributed—once you understand that mechanism, it becomes intuitively clear why ClickHouse can continue to handle petabyte-scale data.

The future of databases lies in the integration of OLTP and OLAP

Q. Earlier, Alexey mentioned that “there is no need to separate transactional data from analytical data in ClickHouse.” In the era of AI and large-scale data infrastructure, do you think these two types of data will eventually be integrated? Please share your thoughts on the future of next-generation databases.

Kanetsuki: Actually, in addition to my business work, I’m also conducting research in the database industry, specifically on an approach called HTAP. In my previous job, I was involved in operating large-scale data analysis systems, but I struggled with latency management when transferring data from OLTP to OLAP. That’s why I became very interested in the HTAP approach, which eliminates the need for data integration altogether, and began my research.In March of this year, I also presented a paper at DOLAP, a conference held in conjunction with EDBT/ICDT (one of Europe’s most renowned database conferences).

*HTAP: A technology that runs both transactional processing (OLTP) and analytical processing (OLAP) on a single platform

The current challenge with HTAP is data synchronization. Internally, data written to the OLTP system is transferred to the OLAP system, but this process is resource-intensive, and maintaining the data flow is difficult. Even though we want to analyze the written data immediately, if query latency takes as long as an hour, we ultimately cannot grasp the latest situation.

Alexey: One of the challenges in building an HTAP database is that the internal data structures required for each workload are fundamentally different. An OLAP database requires columnar storage and sparse indexes on compressed blocks to enable high-speed processing. However, this data structure makes it difficult to efficiently read or update individual records. On the other hand, an OLTP database requires fine-grained indexes for each record and a data structure that allows records to be replaced quickly.

There are two approaches to this. One involves finding a compromise within a single data structure. In-memory databases are a prime example of this, but they have not been particularly successful in practice. This is because, when dealing with large-scale data, a unified data structure is not well-suited for analytical processing; it breaks down as soon as the data exceeds the capacity of memory.

Another approach involves synchronizing two different data structures quickly. This is currently the more common method. ClickHouse Cloud also employs this approach. Specifically, we provide both ClickHouse and ClickHouse Postgres on high-speed NVMe storage and keep them in sync. This allows ClickHouse to access data stored in Postgres with reasonable latency.

Kanetsuki: At Incident Lake, we also use both Postgres and ClickHouse. While we aren’t currently synchronizing the two, we’d like to consolidate as much data as possible into ClickHouse. Unique indexes on IDs and SQL compatibility are currently posing some challenges—do you have any plans to address these in future development?

Alexey: The answer is yes, and it is already in development. As for SQL compatibility, tests using SQL Logic Test have shown over 97% compatibility, indicating that it fully supports standard SQL features.

What Is Needed for a Data Infrastructure in the AI Era

What has become clear through our discussion so far is that SIGQ’s decision to migrate to ClickHouse was not driven solely by cost savings. The more fundamental motivation was to meet the data infrastructure requirements for incident management in the age of AI—requirements that aligned closely with ClickHouse’s history and technical characteristics.

In the second part, we will explore how incident management and SRE practices will evolve in the age of AI on this data infrastructure.

Share this article

Backspace key

List of Helpful Articles

Why Did SIGQ Migrate from BigQuery to ClickHouse?

How a C++ Engineer Created a Database to Analyze Some of the World’s Highest Volumes of Traffic

The BigQuery limitations Incident Lake was facing

Why Do Some of the World’s Largest AI Companies Choose ClickHouse?

The future of databases lies in the integration of OLTP and OLAP

What Is Needed for a Data Infrastructure in the AI Era

Recommended Articles

Can we develop our own incident response tools?

How to Build an Incident Management Workflow | A Practical 7-Step Guide from Detection to Resolution