What is Cassandra’s compaction strategy?
Cassandra Compaction is a process of reconciling multiple copies of data distributed across different SSTables. Cassandra performs compaction of SSTables as a background activity. Cassandra has to maintain fewer SSTables and fewer copies of each row of data due to compactions that improve its read performance.
Table of Contents
How do you trigger a compaction in Cassandra?
Process
- Update a table to set the compaction strategy using the ALTER TABLE statement.
- Change the compaction strategy property to SizeTieredCompactionStrategy and specify the minimum number of SSTables to trigger a compaction using the min_threshold CQL attribute.
What is anti-compaction in Cassandra?
Since SSTables can contain any range, we need to split the ranges that were actually fixed, this is called anti-compaction. It means that an SSTable is split into two: one containing repaired data and one containing unrepaired data.
Does Cassandra compress data?
Cassandra offers operators the ability to configure compression per table. Compression reduces the size of data on disk by compressing SSTable into a user-configurable compression chunk_length_in_kb .
When to use level compaction in Cassandra 1.0?
The tiered compaction strategy was introduced in Cassandra 1.0 to address the shortcomings of the size tiered compaction strategy for some use cases. Unfortunately, it is not always clear which strategy to choose. This post will provide some guidance in choosing one compaction strategy over the other.
How can TWCS be used in Cassandra 2.x?
Those SSTables will not be compacted together. TWCS can be used with Cassandra 2.x by adding a jar file. Compaction options are set at the table level through CQLSH. This allows each table to be optimized based on how it will be used. If no compaction strategy is specified, SizeTieredCompactionStrategy will be used.
Why do large partitions cause extra work in Cassandra?
Large partitions under STCS and LCS generate significant additional work during compaction. By spreading partition data across a number of windows or buckets, partitions can become significantly larger before having the heap and CPU impact on Cassandra during compaction that large partitions do today.
What is the JMX metric for Apache Cassandra?
Additionally, we contributed CASSANDRA-13015 to expose JMX metrics on failed compacts, as well as compacts that had to drop SStables due to limited disk space. This will help operators know exactly how much of an impact their disks have had on Cassandra’s ability to perform compaction.