Cassandra Interview Questions

Education is not limited to just classrooms. It can be gained anytime, anywhere... - Ravi Ranjan (M.Tech-NIT)

Cassandra Interview Questions

Cassandra Interview Questions  

A list of top frequently asked Cassandra interview questions and answers are given below.

1) What is Cassandra?

Cassandra is a one of the NoSQL distributed database system. It is an open source data storage system effectively designed to store and manages large volume of data without any failure.


2) In which language Cassandra is written?

Cassandra is written in Java. It is originally designed by Facebook consisting of flexible schemas. It is highly scalable for big data.


3) Who was the original author of Cassandra?

The original authors of Cassandra are Avinash Lakshman and Prashant Malik. It was initially developed at Facebook to power the Facebook inbox search feature.


4) Which query language is used in Cassandra database?

Cassandra introduced its own Cassandra Query Language (CQL). CQL is a simple interface for accessing Cassandra, as an alternative to the traditional Structured Query Language (SQL).


5) What are the benefits/ advantages of Cassandra?

Advantages/ Benefits of Cassandra:

  • Cassandra was designed to handle big data workloads across multiple nodes without any single point of failure.

  • Cassandra delivers near real-time performance simplifying the work of Developers, Administrators, Data Analysts and Software Engineers.

  • It provides extensible scalability and can be easily scaled up and scaled down as per the requirements.

  • It is fault tolerant and consistent.

  • It is a column-oriented database.

  • It has no single point of failure.

  • There is no need for separate caching layer.

  • It has flexible schema design.

  • It has flexible data storage, easy data distribution, and fast writes.

  • It supports ACID (Atomicity, Consistency, Isolation, and Durability) properties.

  • It has multi-data center and cloud capable.


6) How Cassandra stores data?

Cassandra stores all data as bytes. When you specify validator, Cassandra ensures that those bytes are encoded as per requirement and then a comparator orders the column based on the ordering specific to the encoding.


7) What was the design goal of Cassandra?

The main design goal of Cassandra was to handle big data workloads across multiple nodes without a single point of failure.


8) How many types of NoSQL databases? Give some examples.

There are mainly 4 types of NoSQL databases:

  • Document store types ( MongoDB and CouchDB)

  • Key-Value store types ( Redis and Volgemort)

  • Column store types ( Cassandra)

  • Graph store types ( Neo4j and Giraph)


9) What are the main components of Cassandra data models?

Following are the main components of Cassandra data model:

  • Cluster

  • Keyspace

  • Column

  • Column & Family


10) What are the other components of Cassandra?

Some other components of Cassandra are:

  • Node

  • Data Center

  • Commit log

  • Mem-table

  • SSTable

  • Bloom Filter


11) What is keyspace in Cassandra?

In Cassandra, a keyspace is a namespace that determines data replication on nodes. A cluster contains of one keyspace per node.


12) What are the different composite keys in Cassandra?

In Cassandra, composite keys are used to define key or a column name with a concatenation of data of different type. There are two types of Composite key in Cassandra:

  • Row Key

  • Column Name


13) What is data replication in Cassandra?

Data replication is an electronic copying of data from a database in one computer or server to a database in another so that all users can share the same level of information. Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. The replication strategy decides the nodes where replicas are placed.


14) What is node in Cassandra?

In Cassandra, node is a place where data is stored.


15) What do you mean by data center in Cassandra?

In Cassandra, data center is a collection of related nodes.


16) What do you mean by commit log in Cassandra?

In Cassandra, commit log is a crash-recovery mechanism. Every write operation is written to the commit log.


17) What do you mean by column family in Cassandra?

In Cassandra, a column family is specified as a container for an ordered collection of rows.


18) What do you mean by consistency in Cassandra?

Consistency in Cassandra specifies how to synchronize and up to date a row of Cassandra data and its replicas.


19) How many types of tunable consistency are supported in Cassandra?

It supports two consistencies: Eventual Consistency and Strong Consistency.

The eventual consistency is used when no new updates are made on a given data item, all accesses return the last updated value eventually. Systems with eventual consistency are known to have achieved replica convergence.

Cassandra supports the following conditions for strong consistency:

R + W > N

Here

N: Number of replicas

W: Number of nodes that need to agree for a successful write

R: Number of nodes that need to agree for a successful read

20) What is tunable consistency in Cassandra?

Tunable Consistency is a phenomenal characteristic of Cassandra which makes it a popular choice. Consistency refers to the up-to-date and synchronized data rows on all their replicas. Cassandra's Tunable Consistency facilitates users to select the consistency level best suited for their use cases.


21) What is the syntax to create keyspace in Cassandra?

  1. CREATE KEYSPACE  WITH   


22) What is a column family in Cassandra?

In Cassandra, a collection of rows is referred as "column family".


23) How does Cassandra perform write function?

Cassandra performs the write function by applying two commits:

  • First commit is applied on disk and then second commit to an in-memory structure known as memtable.

  • When the both commits are applied successfully, the write is achieved.

  • Writes are written in the table structure as SSTable (sorted string table).


24) What is memtable?

Memtable is in-memory/write-back cache space containing content in key and column format. In memtable, data is sorted by key, and each ColumnFamily has a distinct memtable that retrieves column data via key. It stores the writes until it is full, and then flushed out.


25) What is SSTable?

SSTable is a short form of 'Sorted String Table'. It refers to an important data file in Cassandra and accepts regular written memtables. They are stored on disk and exist for each Cassandra table.


26) How the SSTable is different from other relational tables?

SStables do not allow any further addition and removal of data items once written. For each SSTable, Cassandra creates three separate files like partition index, partition summary and a bloom filter.


27) What are the management tools in Cassandra?

DataStaxOpsCenter: It is an internet-based management and monitoring solution for Cassandra cluster and DataStax. It is free to download and includes an additional Edition of OpsCenter.

SPM: SPM primarily administers Cassandra metrics and various OS and JVM metrics. It also monitors Hadoop, Spark, Solr, Storm, zookeeper and other Big Data platforms besides Cassandra.


28) What are the main features of SPM in Cassandra?

The main features of SPM are:

  • Correlation of events and metrics

  • Distributed transaction tracing

  • Creating real-time graphs with zooming

  • Detection and heartbeat alerting


29) What is cluster in Cassandra?

In Cassandra, the cluster is an outermost container for keyspaces that arranges the nodes in a ring format and assigns data to them. These nodes have a replica which takes charge in case of data handling failure.


30) When can you use ALTER KEYSPACE?

The ALTER KEYSPACE can be used to change properties such as the number of replicas and the durable_write of a keyspace.


31) What is Cassandra-Cqlsh?

Cassandra-Cqlsh is a query language, used to communicate with its database. Cassandra cqlsh facilitates you to do the following things:

  • Define a schema

  • Insert a data and

  • Execute a query


32) What are the differences between a node, a cluster, and datacenter in Cassandra?

Node: A node is a single machine running Cassandra.

Cluster: A cluster is a collection of nodes that contains similar types of data together.

Datacenter: A datacenter is a useful component when serving customers in different geographical areas. Different nodes of a cluster can be grouped into different data centers.


33) What is Cassandra-CQL collection?

Cassandra-CQL collection is used to store multiple values in single variable. Cassandra facilitates you to use CQL collections in following ways:

  • List: List is used when the order of the data needs to be maintained, and a value is to be stored multiple times (holds the list of unique elements).

  • SET: SET is used for group of elements to store and returned in sorted orders (holds repeating elements).

  • MAP: MAP is a data type used to store a key-value pair of elements.


34) What is the use of Bloom Filter in Cassandra?

A bloom filter is a space efficient data structure that is used to find whether an SSTable has data for a particular row. In Cassandra a Bloom Filter is used to save IO when performing a KEY LOOKUP.


35) How does Cassandra delete data?

SSTables are immutable and cannot remove a row from SSTables. When a row needs to be deleted, Cassandra assigns the column value with a special value called Tombstone. When the data is read, the Tombstone value is considered as deleted.


36) What is SuperColumn in Cassandra?

In Cassandra, SuperColumn is a unique element containing similar collection of data. They are actually key-value pairs with values as columns.


37) What is the difference between Column and SuperColumn?

Difference between Column and SuperColumn:

  • The values in columns are string while the values in SuperColumn are Map of Columns with different data types.

  • Unlike Columns, Super Columns do not contain the third component of timestamp.


38) What is Hadoop, HBase, Hive and Cassandra? Specify similarities and differences among them.

Hadoop, HBase, Hive and Cassandra all are Apache products.

Apache Hadoop supports file storage, grid compute processing via Map reduce. Apache Hive is a SQL like interface on the top of Haddop. Apache HBase follows column family storage built like Big Table. Apache Cassandra also follows column family storage built like Big Table with Dynamo topology and consistency.


39) What is the usage of "void close()" method?

In Cassandra, the void close() method is used to close the current session instance.


40) Which command is used to start the cqlsh prompt?

The cqlsh command is used to start the cqlsh prompt.


41) What is the usage of "cqlsh-version" command?

The "cqlsh-version" command is used to provide the version of the cqlsh you are using.


42) Does Cassandra work on Windows?

Yes. Cassandra is compatible on Windows and works pretty well. Now its Linux and Window compatible version are available.


43) What is Kundera in Cassandra?

In Cassandra, Kundera is an object-relational mapping (ORM) implementation which is written using Java annotations.


44) What do you mean by Thrift in Cassandra?

Thrift is the name of RPC client which is used to communicate with the Cassandra Server.


45) What is Hector in Cassandra?

Hector was one of the early Cassandra clients. It is an open source project written in Java using the MIT license.