Apache Cassandra is a distributed storage system for managing very large amounts of structured data. Cassandra provides highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes possibly spread across different data centers with small and large components fail continuously. Cassandra manages the persistent state in the face of the failures which drives the reliability and scalability of the software systems. Cassandra does not support a full relational data model because it resembles a database and shares many design and implementation strategies. In this paper, discuss an implementation of Cassandra as Hotel Management System application. Cassandra system was designed to run on cheap commodity hardware. Cassandra provides high write throughput and read efficiency.
Cassandra, Data model.
Apache Cassandra is an open source, distributed, highly available, decentralized, elastically scalable, fault-tolerant, consistent, column-oriented database. Cassandra’s distribution design is based on Amazon’s Dynamo and its data model on Google’s Bigtable. Cassandra was introduced at Facebook; it is now used at some of the most popular sites on the Web [1].Apache Cassandra is a type of NoSQL database designed to handle large amounts of data across many servers. This database provides high availability and no single point of failure.Some of the important points of Apache Cassandra: (1) It is scalable, consistent and fault-tolerant, (2) It is key-value as well as column-oriented database,(3) Its data model is based on Google’s Bigtable and distribution design is based on Amazon’s Dynamo, (4) Introduced at Facebook, it differs sharply from relational database management systems,(5) Cassandra implements a Dynamo-style replication model, also adds a more powerful “column family” data model, and (6) Cassandra is being used by some of the biggest companies such as Facebook, Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and more.Cassandra has become so popular because of its outstanding technical features. Given below are some of the features of Cassandra:
Relational Databases are also popular like NoSQL database. But it has various drawbacks. Typically address these problems in one or more of the following ways, sometimes in this order:
Throw hardware at the problem by adding more memory, adding faster processors, and upgrading disks. This is known as vertical scaling.
Like Cassandra it also supports ACID properties. ACID is an acronym for Atomic, Consistent, Isolated, Durable, which are the gauges we can use to assess that a transaction has executed properly and that it was successful:
Atomic
Atomic means “all or nothing”; that is, when a statement is executed, every update within the transaction must succeed in order to be called and another related update failed. The common example here is with monetary transfers at an ATM: the transfer requires subtracting money from one account and adding it to another account. This operation cannot be subdivided; they must both succeed.
Consistent
Consistent means that data moves from one correct state to another correct state, with no possibility that readers could view different values that don’t make sense together. For example, if a transaction attempts to delete a Customer and her Order history, it cannot leave Order rows that reference the deleted customer’s primary key; this is an inconsistent state that would cause errors if someone tried to read those Order records.
Isolated
Isolated means that transactions executing concurrently will not become entangled with each other; they each execute in their own space. That is, if two different transactions attempt to modify the same data at the same time, then one of them will have to wait for the other to complete.
Durable
Once a transaction has succeeded, the changes will not be lost. This doesn’t imply another transaction won’t later modify the same data; it just means that writers can be confident that the changes are available for the next transaction to work with as necessary.
A NoSQL database (also called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular relations used in relational databases. These databases are schema-free, support easy replication, have simple API, eventually consistent, and can handle huge amounts of data.
The primary objective of a NoSQL database is to have
NoSql databases use different data structures compared to relational databases. It makes some operations faster in NoSQL. The suitability of a given NoSQL database depends on the problem it must solve.
The design goal of Cassandra is to handle big data workloads across multiple nodes without any single point of failure. Cassandra has peer-to-peer distributed system, and data is distributed among all the nodes in a cluster [2].
4.1. Data Replication in Cassandra
In Cassandra, one or more of the nodes in a cluster act as replicas for a given piece of data. If it is detected that some of the nodes responded with an out-of-date value, Cassandra will return the most recent value to the client. After returning the most recent value, Cassandra performs a read repair in the background to update the stale values.
The figure 1 shows a schematic view of how Cassandra uses data replication among the nodes in a cluster to ensure no single point of failure. Cassandra uses the Gossip Protocol to allow the nodes to communicate with each other and detect any faulty nodes in the cluster.