International Journal of Computer Networks & Communications (IJCNC)




Apache Cassandra is a distributed storage system for managing very large amounts of structured data. Cassandra provides highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes possibly spread across different data centers with small and large components fail continuously. Cassandra manages the persistent state in the face of the failures which drives the reliability and scalability of the software systems. Cassandra does not support a full relational data model because it resembles a database and shares many design and implementation strategies. In this paper, discuss an implementation of Cassandra as Hotel Management System application. Cassandra system was designed to run on cheap commodity hardware. Cassandra provides high write throughput and read efficiency. 


Cassandra, Data model.


  1. Introduction


Apache Cassandra is an open source, distributed, highly available, decentralized, elastically scalable, fault-tolerant, consistent, column-oriented database. Cassandra’s distribution design is based on Amazon’s Dynamo and its data model on Google’s Bigtable. Cassandra was introduced at Facebook; it is now used at some of the most popular sites on the Web [1].Apache Cassandra is a type of NoSQL database designed to handle large amounts of data across many servers. This database provides high availability and no single point of failure.Some of the important  points of Apache Cassandra: (1) It is scalable, consistent and fault-tolerant, (2)  It is  key-value as well as  column-oriented database,(3)  Its data model is based on Google’s Bigtable and distribution design is based on Amazon’s Dynamo, (4) Introduced at Facebook, it differs sharply from relational database management systems,(5) Cassandra implements a Dynamo-style replication model, also  adds a more powerful “column family” data model, and (6) Cassandra is being used by some of the biggest companies such as Facebook, Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and more.Cassandra has become so popular because of its outstanding technical features. Given below are some of the features of Cassandra:

  • Elastic scalability: Cassandra allows adding more hardware to accommodate more customers and more data as per requirement.
  • Always on architecture: Cassandra is continuously available for critical business applications that cannot afford single point of failure.
  • Fast linear-scale performance: Cassandra increases throughput as the number of nodes in the cluster is increased. Therefore it provides a quick response time.
  • Flexible data storage: Cassandra handles all possible data formats including: structured, semi-structured, and unstructured. It can dynamically provide changes to data structures according to user need.
  • Easy data distribution: Cassandra provides the flexibility to distribute data where user need by replicating data across multiple data centers.
  • Transaction support: Cassandra supports properties like Atomicity, Consistency, Isolation, and Durability (ACID).
  • Fast writes: Cassandra was designed to run on cheap commodity hardware. It performs fast writes and can store hundreds of terabytes of data, without sacrificing the read efficiency.
  • The rest of this paper is organized as follows. Section 2 discusses NoSQL database. Section 3 presents the Cassandra Architecture. Section 4 describes the data model of Cassandra. Section 5 describes the implementation details of Hotel Management System. The conclusion is given in Section 6.
  1. Existing Relational Database

Relational Databases are also popular like NoSQL database. But it has various drawbacks. Typically address these problems in one or more of the following ways, sometimes in this order:

Throw hardware at the problem by adding more memory, adding faster processors, and upgrading disks. This is known as vertical scaling.

  • When the problems arise again, the answer appears to be similar: now that one box is maxed out, you add hardware in the form of additional boxes in a database cluster. Now the problems are data replication and consistency during regular usage and in failover scenarios.
  • Now need to update the configuration of the database management system. This might mean optimizing the channels the database uses to write to the underlying filesystem. Then turn off logging or journaling, which frequently is not a desirable (or, depending on situation, legal) option.
  • Having put what attention into the database system, turn to the application. Then try to improve indexes. Also optimize the queries. But presumably at this scale weren’t wholly ignorant of index and query optimization, and already had them in pretty good shape. So this becomes a painful process of picking through the data access code to find any opportunities for fine tuning. This might include reducing or reorganizing joins, throwing out resource-intensive features such as XML processing within a stored procedure, and so forth. Of course, presumably doing that XML processing for a reason, so if it do somewhere, move the problem to the application layer, hoping to solve it there and crossing fingers that don’t break something else in the meantime.
  • Employ a caching layer. For larger systems, this might include distributed caches such as memcached, EHCache, Oracle Coherence, or other related products. Now we have a consistency problem between updates in the cache and updates in the database, which is exacerbated over a cluster.
  • It is possible to duplicate some of the data to make it look more like the queries that access it. This process, called denormalization, is antithetical to the five normal forms that characterize the relational model, and violate Codd’s 12 Commandments for relational data.

Like Cassandra it also supports ACID properties. ACID is an acronym for Atomic, Consistent, Isolated, Durable, which are the gauges we can use to assess that a transaction has executed properly and that it was successful:


Atomic means “all or nothing”; that is, when a statement is executed, every update within the transaction must succeed in order to be called and another related update failed. The common example here is with monetary transfers at an ATM: the transfer requires subtracting money from one account and adding it to another account. This operation cannot be subdivided; they must both succeed.


Consistent means that data moves from one correct state to another correct state, with no possibility that readers could view different values that don’t make sense together. For example, if a transaction attempts to delete a Customer and her Order history, it cannot leave Order rows that reference the deleted customer’s primary key; this is an inconsistent state that would cause errors if someone tried to read those Order records.


Isolated means that transactions executing concurrently will not become entangled with each other; they each execute in their own space. That is, if two different transactions attempt to modify the same data at the same time, then one of them will have to wait for the other to complete.


Once a transaction has succeeded, the changes will not be lost. This doesn’t imply another transaction won’t later modify the same data; it just means that writers can be confident that the changes are available for the next transaction to work with as necessary.

  1. Nosql Database

A NoSQL database (also called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular relations used in relational databases. These databases are schema-free, support easy replication, have simple API, eventually consistent, and can handle huge amounts of data.

The primary objective of a NoSQL database is to have

  • simplicity of design,
  • horizontal scaling, and
  • finer control over availability.

NoSql databases use different data structures compared to relational databases. It makes some operations faster in NoSQL. The suitability of a given NoSQL database depends on the problem it must solve.

  1. Cassandra Architecture

The design goal of Cassandra is to handle big data workloads across multiple nodes without any single point of failure. Cassandra has peer-to-peer distributed system, and data is distributed among all the nodes in a cluster [2].

  • All the nodes in a cluster play the same role. Each node is independent and at the same time interconnected to other nodes.
  • Each node in a cluster can accept read and write requests, regardless of where the data is actually located in the cluster.
  • When a node goes down, read/write requests can be served from other nodes in the network.

 4.1. Data Replication in Cassandra

In Cassandra, one or more of the nodes in a cluster act as replicas for a given piece of data. If it is detected that some of the nodes responded with an out-of-date value, Cassandra will return the most recent value to the client. After returning the most recent value, Cassandra performs a read repair in the background to update the stale values.

The figure 1 shows a schematic view of how Cassandra uses data replication among the nodes in a cluster to ensure no single point of failure. Cassandra uses the Gossip Protocol to allow the nodes to communicate with each other and detect any faulty nodes in the cluster.



%d bloggers like this: