A Forensic Perspective on Recovering Deleted Data from Big Data systems

In a rush? Jump to,

Synopsis

There is an increasing trend of companies moving away from investments in physical infrastructure in favor of cloud-based computing 1. This shift has had a huge impact on businesses and IT alike. Cloud computing delivers speed, agility, and and cost reduction to IT and other functional areas in the enterprise 2. Big Data systems takes advantage of commodity hardware that enable them to scale-out 3, employ parallel processing techniques, and non-relational data storage capabilities to process unstructured and semi-structured data and apply advance analysis and visualization to the data 4. Commodity Hardware refers to low-performance, inexpensive and easy to obtain hardware 5. The ability of Big Data to scale-out on commodity hardware makes it a good fit to for cloud computing for storing and processing data 6.

While Cloud computing and Big Data have their benefits to businesses they poses a unique challenge to traditional Digital Forensic techniques, specifically those involved with Data Acquisition. Data Acquisition is the task of collecting digital evidence from electronic media 7. Based on the Data Acquisition method used, an investigator may be able to retrieve deleted data from the storage media (for example in case of “disk-to-image”) 8. This deleted data may be potentially incriminating evidence against the suspect. Since in big data systems usually scale horizontally (scale-out) by adding more hardware and benefit from the cloud infrastructure, it would be impractical to use traditional Data Acquisition Techniques for Digital Forensic Analysis.

This paper explores the various technical as well as legal & contextual challenges around data acquisition in Big Data systems, while specifically focuses on the recovering deleted data.

Scope

Digital Forensics and its crossroads with Cloud computing, and Big Data are largely unexplored areas in research. This paper explores the possibility ofnbut it may be fruitful for an investigator to work at a different layer altogether – for example going through a suspects Dropbox; which would map at the SaaS layer of the cloud, and would require little or no knowledge of the underlying data representation and storage engine used by Dropbox on the part of the investigator. These SaaS service related investigation and recovery are beyond the scope of this paper.

Key Concepts

This section covers some key concepts and terminologies that are frequently associated with Cloud and Big Data systems.

Cloud SPI Model

At the end of the day cloud providers are offering it’s users a service, and SPI is an acronym for the most common cloud computing service models, namely, SaaS, PaaS, and IaaS 9.

SaaS

SaaS or Software as a Service, are applications designed for end users delivered over the web 10. Examples include Google Apps, Cisco GoToMeeting, Cisco WebEx 11.

PaaS

PaaS or Platform as a Service, is a set of tools and services designed to make coding and deployment of applications quick and efficient 12. Pivotal’s Cloud Foundry is an example of PaaS 13.

IaaS

IaaS or Infrastructure as a Service is the hardware and software that powers the servers, storage, network, and operating systems 14. Examples include Amazon’s AWS, Microsoft Azure, Google Compute Engine 15, Rackspace Managed Cloud 16.

Note however that many Cloud providers do offer multiple services and that can blur the lines between SaaS, PaaS and IaaS.

Cloud Models or Options

Depending on the type and sensitivity of the data; level of IT investment and budget, security and privacy concerns; presence of existing infrastructure like VPNs, an organization could choose between Public, Private or Hybrid cloud Models.

  • Public clouds are services offered off-site over the internet.

  • Private clouds are ones where services are maintained on a private network.

  • Hybrid includes a combination of Public and Private options with potentially multiple vendors involved 17.

NoSQL and Big Data

NoSQL stands for “Not Only SQL” 18, implying alternatives to or complementing RDBMS 19 and traditional SQL systems with a new breed of databases as suited for a specific requirement. NoSQL is one of the cornerstone groups of technologies of Big Data. This has lead to a new breed of databases to address challenges of traditional databases 20. Almost all Big Data systems sacrifice some of the features of traditional RDBMS systems for scale and performance. For example, traditional joins (and Foreign Keys) are not supported in most NoSQL systems; and also almost all NoSQL databases are not fully ACID 21 compliant.

Types of NoSQL Databases

While not a complete list most NoSQL Databases can be classified as one or more of the following types,

Key-Value Database

One of the simplest NoSQL databases that only allows users to read, add, update and delete a key-value pair. Examples include Amazon’s DynamoDB (not FOSS 22), Redis (data structure server), and Memcached (In memory distributed Cache Server).

Document Database

These databases work with “documents”, which could be in various formats like XML, JSON, BSON, etc. Examples include MongoDB, CouchDB, and OrientDB.

Column Family Store

Column Families are groups of related data that are often accessed together. The rows of data do not necessarily have the same columns. Some examples include, Apache Cassandra and Apache HBase.

Graph Database

Allows storage of Entities and the Relationships between them. Examples include, Neo4J, and Infinite Graph.

Deleting Data from NoSQL Big Data System

Data from a NoSQL Database can be deleted in many ways, it is difficult to prepare an exhaustive list since there is a huge variation in the way in which data is organized and stored in a NoSQL Big Data system. Here are some examples,

  1. Entire Database or Schema – Apache Cassandra has a concept similar to schemas called KeySpace 23.

  2. A table may be dropped (structure itself deleted) or all or fewer records may be deleted. In MongoDB the structure closest to a table is a Collection 24.

  3. Data could be overwritten by an update which could effectively delete the older data.

  4. Some systems allow deleting specific cells of a row, for example in Apache Cassandra is possible to do so 25.

  5. Errors and Disasters could cause a Data Loss.

There could be other forms of deleting data and each NoSQL Big Data system uses its own specific techniques. Many of these systems do not even have any documentation on how exactly the system deletes records internally. One such case was that of MongoDB – as a part of this research a ticket was opened on the team requesting clarification in documentation and standardization – https://jira.mongodb.org/browse/DOCS-5151. Not all possible cases of delete across Big Data systems are explored in detail in this paper.

Why is more research needed on recovering Data from Big Data systems?

Cloud computing posses some unique challenges that demands new techniques for Forensic Research. For example, in traditional disk acquisition, usually a computer, disk or cell phone is owned by a suspect (or can be traced to one); but a cloud is essentially a computing and storage “service” and could be shared by a large number of cloud customers and an even larger number of end users, especially in case of a Public Cloud. While a typical suspect storage media could run in several Terabytes; a cloud could store data in several Petabytes or even Exabytes. The physical node constituting the cloud could be geographically dispersed and physically acquiring them is unlikely to be a feasible option. These differences between a cloud and a traditional media make it necessary for more investment in research in these areas.

Forensic Recovery and Analysis Concerns

Technical Concerns

Size and Scale

Perhaps the obvious and greatest technical concern is the size and scale of data spread across a cloud. As discussed above the data could spread in Exabytes. To put this in perspective here’s an extract from a Facebook’s code blog from 2014,

At Facebook, we have unique storage scalability challenges when it comes to our data warehouse. Our warehouse stores upwards of 300 PB of Hive data, with an incoming daily rate of about 600 TB. In the last year, the warehouse has seen a 3x growth in the amount of data stored. Given this growth trajectory, storage efficiency is and will continue to be a focus for our warehouse infrastructure. 26

So searching through deleted data in this pile of data could be like searching for a needle across many haystacks!

Flexible Provisioning and Elasticity

Provisioning refers to deployment of a companies cloud computing strategy 27. In many public clouds, user “self-provisioning” can take place using the web application provided by the cloud provider 28.

Elasticity in the context of cloud computing refers to the system’s ability to adapt to workload by automatically provisioning and de-provisioning 29. This mean the number of nodes involved in the cloud could grow or shrink based on the load.

Now if this was a storage cloud and if data was deleted, and the node was de-provisioned – the node may provisioned by a different cloud customer. This would make it even more difficult to perform any recovery from the cloud in this case.

Encryption, Compression and Proprietary or Undocumented Encoding

Since public cloud storage moves data of the organization outside the safety of its firewall there is a greater concern for the data’s security and privacy. This encourages encryption of the data. Encryption would make recovery of deleted data far more difficult even if the encryption keys were available, especially if one was using third party scripts or writing tools to recover deleted data.

Another area which could complicate recovery, is compression. While it is certainly easier to recover data that is compressed than that is encrypted – the basic issue remains, if you are looking for a certain pattern in the data to indicate the record is deleted, that pattern will not be available if the data is either encrypted or compressed.

Finally, NoSQL systems are an evolving technology and there are a lot of FOSS and Proprietary implementations available. All have their own formats of storing and encoding data. Besides the database itself may not have its storage format compatible across versions. For example – while MongoDB has a clear specification that states that the data is stored in BSON format – there is little information available on how the data is marked for deletion. This has been discussed in detail in a later section.

Legal and other Contextual Concerns

Geo-Location Transparency and impact on Jurisdiction

The Cloud and it’s services are location transparent, meaning the location of the cloud is irrelevant and anyone will be able to tap into the power of the cloud from anywhere 30.

The EFF’s Workshop Report – Cloudy Jurisdiction sums up this problem accurately,

While the Internet is technically borderless, in reality, state actors impose their sovereignty onto online environments with increasing frequency. The operating of sovereignty over shared spaces can subject individuals to the laws of another country without any realization of having done so. This in effect transforms the surveillance efforts of one country into privacy risks for all the world’s citizens, as an interconnected network places their personal data at the whims of many states. The cloud, which by its nature exists in multiple jurisdictions at once, exacerbates these jurisdictional problems which are generally inherent in online interactions. 31

Multi-Tenancy and Innocent data

Multi-tenancy is an architecture in which a single instance of software serves multiple customers. In cloud computing for example a SaaS provider can run a single application on one database and server, while ensuring that each tenant’s data remains isolated and invisible to other tenants 32.

For example, each multi-tenant instance of Salesforce.com supports 5000 tenants, who share the same database schema 33.

Now if the same database schema is shared and by several corporate tenants and if an investigation involves recovering certain deleted entries there is a risk where the privacy of innocent data of other tenants who are not linked to the investigation but simply happen to share the database schema.

The Good News – How Forensics could benefit from Big Data?

While Big Data and NoSQL technologies do bring in new challenges they also bring in new opportunities for investigators to recover and analyze deleted data.

Support for Version-ing

Some Big Data systems support version-ing – which mean overwritten data can be accessed in certain cases if version support was configured in those cases. For example, Apache HBase an Apache Hadoop NoSQL Database supports “versions” for data stored in it. This feature of Apache HBase is explored in detail in a later section.

Soft Deletes

Delete tends to be expensive and complicated in Big Data systems that have data replicated for High Availability (HA) 34. Usually when a delete on data is requested the big data system only marks data for deletion instead of actually deleting them immediately. This means there is a window of opportunity to recover deleted data before it is permanently deleted. For example,

  • MongoDB which marks the data with \xEE\xEE\xEE\xEE at the start of the document (Explored in detail later).

  • Apache Cassandra which has an elaborate process which actually deletes the data permanently at a point after the data is tomb-stoned (logically deleted) 35.

Both the above cases are explored in detail in a later section, however it is worthwhile noting that the “how” a NoSQL system marks data for deletion is not documented in detail and which is due to the fact that the majority of the user base (Developers and DBAs) do not need to know the internat representation of data and how it is marked for deletion. The soft delete technique and marking for delete will be different from system to system and even withing the same NoSQL Big Data system across versions.

Replication

Most popular Big Data systems support some form of High Availability and Fault Tolerance by replicating the data across multiple nodes. From a forensic recovery standpoint there is a good chance that a certain node will be available for investigation even when some have been completely destroyed or wiped out.

Examples,

  • HDFS (Hadoop Distributed File System) the core component of Hadoop on which HBase is based supports Data Replication for fault tolerance 36.

  • Apache Cassandra supports a concept of “Replication Factor” and “Consistency Level” to adjust the level of HA and Fault Tolerance needed by an application 37.

  • MongoDB supports a concept of “Replica Set” to replicate data 38. These Replica Sets support a feature of “slave delay” which deliberately makes a slave node lag behind the primary node 39. This could be beneficial in a forensic scenario since deletes in the primary node will not propagate slave node with a save delay configured until the configured time elapses.

Benefiting from technical advancement

While the technological changes in the area of Cloud, Big Data, NoSQL and related technologies could make it very difficult for Forensic Examiners and Digital Investigators to catch-up. Instead by embracing these very technologies would enable investigators solve complex forensic problems with ease. For example Digital Forensics analysis requires indexing data – big data has seen huge improvements in ability to index large amounts of data – for example Google’s pursuit to index the internet for its search engine. One example where a forensic tool has made good use of this technical progress to the benefit of Forensic experts is Autopsy using Apache Solr internally for indexing its findings from the disk it analyzes. According to the Apache Solr Website,

“ Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated fail-over and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world’s largest internet sites. 40

So here we see a tool used by large internet sites is being put to use to index the findings from a suspect system.

Recover Deleted Data – A Deep Dive Exercise

For this paper an attempt has been made to research and deep dive into 3 specific cases,

  • Recovering Overwritten Data in version-ed Column Family in Apache HBase.

  • Partial Recovery of Deleted Data from Apache Cassandra along with Delete Meta-Data.

  • Fully recover Deleted Document from MongoDB.

Needless to say, these aren’t the only means of data being deleted or recovered nor are these the only NoSQL based Big Data systems in use.

Note that the following details are more technical in nature and assume a certain basic understanding of the underlying systems. This section does not cover the basics and only focuses on the areas that further the interest of this paper and related research.

Recovering Overwritten Data in version-ed Column Family in Apache HBase

HBase supports version-ing of data, here is an example (adapted from the one in the MapR HDP-100 Online Course) of creating a table with a column family that supports up to 3 versions,

Enter the HBase shell in the virtual machine,

hbase shell

To create a table with column family that supports up to 3 versions enter the following command,

create ‘demo_voter_table’, {NAME => ‘cf1’}, {NAME => ‘cf2’, VERSIONS => 3}, {NAME => ‘cf3’}

Obviously for a forensic analysis you would hope that the column family has been configured to support versions and not create a table or a column family.

Now to read the version-ed from the above table for column family 2 (prefixed with cf2) for record with the key “john” enter the following command at the hbase shell,

get ‘demo_voter_table’, ‘john’, {COLUMN => ‘cf2:party’, VERSIONS => 2}

In this case if

Partial Recovery of Deleted Data from Apache Cassandra along with Delete Meta-Data

Apache Cassandra comes with a set of tools that are designed with DBAs and Developers in mind. These tools while not specifically designed for Forensic Investigators can be used to retrieve partially deleted data and even some metadata associated with the delete like the delete time.

One such tool is the tool is the sstable2json which ships with Apache Cassandra – an SSTable is an internal storage structure for the data in Apache Cassandra tables. The SSTables are stored in *Data.db files for example, trial_keyspace-trial_table1-ka-1-Data.db – this file represents the data stored in trial_table1 under trial_keyspace.

Interestingly this command was able to recover the key of the records that were deleted along with some metadata, here are some examples. Here a row was deleted,

Enter the command on Linux console,

./sstable2json /home/yazad/apache-cassandra-2.1.4-src/data/data/trial_keyspace/trial_table1-d7abfe70dab011e4bf20e3263299699d/trial_keyspace-trial_table-ka-1-Data.db

(Modify the path based on the location of the Cassandra data file of interest.)

Here is a sample output,

[

{“key”: “df1ae5ce-c44e-4ea2-82e1-a96c43fc9190”,

“metadata”: {“deletionInfo”: {“markedForDeleteAt”:1428056177336059,”localDeletionTime”:1428056177}},

“cells”: []}

]

[

{“key”: “dd0ffea9-f3ae-410c-bf3b-4bd2e27accc8”,

“cells”: [[“”,””,1428056116510066],

[“age”,”400″,1428056116510066],

[“fn”,”DDDDDDD”,1428056116510066],

[“ln”,”DDDDDD”,1428056116510066],

[“score”,”400.5″,1428056116510066]]},

{“key”: “ecc83774-895f-4d72-b49c-a82db922419f”,

“cells”: [[“”,””,1428056094582210],

[“age”,”400″,1428056094582210],

[“fn”,”EEEEEEE”,1428056094582210],

[“ln”,”EEEEEE”,1428056094582210],

[“score”,”400.5″,1428056094582210]]},

{“key”: “df1ae5ce-c44e-4ea2-82e1-a96c43fc9190”,

“metadata”: {“deletionInfo”: {“markedForDeleteAt”:1428056177336059,”localDeletionTime”:1428056177}},

“cells”: []}

]

As we can see in that the data with the key “df1ae5ce-c44e-4ea2-82e1-a96c43fc9190” was deleted at epoch time “1428056177336059” which is Fri Apr 03 2015 03:16:17 GMT-0700 (Pacific Standard Time).

So we can see we could recover the key of the data that was deleted. Now if the Key of the data was something more useful than a random UUID and cell values were deleted instead of an entire row, we can use the same command on the appropriate Data file to recover those details. Here is a sample output of a row with cell value deleted,

[

{“key”: “Mr ABC:XYZ”,

“cells”: [[“53dce6d0-dab1-11e4-bf20-e3263299699d:”,””,1428141587793487],

[“53dce6d0-dab1-11e4-bf20-e3263299699d:age”,”300″,1428141587793487],

[“53dce6d0-dab1-11e4-bf20-e3263299699d:score”,”300.5″,1428141587793487],

[“56017480-dab1-11e4-bf20-e3263299699d:_”,”56017480-dab1-11e4-bf20-e3263299699d:!”,1428141854500102,”t”,1428141854],

[“5a9ef170-dab1-11e4-bf20-e3263299699d:”,””,1428141599239194],

[“5a9ef170-dab1-11e4-bf20-e3263299699d:age”,”3000″,1428141599239194],

[“5a9ef170-dab1-11e4-bf20-e3263299699d:score”,”30000.5″,1428141599239194]]},

{“key”: “Mr PQR:LMN”,

“cells”: [[“7e3b7590-dab1-11e4-bf20-e3263299699d:”,””,1428141658984925],

[“7e3b7590-dab1-11e4-bf20-e3263299699d:age”,”3000″,1428141658984925],

[“7e3b7590-dab1-11e4-bf20-e3263299699d:score”,”30000.5″,1428141658984925],

[“82b9d1c0-dab1-11e4-bf20-e3263299699d:”,””,1428141666523689],

[“82b9d1c0-dab1-11e4-bf20-e3263299699d:age”,”300000″,1428141666523689],

[“82b9d1c0-dab1-11e4-bf20-e3263299699d:score”,”300000.5″,1428141666523689],

[“8732fdd0-dab1-11e4-bf20-e3263299699d:”,””,1428141674028327],

[“8732fdd0-dab1-11e4-bf20-e3263299699d:age”,”600000″,1428141674028327],

[“8732fdd0-dab1-11e4-bf20-e3263299699d:score”,”3600000.5″,1428141674028327],

[“8e04bfe0-dab1-11e4-bf20-e3263299699d:”,””,1428141685469457],

[“8e04bfe0-dab1-11e4-bf20-e3263299699d:age”,”60″,1428141685469457],

[“8e04bfe0-dab1-11e4-bf20-e3263299699d:score”,”3600000.5″,1428141685469457]]}

]

It would help understanding the structure of the table to understand the above output,

enter the CQLSH console,

./cqlsh

Switch to the schema of interest,

use trial_keyspace;

(use the appropriate keyspace name).

Describe the table of interest,

desc table trial_table1;

The output is as follows in the table being used in the researched,

CREATE TABLE trial_keyspace.trial_table1 (

fn text,

ln text,

doj timeuuid,

age int,

score float,

PRIMARY KEY ((fn, ln), doj)

)

As we can see the key columns are fn (First Name), ln (Last Name) and doj (Date of Joining).

Now if we take a look again at the output from SSTable – we notice that we recovered have partially recovered data – the first name, last name and date of joining of the cells that were deleted (age and score). The First name is “Mr. ABC”, the last name was “XYZ”, and the date of joining (in timeuuid format) is “56017480-dab1-11e4-bf20-e3263299699d”. The cell value was deleted at epoch time “1428141854500102” which is Sat Apr 04 2015 03:04:14 GMT-0700 (Pacific Standard Time).

An alternative would include writing a script or tool to read the Apache Cassandra commit logs which may have the entries for the original data bing inserted and subsequently deleted.

Fully recover Deleted Document from MongoDB

For recovering deleted documents from MongoDB there are various scripts available online unofficially. However since the delete mechanism may have been different across MongoDB versions the scripts don’t seem to work out of the box. One such script that I focused on was available on GitHub at this location – https://gist.github.com/egguy/2788955.

This script took a MongoDB database file and recovered the records in that file. The script could read the records in the raw database file however it wasn’t able to read deleted files. On closer inspection it was noted that the documents stored inside a MongoDB collection in BSON format begin with the size size of the document. In case of deleted records the first 4 bytes where the document size is stored is replaced by \xEE\xEE\xEE\xEE in the current version, 3.0.1 on Linux.

This means that the size of the record was not known and recovering requires some trial and error since we know the start of a deleted record (4 \xEEs) but we don’t know where the record ends. MongoDB has placed restrictions on the size of the records which has changed across versions – this restriction could be used to iteratively “guess” the size of the record and attempt to recover it.

Here is an example where the size of the document is 51 bytes and is known. The location of the deleted record is know at 20800 and an attempt is made to recover the data using Python.

>>> f = open(“/home/yazad/trial.2”, “rb”)

>>> x = f.read()

>>> x[20800]

‘\xee’

>>> x[20800:20800+51]

‘\xee\xee\xee\xee\x07_id\x00U\x19\xa6g\x9f\xdf\x19\xc1\xads\xdb\xa8\x02name\x00\x04\x00\x00\x00AAA\x00\x01marks\x00\x00\x00\x00\x00\x00@\x9f@\x00’

#replacing the first 4 chars with 51 – the size of the .

>>> y=”3\x00\x00\x00″+x[20804:20800+51]

>>> y

‘3\x00\x00\x00\x07_id\x00U\x19\xa6g\x9f\xdf\x19\xc1\xads\xdb\xa8\x02name\x00\x04\x00\x00\x00AAA\x00\x01marks\x00\x00\x00\x00\x00\x00@\x9f@\x00’

>>> import struct

>>> struct.unpack(“<I”, y[0:4])

(51,)

>>> struct.unpack(“<I”, y[0:4])[0]

51

>>> import bson

>>> bson.decode_all(y)

[{u’_id’: ObjectId(‘5519a6679fdf19c1ad73dba8′), u’name’: u’AAA’, u’marks’: 2000.0}]

The above script first reads a raw MongoDB file then replaces the \xEE’s with the number 51 in little endian hexadecimal form – “3\x00\x00\x00”.

Finally using the BSON API an attempt is made to read the deleted record and dump it on the screen. We see that the deleted data had an Id, a name (AAA) and marks (2000.0).

Conclusion

While there are many challenges with Cloud, Big Data and the new breed on NoSQL databases; it is still possible for Forensic Investigators to be able to recover deleted data from these database. Full data recovery may not always be possible along with the required metadata, but with deletes usually being soft there is certainly hope to be able to recover evidences in deleted data in Big Data systems.

Bibliography

1“Forecast 2015: IT Spending on an Upswing | Network World,” accessed April 8, 2015, http://www.networkworld.com/article/2842418/infrastructure-management/forecast-2015-it-spending-on-an-upswing.html.

2Ian Hancock, “Business Impacts of Cloud, Cloud Computing,” KPMG (Australia), accessed April 8, 2015, https://www.kpmg.com/au/en/topics/cloud/pages/default.aspx.

3Scale-out is to scale an application by adding more (typically commodity) hardware. Scale-up on the other hand is replacing existing hardware with a more powerful and usually centralized/non-distributed and expensive hardware.

4“Big Data Manifesto | Hadoop, Business Analytics and Beyond – Wikibon,” accessed April 8, 2015, http://wikibon.org/wiki/v/Big_Data:_Hadoop,_Business_Analytics_and_Beyond.

5“What Is Commodity Hardware? A Definition From the Webopedia Computer Dictionary,” accessed April 8, 2015, http://www.webopedia.com/TERM/C/commodity_hardware.html.

6“Big Data Cloud Computing & Cloud Database | Qubole,” accessed April 8, 2015, http://www.qubole.com/resources/articles/big-data-cloud-database-computing/.

7Amelia Philips, “Data Acquisition (Opening),” in Guide to Computer Forensics and Investigations, 4th Ed (Boston, MA: Cengage Learning, 2009), 100.

8Amelia Philips, “Data Acquisition (Determining the Best Acquisition Method),” in Guide to Computer Forensics and Investigations, 4th Ed (Boston, MA: Cengage Learning, 2009), 103.

9Margaret Rouse, “What Is SPI Model (SaaS, PaaS, IaaS)?,” Definition from WhatIs.com, accessed April 10, 2015, http://searchcloudcomputing.techtarget.com/definition/SPI-model.

10Rackspace Support, “Understanding the Cloud Computing Stack: SaaS, PaaS, IaaS,” Rackspace Hosting (Knowledge Center), accessed April 9, 2015, http://www.rackspace.com/knowledge_center/whitepaper/understanding-the-cloud-computing-stack-saas-paas-iaas.

11“IaaS, PaaS, SaaS (Explained and Compared),” Apprenda, accessed April 10, 2015, http://apprenda.com/library/paas/iaas-paas-saas-explained-compared/.

12Rackspace Support, “Understanding the Cloud Computing Stack: SaaS, PaaS, IaaS.”

13“Pivotal Cloud Foundry | Platform as a Service,” Pivotal, accessed April 10, 2015, http://pivotal.io/platform-as-a-service/pivotal-cloud-foundry.

14Rackspace Support, “Understanding the Cloud Computing Stack: SaaS, PaaS, IaaS.”

15“IaaS, PaaS, SaaS (Explained and Compared).”

16John Engates, “Where Does IaaS Fit Into The Managed Cloud?,” Rackspace Blog, accessed April 10, 2015, http://www.rackspace.com/blog/where-does-iaas-fit-into-the-managed-cloud/.

17“Comparing Public, Private, and Hybrid Cloud Computing Options,” For Dummies, accessed April 9, 2015, http://www.dummies.com/how-to/content/comparing-public-private-and-hybrid-cloud-computin.html.

18Pramod Sadalage, “NoSQL Databases: An Overview,” ThoughtWorks, accessed April 9, 2015, http://www.thoughtworks.com/insights/blog/nosql-databases-overview.

19RDBMS stands for Relational Database Management System which is a DBMS based on a relational model.

20Scott Hirleman, “Big Data or Big Hype or Why NoSQL Matters,” DataStax Blog, accessed April 10, 2015, http://www.datastax.com/2014/05/big-data-or-big-hype-or-why-nosql-matters.

21ACID stands for “Atomicity, Consistency, Isolation, Durability”and are properties of database reliability.

22FOSS stands for Free and Open Source Software.

23“Create a Keyspace and Table,” Planet Cassandra, accessed April 12, 2015, http://planetcassandra.org/create-a-keyspace-and-table/.

24“Glossary — MongoDB Manual 3.0.2 – Collection,” MongoDB, accessed April 12, 2015, http://docs.mongodb.org/manual/reference/glossary/#term-collection.

25“Deleting Columns and Rows,” DataStax CQL 3.0 Documentation, accessed April 12, 2015, http://docs.datastax.com/en/cql/3.0/cql/cql_using/use_delete.html.

26Pamela Vagata, Kevin Wilfong, “Scaling the Facebook Data Warehouse to 300 PB,” Facebook Code – Engineering Blog, accessed April 11, 2015, https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/.

27Vangie Beal, “What Is Cloud Provisioning?,” Webopedia Definition, accessed April 11, 2015, http://www.webopedia.com/TERM/C/cloud_provisioning.html.

28Margaret Rouse, “What Is User Self-Provisioning?,” Definition from WhatIs.com, accessed April 11, 2015, http://searchcloudprovider.techtarget.com/definition/User-self-provisioning.

29Herbst, Nikolas Roman, Samuel Kounev, and Ralf Reussner, “Elasticity in Cloud Computing: What It Is, and What It Is Not.,” ICAC, 2013, pp. 23–27.

30Paul T. Jaeger, Jimmy Lin, Justin M. Grimes and Shannon N. Simmons, “Where Is the Cloud? Geography, Economics, Environment, and Jurisdiction in Cloud Computing,” First Monday, accessed April 11, 2015, http://pear.accc.uic.edu/ojs/index.php/fm/article/view/2456/2171#p4.

31“Cloudy Jurisdiction: Addressing the Thirst for Cloud Data in Domestic Legal Processes,” Electronic Frontier Foundation, accessed April 11, 2015, https://www.eff.org/document/cloudy-jurisdiction-addressing-thirst-cloud-data-domestic-legal-processes.

32Margaret Rouse, “What Is Multi-Tenancy?,” Definition from WhatIs.com, accessed April 11, 2015, http://whatis.techtarget.com/definition/multi-tenancy.

33Sreedhar Kajeepeta, “Multi-Tenancy in the Cloud: Why It Matters,” Computerworld, accessed April 11, 2015, http://www.computerworld.com/article/2517005/data-center/multi-tenancy-in-the-cloud–why-it-matters.html.

34Computing components that continue to be available in the event of failure.

35“About Deletes,” DataStax Cassandra 2.0 Documentation, accessed April 11, 2015, http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_about_deletes_c.html.

36“HDFS Architecture Guide,” accessed April 12, 2015, https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#Data+Replication.

37“About Replication in Cassandra — Apache Cassandra 1.0.x Documentation,” accessed April 12, 2015, http://docs.datastax.com/en/archived/cassandra/1.0/docs/cluster_architecture/replication.html.

38“Replication — MongoDB Manual 3.0.2,” accessed April 12, 2015, http://docs.mongodb.org/manual/replication/.

39“replSetGetConfig — MongoDB Manual 3.0.2,” accessed April 12, 2015, http://docs.mongodb.org/manual/reference/command/replSetGetConfig/#replSetGetConfig.members%5Bn%5D.slaveDelay.

40“Apache Solr,” Apache, accessed April 12, 2015, http://lucene.apache.org/solr/.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s