Forrester defines big data as “the techniques and technologies that make capturing value from data at extreme scales economical”.
Wikipedia defines it as “a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
The challenges include capture, curation, storage, search, sharing, analysis and visualization”. Many use the 3Vs to describe the characteristics of big data – Volume, Variety and Velocity. Basically Big Data refers to number crunching of epic proportion, accomplishing in minutes what may have taken weeks several years ago.
So what does this have to do with cloud computing? Certainly the notion of Big Data can exist without cloud computing. The question to ponder is whether the notion of Big Data would have been conceived without cloud computing. The average teenager spends an inordinate amount of time sharing thoughts (text), videos, and photos with their friends via FaceBook, Instagram, Google+, Twitter, Pinterest, etc. In the US in 2011, retail shopping websites earned $162 billion and the number of online shoppers is expected to grow from 137 million in 2010 to 175 million in 2016. The average number of Google searches per day went from 60 million in 2000 to 4.717 billion in 2011. These applications all exist in the cloud and their providers take Orwellian interest in every transaction that is made. How else would Facebook know who we might want to friend and Amazon knows what books to recommend we read?
So cloud computing is certainly an enabling technology for Big Data and has led to vast amounts of data being collected and stored. Add to this the vast amounts of data collected from other sources through applications and other devices designed to collect and transmit data. Now consider the fact that this data is being collected in many formats; text, video, still images, audio, sensor readings, GPS coordinates, radio frequency identification (RIF) readers, etc. are all thrown into the pot. Big Data is the tools and techniques that make it possible to process these large amounts of data in varying formats with lightning speed.
This brings us full circle back to cloud computing. A recent RedHat Report indicates that many businesses implemented cloud based environments last year as a way to manage the influx of structured and unstructured data. The cloud not only provides storage solutions for the vast amounts of data being collected but also provides enough computing power to make analysis and visualization of the data possible.