What is Big Data and what does it matter? How will Big Data improve your business and impact your life? Well, it’s all about capturing, analyzing, and driving efficiencies. This extreme focus on data science has yielded a new wrinkle in “datalogy,” spawning an emerging academic field that requires new tools, new skill sets, statistical modelers, text-mining professionals, and people who specialize in sentiment analysis.
The Good: There is time still to learn about Big Data and apply the Four V’s of Big Data: Volume, Velocity, Variety, and Veracity! (Source: IBM)
Volume: Enterprises are awash with ever-growing data of all types, easily amassing terabytes—even petabytes—of information. Turn 12 terabytes of Tweets created each day into improved product sentiment analysis. Convert 350 billion annual meter readings to better predict power consumption
Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value. Scrutinize 5 million trade events created each day to identify potential fraud. Analyze 500 million daily call detail records in real-time to predict customer churn faster.
Variety: Big data is any type of data – structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together. Monitor 100’s of live video feeds from surveillance cameras to target points of interest. Exploit the 80% data growth in images, video and documents to improve customer satisfaction.
Veracity: 1 in 3 business leaders don’t trust the information they use to make decisions. How can you act upon information if you don’t trust it? Establishing trust in big data presents a huge challenge as the variety and number of sources grows.
The Bad: Coping with the Learning Curve and Getting the Big Wheels Turning. In July of 2012, the University of Virginia’s Jefferson Trust gave their president Teresa Sullivan $29 million to fund one of her newest initiatives, which is to establish new courses and curricula for the study of Big Data for its students to “handle the analysis and translation of massive amounts of data into creative opportunities for business and research.” The University of Louisiana, George Mason, and many other universities are following suit. But will their curriculum draw the combination left-brain and right-brain talent from the mathematic and actuarial minded students?
The Ugly: Big Shortages of Big Data Talent! By 2018, 140,000 to 190,000 people, as well as 1.5 million managers and analysts — with the know-how to use big data, will be needed! IT Managers will need to manage 100+ servers.
First step: Start collecting data! Worry how you will use it to your best effect later. You may not know what you are looking for until you cull the data (Jack Norris). Think in simple terms of capture, analyze, and driving efficiencies.
Just to see where you fit in on the learning curve, try a few qualifying questions.
What is a petabyte? 1 Quadrillion Bytes
How many zeroes in a Quadrillion? (15)
What is a Hadoop? An open-source distributed data processing platform.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures (hadoop.apache.org).