
Big data is defined as any kind of data source that has at least three shared characteristics:
- Extremely large Volumes of data (How much data)
- Extremely high Velocity of data (How fast that data is processed)
- Extremely wide Variety of data (The various types of data)
Big data is important because it enables organizations to gather, store, manage, and manipulate vast amounts of data at the right speed, at the right time, to gain the right insights.
With big data, it is now possible to virtualize data so that it can be stored efficiently and, utilizing cloud-based storage, more cost-effectively as well. In addition, improvements in network speed and reliability have removed other physical limitations of being able to manage massive amounts of data at an acceptable pace.
Data must be able to be verified based on both accuracy and context. An innovative business may want to be able to analyze massive amounts of data in real time to quickly assess the value of that customer and the potential to provide additional offers to that customer. It is necessary to identify the right amount and types of data that can be analyzed to impact business outcomes. Big data incorporates all data, including structured data and unstructured data from e-mail, social media, text streams, and more. This kind of data management requires that companies leverage both their structured and unstructured data.
The below figure illustrates that data must first be captured, and then organized and integrated. After this phase is successfully implemented, data can be analyzed based on the problem being addressed. Finally, management takes action based on the outcome of that analysis.
The cycle of Big Data Management
Big data analytics
The capability to manage and analyze petabytes of data enables companies to deal with clusters of information that could have an impact on the business. This requires analytical engines that can manage this highly distributed data and provide results that can be optimized to solve a business problem. Analytics can get quite complex with big data. For example, some organizations are using predictive models that couple structured and unstructured data together to predict fraud. Social media analytics, text analytics, and new kinds of analytics are being utilized by organizations looking to gain insight into big data.
Big data applications
Traditionally, the business expected that data would be used to answer questions about what to do and when to do it. Data was often integrated as fields into general-purpose business applications. With the advent of big data, this is changing. Now, we are seeing the development of applications that are designed specifically to take advantage of the unique characteristics of big data.
Some of the emerging applications are in areas such as healthcare, manufacturing management, traffic management, and so on. What do all these big data applications have in common is they rely on huge volumes, velocities, and varieties of data to transform the behavior of a market. In healthcare, a big data application might be able to monitor premature infants to determine when data indicates when intervention is needed. In manufacturing, a big data application can be used to prevent a machine from shutting down during a production run. A big data traffic management application can reduce the number of traffic jams on busy city highways to decrease accidents, save fuel, and reduce pollution.
The Big Data Journey
Companies have always had to deal with lots of data in lots of forms. The change that big data brings is what you can do with that information. If you have the right technology in place, you can use big data to anticipate and solve business problems and react to opportunities. With big data, you can analyze data patterns to change everything, from the way you manage cities, prevent failures, conduct experiments, manage traffic, improve customer satisfaction, or enhance product quality, just to name a few examples
How Big Is Big? How Big Will It Become?
The creation of data and the notion of Big Data have been enabled by the development of computers, the advancement of digital data over analog data, and the rate at which we process and store data. These are a function of technological innovation and the continued advances to create, process, and store data digitally. We should look at advancements in computing and data storage to glean some relative measure of growth and size in data. In the early years of technology, data was expensive to store, so analog data storage devices such as microfilm, photographs, and print media were used instead. It quickly became evident that recall and reuse of such analog-stored data was very laborious and generally hard to do systematically. As the cost of data storage decreased, and as computing platforms could create, process, and store more types of data, the use of and reliance on digital data proliferated. Digital data allowed for easy and inexpensive recall of data, reprocessing, and systemic searching of the data for specific nuggets of information.
The computing environment overcame many obstacles that had plagued society in terms of storing analog data. Data in a computing environment offers some powerful features with regards to data. The data is perfectly remembered. Copies of the data are easy to make. Data is highly accessible, and storage costs are dramatically lower than physical storage. These features mean that users of a computing environment could (and do) create more data. Interestingly, data creation worldwide seems to be increasing at a faster rate than data processing and data consumption. Our computing systems, mobile devices, and hosts of sensors in our daily life provide an amount of data creation that might even be considered an exhaust or byproduct of other primary activities. Given the reduction in data storage costs, it is convenient, tempting, and valuable to store all data created, even the incidental data that is the exhaust of our digital lives. Let’s examine data scale in terms of data creation, data storage, and data processing and relate it to the rate of data consumption by humans.
Big Data is no longer just a database or a set of digits, but can be, and will be, data involving all of the senses. As data becomes more frequently gathered, its dynamic nature becomes more valuable. The realm of Big Data provides more than just snapshots of data in time, but rather a constant and linked stream of data, more like a 360-degree movie of data capture.
Velocity: Leveraging Data within Its Window of Opportunity
The rise of digital platforms in the information age has brought a new norm for data capture. Just a few decades ago, data capture was largely limited to manual processes. For instance, stock prices might be available as quickly as a person could write them down or process them visually on a screen or monitor. With the advent of digital and automated processes for data measurement, the slow human element is no longer the bottleneck. Stock prices can be measured in milliseconds and presumably even more frequently than that. High velocity in financial markets is emblematic of the massive data creation possible by automated processes.
Data is being created more quickly than ever, owed in part to the advances in computing capabilities but also to the rise of social networks and the widespread distribution of information via mobile devices.
Velocity in Big Data is seen in many areas other than marketing. Operations are being transformed through automation and machine to machine (M2M) communication. Many complex systems are managed by algorithms and rules that respond to data that is automatically captured and created by machine sensors. This M2M form of data capture and consumption is limited by the processing speed of the processors.
Where Is Big Data Being Created?
With the recent development of large databases, the technical ability to amass large and complete data sets has been increasing. The functions of an enterprise are dramatically different in their ability to create data assets and the type of data assets created. The creation of large data sets and the deployment of analytics to mine that data for business insights come from four major domains: (1) Customers, (2) Operations, (3) Knowledge Sets, and (4) Mass Markets.