Data: The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.
Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently. Big data is also a data but with huge size.
Following are some of the Big Data examples-
1. The New York Stock Exchange is an example of Big Data that generates about one terabyte of new trade data per day.
2. The statistic shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day. This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc.
3. A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many thousand flights per day, generation of data reaches up to many Petabytes.
Concepts
Doug Laney introduced this concept of 3 Vs of Big Data, viz. Volume, Variety, and Velocity.
Volume refers to the amount of data that is being collected. The data could be structured or unstructured.
Velocity refers to the rate at which data is coming in.
Variety refers to the different kinds of data (data types, formats, etc.) that is coming in for analysis.
Over the last few years, 2 additional Vs of data have also emerged – value and veracity.
Value refers to the usefulness of the collected data.
Veracity refers to the quality of data that is coming in from different sources.
Ability to process Big Data in DBMS brings in multiple benefits, such as-
Businesses can utilize outside intelligence while taking decisions
Access to social data from search engines and sites like facebook, twitter are enabling organizations to fine tune their business strategies.
Improved customer service
Traditional customer feedback systems are getting replaced by new systems designed with Big Data technologies. In these new systems, Big Data and natural language processing technologies are being used to read and evaluate consumer responses.
Early identification of risk to the product/services, if any
Better operational efficiency
Big Data technologies can be used for creating a staging area or landing zone for new data before identifying what data should be moved to the data warehouse. In addition, such integration of Big Data technologies and data warehouse helps an organization to offload infrequently accessed data..
Managing datasets having terabytes of information can be a big challenge for companies. As datasets grow in size, storing them not only becomes a challenge but also becomes an expensive affair for companies.
To overcome this, companies are now starting to pay attention to data compression and de-duplication. Data compression reduces the number of bits that the data needs, resulting in a reduction in space being consumed. Data de-duplication is the process of making sure duplicate and unwanted data does not reside in our database.
2. Data security
Data security is often prioritized quite low in the Big Data workflow, which can backfire at times. With such a large amount of data being collected, security challenges are bound to come up sooner or later.
Mining of sensitive information, fake data generation, and lack of cryptographic protection (encryption) are some of the challenges businesses face when trying to adopt Big Data techniques.
Companies need to understand the importance of data security, and need to prioritize it. To help them, there are professional Big Data consultants nowadays, that help businesses move from traditional data storage and analysis methods to Big Data.
3. Data integration
Data is coming in from a lot of different sources (social media applications, emails, customer verification documents, survey forms, etc.). It often becomes a very big operational challenge for companies to combine and reconcile all of this data.
There are several Big Data solution vendors that offer ETL (Extract, Transform, Load) and data integration solutions to companies that are trying to overcome data integration problems. There are also several APIs that have already been built to tackle issues related to data integration.
Impact of Big Data
Recommendation Engines
– Amazon, Netflix, Spotify
– Recommendations based on customer preferences and historical behavior.