Join Regular Classroom : Visit ClassroomTech

Big Data Basic Questions | Codewindow.in

Big Data Basic Questions

Data: The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.

 

Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently. Big data is also a data but with huge size.

Following are some of the Big Data examples-

1. The New York Stock Exchange is an example of Big Data that generates about one terabyte of new trade data per day.

2. The statistic shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day. This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc.

3. A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many thousand flights per day, generation of data reaches up to many Petabytes.

 

Concepts

Doug Laney introduced this concept of 3 Vs of Big Data, viz. Volume, Variety, and Velocity.

Volume refers to the amount of data that is being collected. The data could be structured or unstructured.

Velocity refers to the rate at which data is coming in.

Variety refers to the different kinds of data (data types, formats, etc.) that is coming in for analysis.

Over the last few years, 2 additional Vs of data have also emerged – value and veracity.

Value refers to the usefulness of the collected data.

Veracity refers to the quality of data that is coming in from different sources.

 

Advantages Of Big Data Processing

Ability to process Big Data in DBMS brings in multiple benefits, such as-

Businesses can utilize outside intelligence while taking decisions

Access to social data from search engines and sites like facebook, twitter are enabling organizations to fine tune their business strategies.

Improved customer service

Traditional customer feedback systems are getting replaced by new systems designed with Big Data technologies. In these new systems, Big Data and natural language processing technologies are being used to read and evaluate consumer responses.

Early identification of risk to the product/services, if any

Better operational efficiency

Big Data technologies can be used for creating a staging area or landing zone for new data before identifying what data should be moved to the data warehouse. In addition, such integration of Big Data technologies and data warehouse helps an organization to offload infrequently accessed data..

 
 

Challenges

1. Data growth

Managing datasets having terabytes of information can be a big challenge for companies. As datasets grow in size, storing them not only becomes a challenge but also becomes an expensive affair for companies.

To overcome this, companies are now starting to pay attention to data compression and de-duplication. Data compression reduces the number of bits that the data needs, resulting in a reduction in space being consumed. Data de-duplication is the process of making sure duplicate and unwanted data does not reside in our database.

2. Data security

Data security is often prioritized quite low in the Big Data workflow, which can backfire at times. With such a large amount of data being collected, security challenges are bound to come up sooner or later.

Mining of sensitive information, fake data generation, and lack of cryptographic protection (encryption) are some of the challenges businesses face when trying to adopt Big Data techniques.

Companies need to understand the importance of data security, and need to prioritize it. To help them, there are professional Big Data consultants nowadays, that help businesses move from traditional data storage and analysis methods to Big Data.

3. Data integration

Data is coming in from a lot of different sources (social media applications, emails, customer verification documents, survey forms, etc.). It often becomes a very big operational challenge for companies to combine and reconcile all of this data.

There are several Big Data solution vendors that offer ETL (Extract, Transform, Load) and data integration solutions to companies that are trying to overcome data integration problems. There are also several APIs that have already been built to tackle issues related to data integration.

 
 

Impact of Big Data

Recommendation Engines

– Amazon, Netflix, Spotify

– Recommendations based on customer preferences and historical behavior.

Virtual personal assistants

– Siri, Alexa, Google Now

Analysis of user data

– Clickstream analysis

Fraud detection

Intrusion detection

Science

Large Hadron Collider LHC (Genve)

Sloan Digital Sky Survey

DNA sequencing

Internet of Things (IoT)

 

Data sources

People-generated data

Machine-generated data

Business-generated data

Data Formats

Structured data

– organized, labeled, strict model

Unstructured data

– no organization (e.g. plain text, images, …)

Semi-structured data

– may have an organized structure, but no strictly-defined model (e.g. JSON, XML, log files, …)

Categories
Pages
Recent Posts