What’s Big Data? And what’s so big about it?

Why Big Data? And what’s so big about it?

At first glance at the term, you may conclude its definition, and you may go “Hey, it definitely isn’t what it sounds like”.

Actually, it is exactly what it sounds like.

Let’s just take a quick reminder of what “data” is:
Data is synonymous with information. In computer science, data is a representation of information, and it can have many forms and structures (e.g.: tables, trees, graphs, etc….).

So what exactly is Big Data?

Like we said, it is exactly what it sounds like, Big Data is a collection of very large and complex data sets, so large that the traditional methods of data processing like database management systems and file systems isn’t just doing it anymore.

How big is it, you might ask?

Well, let’s just say your hard drive is of size 1 Terabytes (1000 Gigabytes), if you’re one of the majority of people who uses the PC just for browsing the internet, checking multimedia ,or playing video games, then you’ll be really satisfied by how much free space you have now on your hard drive and how pretty much you will never delete any data on it.

Well, Big Data may reach to the size of Exabytes, which is 1 billion Gigabytes.
As of 2012, 2.5 exabytes of data are created every day, a size that even the most advanced of info management systems weren’t designed to handle.

Popular Open Source Tools for Big Data

Due to the incredible success of the architecture of MapReduce, an implementation of its framework was adopted by an Apache open source project named Hadoop.

1- Big Data Analysis Platforms and Tools:
Hadoop, MapReduce, GridGain, Storm
2- Databases/Data Warehouses
Cassandra, HBase, MongoDB
CouchDB, Redis
3- Business Intelligence Talend
4- Data Mining RapidMiner/RapidAnalytics
Mahout, Orange
5- Big Data Search
Lucene, Solr

“Information is the oil of the 21st century, and analytics is the combustion engine.”

In 2004, Google published a paper describing a new process called MapReduce that provides a parallel processing model consisting of nodes in which the queries are split, distributed and processed (Map). The results are then gathered and delivered (Reduce).

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

Related articles


ButlerFM is an innovative Software as a Service (SaaS) solution designed to elevate janitorial operations for in-house providers, maintenance teams, and commercial cleaning businesses.

Read more
Contact us

Partner with Us for Comprehensive IT

We’re happy to answer any questions you may have and help you determine which of our services best fit your needs.

Your benefits:
What happens next?

We Schedule a call at your convenience 


We do a discovery and consulting meting 


We prepare a proposal 

Schedule a Free Consultation