BIG DATA AND HADOOP
Why Big Data? And what’s so big about it?
At first glance at the term, you may conclude its definition, and you may go “Hey, it definitely isn’t what it sounds like”.
Actually, it is exactly what it sounds like.
Let’s just take a quick reminder of what “data” is:
Data is synonymous with information. In computer science, data is a representation of information, and it can have many forms and structures (e.g.: tables, trees, graphs, etc….).
So what exactly is Big Data?
Like we said, it is exactly what it sounds like, Big Data is a collection of very large and complex data sets, so large that the traditional methods of data processing like database management systems and file systems isn’t just doing it anymore.
How big is it, you might ask?
Well, let’s just say your hard drive is of size 1 Terabytes (1000 Gigabytes), if you’re one of the majority of people who uses the PC just for browsing the internet, checking multimedia ,or playing video games, then you’ll be really satisfied by how much free space you have now on your hard drive and how pretty much you will never delete any data on it.
Well, Big Data may reach to the size of Exabytes, which is 1 billion Gigabytes.
As of 2012, 2.5 exabytes of data are created every day, a size that even the most advanced of info management systems weren’t designed to handle.
How to handle Big Data?
In 2004, Google published a paper describing a new process called MapReduce that provides a parallel processing model consisting of nodes in which the queries are split, distributed and processed (Map). The results are then gathered and delivered (Reduce).
Popular Open Source Tools for Big Data
Due to the incredible success of the architecture of MapReduce, an implementation of its framework was adopted by an Apache open source project named Hadoop.
1- Big Data Analysis Platforms and Tools
2- Databases/Data Warehouses
3- Business Intelligence
4- Data Mining
5- Big Data Search