What’s Big Data? And what’s so big about it?

1 May 2018

Why Big Data? And what’s so big about it?

At first glance at the term, you may conclude its definition, and you may go “Hey, it definitely isn’t what it sounds like”.

Actually, it is exactly what it sounds like.

Let’s just take a quick reminder of what “data” is:
Data is synonymous with information. In computer science, data is a representation of information, and it can have many forms and structures (e.g.: tables, trees, graphs, etc….).

So what exactly is Big Data?

Like we said, it is exactly what it sounds like, Big Data is a collection of very large and complex data sets, so large that the traditional methods of data processing like database management systems and file systems isn’t just doing it anymore.

How big is it, you might ask?

Well, let’s just say your hard drive is of size 1 Terabytes (1000 Gigabytes), if you’re one of the majority of people who uses the PC just for browsing the internet, checking multimedia ,or playing video games, then you’ll be really satisfied by how much free space you have now on your hard drive and how pretty much you will never delete any data on it.

Well, Big Data may reach to the size of Exabytes, which is 1 billion Gigabytes.
As of 2012, 2.5 exabytes of data are created every day, a size that even the most advanced of info management systems weren’t designed to handle.

Popular Open Source Tools for Big Data

Due to the incredible success of the architecture of MapReduce, an implementation of its framework was adopted by an Apache open source project named Hadoop.

1- Big Data Analysis Platforms and Tools:
Hadoop, MapReduce, GridGain, Storm
2- Databases/Data Warehouses
Cassandra, HBase, MongoDB
CouchDB, Redis
3- Business Intelligence Talend
Jaspersoft
4- Data Mining RapidMiner/RapidAnalytics
Mahout, Orange
5- Big Data Search
Lucene, Solr

“Information is the oil of the 21st century, and analytics is the combustion engine.”

Peter Sondergaard, Senior Vice President and Global Head of Research at Gartner, Inc.

In 2004, Google published a paper describing a new process called MapReduce that provides a parallel processing model consisting of nodes in which the queries are split, distributed and processed (Map). The results are then gathered and delivered (Reduce).

What do you think?

Show comments / Leave a comment

Revolutionizing Business Growth with Adltix: Smart Marketing Solutions for Full Control and 4x Sales Rate

Empower Your Business Through Comprehensive Campaign Management, Risk-Free Antifraud Models, Real-time Analytics, and Seamless Integrations – A Success Story in Collaboration with Top Corporations.

Case Studies

ButlerFM

ButlerFM is an innovative Software as a Service (SaaS) solution designed to elevate janitorial operations for in-house providers, maintenance teams, and commercial cleaning businesses.

Case Studies

Star Paper Mill’s: A Tale of Modern Tissue Manufacturing Unveiled

Embark on a journey through Star Paper Mill’s evolution as a modern tissue manufacturing powerhouse. Discover how strategic collaboration with cutting-edge technologies has propelled Star Paper Mill to the forefront of innovation, producing prime quality virgin tissue paper. Uncover the story of their success in this captivating case study.

Partner with Us for Comprehensive IT

We’re happy to answer any questions you may have and help you determine which of our services best fit your needs.

Your benefits:

What happens next?

We Schedule a call at your convenience

We do a discovery and consulting meting

We prepare a proposal