HomeAbout Me

Big Data for Dummies

By Cecilia Caselli
Published in Fundamentals
February 11, 2024
3 min read
Big Data for Dummies

Big data can be classified into two main categories:

  1. Structured data
  2. Unstructured data

Structured data offers a predetermined structure which makes them storable in databases. Being already “structured”, these data are collected in fixed “spots” making it easier to store them, and to run queries and analysis.

On the contrary, unstructured data don’t have a predetermined structure, which is the reason for them being stored as they are.

As of today, 80% of data flows are unstructured in the form of text, audio, video or images, but also as web pages, comments, and surveys.

Looking more closely at the example of a YouTube video, there are some metadata, such as the time of upload, the date of upload, the number of views (partial or complete), the number of likes and dislikes, etc. But the content within the title of the video, the description of the video and the video itself are not structured. They have a qualitative aspect that cannot be acquired only through numbers.

Cloud computing

All these flows of data require the use of the cloud. But what is a cloud and why is it important?

Let’s start from the basics. The term cloud can be thought of as an “ecosystem” where remote servers connect. We often name it Cloud computing because it offers different computing services. The user, indeed, does not need to install any software except for a good internet connection to get access to data.

Cloud computing works thanks to a technology known as virtualization, and it offers three type of services:

  1. IaaS (infrastructure as a service): In this case, the company outsources the IT infrastructure. IaaS offers indeed all the necessary hardware components such as RAM, space, server, network. As a result, the company doesn’t have to neither buy physical servers nor maintain IT infrastructure, which is borne by the cloud.

    Addresses to: Network Architects, IT administrators

    Examples: Google Compute Engine, AWS, Oracle

  2. PaaS (Platform as a service): In this case, the cloud platform provides a platform including all the (hardware and software) instruments needed to build a software. Hence, the software developer has “only” to develop the on-cloud application.

    Addresses to: Software developer

    Example: Google App Engine, Microsoft Azure

  3. SaaS (Software as a service): Amongst them, it’s the most popular one since it’s ready-to-use by users on the web by the companies.. Salesforce, Slack, Mailchimp, Gmail, Trello, and Office365 are all examples of SaaS applications.

Common questions

  • Which Cloud model should we choose? It depends. A company should evaluate how many “layers” could or would outsource when switching from an on-premise solution to a cloud solution.
  • What’s the main difference between the Cloud model and the traditional one? The Internet was always made of servers, clients, and the infrastructure behind it. In the traditional model, client-servers send requests to the servers, and the servers send back answers. In the Cloud Computing model cloud servers do much more than this, from executing programs to storing data on behalf of the users.
  • What advantages does Cloud Computing bring to the user? Primarily, a higher level of security and the opportunity to get access from any device anywhere in the world.
  • And to the businesses? The most evident ones are: less maintenance costs; better scalability; increased data security.
  • How do companies make use of Cloud Computing? The most common ones involve HR processes, managing the expenses, accounting softwares, project management, online meeting, CRM etc.

A brief history of cloud computing

In 1961, a person called John McCarthy delivered a speech at MIT saying that “Computing can be sold as a utility, like water and electricity”.

In the following years, American computer scientist J.C.R. Licklider - which is believed to be the father of cloud computing technology - imagined a world where connecting was possible regardless of the location. His intuition indeed brought to the creation of ARPANET (Advanced Research Projects Agency Network), which allowed the sharing of digital resources between computers that were physically distant.

As for the term cloud computing, it was coined in 1997 by the professor Ramnath Chellappa during a talk on the “new computing paradigm”.

Eventually, the first global company to implement Cloud computing services was Salesforce. Since then, Cloud computing services have been in huge demand, so that the big organizations like Amazon, Microsoft, and Google are all providing Cloud Computing services.

Sources


Tags

#bigdata#ai#data

Share

Previous Article
Introduction to web development
Cecilia Caselli

Cecilia Caselli

Topics

Fundamentals

Related Posts

Business Intelligence - Transform Data Into Information
January 17, 2024
2 min
About Me

Social Media