How Google Big Table Works Part 2

Google File System
How Google Big Table Works Part 1
August 15, 2020
Google File System
How Google MapReduce Works Part 1
August 15, 2020
Show all

How Google Big Table Works Part 2

Google File System

The Big Table applications 

This article is a sort of continuation of our previous article. We have discussed the theoretical and the technical aspects of Big Tables. What is Big Table, how do they function, what do they comprise of, and more. I would like to suggest that please refer the previous article if you are not aware of Big Tables. So in this article, we are going to discuss the famous applications that have been built or are being executed using Big Tables. As mentioned, Big Table is a Google Technology. So Google applications have been built using Big Table. Most of you might be using these applications, so don’t be surprised. Let us have a look at them!

Application Number 1: Google Analytics 

Google Analytics is a famous application used for analyzing web-traffic patterns. This application is very important when it comes to social media marketing and other online marketing. The application is widely used by Webmasters as well as others. Google Analytics has numerous features such as the total number of visitors count, the number of page views per URL, site-tracking reports, the percentage of users that made a purchase, the number of visits from a particular region or country, etc. For making this possible, usually, webmasters embed a JavaScript in their web pages. The application can also be initiated using other means. The program invokes every time someone visits the web page. Information such as the identifier and the type of data being fetched is stored. This data is summarized and made available to the webmasters.

There are 2 tables used by Google Analytics which are known as the raw click table and the summary table. Here, the raw click table maintains a row for each end-user session and every row name includes the website’s name and the time at which the user visited the website. Sessions that involve same website visit are contiguous and are sorted in a chronological order to avoid confusion. The table is compressed to 14% of its original size. The summary table has predefined summaries for every website. The information is sourced from the raw click table periodically using the MapReduce jobs. So what is MapReduce? We will discuss it in the next article. The throughput of GFS limits the system’s overall throughput and the summary table is compressed to 29% of its actual size.

Application Number 2: Google Earth

Most of us are aware of this application because we use it quite often. High-resolution satellite images of the world’s surface can be obtained using Google Earth through the web-based Google Maps interface as well as the Google Earth custom software. By panning and viewing, the user can annotate satellite images at different levels and with different levels of resolution. For making this possible, one table is just dedicated for pre-processing the data. Another set of tables is used for responding to the client’s requirement. A table is used for storing raw images and while preprocessing, the images are cleaned before it reaches the user. Being an image based application, the tables contain an approximate 70 terabytes of data. This is served from the disk. These images are compressed to the best possible means, thus, there is no need for Big Table compression.

In the Big Table, each row in the imagery table corresponds to the graphics segment and these rows are named. This ensures that adjacent geographic segments are stored side by side or nearby. The table also has a column family which keeps track of the data sources for each segment. The column family has many columns which are essential for raw data image. Here, every segment is built using few images and due to this, the column family is sparse. Here, MapReduce has a huge role to play. The preprocessing pipeline depends a lot on it for transforming data. Over 1 MB/sec of data is processed per tablet server. A table used for indexing data stored in GFS. This table is small, approximately 500 GB but it serves thousands of queries every second with low latency. Hence, this table is hosted on hundreds of tablet servers and has in memory column facilities.

Application Number 3: Google My activity 

This application is an optional service used for recording user queries and clicks across numerous Google properties like web search, images, news, videos, maps, etc. Search history can be referred for revisiting old queries and activities could be traced by asking for My activity results. Google delivers the result as per the Google usage patterns. Here, user data is stored in the Big Table. Each user is identified by a unique user id and is assigned a row. The row is named by the user id itself. The user sections are stored in a table and a column family is reserved for each action. For example, a separate column family for all web queries. Data element are used as per the Big Table timestamp which corresponds to the user action. The user profiles are generated through MapReduce on the Big Table. Further, there are numerous Big Table clusters on which the My Activity data is replicated. The clusters increase the availability and reduce latency from the clients. Initially, a client-side replication mechanism was built. This ensured consistency of all the replicas. But the current system uses a better replication subsystem which is built into the servers.

If you think about the design of My Activity storage, it is such that it allows other groups to add information per user. The main thing is that this information could be stored in their own columns and the system could be used by other Google properties. So these properties store per-user data like configuration options and settings. This is how we get customized settings. Obviously, sharing this table among many groups results in a large number of column families but to ease this, a simple quote mechanism was added to the Big Table. This quote system limits the storage consumption for any particular client when it comes to shared tables. Additionally, this mechanism also facilitates isolation between various product groups for per-user information storage.

Have you ever thought, “Is Big Table only limited to Google?” or “Is it available for everyone?”

Initially, Big Table was only limited to Google. But as we all know, Google promotes and tries to make day-to-day life simpler. Google experimented and implemented Big Table in many applications and came up with promising results. Now, the good thing is that on 6th of May, 2015, Google launched the public version of Big Table. In short, Big Table was made available with the name Cloud Big Table. Also, a few other names and services related to Cloud Big Table are now in existence. One of them is Google Cloud Data Store and is a part of the Google Cloud Platform.

Conclusion

Discussing real-time applications is fun because we can directly connect to our own experience. Especially, when the applications are widely used. So in this article, we discuss the three main Google applications that use Big Tables. Being a Google in-house technology, it is quite obvious that Google will first implement it in their own applications. So the first application is Google Analytics. This application is widely used by webmasters. The application gives a precise analysis of the web traffic. Here, we understand how raw click table and the summary table play an important role. The second application is Google Earth. Most of us are aware of it and have tried to see our home, how does it look from the above right? If you haven’t, try it right now! Here, we see how the raw set of stored images in the tables are displayed one after the other, giving us a real feel of watching the Earth from above.

After this, we discuss the third application. It is known as Google My Activity. By this application, the users get a private or personalized feel. Whatever you search, your preferences, and the recent searches, everything is analyzed by Google. This makes your search easier. We conclude our article by discussing a good thing that Big Table is now publicly available by the name Google Cloud Platform. With such a good, tested and implemented working technology, miracles are meant to happen. So let us make good use of it. Let us come up with something that could be helpful to us and to the society. Good Luck!

Here is the link to the previous article from this series.

Tao
Tao
Tao is a passionate software engineer who works in a leading big data analysis company in Silicon Valley. Previously Tao has worked in big IT companies such as IBM and Cisco. Tao has a MS degree in Computer Science from University of McGill and many years of experience as a teaching assistant for various computer science classes.

Leave a Reply

Your email address will not be published.

LEARN HOW TO GET STARTED WITH DEVOPS

get free access to this free guide, downloaded over 200,00 times !

You have Successfully Subscribed!

Level Up Big Data Pdf Book

LEARN HOW TO GET STARTED WITH BIG DATA

get free access to this free guide, downloaded over 200,00 times !

You have Successfully Subscribed!

Jenkins Level Up

Get started with Jenkins!!!

get free access to this free guide, downloaded over 200,00 times !

You have Successfully Subscribed!