editione1.0.2
Updated November 2, 2022Youβre reading an excerpt of Making Things Think: How AI and Deep Learning Power the Products We Use, by Giuliano Giacaglia. Purchase the book to support the author and the ad-free Holloway reading experience. You get instant digital access, plus future updates.
Weβve all been there. You start watching a video on YouTube. Before you realize it, itβs 1 a.m., and you are watching videos about Greek philosophers and their influence in the modern world.
This is known as the βYouTube rabbit holeββthe process of watching YouTube videos nonstop. Most of these videos are presented by YouTubeβs recommendation algorithm, which determines what to suggest you watch based on your and other usersβ watch histories.
TikTok, Netflix, Twitter, Facebook, Instagram, Snapchat, and all services that present content have an underlying algorithm that distributes and determines the material presented to users. This is what drives YouTubeβs rabbit hole.
For TikTok, an investigation done by the Wall Street Journal found that the app only needs one important piece of information to figure out what a user wants: the total amount of time a user lingers on a piece of content.* Through that powerful signal, TikTok can learn peopleβs interests and drive users into rabbit holes of content. YouTubeβs and TikTokβs algorithms are all engagement-based, but according to Guillaume Chaslot, TikTokβs algorithms learn much faster.*
These services drive engagement by recommending content that users are likely to watch, but Netflix went a step further and personalizes thumbnail images of its shows to increase the click-through rate and total watch time. Netflix figured out that the thumbnail image that attracts a user to click depends on the type of movies that person likes to see. For example, if a user watches a lot of romance movies, the thumbnail should show an image of a romantic scene.
Letβs dive into one of these recommendation systems. Weβll look at YouTubeβs system as that has been discussed publicly. Othersβ systems work similarly.
The YouTube recommendation system works in two different stages. The first is for candidate generation, which selects videos that are possible options to be presented to users. The second stage is for ranking, which determines which videos are at the top and which are at the bottom of usersβ feeds.*
Candidate generation takes usersβ YouTube history as input. The ranking network operates a little differently. It assigns a score to each video using a rich set of features describing the video and the user. Letβs go over both stages.
The first stageβs model is inspired by the architecture of a continuous bag of words language model.* The continuous bag of words is a way of representing sentences as data points. It tries to predict the current target word (the center word) based on the source context words (surrounding words). That means that it just uses a small context around the target word to represent it.
The model will generate a representation of the video called an embedding. Then, the neural network is fed embeddings which have been learned from each video and are organized in a fixed vocabulary.
Data about each userβs viewing history is transformed into varying arrays of video IDs and mapped into a dense vector representation. With that, YouTubeβs algorithm uses training data of past videos and their watch time to train their neural network to figure out the expected viewing time for other videos.
Models are typically biased from making predictions based on past data. But recent relevant content is vital to YouTube as a platform, as it helps keep users engaged and up to date. To correct for this, YouTube sets the age of the training data as a feature and optimizes it so that more recent videos are more likely to show up as candidates and at the top of the list.
The second part of the recommendation system involves ranking videos. In order to recommend quality content, YouTube needs a way to determine which content users are watching and enjoying.
The authors observed that previous interactions with a particular video or ones similar to it were very important when predicting recommendations. This is intuitive because if a viewer enjoys particular types of content, they are likely to view many videos in that niche. They also noticed that videos coming from particular channels were also very important in deciding what to recommend next. So they used these features and others for the neural network to predict the ranking of a video.
Videos that retain the viewerβs attention are usually regarded as higher quality. In order to recommend quality videos, the model is trained so that it can predict how long a viewer will watch a video. This aspect also plays into how the algorithm ranks the videos.
With all of that, the team trained a neural network that takes inputs like the video ID, the watched video IDs, the video language, user language, time since last watch, number of previous impressions, and other features to predict the expected watch time. The click-through rate and the total amount spent per user increased based on YouTubeβs recommendations.
YouTubeβs algorithm is based on neural networks that aim to maximize engagement. That might be a good proxy for whether the user is enjoying watching those videos as the user is spending more time watching them. But there is not as much understanding of exactly what these neural networks are optimizing.
There is a risk that because these algorithms serve such a large percentage of the views, they can be controlled by a small group of people. For example, most of the social media platforms in China do not allow Chinese citizens to post images of Winnie the Pooh because it looks like the Chinese dictator, Xi Jinping.*
In the next section, I go over how researchers are trying to understand what these neural networks are doing under the hood.
βBy the help of microscopes, there is nothing so small, as to escape our inquiry; hence there is a new visible world discovered to the understanding.βRobert Hooke*
Mary spent the whole morning on her TikTok getting videos about how lamps work. Her TikTok feed is mostly that and cute videos of dogs. As with many who have interacted with TikTok or other social media apps, she never noticed that most of her social media feed is determined mostly by algorithms that tell her what to watch next.
This isnβt a problem when she is watching videos of dogs, but one day she was browsing around and started watching depressing videos, and the algorithm just reinforced that.