Search engines today incorporate personalization: an algorithm that takes into account your search history, what you click on, and your geographical location. Personalization is supposed to enhance your search by showing things that you’re interested in, but what does this mean for politics? What if the search engine starts pushing liberal news because that is what you like, even if these articles are not representative of the issue? In this article, I want to convince readers that there is a similar problem with bias in search engines, inform readers about how search engines work today and how it can be bad, and convince readers that they can express their dissatisfaction by choosing other options.
To illustrate the problem, here are the results of a private-browser search on one of the Virginia candidates for governor in the 2017 election season. By doing searches in a private browser, I prevent the engines from having any personal profile data or cookies saved about me. I have tried to make the search engine as unidentifiable as I can for all four websites:
We have four different search engine results, showing just the top 3 or 4 results for each page. It is true that different search engines have their own algorithms for ranking results, but there is a problem in these four that shows some sort of political bias. Specifically, let’s focus in on the first result returned from the top left and bottom right. Top left: “Official Ralph Northam Website - Latest Northam News and Facts.” Top right: “Ralph Northam - Who is Ralph Northam? - Learn More.” They seemed to fit my information need perfectly - I wanted to know more about Ralph Northam - so I clicked away. But the click brings you to a website that states “Northam’s Ideas - Wrong for Virginia” and goes on to list a few policy areas and accompanying incriminating paragraphs that paint Northam as a bad candidate for governor. We have two immediate problems: first, that the advertisement makes it to the top of list in two search engines while looking a lot like a normal search result and second, that the page is clearly misrepresenting itself as a legitimate source of information about a candidate by using header terms like “Official…Website” and “Learn More.”
The actual official site for this candidate is the first or second listing in all the results but the bottom right, where it does not even make the cropped image (it is right under the “News” section). The other two search engines do not list the biased website in the top 5 results. Going clockwise from the top left, we have Google (U.S.), DuckDuckGo (U.S.), Bing (U.S.), and Yandex (Russia) with the yellow search bar. Google and Bing’s aggregate global market share in the search engine space is ~86% and both engines pushed a biased result to the top of the list in lieu of arguably more unbiased results like the candidate’s own website. DuckDuckGo explicitly states that they are personalization-free and do not track user data in favor of upholding user privacy. Yandex is popular in Russia and does store user profile data.
Since you are likely reading this digitally, feel free to take a break and try out these search engines. Try queries that you think are political in nature (“donald trump”, “gun violence”, etc.) and see how/if the results differ.
Let’s get into the weeds of how search engines work at a high level. As an analogy, think of a search engine as a library. It has an indexer that does the job of librarians, who would read through new books and catalogue them, sorting them by topic and putting them into the Dewey decimal system, by going through web pages and figuring out what the content is and stores all this information on the pages away. Then, the search engine has a ranker that, given the information on the pages and the search query, puts the pages in order by some measure. Search engines typically use relevance to do this: when you type in “how to change a tire” they want to show the best self-help articles from top to bottom because they think you will read the results in order and that you will be satisfied by a relevant search result. Very little is published about the specifics of how modern search engines rank their results because those details keep them competitive. But something that still is at the core of Google is the PageRank algorithm, published by the founders in a paper, and more carefully explained by Ian Rogers here. As a simplification, PageRank ranks pages by how important other pages think it is. For example, many pages might link to whitehouse.gov, which increases its ranking. In this article, I link to a ton of other articles - this does not help my article’s ranking. But if a ton of other articles link to me, lots of other sites think my page is worth traveling to.
So what’s wrong with how search engines work? In discussing the ethics of their search algorithm in 2006, Peter Norvig, then the director of search quality and research at Google, said that “You can't buy your way into the search results. You can buy an ad and be shown on the side of the page, but you can't be shown in the regular search results.” This truth of statement has since changed, as we can see. Google used to have the advertisements on the side. Then the ads were listed with the ranked results, marked in yellow because companies were likely complaining that users would ignore the side results entirely. Now, the advertisements blend right in with the ranked results.
The second thing that is hurting the diversity of search results is the personalization algorithm. This is where engines like Google and Bing remember what you have searched before, your location, and other personal data in an effort to improve your future searches. If you keep searching “CAT” and you mean “Charlottesville Area Transit”, not the animal, after searching “the corner uva” and “CAT uva” a few times, you’ll see that the results having to do with Charlottesville and UVa start bubbling to the top. If you’ve been searching for a while, it’s likely “CAT” will already bubble up a local result. What you search probably does not look like what I search. Maybe if I have been reading a lot of conservative news or my searches seem more conservative, Bing will pull up more things that are from those outlets.
Try it for yourself: Close all your private browsers to make sure your cookies are properly deleted (cookies are small packets of data that track your behavior while you are on a browser) or use in-private mode on Edge. Then compare search results for queries (for example, I tried “cat” and got Charlottesville results with personalization) that might give you different results when you’re logged in/in normal browsing versus private browsing.
Besides allowing organizations with agendas and money to influence what users see on search results pages, search engines are also moving towards personalized search as the Next Big Thing that will galvanize business for them which means more users unaware that they could be searching “election debate summary” but get totally different results from their neighbors. What’s wrong with this? Shouldn’t I want to see things germane to my interests? Usually, we are searching for innocuous things like “bean bag chair cheap.” How can that possibly get political? But when we search for things, we are looking for truth. For many of its users, Google is an arbiter of truths. Google will sift through the 4.6 billion pages (5% of the World Wide Web) that they have stored for us and figure out how to answer our query. When we search “quadratic formula” and Google returns that the formula is such-and-such, we assume Google’s answer to be true. Usually, it is. But it is dangerous when it isn’t.
What can we do instead? First, we can demand more from our search engines. We do not like monopolies in business because it drives up prices and stamps out innovation. Why should we accept the same in search? Google is not the only search platform out there - the other two search result pages did pretty well, placing Northam’s official website in the top two results. Yandex is a Russian-based search engine with a huge share of the Russian market and is heavily influenced by the Russian government, a fact that should make it a less appealing choice for American consumers. The second, DuckDuckGo, places a huge emphasis on user privacy, claiming zero tracking data and a ranking algorithm that aggregates results from different search engines to come up with best results. Baidu (in the collage at the top) is a Chinese search engine, which for similar reasons as for Yandex, may make it less appealing for American consumers. Yahoo! search is powered by Bing, so we can consider the search results from that platform as Bing results. Consider switching your default search page to a different engine and use it for a month or so - I’ve been Binging it since early September and do not feel a significant difference in search results.
Growing up in the digital age, it is important to get smarter about how technology shapes us, especially when it’s in this low-visibility way. I believe that search engines can be used as a public good that brings information knowledge to all kinds of places, but that the way that search is done today continues to inflate bubbles of political bias where people are overexposed to their own views and fail to recognize it. Instead of letting the search engine tell you what to think, remember that a search engine is just another tool - be critical about it and diversify where you get your information from.