December 30, 2024

Westside People

Complete News World

Google mistakenly published internal research documentation on GitHub

Google mistakenly published internal research documentation on GitHub
Large Google logo at a trade show.

Getty Images | Alexander Corner

Apparently, Google accidentally published a large set of internal technical documents to GitHub, which partly explains how the search engine ranks web pages. For most of us, the question regarding search rankings is just “Are my web results good or bad,” but the SEO community is thrilled to peek behind the curtain and go up in arms since the docs seem to contradict some of the Provided by Google has told them in the past. Most of the comments on the leak come from SEO experts. Rand Fishkin And Mike King.

Google confirmed the authenticity of the documents the edge“We caution against making inaccurate assumptions about research based on out-of-context, outdated, or incomplete information,” he said. “We have shared comprehensive information about how research works and the types of factors our systems measure, while also working to protect the integrity of our results from manipulation.”

The fun thing about accidentally publishing to GoogleAPI GitHub is that even though these are sensitive internal documents, Google has technically released them under the Apache 2.0 license. This means that anyone who found the documents received a “perpetual, worldwide, non-exclusive, royalty-free, irrevocable copyright licence”, so they are freely available online now, as here.

One of the leaked documents.
Zoom in / One of the leaked documents.

The leak contains a lot of API documentation for Google’s “ContentWarehouse,” which is very similar to a search index. As you might expect, even this incomplete view of how Google ranks web pages would be impossibly complex. King wrote that there are “2,596 modules represented in the API documentation with 14,014 attribute(s).” These are all documents written by programmers for programmers and rely on a lot of basic information that you probably wouldn’t know unless you worked on a research team. The SEO community is still studying the documents and using them to build assumptions about how Google search works.

Both Fishkin and King accuse Google of “lying” to SEO experts in the past. One of the revelations in the docs is that the CTR of a search results listing affects its ranking, which is a Google thing He denied it He goes to “soup” the results on several occasions. The click-tracking system is called “Navboost,” in other words, it boosts the websites that users go to. Naturally, much of this click data comes from Chrome, even when you leave search. For example, some results could show a small set of Sitemap results below the main menu, part of what appears to be supporting these more popular subpages as determined by Chrome’s click tracking.

The documents also indicate that Google had whitelists that would artificially boost certain websites for certain topics. The two mentioned are “isElectionAuthority” and “isCovidLocalAuthority”.

Much of the documentation is exactly how you would expect a search engine to work. Sites have a “SiteAuthority” value that will rank well-known sites higher than less popular sites. The authors also have their own classifications, but as with everything here, it’s impossible to know how everything interacts with everything else.

Both comments from SEO experts make them seem upset that Google would mislead them at all, but doesn’t the company need to maintain at least a slightly adversarial relationship with people who try to manipulate search results? One recent study found that “search engines appear to be losing the cat-and-mouse game of SEO spam” and found “an inverse relationship between the level of page optimization and its perceived expertise, suggesting that SEO may at least harm subjective page quality.” None of this additional documentation will be beneficial to users or the quality of Google results. For example, now that people know that CTR affects search rankings, couldn’t you boost your website listing with a click farm?