mardi 30 juin 2015

Big Data, Big Problems: 4 Major Link Indexes Compared

Posted by russangular

Given this blog's readership, chances are good you will spend some time this week looking at backlinks in one of the growing number of link data tools. We know backlinks continue to be one of, if not the most important parts of Google's ranking algorithm. We tend to take these link data sets at face value, though, in part because they are all we have. But when your rankings are on the line, is there a better way to get at which data set is the best? How should we go about assessing these different link indexes like Moz, Majestic, Ahrefs and SEMrush for quality? Historically, there have been 4 common approaches to this question of index quality...

  • Breadth: We might choose to look at the number of linking root domains any given service reports. We know that referring domains correlates strongly with search rankings, so it makes sense to judge a link index by how many unique domains it has discovered and indexed.
  • Depth: We also might choose to look at how deep the web has been crawled, looking more at the total number of URLs in the index, rather than the diversity of referring domains.
  • Link Overlap: A more sophisticated approach might count the number of links an index has in common with Google Webmaster Tools.
  • Freshness: Finally, we might choose to look at the freshness of the index. What percentage of links in the index are still live?

There are a number of really good studies (some newer than others) using these techniques that are worth checking out when you get a chance:

  • BuiltVisible analysis of Moz, Majestic, GWT, Ahrefs and Search Metrics
  • SEOBook comparison of Moz, Majestic, Ahrefs, and Ayima
  • MatthewWoodward study of Ahrefs, Majestic, Moz, Raven and SEO Spyglass
  • Marketing Signals analysis of Moz, Majestic, Ahrefs, and GWT
  • RankAbove comparison of Moz, Majestic, Ahrefs and Link Research Tools
  • StoneTemple study of Moz and Majestic

While these are all excellent at addressing the methodologies above, there is a particular limitation with all of them. They miss one of the most important metrics we need to determine the value of a link index: proportional representation to Google's link graph . So here at Angular Marketing, we decided to take a closer look.

Proportional representation to Google Search Console data

So, why is it important to determine proportional representation? Many of the most important and valued metrics we use are built on proportional models. PageRank, MozRank, CitationFlow and Ahrefs Rank are proportional in nature. The score of any one URL in the data set is relative to the other URLs in the data set. If the data set is biased, the results are biased.

A Visualization

Link graphs are biased by their crawl prioritization. Because there is no full representation of the Internet, every link graph, even Google's, is a biased sample of the web. Imagine for a second that the picture below is of the web. Each dot represents a page on the Internet, and the dots surrounded by green represent a fictitious index by Google of certain sections of the web.

Of course, Google isn't the only organization that crawls the web. Other organizations like Moz, Majestic, Ahrefs, and SEMrush have their own crawl prioritizations which result in different link indexes.

In the example above, you can see different link providers trying to index the web like Google. Link data provider 1 (purple) does a good job of building a model that is similar to Google. It isn't very big, but it is proportional. Link data provider 2 (blue) has a much larger index, and likely has more links in common with Google that link data provider 1, but it is highly disproportional. So, how would we go about measuring this proportionality? And which data set is the most proportional to Google?

Methodology

The first step is to determine a measurement of relativity for analysis. Google doesn't give us very much information about their link graph. All we have is what is in Google Search Console. The best source we can use is referring domain counts. In particular, we want to look at what we call referring domain link pairs. A referring domain link pair would be something like ask.com->mlb.com: 9,444 which means that ask.com links to mlb.com 9,444 times.

Steps

  1. Determine the root linking domain pairs and values to 100+ sites in Google Search Console
  2. Determine the same for Ahrefs, Moz, Majestic Fresh, Majestic Historic, SEMrush
  3. Compare the referring domain link pairs of each data set to Google, assuming a Poisson Distribution
  4. Run simulations of each data set's performance against each other (ie: Moz vs Maj, Ahrefs vs SEMrush, Moz vs SEMrush, et al.)
  5. Analyze the results

Results

When placed head-to-head, there seem to be some clear winners at first glance. In head-to-head, Moz edges out Ahrefs, but across the board, Moz and Ahrefs fare quite evenly. Moz, Ahrefs and SEMrush seem to be far better than Majestic Fresh and Majestic Historic. Is that really the case? And why?

It turns out there is an inversely proportional relationship between index size and proportional relevancy. This might seem counterintuitive, shouldn't the bigger indexes be closer to Google? Not Exactly.

What does this mean?

Each organization has to create a crawl prioritization strategy. When you discover millions of links, you have to prioritize which ones you might crawl next. Google has a crawl prioritization, so does Moz, Majestic, Ahrefs and SEMrush. There are lots of different things you might choose to prioritize...

  • You might prioritize link discovery. If you want to build a very large index, you could prioritize crawling pages on sites that have historically provided new links.
  • You might prioritize content uniqueness. If you want to build a search engine, you might prioritize finding pages that are unlike any you have seen before. You could choose to crawl domains that historically provide unique data and little duplicate content.
  • You might prioritize content freshness. If you want to keep your search engine recent, you might prioritize crawling pages that change frequently.
  • You might prioritize content value, crawling the most important URLs first based on the number of inbound links to that page.

Chances are, an organization's crawl priority will blend some of these features, but it's difficult to design one exactly like Google. Imagine for a moment that instead of crawling the web, you want to climb a tree. You have to come up with a tree climbing strategy.

  • You decide to climb the longest branch you see at each intersection.
  • One friend of yours decides to climb the first new branch he reaches, regardless of how long it is.
  • Your other friend decides to climb the first new branch she reaches only if she sees another branch coming off of it.

Despite having different climb strategies, everyone chooses the same first branch, and everyone chooses the same second branch. There are only so many different options early on.

But as the climbers go further and further along, their choices eventually produce differing results. This is exactly the same for web crawlers like Google, Moz, Majestic, Ahrefs and SEMrush. The bigger the crawl, the more the crawl prioritization will cause disparities. This is not a deficiency; this is just the nature of the beast. However, we aren't completely lost. Once we know how index size is related to disparity, we can make some inferences about how similar a crawl priority may be to Google.

Unfortunately, we have to be careful in our conclusions. We only have a few data points with which to work, so it is very difficult to be certain regarding this part of the analysis. In particular, it seems strange that Majestic would get better relative to its index size as it grows, unless Google holds on to old data (which might be an important discovery in and of itself). It is most likely that at this point we can't make this level of conclusion.

So what do we do?

Let's say you have a list of domains or URLs for which you would like to know their relative values. Your process might look something like this...

  • Check Open Site Explorer to see if all URLs are in their index. If so, you are looking metrics most likely to be proportional to Google's link graph.
  • If any of the links do not occur in the index, move to Ahrefs and use their Ahrefs ranking if all you need is a single PageRank-like metric.
  • If any of the links are missing from Ahrefs's index, or you need something related to trust, move on to Majestic Fresh.
  • Finally, use Majestic Historic for (by leaps and bounds) the largest coverage available.

It is important to point out that the likelihood that all the URLs you want to check are in a single index increases as the accuracy of the metric decreases. Considering the size of Majestic's data, you can't ignore them because you are less likely to get null value answers from their data than the others. If anything rings true, it is that once again it makes sense to get data from as many sources as possible. You won't get the most proportional data without Moz, the broadest data without Majestic, or everything in-between without Ahrefs.

What about SEMrush? They are making progress, but they don't publish any relative statistics that would be useful in this particular case. Maybe we can hope to see more from them soon given their already promising index!

Recommendations for the link graphing industry

All we hear about these days is big data; we almost never hear about good data. I know that the teams at Moz, Majestic, Ahrefs, SEMrush and others are interested in mimicking Google, but I would love to see some organization stand up against the allure of more data in favor of better data—data more like Google's. It could begin with testing various crawl strategies to see if they produce a result more similar to that of data shared in Google Search Console. Having the most Google-like data is certainly a crown worth winning.

Credits

Thanks to Diana Carter at Angular for assistance with data acquisition and Andrew Cron with statistical analysis. Thanks also to the representatives from Moz, Majestic, Ahrefs, and SEMrush for answering questions about their indices.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Russian officials ban yoga because it's too much like a religious cult


I mean, technically they're not wrong. But come on, what's the point of life without appropriation! via Pocket http://ift.tt/1R2oEgq

Russian officials ban yoga because it's too much like a religious cult



I mean, technically they're not wrong. But come on, what's the point of life without appropriation!

from Pocket http://ift.tt/1R2oEgq

Court rules NSA can (temporarily) resume bulk collection of phone records.


Court rules NSA can (temporarily) resume bulk collection of phone records. (reuters.com) Anyone here? via Pocket http://ift.tt/1FOa0gW

Court rules NSA can (temporarily) resume bulk collection of phone records.



Court rules NSA can (temporarily) resume bulk collection of phone records. (reuters.com) Anyone here?

from Pocket http://ift.tt/1FOa0gW

Greece becomes first developed nation to default on international obligations


By the beard of Zeus! Buy the beard of Zeus, you ask? Must be pretty cheap by now. via Pocket http://ift.tt/1LTojWC

Greece becomes first developed nation to default on international obligations



By the beard of Zeus! Buy the beard of Zeus, you ask? Must be pretty cheap by now.

from Pocket http://ift.tt/1LTojWC

ISIS executes women by beheading for the first time for 'sorcery' in Syria


Sorcery is serious business. You don't have to deal with it in America because the NSA is protecting you. Thanks Obama! via Pocket http://ift.tt/1LFCemj

ISIS executes women by beheading for the first time for 'sorcery' in Syria



Sorcery is serious business. You don't have to deal with it in America because the NSA is protecting you. Thanks Obama!

from Pocket http://ift.tt/1LFCemj

Worldnews Rules


Cuba first to ​eliminate mother-to-baby HIV transmission: World Health Organization hails 'one of the greatest public health achievements possible', five years into regional initiative (theguardian.com)submitted 1 hour ago by loading... via Pocket http://ift.tt/1HsElau

Worldnews Rules



Cuba first to ​eliminate mother-to-baby HIV transmission: World Health Organization hails 'one of the greatest public health achievements possible', five years into regional initiative (theguardian.com)submitted 1 hour ago by loading...

from Pocket http://ift.tt/1HsElau

South Sudan army raped and then burned girls alive, says UN


Rarely does good news come from this area. I was just thinking the same thing..... Sucks that the governments are so corrupt the war never ends. via Pocket http://ift.tt/1LzHMOi

South Sudan army raped and then burned girls alive, says UN



Rarely does good news come from this area. I was just thinking the same thing..... Sucks that the governments are so corrupt the war never ends.

from Pocket http://ift.tt/1LzHMOi

Australia bans 220 videogames in four months as Government adopts new classification model


It seems to be a great year for Australia. Maybe the AU.gov can turn it back to a prison to the end of the year. via Pocket http://ift.tt/1FMP4a1

Australia bans 220 videogames in four months as Government adopts new classification model



It seems to be a great year for Australia. Maybe the AU.gov can turn it back to a prison to the end of the year.

from Pocket http://ift.tt/1FMP4a1

ISIS just executed its top official in Mosul for planning a coup


Good guy ISIS killing an ISIS top official. You live by the sword, you die by the sword. via Pocket http://ift.tt/1BU5bYf

At least 116 feared dead in after Indonesian military plane crashes into a major city


From experience here... -Either that thing was overloaded or the cargo snapped free -It was shot down -Or the Cockpit had a fire. C-130's don't just fall out of the sky like this, they are insanely overbuilt and overpowered. via Pocket http://ift.tt/1g6FFFG

ISIS just executed its top official in Mosul for planning a coup



Good guy ISIS killing an ISIS top official. You live by the sword, you die by the sword.

from Pocket http://ift.tt/1BU5bYf

At least 116 feared dead in after Indonesian military plane crashes into a major city



From experience here... -Either that thing was overloaded or the cargo snapped free -It was shot down -Or the Cockpit had a fire. C-130's don't just fall out of the sky like this, they are insanely overbuilt and overpowered.

from Pocket http://ift.tt/1g6FFFG

Saudi comedian gets death threats for satirizing IS


Good for him. Those asshats don't like being shown how big an idiot they really are. It's already been shown we can't wage war on an idea, only combat it with another idea. via Pocket http://ift.tt/1FMK1q2

Man's father dies in South African hospital's waiting room after being refused because he didn't have the R20 ($1.60) admission fee


Man's father dies in South African hospital's waiting room after being refused because he didn't have the R20 ($1.60) admission fee (ewn.co.za)submitted 3 hours ago by loading... via Pocket http://ift.tt/1BSvLkv

Worldnews Rules


The remains of an unidentified sea animal with fur on its tail have been washed ashore in the Far East. Found near the airport at Shakhtersk, on Sakhalin Island, its appearance is unlike anything ever found in Russia. (siberiantimes.com) via Pocket http://ift.tt/1NuOBz0

Saudi comedian gets death threats for satirizing IS



Good for him. Those asshats don't like being shown how big an idiot they really are. It's already been shown we can't wage war on an idea, only combat it with another idea.

from Pocket http://ift.tt/1FMK1q2

Man's father dies in South African hospital's waiting room after being refused because he didn't have the R20 ($1.60) admission fee



Man's father dies in South African hospital's waiting room after being refused because he didn't have the R20 ($1.60) admission fee (ewn.co.za)submitted 3 hours ago by loading...

from Pocket http://ift.tt/1BSvLkv

Worldnews Rules



The remains of an unidentified sea animal with fur on its tail have been washed ashore in the Far East. Found near the airport at Shakhtersk, on Sakhalin Island, its appearance is unlike anything ever found in Russia. (siberiantimes.com)

from Pocket http://ift.tt/1NuOBz0

Second Wealthiest Man Carlos Slim scraps project with Donald Trump after Mexico insults


Man, Donald trump is really getting shafted it seems. via Pocket http://ift.tt/1IpcBUM

Worldnews Rules


CIA photos of ‘black sites’ could complicate Guantanamo trials | Military prosecutors this year learned about a massive cache of CIA photographs of its former overseas “black sites” while reviewing material collected for the Senate investigation of the agency’s interrogation program, offic via Pocket http://ift.tt/1g368nx

Second Wealthiest Man Carlos Slim scraps project with Donald Trump after Mexico insults



Man, Donald trump is really getting shafted it seems.

from Pocket http://ift.tt/1IpcBUM

Worldnews Rules



CIA photos of ‘black sites’ could complicate Guantanamo trials | Military prosecutors this year learned about a massive cache of CIA photographs of its former overseas “black sites” while reviewing material collected for the Senate investigation of the agency’s interrogation program, offic

from Pocket http://ift.tt/1g368nx

1,000 runners get norovirus after French mud run


And this is why I don't go outside. Norovirus is transmitted primarily through shit and diarrhea. I played in a lot of mud as a child and never got sick. via Pocket http://ift.tt/1NriBfL

1,000 runners get norovirus after French mud run



And this is why I don't go outside. Norovirus is transmitted primarily through shit and diarrhea. I played in a lot of mud as a child and never got sick.

from Pocket http://ift.tt/1NriBfL

New WikiLeaks Documents Reveal NSA Spied On Top French Companies


State-sponsored French hackers are probably the most “capable” of stealing the business secrets of American companies, after China, according to former CIA director and defense secretary, Robert Gates. via Pocket http://ift.tt/1CF6XYp

New WikiLeaks Documents Reveal NSA Spied On Top French Companies



State-sponsored French hackers are probably the most “capable” of stealing the business secrets of American companies, after China, according to former CIA director and defense secretary, Robert Gates.

from Pocket http://ift.tt/1CF6XYp

Uber managers arrested in France over 'illicit' taxi service


So after being told that their business is operating illegally they kept running it? Sounds like a helluva legal department at Uber. I'm really torn about the whole Uber thing. People seem to really hate taxi-groups, but Uber is fucking a lot of taxi drivers that had to go through the process. via Pocket http://ift.tt/1Hqa38g

Uber managers arrested in France over 'illicit' taxi service



So after being told that their business is operating illegally they kept running it? Sounds like a helluva legal department at Uber. I'm really torn about the whole Uber thing. People seem to really hate taxi-groups, but Uber is fucking a lot of taxi drivers that had to go through the process.

from Pocket http://ift.tt/1Hqa38g

Help Us Improve the Moz Blog: 2015 Reader Survey

Posted by Trevor-Klein

In late 2013, we asked you all about your experience with the Moz Blog. It was the first time we'd collected direct feedback from our readers in more than three years—an eternity in the marketing industry. With the pace of change in our line of work (not to mention your schedules and reading habits) we didn't want to wait that long again, so we're taking this opportunity to ask you how well we're keeping up.

Our mission is to help you all become better marketers, and to do that, we need to know more about you. What challenges do you all face? What are your pain points? Your day-to-day frustrations? If you could learn more about one or two (or three) topics, what would those be?

If you'll help us out by taking this five-minute survey, we can make sure we're offering the most useful and valuable content we possibly can. When we're done looking through the responses, we'll follow up with a post about what we learned.

Thanks, everyone; we're excited to see what you have to say!

Can't see the survey? Click here to take it in a new tab.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

lundi 29 juin 2015

Saudi Officials Linked to Jihadist Group in WikiLeaks Cables


The documents, which couldn’t be independently verified, say the Saudi ambassador to Pakistan met in 2012 with Nasiruddin Haqqani, the chief fundraiser for the jihadist group who has been on a United Nations terrorism watch list since 2010. In the meeting, Mr. via Pocket http://ift.tt/1C1UFPh

via Pocket http://ift.tt/1LEA6LI

Saudi Officials Linked to Jihadist Group in WikiLeaks Cables



The documents, which couldn’t be independently verified, say the Saudi ambassador to Pakistan met in 2012 with Nasiruddin Haqqani, the chief fundraiser for the jihadist group who has been on a United Nations terrorism watch list since 2010. In the meeting, Mr.

from Pocket http://ift.tt/1C1UFPh




from Pocket http://ift.tt/1LEA6LI

Worldnews Rules



Legislators in Brazil's largest city, Sao Paulo, have banned the production and sale of foie gras, a delicacy made from the fatty liver of force-fed ducks and geese. City councillors said animals go through a great deal of suffering for the production of the pâté. (bbc.com) Good for them!

from Pocket http://ift.tt/1IGiQzn

Worldnews Rules


Legislators in Brazil's largest city, Sao Paulo, have banned the production and sale of foie gras, a delicacy made from the fatty liver of force-fed ducks and geese. City councillors said animals go through a great deal of suffering for the production of the pâté. (bbc.com) Good for them! via Pocket http://ift.tt/1IGiQzn