It took a little longer than I hoped to get back numbers on the relative performance between Bing and other search engines. Partially this was because I made the horrible mistake of renting an RV and driving to the Grand Canyon. RVs are not at all what you think. I have never been so shaken, rattled, and deafened.
The Grand Canyon was impressive, and I did get the benefit of the quintessential American experience of small towns that are “World Capital” of something. Boron comes to mind (from where most of the world’s borax comes).
The other delay was in calculating significance values correctly. The right thing to use is the Student’s t-test, but in Ireland this is always taught in concert with drinking Guinness (as that is what the test was designed to measure—whether two batches of Guinness are significantly different), and thus one’s memory can be slightly less accurate than usual.
First some caveats: There are two kinds of search engine rankers: those that are machine learned from testing data, and the hand built ones. The machine learned ones often outperform the hand built ones when you measure them against training data, but when tested by other metrics (like click-through, or time on landing page) they turn out to be worse.
This is because they learn what people say they like, not what people actually like. The classic example of this is Wikipedia. People rank Wikipedia higher than they should (as measured by how often they click on it). I suppose it seems worthy to them.
This effect is more pronounced on common queries, as these over-weighted brands are more frequent in short common queries, so it is not uncommon to see Yahoo (a machine learned engine) outperform Google (a hand tuned engine) on head (common) queries, at least by a metric that asks users to rate queries.
The second major caveat is users’ behavior. There are three ways to arrive at Bing. Firstly you can accidently type a query in the wrong box in your browser and end up at Bing. Secondly, you can have Bing as your default engine. Or thirdly, you go to Bing having first tried a search on Google (or Yahoo! I suppose).
Bing’s growth is driven by the last case: Where there is a straight-up comparison between Google and Bing. But, and this is a critical issue, the queries where people leave Google are ones where they did not find what they were looking for—things are always in the last place you look, because then you stop looking.
Therefore, a critical test is whether Bing shows results that Google doesn’t, as people have already seen the results Google showed, and they did not like those. The overlap in result sets is therefore a critical measure, with less overlap being better (for this use case).
Finally, the queries that people try when they switch are usually the harder non-navigational ones—the easy or navigational queries are answered correctly (in the most part) by Google, so there is no need to Bing and decide.
So, in order to test Bing fairly, we took a set of 400 queries that are the kind people would switch and try on another engine (our logs are an easy source of these). We made sure there were no porn or foreign queries. We asked raters to rate each of the top 5 URLs returned by each search engine on a scale of 0 to 4. Raters just saw the query and the page returned, with no evidence of what engine generated the URL. Each URL-query pair was rated 3 times, and major disagreements were triaged by a trusted rater. We then summed the scores weighting higher positions more (position i gets a weight of 0.8 to the power of i, starting at 0 of course).
First the bad news for Bing. It overlaps Google too much. On our test queries it overlapped Google 29% of the time, more than Yahoo (25%). Kendall’s tau was much closer however (0.105 vs. 0.102).
The good news is that Bing is pretty close on ranking. On this query set Bing gets a 65, Yahoo a 66.5 and Google a 67. These differences are significant at 99% confidence level.
The ratio of spam in the results was the biggest factor. Bing had 2.9% spam, Google had 2.56% spam, while Yahoo had 4.9%. Having double the spam of Google must be holding Yahoo back from gaining search share.
The other noticeable difference is that Bing’s distribution of scores is wider than Google’s. Yahoo’s lies somewhere in between. Bing had more very high scoring, and more very low scoring results.
Often, people will claim that search engines are sufficiently similar in ranking that you cannot tell the difference. In practice, a well designed test can tell the difference, and traffic growth is strongly correlated with whoever is ahead, i.e. when Yahoo! has been ahead on tests like this (and has had their spam problem in hand), it has seen growth.
To put this in perspective, Live.com has been as much as 10 points back, so this amounts to a a very significant closing of the gap. This query set is a very tail set, so should not over-weight machine learned rankers as much as a common set, though there is doubtless some over-weighting.
When search engines are this close, traffic growth generally depends on other factors as well as quality. So, Bing may have a chance at solid growth, so long as it can remain differentiated, avoid being spammed, and can resist the temptation to put up too many ads.
A second major question is whether Bing is using any new signals, beyond what are usually used in search engines. Search engines use traditional signals like title, URL, emphasis/heading, document length and number of occurrences on page to generate an on-page score. They then make an off page score from a count of matching anchors, possibly weighted by the quality of the source page. They combine these scores with some proximity information, and some notion of page popularity (e.g. PageRank). Finally they demote spam and they promote pages that are clicked on for this query more than one would expect.
Do we see any examples where Bing’s ranking cannot be explained by these signals. In the examples I have looked at, no. It seems there is no new magic signal that Bing has that no one else is using.
That does not mean there are not significant differences between Bing’s ranking and others. A few things stand out.
Bing prefers URL matches more than others. For example, the query “heston farm” returns the URL chestonfarm.co.uk, and Bing marks the occurrence of “heston farm” in the URL in bold.
Bing seems to prefer pages where the term occurs with its first letter capitalized.
Bing does less term-rewriting than Google. For “Hubble telescope” you can see Google bolding “the Hubble Space telescope”, even though the query did not contain “the” or “Space.” Bing also replaces titles less than Google – in this query we can see Google use “The Hubble Space Telescope Project” for hubble.nasa.gov/, whereas the actual title is “Main Hubble Page”, which Bing uses.
Interestingly, Bing’s page popularity seems to be weighted differently than Google’s PageRank. There is a strong bias for pages from large sites. There are cases when amazon.com results appear where signals would point otherwise. A good example is the page “www.bbc.co.uk/languages/spanish/lj” appearing for the query “spanish steps”. This suggests that www.bbc.co.uk is over-weighted, as the page does not really have enough anchors to overcome the obvious travel sites.
Bing is weaker than Google where proximity is important.
Some bug reports: Bing’s paging is sometimes off. For the query “Slane Castle” Bing repeats the last two results on the first page as the first two on their second page. For “Hazel O’Connor” Bing does not recognize the character escaping in “en.wikipedia.org/wiki/Hazel_O%27Connor” and thus fails to bold the match.
As one final thought, I have to address the issue of the “Google Brand”. It is common for people to say that to gain share from Google it is necessary to be much better, as people, when shown the same results, will prefer the page if it is branded Google, or will say they would not switch, even though they are shown benefits in another engine.
This is a simple matter of people having an intuitive notion of Bayes’ theorem. When asked which of two things is better, given some evidence, people take into account all the prior experiences they have had. If in the past Google has been reliably better, one or two good experiences will not sway a user.
However, this does not mean that a user will not be swayed by a longer sequence of good experiences. Bing would be in big trouble if there was evidence that users were not trying it more than a few times. It seems that many users are repeatedly trying Bing, so any preference that Google enjoyed will be swamped by the actual experience of Bing.

Meow · Jun 27, 12:02 AM
So how is Cuil doing?
Is Cuil still attractive to users?
Does Cuil perform better than Bing?
How could Cuil earn money?
When will Cuil care about non-English users?
I don’t like to Bing, either. (In Chinese, Bing means illness) However, less people want to Cuil, so that’s silly.
Leandro · Jun 27, 06:57 AM
All these tests and numbers are great, for those in the know; however, I think that, perhaps the biggest problem for the average Joe and Joanne, is that neither one of them really have a clue on how search engines work. That Yahoo or Bing are 2.5% better or worse than Google is almost trivial.
I should probably point out that I followed a link found in my weblogs, which led me to the Cuil’s robot page and from there, a link to this blog. It always amuses me how, on quiet mornings such as this one, most of the traffic received is by little search engines automatons, all trying to achieve the same goal.
Good luck.
Leo
George · Jun 30, 09:32 AM
Tom your subject on search engines is facinating. We almost take Google, Yahoo & MSN are the major engines. How does one optimise for all or most of them?
Yerasoft · Jun 30, 10:00 AM
I think the problem with Google is privacy and hyper-indexing. Opinion doesn’t make for great or accurate search results. That would be like the old days of yellow pages only to flooded with notes and letters to non-connected people. Having the social media control search is a bad idea. It’s easy as an SEM professional to get higher ranks in google by simply adding your site to a dozen or so social media sites and blog a little. Does this prove accurate data? I don’t think so. I’ve published bogus sites with the intention of proving that Google is looking for numbers with cause or concern about accurate data. Search needs to get back to something that makes sense… Search real data not social media sites or moment by Tweets.
Social media is flooding Google with non-sense and it will cause harm to that search engine. I hope Bing does a better job of separating data from sites and social media environments.
Lee Craven · Jul 1, 07:21 AM
Cuil crawler bot has not left my site for 2 weeks now lol – I personally hope it does shortly because google pulls a lot of data a week, and I can’t imagine if Cuil is on my site 24/7 how much data it’s taking. GULP
John Clifford · Jul 1, 05:03 PM
I work with professional services companies to increase their visibility in local search. So i do lots of local searches at various search engines and directories. The user experience, spam , and data quality issues are quite distinct if you do local search on Google, Bing, and Yahoo. Bing often shows yellowpages.com links above their 8 pack for local search and often they are for firms that are not in the geotargetd location. For example i searched on “personal injury lawyer seattle” and Bing listed 2 links from yellowpages.com that are about a 1 hr drive south of Seattle. This kind of serp page hurts conversion rates, consumer experience, and the credibility of Bing. Plus you have to click 2 times just to get to the law firm’s website. If I wanted to search at YellowPages.com I would go there not Bing. Any comments? How is Cuil approaching the challenges of local geotargeted search ?
Domino · Jul 7, 05:52 PM
I have often equated search and chess engines. Both tend to transcend the click-through myopia and yield a smooth satisfying stream of data, like Guinness or Cider . It looks as though, it is only a matter of time before Cuil bursts through the ranks encumbered by all the crawled information masses it consumed to get there.
Thomas · Jul 10, 08:24 AM
Re the bing bug…“Bing’s paging is sometimes off. For the query “Slane Castle” Bing repeats the last two results on the first page as the first two on their second page.”
Actually it repeats all the results on both pages, with weird numbering.
Try searching “meridias in albany ny”. There are 5 results, but search returns 2 pages, with same 5 results on each, numbered 1-5 on page 1, 11-15 on page 2.
We’re finding similar behavior using it via the API.