Bing launched a little more than a week ago, so it’s probably too early to give a definitive review, but I thought it might be interesting to look at where it is now.
Search is a very strange business in marketing terms. Unlike most other businesses a search engine has many, many interactions with its users. The only others who have as fine grained interaction are media outlets and food vendors (Starbucks comes to mind, but even there I have never bought more than 5 lattes in a day).
The second factor that changes things is how much user satisfaction varies with results. 25% of the time search engines nail things exactly (especially for navigational results), but 25% of the time the results are not at all what the user wanted, either because what they wanted doesn’t exist (natural looking toupees), or isn’t available on the web.
These two factors mean that users have a very large number of data points, where they know how a search engine has performed. In the course of a month, a user knows how often the search engine has succeeded or failed. This makes marketing search engines more difficult than normal consumer products as you can’t argue with people’s recent, direct experience. This is one reason why there has not been a marketing campaign that has significantly changed search market share.
For any search engine starting out, traffic growth is going to come primarily from getting people to switch from Google. People who have not switched from Yahoo! or MSN in the past 10 years are less likely to decide that today is the time that they will try something new. Therefore, the challenge is to look better than Google, when the user tries you.
There are two common scenarios. The first is the test drive. A news story comes out that mentions your product, so a user tries out some queries, comparing you and Google. As they are not actually looking for something, the queries are short, and as they do not have something in mind they are less judgmental about the results. In this scenario, both engines look similar. Most of the reviews of Bing that I have seen so far are in this mode. It is almost impossible to convert this casual user to a new search engine based on this kind of “side-by-side”, as there is no reason to switch—both engines do well enough.
The other scenario is when a user is looking for something and cannot find it on Google. They then try another search engine, expecting that the other engine will fail—after all Google did. If the other engine can find what the user was looking for, or even show results that are different from what Google showed, then the user has a positive experience, and is more likely to go back. Habits being what they are one good experience is probably not enough to get a user to switch, but with every successful attempt, the user is more likely to try again.
This explains what kinds of search queries an engine has to beat Google at (at least sometimes) in order to grow traffic.
- Different than Google
- Good when Google is bad. Being slightly better when Google is good does not help
- Tail queries
How do you beat Google on these queries?
- Different signals than Google
- Different ranking than Google
- More or different content
The critical notion here is that you cannot best Google by having a good small index that performs well on popular queries. People will only try you on the tail (when Google has failed), so that is where the battle is joined.
To win on the tail needs more content or more signals, (or if someone messes up and is slow, you can be faster, but Google is fast enough right now).
What has Microsoft done with Bing? It seems that they have gone with more signals, but they have cut the number of pages in their index (presumably keeping the total size constant).
The old Live search engine came in at between 30%-40% the size of Google’s index. Where is Bing? Search engine index size (notwithstanding Andrei Broder and others who have come up with many very subtle methods) is pretty easy to measure. You take rare queries, and see how many pages are returned. (You need to check that the page actually contains the query, as some engines return pages that don’t contain the term, but contain a misspelling, or there is a link to that page with the term.
A fairly quick test shows that Bing is now around 20% the size of Google, so it has replaced half its index with additional ranking signals. This means Bing is indexing less than 10B pages, which is pretty sparse. To put this in perspective, Google has not been that small since 2003.
Another way of thinking about this is in terms of crawling. It is reasonable to crawl the web at about 2 billion pages a day. This reduces to a little more than a billion a day after de-duping and removal of cruft, so Bing is indexing about a week’s worth of crawl. A quick look at the dates on cached pages shows they are refreshing a considerable number of pages on a weekly basis, though there are some outliers, going back months.
Yahoo! moved close to Live’s size a few years ago, so Bing is now significantly behind Yahoo! in size.
Looking at some logs it seems like Bing is crawling as much as Google, or us, (though Yahoo! is crawling more than GoogleBot or Twiceler in the logs I see). There are two Msnbots (msnbot/1.1 and msnbot/2) out in the wild, with 1.1 crawling 50% more. 1.1’s crawl is being served by Bing, (by checking crawl times and the cached page dates).
Interestingly neither pretend to be Mozilla (which is what regular crawlers do).
In the first week of launch it is not surprising to see that crawling has gotten ahead of indexing, and that some pages that were crawled pre-launch are not yet in the index. In the past Microsoft has always had a very fast pipeline, so presumably this is a transient effect.
This puts a lot of pressure on Bing’s ranking as they cannot match Google on recall.
Search engines can be judged in several ways. The two easiest metrics are speed and size. Bing seems to be ok on speed, but behind on size. Presumably they did this to improve relevancy, trading size for quality. Relevancy is more difficult to measure. Relevance is best measured by taking a meaningful query stream (the kind of stream that switchers from Google would produce) and getting raters to rate each URL. I have a set of raters rating Bing right now, and I will post on what those results are later.