Bing launched a little more than a week ago, so it’s probably too early to give a definitive review, but I thought it might be interesting to look at where it is now.
Search is a very strange business in marketing terms. Unlike most other businesses a search engine has many, many interactions with its users. The only others who have as fine grained interaction are media outlets and food vendors (Starbucks comes to mind, but even there I have never bought more than 5 lattes in a day).
The second factor that changes things is how much user satisfaction varies with results. 25% of the time search engines nail things exactly (especially for navigational results), but 25% of the time the results are not at all what the user wanted, either because what they wanted doesn’t exist (natural looking toupees), or isn’t available on the web.
These two factors mean that users have a very large number of data points, where they know how a search engine has performed. In the course of a month, a user knows how often the search engine has succeeded or failed. This makes marketing search engines more difficult than normal consumer products as you can’t argue with people’s recent, direct experience. This is one reason why there has not been a marketing campaign that has significantly changed search market share.
For any search engine starting out, traffic growth is going to come primarily from getting people to switch from Google. People who have not switched from Yahoo! or MSN in the past 10 years are less likely to decide that today is the time that they will try something new. Therefore, the challenge is to look better than Google, when the user tries you.
There are two common scenarios. The first is the test drive. A news story comes out that mentions your product, so a user tries out some queries, comparing you and Google. As they are not actually looking for something, the queries are short, and as they do not have something in mind they are less judgmental about the results. In this scenario, both engines look similar. Most of the reviews of Bing that I have seen so far are in this mode. It is almost impossible to convert this casual user to a new search engine based on this kind of “side-by-side”, as there is no reason to switch—both engines do well enough.
The other scenario is when a user is looking for something and cannot find it on Google. They then try another search engine, expecting that the other engine will fail—after all Google did. If the other engine can find what the user was looking for, or even show results that are different from what Google showed, then the user has a positive experience, and is more likely to go back. Habits being what they are one good experience is probably not enough to get a user to switch, but with every successful attempt, the user is more likely to try again.
This explains what kinds of search queries an engine has to beat Google at (at least sometimes) in order to grow traffic.
- Different than Google
- Good when Google is bad. Being slightly better when Google is good does not help
- Tail queries
How do you beat Google on these queries?
- Different signals than Google
- Different ranking than Google
- More or different content
The critical notion here is that you cannot best Google by having a good small index that performs well on popular queries. People will only try you on the tail (when Google has failed), so that is where the battle is joined.
To win on the tail needs more content or more signals, (or if someone messes up and is slow, you can be faster, but Google is fast enough right now).
What has Microsoft done with Bing? It seems that they have gone with more signals, but they have cut the number of pages in their index (presumably keeping the total size constant).
The old Live search engine came in at between 30%-40% the size of Google’s index. Where is Bing? Search engine index size (notwithstanding Andrei Broder and others who have come up with many very subtle methods) is pretty easy to measure. You take rare queries, and see how many pages are returned. (You need to check that the page actually contains the query, as some engines return pages that don’t contain the term, but contain a misspelling, or there is a link to that page with the term.
A fairly quick test shows that Bing is now around 20% the size of Google, so it has replaced half its index with additional ranking signals. This means Bing is indexing less than 10B pages, which is pretty sparse. To put this in perspective, Google has not been that small since 2003.
Another way of thinking about this is in terms of crawling. It is reasonable to crawl the web at about 2 billion pages a day. This reduces to a little more than a billion a day after de-duping and removal of cruft, so Bing is indexing about a week’s worth of crawl. A quick look at the dates on cached pages shows they are refreshing a considerable number of pages on a weekly basis, though there are some outliers, going back months.
Yahoo! moved close to Live’s size a few years ago, so Bing is now significantly behind Yahoo! in size.
Looking at some logs it seems like Bing is crawling as much as Google, or us, (though Yahoo! is crawling more than GoogleBot or Twiceler in the logs I see). There are two Msnbots (msnbot/1.1 and msnbot/2) out in the wild, with 1.1 crawling 50% more. 1.1’s crawl is being served by Bing, (by checking crawl times and the cached page dates).
Interestingly neither pretend to be Mozilla (which is what regular crawlers do).
In the first week of launch it is not surprising to see that crawling has gotten ahead of indexing, and that some pages that were crawled pre-launch are not yet in the index. In the past Microsoft has always had a very fast pipeline, so presumably this is a transient effect.
This puts a lot of pressure on Bing’s ranking as they cannot match Google on recall.
Search engines can be judged in several ways. The two easiest metrics are speed and size. Bing seems to be ok on speed, but behind on size. Presumably they did this to improve relevancy, trading size for quality. Relevancy is more difficult to measure. Relevance is best measured by taking a meaningful query stream (the kind of stream that switchers from Google would produce) and getting raters to rate each URL. I have a set of raters rating Bing right now, and I will post on what those results are later.

Ram · Jun 9, 06:01 PM
The number returned by search engine usually turns out to be a bogus one. Only 1000 results are returned by actual engine. Beyond that, no one knows what that number is.
I think your prediction of < 10 B URLs is definitely wrong :) Also using recall of results to determine relevance is such a bad predictor.
George Mc Keon · Jun 10, 07:01 AM
Hi Tom
I find it amusing and interesting that whilst one native of Droghede is developing the largest search engine in the world at cuil.com, another native of Drogheda is trying to develop the smallest search facility in the world at ghq.com.
Regards
George Mc Keon
Tom Costello · Jun 10, 10:27 AM
@Ram:
To measure the size of a search engine you do queries that return less than 1000 results. For example, Glycobiosciences (don’t do the search, you do not want to know), returns 72 results on Google – they say 3,360 on the front page but if you set advanced search to 100 results there are only 72 returned (or 114 including duplicates).
On Bing, you get 34 results (down from 35 yesterday). If you collect a set of queries like these, you see that in general Bing returns 1/5th of Google.
I am fairly confident of the estimate of Bing’s size. My wife Anna built Google’s large search engine, so she knows how may pages it indexes — there is a count printed by the indexer at the end, so it is not really guesswork.
I completely agree that using recall to measure precision is wrong. I think you should use number of results returned to measure recall. Precision is measured in other ways, which I will deal with anon.
Tom
Tom Costello · Jun 10, 10:27 AM
@George:
It is great to see a Drogheda man working in search. I’m sorry I am missing Ireland’s descent into bankruptcy. I really enjoyed the eighties in Ireland and I wish I could be back for the arrival of the IMF.
Tom
bill · Jun 11, 07:00 AM
To measure the size of a search engine…
I agree with that, but why are search engines estimating a much higher number of results at the first page. Specially for one term query. The exact number is already stored in the inverted-index (because it is used in ranking calculations). With tests like yours google is returning much more results than cuil. But cuil is claiming to index more than google. Are cuil indexing only small parts from each page?
Imad Jomaa · Jun 11, 09:30 AM
I really like the way you guys have your goals set in terms of indexing. Great results and all, however, it does lack a couple things.
1. Twiceler is all over the place indexing, but it takes a while for the results to appear in the actual search. I do remember reading something on your site before launch about you guys manually filtering the results? (I hope I’m wrong) If so, it’s a slow process and pages take a while to reach up-to-date, but by the time you get there, the information becomes outdated. With that said, Google indexing and returning the results varies between website popularity. For instance, you add a news snippet to Digg, minutes later, it’s in Google. Furthermore, I definitely think you should return the results immediately for searching in order to keep up to date with all the new data.
If I may, here’s a better example. The news updates everyday as it should, Google quickly captures them and they’re ready for searching while for Cuil, it may take a couple of weeks or so for it to appear for searching. If you guys can fix this issue, you’ll definitely be one step closer to grabbing a bigger market share.
2. The website layout seems very uninviting to be honest. Your website needs pizazz, it needs a modern touch of color that is inviting and welcoming. Currently, the black home page with a very tiny unnoticeable search box is basically saying “If you want to use us, find us” while a nice white page with a noticeable search box and maybe even a different twist to the white will be saying “Hey! We’re here! Come right in!”.
If you guys can definitely fix those up, you’ll definitely see me and many others using your search engine over the others.
Best Regards.
Erik · Jun 12, 07:28 AM
Hi Tom
Most conventional search engines will play it safe. As a result they all do things more or less the same. Incremental changes are allowed by the marketing people who like to play it safe, and prefer to offer the user what he/she already knows. Maybe a few ‘bells and whistles’ here and there, but that is it. The number game of total indexed pages is likely a marketing tool. For most users it is important that the search engine indexes quality pages only. Bing is an example of an incremental change (with an element of exploratory search).
You mentioned that for any conventional search engine starting out, traffic growth is going to come primarily from getting people to switch from the present leader, Google. The challenge is to look better than Google, when the user tries you.
But wouldn’t it be better to offer the user a unique type of search? A type of search even Google does not have?
New search engines have to break the mold and come up with something that sets them apart, something unique, to provide them with an edge, a reason for users to have to come to the new search engine, and to return to the new search engine since no other search engine has a similar search capability. This something would not substitute conventional search, but rather would complement it. This integration rather than substitution would keep the marketing people happy.
You mentioned that one important factor that can change the popularity of a search engine is the result based user satisfaction, and that only 25% of the time search engines nail things exactly. I suggest that in most of these cases the user knows exactly what he/she is looking for, rather than performing an exploratory search. Various studies have shown though that between 20-30% of all web queries are exploratory in nature. Some even argue that this number is closer to 80%. Web queries that are exploratory in nature are not well served by conventional search.
As you know exploratory search includes situations where:
So when do we use exploratory search?
I understand that Cuil’s goal is to guide users towards answers to the questions they’re not even sure how to ask. This goal equates to offering exploratory search.
As part of its philosophy Cuil is in the pursuit of indexing the whole Internet. Cuil believes that information is only useful when it is sorted. Cuil aims to index and organize the whole Internet so users can find the information they want. These efforts equate to the creation of a directory to support exploratory search.
Directories are arranged like an outline, hierarchically, with major topics divided into smaller related topics to whatever level of detail is desired, with the web sites listed at appropriate points along the path.
There are several advantages to this type of organization, especially if the user is searching on an unfamiliar topic:
Task based usability studies clearly show that directory based exploratory search provides extremely high levels of satisfaction versus conventional search. It is faster and results in a higher completion and conversion rate. Considering this high satisfaction rate, any search engine that offers directory based, exploratory search would see increased user returns (whether originally ‘test driving’ or ‘Google fatigued’). The question that seems obvious is why have the big search engines failed to pick up on this? I believe this is simply because nobody has figured out how to successfully display directory search.
Significant improvements in user satisfaction are realized by offering dedicated exploratory search. This could be an experience that justifies a marketing campaign that has the product and power to significantly change existing search market shares.
Numerous UI features have been introduced to offer some degree of exploratory search support, including ‘Explore by Category’ feature, ‘Query suggestions’ in the search box, Images, ‘Roll-over’ definitions, and the recently introduced Google ‘Wonder Wheel’.
All of these features provide a level of preview. How do generalized query previews support search?
So what type of directory search interface features should be included to support exploratory search?
- A dynamic categorized Index with Infinite Levels
- Query previews before final selection is required
- Stable and complete bread crumbs
- Mainstream search in addition to directory search
- Alternate search result Interfaces
- Cross-referencing capability between domains/databases
- Targeted advertisement as required
- Cross-Browser & Cross-OS Compatibility
- Preferably using DHTML so it can be used on all devices including iPhone (not true for Google’s Wonder Wheel)
Would Cuil benefit from the integration of a unique user interface, which provides stable, categorized overviews to support exploratory web search. Could Cuil become the world leader in directory based exploratory search? I can already imagine what such an interface could look like and I can’t wait for such a useful interface to become readily available for day to day search.
Let me know what your thoughts are!
Sarah · Jun 12, 09:42 AM
This is Sarah at Cuil – These comments are really interesting and I know Tom will want to reply to them. He’s on a few days leave and will be back on Monday – so standby – he’s not ignoring you!
Daniel · Jun 16, 08:17 PM
Thanks for the first well-thought-through analysis of Bing. Any rival must focus on attacking Google, and the only approaches are for finding early adopters who willingly try new things, or through disparate people who are dissatisfied with Google’s results to a particular query.
The first search is pretty important. I’m sure quite a few people make a split-second, permanent decision based on a single Cuil or Bing query. Is this fair? no. but they’ll only try it again when they hear enough positive things through word of mouth. (And every time they hear something good, they’re sure to chime in with their story of how it didn’t work)
Good luck! New competitors always create concern, and you seem to be maintaining a clear head and just wrote a better review than anything else I’ve read.
Meow · Jun 25, 01:26 PM
I know Cuil is good, but it is only for English environment.
As I switch Cuil to other languages like German, I will only get English results though the interface is German. Otherwise, I search 台北, but all results are still in English.
Going typing in other languages and getting English results is really really silly. I prefer quality more.
Rob Abdul · Jun 26, 03:46 AM
Hi Tom,
I don’t think that Bing will be successful until Microsoft sort out their indexing issues.
For example, sites that have almost all their pages indexed in Google have barely 20% indexed in Bing.
Therefore Bing is not seeing most of the web.
I wish Microsoft would sort this out – they have millions at their disposal and the brightest people working for them.