What is Faceted Ad Search?
Newspaper ads contain two kinds of attributes: attributes of an ad order and attributes of the ad content. Attributes of an ad order include the ad number, classification, customer id, name, and address, run schedule, cost of the ad, and so on. The relational database system allows ads to be indexed for quick search on these attributes. Such searches are useful to newspaper personnel for providing management information through queries and reports.
Attributes of the ad content depend on the category of the ad. For example, auto ads have a make, model, year, mileage, price, and so on. Real estate ads have a price or rent, number of bedrooms, location, and so on. The Spice system allows each newspaper to specify rules for recognizing such attributes in textual ad content. Once attributes are recognized (by a program, called oddly enough a recognizer), ads are added as documents in a text indexing system that indexes ads for quick search on those attributes.
These attributes can be added manually by either newspaper staff or by advertisers in a self-service mode, but one of the real strengths of the SCS search offering is that our recognizer will add them automatically using a technique called “named entity recognition” or NER.
We consider the automatic assignment of ad content attributes a significant innovation. Not having to enter fielded data to know that, for example, an ad describes a car with two doors and costs between $7,000 and $8,000 makes supporting faceted search practical for newspapers, where the labor costs of manually entering such data are prohibitive.
Because we use this technique, we need only a classified extract (as one might send to a pagination system) to have the raw data needed to build a newspaper owned and controlled faceted search web site for classified ads.
Searching ads by content is especially useful for users of an on-line advertising system. It allows them to specify criteria that narrow down the list of available autos, homes, apartments, or jobs to those that are of interest to the reader. The search technique that SCS uses for content searches is known as faceted search.
Let’s use as an example a search for a used Chevrolet (any model) for sale in the Lehigh Valley, with air conditioning, for sale for less than $10,000.
The first step in a faceted search of classified ad content is to get to the right classification. In general, getting to the right classification is called hierarchical or taxonomy searching. Yahoo Directory uses this method. With it you can iteratively narrow your search to get to the desired information.
For example, we can start at the top of the directory and then select Business & Economy, Shopping and Services, ... until, eventually, we reach car listings.
The second step is to indicate search criteria that make sense for ads in that classification. This step is similar to direct keyword text search, as Google supports, with several advantages. With Google, you just throw a bunch of words into a text box.
Chevrolet cars in Lehigh Valley with air conditioning less than $10,000
The user interface provides no guidance on what to type to specify ad search criteria. In fact, there is really nothing you can type into a text box that will, for example, find all ads for autos in a certain price range. And you can’t be sure all of your criteria are used.
Following the first non-sponsored link returned by that last search, for instance, returned 5,051 cars and the first two were in Tennesee and Florida. The listed prices were under $10,000 but one of them represented the current bid in an incomplete eBay auction.
With an attribute-based search, the recognizer adds to the text index an attribute that indicates the price range. And since the classification has already been determined by the first step in the search, the user interface for the second step can guide the user in specifying search criteria (e.g., price, make, mileage) that make sense for that classification.
Thus faceted search combines the hierarchical and direct search techniques in such a way as to provide a superior classified search experience. Faceted search enables users to navigate a large, online database of classified ads by combining a progressive narrowing of choices (i.e., by classification) with text search. Faceted search has become the prevailing user interaction mechanism in e-commerce sites.
SCS provides faceted search using new technology that allows full text searching along with attribute based searching. With it you can, for example, find ads for the apartments for rent in a given neighborhood, in a given price range with the right number of bedrooms and baths. Combined with the Ajax-enabled interactivity that SCS uses in its on-line applications, faceted search provides a unique and wonderful user experience.
We can also use the SCS system to look for our used Chevrolet with air conditioning, that costs less than $10,000. We start by selecting the category Transportation and then Cars.
At this point, sub-topics that are appropriate only for cars are presented. We can select Auto Makes and then Chevrolet.
Notice the “breadcrumb” menu that shows how we got here and also how many ads currently meet our search criteria.
We can now check off our price criteria and our request for air conditioning.
Finally we can see that a small number of ads meet our criteria and we can ask to see them. If we hover over a specific ad, we see it as it appeared in the newspaper.
Never has there been a way to browse content so rapidly while getting acquainted with the scope and nature of the content. You will never feel lost no matter how much data there is. More than a search interface, faceted search is an information navigation and discovery tool.
The Spice formula language’s selection command allows developers to build a query that specifies a selection of ads using ad content criteria specified in a faceted search with ad order criteria from the relational database, thus seamlessly integrating information retrieval and relational database functionality. The rows (ads) resulting from such a query can include a join of ad order attribute columns from the relational database table and ad content attributes derived by the content recognizer.
Perhaps the earliest reference for a system supporting faceted search of classified advertising and similarly organized information is Patent number: 4429385 “Method and apparatus for digital serial scanning with hierarchical and relational access.” Filed Dec. 31,1981; issued Jan. 31, 1984 by inventors Richard J. Cichelli and Michael O. Thompson then with the American Newspaper Publishers Association.
Articles in the SCS Blog are written by SCS employees and associated news outlets.