Newspaper ads contain two kinds of attributes: attributes of an ad order and attributes of the ad content. Attributes of an ad order include the ad number, classification, customer id, name, and address, run schedule, cost of the ad, and so on. The relational database system allows ads to be indexed for quick search on these attributes. Such searches are useful to newspaper personnel for providing management information through queries and reports.
Attributes of the ad content depend on the category of the ad. For example, auto ads have a make, model, year, mileage, price, and so on. Real estate ads have a price or rent, number of bedrooms, location, and so on. The Spice system allows each newspaper to specify rules for recognizing such attributes in textual ad content. Once attributes are recognized (by a program, called oddly enough a recognizer), ads are added as documents in a text indexing system that indexes ads for quick search on those attributes.
These attributes can be added manually by either newspaper staff or by advertisers in a self-service mode, but one of the real strengths of the SCS search offering is that our recognizer will add them automatically using a technique called “named entity recognition” or NER.
We consider the automatic assignment of ad content attributes a significant innovation. Not having to enter fielded data to know that, for example, an ad describes a car with two doors and costs between $7,000 and $8,000 makes supporting faceted search practical for newspapers, where the labor costs of manually entering such data are prohibitive.
Because we use this technique, we need only a classified extract (as one might send to a pagination system) to have the raw data needed to build a newspaper owned and controlled faceted search web site for classified ads.
Searching ads by content is especially useful for users of an on-line advertising system. It allows them to specify criteria that narrow down the list of available autos, homes, apartments, or jobs to those that are of interest to the reader. The search technique that SCS uses for content searches is known as faceted search.
The first step in a faceted search of classified ad content is to get to the right classification. In general, getting to the right classification is called hierarchical or taxonomy searching. Yahoo Directory uses this method. With it you can iteratively narrow your search to get to the desired information.
For example, we can start at the top of the directory and then select Business & Economy, Shopping and Services, ...
until, eventually, we reach car listings.
Chevrolet cars in Lehigh Valley with air conditioning less than $10,000
The user interface provides no guidance on what to type to specify ad search criteria. In fact, there is really nothing you can type into a text box that will, for example, find all ads for autos in a certain price range. And you can’t be sure all of your criteria are used.
Following the first non-sponsored link returned by that last search, for instance, returned 5,051 cars and the first two were in Tennesee and Florida. The listed prices were under $10,000 but one of them represented the current bid in an incomplete eBay auction.
With an attribute-based search, the recognizer adds to the text index an attribute that indicates the price range. And since the classification has already been determined by the first step in the search, the user interface for the second step can guide the user in specifying search criteria (e.g., price, make, mileage) that make sense for that classification.
SCS provides faceted search using new technology that allows full text searching along with attribute based searching. With it you can, for example, find ads for the apartments for rent in a given neighborhood, in a given price range with the right number of bedrooms and baths. Combined with the Ajax-enabled interactivity that SCS uses in its on-line applications, faceted search provides a unique and wonderful user experience.
We can also use the SCS system to look for our used Chevrolet with air conditioning, that costs less than $10,000. We start by selecting the category Transportation and then Cars.
We can now check off our price criteria and our request for air conditioning.
The Spice formula language’s selection command allows developers to build a query that specifies a selection of ads using ad content criteria specified in a faceted search with ad order criteria from the relational database, thus seamlessly integrating information retrieval and relational database functionality. The rows (ads) resulting from such a query can include a join of ad order attribute columns from the relational database table and ad content attributes derived by the content recognizer.
Perhaps the earliest reference for a system supporting faceted search of classified advertising and similarly organized information is Patent number: 4429385 “Method and apparatus for digital serial scanning with hierarchical and relational access.” Filed Dec. 31,1981; issued Jan. 31, 1984 by inventors Richard J. Cichelli and Michael O. Thompson then with the American Newspaper Publishers Association.
Originally published 2009-07-01.