One of the most important features in a search engine is query auto-completion (QAC). QAC is the first service through which users interact with a search engine to input their search intent. In 2014, global users of Yahoo search saved more than 50% of keystrokes when submitting English queries by selecting QAC suggestions.
QAC provides and updates a query suggestion list based on each new character typed by a user in the search box. The suggestion list is ranked by considering different factors, such as most popular completion (based on historical frequency counts from query logs), time (giving more weight to breaking news or recent popular queries), location, context (based on a user’s previous queries), personalization (based on a user’s profile), and click modeling (based on a user’s past click behaviors).
The aforementioned approaches use only certain relevance features and do not fully take advantage of users’ preferences such as user-QAC interactions. Suppose a user dwells on a suggestion list for a long time without selecting the top-ranked query, it indicates that the user intent might not be satisfied by the provided query suggestions. That wealth of implicit negative feedback has not yet been fully exploited for designing QAC models.
In a research paper to be published in the proceedings of next week’s 38th Annual ACM SIGIR Conference called “adaQAC: Adaptive Query Auto-Completion via Implicit Negative Feedback,” we study implicit negative feedback during user-QAC interactions. Our findings suggest more accurate search results when redesigning QAC to include a more general “(static) relevance-(adaptive) implicit negative feedback” framework.
Consider a user who wants to query Apple Inc.’s “facetime” with a popularity-based QAC. When the user types “fac,” “facebook” is ranked at the top in the suggestion list because it is most popular in historical query logs. The user dwells for a long time to examine the suggested query “facebook,” but does not select it because it is not “facetime.” With the next keystroke “e,” popularity-based QAC still makes “facebook” top in the list because it is still the most popular query that matches the prefix “face.” Figure 1 depicts our interactions with the Yahoo search engine QAC that depends on relevance scores only. Here the user implicitly expresses negative feedback to the suggestion list by dwelling on it for a long time without selecting any query. Hence, based on such implicit negative feedback, the user may not favor these unselected queries.
Figure 1: Given prefixes “fac” and “face,” popular “facebook”-related queries are returned by Yahoo to users after being ranked by certain relevance scores.
In our paper, we propose a novel adaptive model called adaQAC (adaptive QAC) that adapts query auto-completion to take into account users’ implicit negative feedback toward unselected query suggestions. Figure 2 explains the system design and data flow of adaQAC. In general, adaQAC has two steps. In Step 1, the suggestion list of the top-N queries for a given prefix is pre-indexed by static QAC. For instance, for the prefix “face,” suppose the top two queries “facebook” and “facetime” are pre-indexed by static QAC based on historical frequency count. In Step 2, adaQAC re-ranks these top-N queries based on the negative feedback from the user’s previous interactions with the system in the same search session. With a fixed N value (for example, N=10), adaQAC is fast and can be used in real-time.
Figure 2: The system design and data flow of adaQAC
We define a search session as the time from the very first keystroke until the last keystroke in formulating a single query. To address the problem of algorithm design, adaQAC makes use of both the bias information of a query and negative feedback information of a query during a search session. Since negative feedback depends on both the query and session, it can be adaptive in different sessions where different negative feedback can be received even for the same query. Bias information of a query can be computed by any existing static QAC, like ranking queries by historical frequency count. Since bias information of a query can be any existing static QAC, adaQAC is highly flexible.
Negative feedback is composed of a list of signals (such as the dwell time of each keystroke, ranking of the query in the suggestion list, etc.) and the impact weightage associated with these signals. To maximize the probability of the total observations of what users’ final queries are after expressing negative feedback via signals (such as dwell time and position of the query suggestions), the regularized log-likelihood of a softmax function becomes the objective function. Model and signal details can be found in our SIGIR 2015 paper.
Of note, adaQAC performs personalized learning; adaQAC infers different impact weightages of signals for different users because different users have different typing speeds. adaQAC infers impact weightage of signals for each individual user independently and is therefore highly parallel and extremely scalable on large query logs data. We performed a large-scale evaluation of adaQAC and static QAC in five-month Yahoo query logs in the Hadoop framework; adaQAC significantly outperforms static QAC for metrics such as mean reciprocal rank (MRR) and success rate (SR) when evaluating the top suggested search results.
Our research concludes that QAC for search should be designed in a more general framework for adapting to implicit negative feedback using personalized learning. As we continue to improve QAC, our scientific research at Yahoo Labs will focus on leveraging all users and the contextual information available on mobile devices to provide better search suggestions. That will further help Yahoo search users save even more keystrokes, and ultimately more time, by giving them what they want as soon as possible.