Watson At Work
IBM’s Jeopardy! champion Watson computer is a technology triumph, capable of understanding human language and broad knowledge topics – not just facts and trivia, but ambiguous language including puns, double entendres and idioms.
Big Blue has set its sights on many commercial applications for the technology in healthcare, financial services and customer service operations. But the question remains, is it practical? Does Watson embody an approach that enterprises can exploit, or learn from? How readily can a “Watson” be applied to the knowledge and content access problems of the typical enterprise?
The 25-person IBM team spent millions on research in the 4-year period of development of the core technology. Few organizations have the resources that Watson required: $3 million worth of hardware (off-the-shelf-servers, with almost 3,000 processors and a terabyte of RAM). Additional challenges, including the nature of knowledge access, have been discussed by Watson team members.
Some principles that Watson exploited:
-- Watson used multiple algorithms to process information. These included the usual keyword matching algorithms of run-of-the-mill search, “temporal” (time based) reasoning that understand dates and relative time calculations, “statistical paraphrasing” an approach to convey ideas using different words, and “geospatial reasoning” – a way of interpreting locations and geographies, and various approaches to unstructured information processing.
-- At one level, Watson can be characterized as “semantic search” or natural language search. That is, questions are asked in plain English as opposed to a structured query and this question is parsed into its semantic and syntactic (meaning and grammatical structure) components. The components are then processed in a number of ways by the system.
-- The system consumed 200 million pages of information for processing ( “corpuses” of information) including Wikipedia, various news sources, dictionaries, thesauri, databases, taxonomies, literary works, and specialized knowledge representations called ontologies including two that have been developed over a number of years: Wordnet and DBPedia
What does this mean for an organization attempting to exploit this approach in order to make information easier to consume? Two major points stand out.
The first is that a core framework for structuring information is needed in order for any algorithm to make sense of data. Other than keyword matching which parses terms and processes them against a dumb bag of words, more complex and powerful approaches require an underlying structure to the information. These structures are in the form of taxonomies and ontologies which tell the system how concepts relate to one another. Many organizations are beginning to build these taxonomy frameworks for purposes of e-commerce, document management, intranet and knowledge base applications. The message here is to not stop those efforts in the hope that technology will obviate the need for them. Technology is getting better, but having a map of the specific and unique knowledge of the enterprise will improve the performance of search, business intelligence, and content management tools.
If you don’t already use and apply enterprise taxonomies, it is important to get started developing them now. While the initial time to value for siloed projects can be short, fully leveraging semantics across the enterprise can take years to refine, deploy and exploit across business units and applications. While data architects have part of the solution, semantic architects are needed to make sense of knowledge. Developing a semantic architecture will benefit the organization by making technology investments more productive and have payoffs in improved search and better reuse of intellectual assets. They form the foundation of knowledge systems that are finally becoming practical.
The second point is that Watson demonstrates key elements of solutions that do not assume that users know exactly how to frame questions regarding what they want. As much research on search shows, users frequently ask ambiguous questions and expect precise results. Therefore we need to build solutions that help them with the queries. These are the same approaches for structuring the information in the first place (the structures that the tools require to make sense of the data are the same ones that help guide users in their choices). Think of the new navigation/search approaches used in ecommerce sites – choosing color, size, brand, price, etc. help users find what they need and precisely navigate to specific information.
Bottom line: tools like Watson are a great leap forward in capabilities, but there is no free lunch – Watson’s power comes from organizing content. Tools for gaining insights and finding answers will get better as time goes on, but human judgment needs to be applied to information to develop a foundation of meaning and structure.
Seth Earley, CEO of Earley & Associates,is an expert on content management and knowledge management practices.