IBM?s Jeopardy-playing Watson computer has been hailed as a technology triumph?acomputer that understands human language and has broad knowledge of topics?notjust facts and trivia but also ambiguous language, including puns and idioms.
The technology is impressive, and IBM has set its sights onmany commercial applications in health care, financial services and customerservice operations. But the question remains: Is it practical? Does Watsonembody a solution approach that enterprises can exploit or learn from? Howreadily can a Watson-like computer be applied to the knowledge- andcontent-access problems of the typical enterprise?
Few organizations have the resources that developing Watsonrequired: $3 million in hardware. Some additional clues lie in the nature ofknowledge access and the challenges that the Watson team discussed in articlesand interviews. Here are some principles that the Watson team exploited:
? Watson usedmultiple algorithms to process information. These included the usualkeyword-matching algorithms; temporal reasoning that understands dates andrelative time calculations; statistical paraphrasing, an approach to conveyingideas using different words; geospatial reasoning, a way of interpretinglocations and geographies; and approaches to unstructured informationprocessing.
? Watson canbe characterized as a ?semantic search? or natural language search processor.That is, a question is asked in plain English rather than as a structuredquery, and is parsed into its semantic and syntactic (meaning and grammaticalstructure) components, which are processed by the system.
? The systemconsumed 200 million pages of information for processing, including Wikipedia,various news sources, dictionaries, thesauri, databases, taxonomies, literaryworks and specialized knowledge representations called ontologies.
Making Information Consumable
What does this mean for an organization attempting toexploit this approach to make information easier to consume? Two major pointsstand out. The first is that a core framework for structuring information isneeded for any algorithm to make sense of data. Other than keyword matching,which parses terms and processes them against a dumb bag of words, more complexand powerful approaches require an underlying structure to the information.These structures are in the form of taxonomies and ontologies that tell thesystem how concepts relate to one another.
Many organizations are beginning to build these taxonomyframeworks for purposes of e-commerce, document management, intranet andknowledge-base applications. The message here is, don?t stop those efforts inthe hope that technology will obviate the need for them. Technology is gettingbetter, but having a map of the specific and unique knowledge of the enterprisewill improve the performance of search, business intelligence and contentmanagement tools.
If you don?t already use and apply enterprise taxonomies, itis important to start developing them now. While the initial time-to-value forsiloed projects can be short, fully leveraging semantics across the enterprisecan take years to refine, deploy and exploit across business units andapplications.
Data architects have part of the solution, but semanticarchitects are needed to make sense of knowledge. Developing a semanticarchitecture will benefit the organization by making technology investmentsmore productive and will pay off via improved search and better reuse ofintellectual assets. They form the foundation of
knowledge systems that are finally becoming practical.
The second point: Watson demonstrates key elements ofsolutions that do not assume users know how to frame their questions correctly.As much research on how people search shows, users frequently ask ambiguousquestions but expect precise results. Therefore, we need to build solutionsthat help them with the queries.
This is the same approach used to structure the information:The structures the tools require to make sense of the data are the same onesthat help guide users in their choices. They resemble the new navigation/searchapproaches used in e-commerce sites, which let potential buyers search bycolor, size, brand, price, etc. to help them navigate precisely to specificinformation.
Tools such as IBM?s Watson are a great leap forward incapabilities, but Watson?s power comes from organizing content. Tools forgaining insights and finding answers will get better as time goes on, but humanjudgment needs to be applied to information to develop a foundation of meaningand structure.
Seth Earley is president& CEO of Earley & Associates, which specializes in content andknowledge management practices. He has developed search, content and knowledgestrategies for Fortune 500 companies.