Text Mining Tools: Don’t Let Data Confuse You

Human analysts, not computers, bear most of the responsibility for spotting competitive threats, trends and opportunities. Technologists can arm them with “text mining” software that analyzes news stories, patent filings, customer-service notes and the like to find mentions of a company and its products and activities. The tools try to determine whether a company is seen in a negative or positive light, and to spot themes that might provide insight into what customers want the company to do next.

PROBLEM: The goal of discovering unexpected patterns through text mining can be elusive.

RESOLUTION: Take the time to properly define what you’re looking for. Text mining aims to identify specific entities, such as people or customers, related facts and attributes, and events such as product launches or company acquisitions. But doing it effectively often requires a substantial investment in defining synonymous terms, such as scientific versus trade names for a given drug, or establishing relationships between products and companies or between parent companies and their subsidiaries or acquisitions (the startup that’s suddenly relevant because it’s been bought by your biggest competitor).

The “machine learning” capabilities of text mining software can accelerate this process by guessing at relationships—for example, by using computational linguistics to identify the relationship between a company and an action it has taken, such as acquiring another firm.

PROBLEM: Some forms of text mining, such as “sentiment mining,” which focuses on identifying positive or negative comments posted online about your company or its products, don’t work equally well.

RESOLUTION: Know the limits of the tools you choose to use. Sentiment mining vendors such as Intelliseek, with its BrandPulse suite and BlogPulse product, provide a way of continuously monitoring the online buzz about your company and its products. If you’re Apple Computer, it’s likely that a significant slice of your customer base is buzzing about the latest Mac or iPod product. But what if you’re Mattel? Are there enough young girls keeping blogs for BlogPulse to discover meaningful trends in attitudes toward Barbie?

PROBLEM: The analysis often produces too much information for users to easily digest.

RESOLUTION: Provide analysts with a toolkit that contain both visualization software, such as mapping packages, and spreadsheet-like displays. For example, MicroPatent’s Aureka ThemeScape, an analysis tool for patent data, will display the intersections between, say, a new chemical and its applications as if they were elements in a topographic map. Mountain peaks represent clusters of similar applications; isolated applications appear as islands.

PROBLEM: Computer analysis does not equal insight.

RESOLUTION: Don’t just read the data, study it and talk about it. Leaders have a tendency to do what worked for them in the past, ignoring changes that render that strategy obsolete, according to competitive intelligence expert Ben Gilad, author of the book Business Blindspots. “Blind spots are immune to any text mining or visualization tool on the market,” he says. Gilad advises firms that want to avoid this pitfall to establish an early warning system where executives and analysts meet twice a year to discuss what he terms “faint signals” of customer dissatisfaction and competitive threats.