Mitigating the Risk of Bad Research: Aggregate, Search, Cleanse and Repeat

Over the last several years, vast numbers of companies and organizations have taken their first steps toward analyzing social media data. Suddenly, everything from product development and message optimization to relationship management and crisis mitigation is being fueled by the real-time voice of the stakeholder or customer. And, clearly, the potential value to organizations of understanding this sort of data is immense when it comes to building stronger relationships, more salable products and more effective messaging.

This rise in attention to social media analysis has brought on a chaotic rush of vendors and agencies competing to provide insight. Unfortunately, much of the listening research being rushed into client hands is poor quality, with data and analysis based on shaky methodologies that are confusing and not actionable. Just as unfortunately, organizations are using this faulty data as the basis for critical, strategic PR and marketing decisions—decisions that can have multimillion dollar budget impacts and still lead to bad bottom-line results.

For the social media space as a whole to prosper, communications pros must understand what comprises sound social media research, be able to identify pitfalls that compromise the predictive value of social media research and effectively evaluate vendors and their social media measurement offerings by asking the following five questions:

Q1: Is Data Aggregation Working?

Good data is the foundation of all trustworthy social media measurement and monitoring. Many platform aggregators of social media data use some combination of home-built search engines with a number of third-party aggregators, one or more for each channel. There may be a “forums” aggregator and several “blog” aggregators that (they report) crawl millions upon millions of results daily. That does not mean the data the platform or provider is getting is any good. It may be incomplete, rarely updated or full of duplicate content. Data quality also varies widely across providers, and any provider can say that it indexes millions of blogs.

Instead of blindly trusting your social media metrics or analysis provider claims, perform your own testing. To test, run the same Boolean search (see question 2) on multiple providers. Be sure you test across multiple channels for the same period. Then compare the results by looking at how the volumes stack up against each other and looking to see if the data is clean (see question 3).

Q2: How Are Boolean Searches Set Up?

Understanding the Boolean search being used in a social media measurement tool to bring back content for your brand is critical, as this search defines the group of things that will be analyzed in every subsequent step of your monitoring or analysis.

At its simplest, a Boolean search is virtually indistinguishable from the Google search used on the Internet except that you explicitly define which words you want included or excluded with the terms AND, OR and NOT. For example, for a dishwasher detergent brand named “Shiny,” you might want to mandate that terms like “dish,” “pot,” “pan,” “glasses” and “dishes” also appear in any result that you examine by including them in your Boolean search with AND. If you don’t, you’ll get results that include just about any shiny thing you can imagine.

Searching only for instances where such terms appear together or apart narrows your field of vision, allowing you to focus on tweets, blog posts or comments most relevant to your business. At the same time, restricting a search too much could invalidate your research with a distorted picture of what people are saying about the product. If you only searched for “Shiny dishwasher detergent,” for example, you’d miss all the times when customers talk about your brand casually (think “I love washing with Shiny”). This is particularly true on Twitter, where shortening is common because of the 140-character limit.

Q3: Is Data Properly Cleansed?

Once you have mountain of data resulting from a well-constructed Boolean search, you generally should count on discounting about 40%-60% of it as irrelevant or spam content before running any mission-critical analysis. After all, how useful is a graph of customer conversation composed of 40% spam blogs?

Not removing spam before displaying results data is a core weakness of most analytics platforms. This also is the root of the misunderstanding between clients and vendors or agencies around the difficulty of deriving social media analytics. A channel breakdown whose Twitter content is 60% retweeting by robots or a large percentage of people talking about a place with the name of your product is misleading.

If you don’t have a spam problem, chances are you aren’t getting enough data from your provider. Some providers take such an aggressive stance against spam that they blacklist whole parts of the Internet or keywords that often appear with spam content en masse. Although on the face of it this appears to solve the problem, it actually cripples the results as a data set for listening or decision-making because so many valid results are discounted.

Q4: How Does a Platform Present Data?

A flashy platform may offer an endless feature list of technical bells and whistles and beautiful visualizations, but none will matter if you can’t get a trustworthy view of your social media in the first place. Data aggregation, Boolean search and data cleansing are the critical pieces of any such platform. Everything else is secondary.

As with data, you must examine your platform and any insights, as well as how those insights are generated, with a critical eye. For example, consider dashboards. Be wary of the trend to communicate social media analytics to clients or leadership via dashboards. Like platforms without cleansed data, dashboards are prone to offering misleading, spam-clogged graphs instead of actionable knowledge.

Q5: Who Is the Analyst?

An involved analyst is the difference between research that will unlock true value for you and a lot of pretty graphs.

Each stage of social media analysis—from constructing Boolean searches to identifying true insights and optimization opportunities—must be informed with an intimate understanding of your brand, your unique challenges and your goals. Thus informed, a good analyst will help break down conversations on owned and earned channels in meaningful ways.

This cannot be done by a person who is not part of your planning process. Social media analytics is best executed in-country and in close communication with the local account team and the local client. Keep your enemies close—and your analysts closer. PRN

[Adapted from PR News’ Digital PR Guidebook, Vol. 4 . You can order this newly published guidebook at]


This article was written by Israel Mirsky, EVP of emerging media and technology for Porter Novelli. He can be reached at