Today the Sunlight Foundation unveils Churnalism, which will compare any text online to a corpus of corporate, government, and other promotional content.
We live in an age of information, it is said again and again. But that doesn’t mean we live in an age of good information, as last week seemed intent on bearing out. In fact, quite the opposite. Countless times each day, we have to weigh the credibility of a piece of information, and decide whether to put our faith in it.
It’s not really feasible for each of us to track each piece of information to its source (nor would it be efficient), so, instead, we use clues — who wrote this, where is this published, does this square with other information we know. But the trouble is that these clues aren’t perfect indicators, at least in part because even credible publications and professional journalists sometimes regurgitate information without giving it a careful vetting, a process often referred to as churnalism (just as gross as it sounds).
Today, the Sunlight Foundation has unveiled a tool that will help us all with this work. “The tool is, essentially, an open-source plagiarism detection engine,” web developer Kaitlin Devine explained to me. It will scan any text (a news article, e.g.) and compare it with a corpus of press releases and Wikipedia entries. If it finds similar language, you’ll get a notification of a detected “churn” and you’ll be able to take a look at the two sources side by side. You can also use it to check Wikipedia entries for information that may have come from corporate press releases. The tool is based on a similar project released in the United Kingdom two years ago, which the Sunlight Foundation supported with a grant to make it open source. Churnalism will be available both on the website and as a browser extension. Its database of press releases includes those from EurekaAlert! in addition to PR Newswire, PR News Web, Fortune 500 companies, and government sources.
One byproduct of this method is that Churnalism will find text that has been quoted from speeches. Although such quotations are not examples of churnalism per se, Devine says that that information will be helpful to readers too, showing them the context a quote appeared in, and giving them the chance to think about why a reporter selected a particular passage from all of the others.
In general, according to Devine, “science press releases seem to get more plagiarized than others.” For example, the Sunlight Foundation points to a CBS News article from last fall which shares several phrases — typically information-laced descriptions such as the list “found in hard plastics, linings of canned food, dental sealants” — with a press release from EurekaAlert!, as the Churnalism tool’s results show. Devine speculates that science journalism may run into this problem more frequently because “the language around the findings in those is so specific that it becomes very hard to reinterpret it.”
Eventually, Devine says, she’d like to be able to build on the underlying technology so that researchers can track not just similarities between news articles, but also in legislation, seeing, for example, how wording in state legislation may wind up in Congress. Down the road they might add an email address where people can forward press releases to be added to the corpus, something they hope would help catch some of the corporate PR from smaller companies not already included.
The quote-unquote marketplace of ideas is not a level one — not before the Internet and not since — but seeing the contours of it is remarkably tough. Where does a phrase or an argument come from? What nodes promoted it? With the Sunlight Foundation’s Churnalism tool we’ll be able to see that path a bit, and hopefully that knowledge will make us just a pinch more savvy as we proceed through the daily game of who, and what, to trust.