CloudTweaks | Does Proprietary Data Hinder Research?

Does Proprietary Data Hinder Research?

A widely-discussed article at Newsweek about the ‘data problem in medicine’ sheds light on the fact that doctors don’t have access to data about the very medicines the prescribe. In fact, of all clinical trials, as many as half are never published, leaving doctors in the dark and patients at risk.

In the perhaps most extreme example, an antiarrhythmic drug called lorcainide was tested in the 1980s, and 9 people of the lorcanide group died vs. just one of the placebo group. This study could have prevented thousands of deaths during the decade, but was, for some reason, never published until the researchers apologized for the fact in 1993.

This is, in part, a problem of the community: journals are more likely to publish positive results as they can, well, sell more copies this way. On the other hand, manufacture of pharmaceuticals is a multi-billion dollar industry so money tends to slip into the equation when it shouldn’t.

For there is a real commercial boon in not sharing all trial data: on the surface, pharmaceutical companies can suffer greatly from the competitors if the published studies reveal the way the drugs work. On another level, there’s, also the incentive to push a drug that has swallowed a lot of money for fear of not making up for the investment. This, of course, extends beyond the pharmaceutical industry.

Proprietary data in economics

An intriguing article at Quartz acknowledges that more and more studies are based on proprietary data. In fact, from the studies published in the American Economic Review, a very prestigious economic journal, the number that use proprietary (either Government or private) data has risen from 8% in 2006 to 46% in 2014. That is, researchers have asked either the government or private companies for data, and more and more virtually unreplicable studies are being published.

Here’s the rub. Companies like Amazon, Facebook and Google, which have hoarded petabytes upon petabytes of data, have a) no real incentive to do anything noncommercial with the data; b) a negative incentive to share the data; c) but even if they do, they are likely to share it with those who would paint a rosy picture about them. A recent study on proprietary data states just that: “To obtain those data, academic economists have to develop a reputation to treat their sources nicely.” And treat them nicely they will, because such data sets are too unique and (one might imagine) too interesting to pass by.

Conclusions

Proprietary data, then, in some fields can be seen to be plainly bad, like in the cases of drug trials. In other fields, the effect cannot be measured, but there’s a very real danger that we’ll have a publication bias towards praising the company that released the data set to the researcher. It’s inevitably a trade-off, but for now it’s the only way scientists can access that data.