How top health websites are sharing sensitive data with advertisers

Some of the UK's most popular health websites are sharing people's sensitive data – including medical symptoms, diagnoses, drug names and menstrual and fertility information – to dozens of companies around the world, ranging from ad-targeting giants such as Google, Amazon, Facebook and Oracle, to lesser-known data-brokers and adtech firms like Scorecard and OpenX.

Using open-source tools to analyse 100 health websites, which include WebMD, Healthline, Babycentre and Bupa, an FT investigation found that 79 per cent of the sites dropped "cookies" – little bits of code that, when embedded in your browser, allow third-party companies to track individuals around the internet. This was done without the consent that is a legal requirement in the UK.

Google's advertising arm DoubleClick was by far the most common destination for data, showing up on 78 per cent of the sites tested, followed by Amazon, which was present in 48 per cent of cases, Facebook, Microsoft and adtech firm AppNexus.

“These findings are quite remarkable, and very concerning,” said Wolfie Christl, a technologist and researcher who has been investigating the adtech industry. “From my perspective, this kind of data is clearly sensitive, has special protections under the [General Data Protection Regulation] and transmitting this data most likely violates the law.”

Health for sale

For centuries, physicians have sworn the Hippocratic oath, to keep secret “whatever I see or hear in the lives of my patients”.

But hundreds of millions of people now turn to the web each day to allay their medical worries, which range from the mundane to the grave. Despite the illusion of privacy that exists between users and their computers, the reality is starkly different.

Digging deeper into 10 of the sites, chosen to reflect the different types of health information they offer to users, the FT looked at the types of data they were sharing.

The investigation excluded data sent to analytics companies to improve the performance of a website, and consent was given for cookies on all the websites that requested it. The privacy policies the FT reporters consented to did not adequately outline that this sensitive data would be shared with third parties, however, or for what purposes.

The data shared included: drug names entered into Drugs. com were sent to Google's ad unit DoubleClick; symptoms inputted into WebMD's symptom checker, and diagnoses received, including "drug overdose", were shared with Facebook; menstrual and ovulation cycle information from BabyCentre ended up with Amazon Marketing, among others; keywords such as "heart disease" and "considering abortion" were shared from sites like the British Heart Foundation, Bupa and Healthline to companies including Scorecard Research and Blue Kai (owned by software giant Oracle).

In eight cases (with the exception of Healthline and Mind), a specific identifier linked to the web browser was also transmitted – potentially allowing the information to be tied to an individual – and tracker cookies were dropped before consent was given. Healthline confirmed that it also shared unique identifiers with third parties.

‘Data silos of undesirables’

Since the adoption of the Europe-wide General Data Protection Regulation in May 2018, the EU online advertising industry, which makes $200 billion of annual sales, has been subject to tighter rules around the collection and processing of data.

It is now illegal for advertisers to share the most sensitive data, including on health and sexual orientation, without explicit consent, where the user agrees to the specific sharing of their “special category” data, and is told how it will be used and by whom.

None of the websites tested asked for this type of explicit and detailed consent.

The ultimate destinations of the personal and sensitive data collected and shared by the websites was opaque, as it was not visible via an internet browser.

Research into the “data broker” industry shows that dozens of companies profit from buying and selling data to multiple clients who want to better understand users.

Experts believe that the predictive models built by the plethora of advertising and data-targeting companies may use ill health to profile and prey on users.

Knowledge of an individual’s medical ailments allows companies to try to sell specific treatments, services or financial products that desperate users might turn to.

“There is a whole system that will seek to take advantage of you because you’re in a compromised state. I find that morally repugnant,” said Tim Libert, a computer scientist at Carnegie Mellon university, who built the open source WebXray tool used by the FT, and specialises in the social and legal implications of online ad tracking.

Previous research in which Mr Libert analysed 80,000 unique pages relating to common diseases found that more than 91 per cent contacted third parties in the US. The paper explains that holding such sensitive data on a person can result in discriminatory marketing, even without marketers knowing their identity.

“As medical expenses leave many with less to spend on luxuries, these users may be segregated into ‘data silos’ of undesirables who are then excluded from favourable offers and prices,” Mr Libert wrote. “This forms a subtle, but real, form of discrimination against those perceived to be ill.”

In the UK, the online advertising industry was put on watch in June by the regulator, the Information Commissioner’s Office. It gave the industry until December to clean up its data practices, or face further probes.

“This investigation by the Financial Times further highlights the ICO’s concerns about the processing of special category data in online advertising, as well as the role that site owners and publishers play in this ecosystem,” said Simon McDougall, the ICO’s executive director for technology policy and innovation.

“Special category data – such as health information – requires greater protection because of its sensitivity and the increased risk of harm to or discrimination against individuals. We will be assessing the information provided by the FT before considering our next steps,” he added.

The advertisers’ defence

Google, which powers the online advertising industry, said that it “does not build advertising profiles from sensitive data ... and has strict policies preventing advertisers from using such data to target ads”.

It told the FT that the named sites investigated had been marked as “sensitive” internally, meaning the information that we found being sent to them was specifically excluded from the database used for personalised advertising. It said that its technology might be used to serve “contextual” ads, based on the contents of the page, but not user information.

The company explained that if a publisher chose to include information like the date of its visitor’s last period in the URL, it could be sent to Google as part of an ad request from that page. But Google’s ads systems would not understand what that URL data represents, nor use it to create profiles of users.

The sensitive data could be used for a variety of other reasons, including protecting against fraud and abuse and measuring the engagement with an advert, Google said.

Facebook, another frequent tracker across the sites we surveyed, which also received data on highly sensitive symptoms and diagnoses, was not able to confirm what it does with this information. “We don’t want websites sharing people’s personal health information with us – it’s a violation of our rules, and we enforce against sites we find doing this,” a company spokesperson said. “We’re conducting an investigation and will take action against those sites in violation of our terms.”

Methodology

We carried out our analysis on August 29 2019, based on a list of the top 100 health sites produced by SimilarWeb based on average UK monthly traffic as of July 2019. We ran this list through WebXray, an open-source tool that opens each site and records all the subsequent “requests” made to third parties, and also used HTTP Toolkit to look more closely at what data were specifically being received by third parties.

It should be noted that our investigation represents only a limited view, since we could not see what happened to data beyond the user’s browser, and that it is a snapshot in time: if the experiment were repeated, even on the same computer in the same location, it is likely the results would vary.

Amazon said: “We do not use the information from publisher websites to inform advertising audience segments,” but it did not confirm what it did with the sensitive data it received, such as user-input fertility information.

It was unclear if either Facebook or Amazon also received personal identifiers, such as an IP address or a unique ID, alongside health data.

The companies also emphasised that the publishers of the websites were required to manage user consent and the type of data sent to third parties.

“It’s like a bar saying we don’t like to serve people that are underage, they shouldn’t come here to drink,” Mr Libert said. “They are being negligent and it’s deeply disingenuous.”

Meanwhile the website publishers themselves did not provide details of why the data was being shared or what would be done with it once it left their hands. A WebMD spokesperson said: “[W]e only use, collect or share user information to the extent disclosed in our privacy policy.” The policy reviewed by the FT did not appear to provide clear answers about the fate of the data.

Condensed comments from other companies that responded are published at the end of this article. Others contacted, including Lotame, ComScore, AppNexus, Drugs.com, Health.com and Bounty, did not respond to request for comment.

As the ICO’s deadline for online ad auction firms to audit themselves approaches, it will be a time of reckoning for many in the industry that was until recently self-regulated.

“The internet has turned into a privacy wasteland. But there’s a suspension of disbelief in the [ad] industry. Companies say they are GDPR-compliant, there’s a codependency where everybody pretends everything is OK, but the deep technical architecture is fundamentally incompatible with the right to privacy,” Mr Libert said.

“Ultimately it’s going to be the ICO that decides, and based on early guidance, I suspect they may not be a willing participant in this fictional world built by online advertisers.”

Further company responses

Bupa: “Advertising cookies are used on our site but we have set them so that no personal data about visitors to our website, including our health information pages, is passed on to third parties.

“Unique IDs are shared with some third parties in order to measure website performance and engagement. This is anonymised data and is not personally identifiable. No health information of visitors to our website is shared with third parties.”

BabyCenter: "Our privacy and consent statements clearly indicate that we may use data including due date to personalise content and ads. BabyCenter would only pass personal data to third parties after consent is given. As of August 19, 2019, BabyCenter is under new ownership and will be rolled under Everyday Health Group's GDPR and data privacy consent policies and practices in line with the digital properties in its portfolio."

British Heart Foundation: “The data captured by the cookies on our website is protected (pseudonymised) so it doesn’t directly identify individuals. We don’t sell data and we don’t share sensitive personal data on areas such as ethnic origin and health that could directly identify people.

“To reflect recent changes in guidelines we are reviewing how we use cookies and how we seek consent for their use when people visit our website. In the coming months, we will be implementing a new version of our cookies model.

“We don’t share or sell sensitive personal information that could directly identify an individual. We only share information about pages that devices have visited, for example the URL.”

Healthline: "Of the eight platforms you referenced, five are service providers and are not used by Healthline.com for advertising. Facebook, Pinterest and Trade Desk are platforms we may use for re-marketing. However, the data we pass to these platforms is for our use only and is subject to data protection agreements."

Mind: "Since the Information Commissioner published updated guidance about cookies in July, we have been reviewing our practice, including an audit of tracking across our website. As a result of a report published by Privacy International in September, we have removed marketing trackers from our site and won't reinstate them until our review has finished and we are satisfied we're using them appropriately.

“No data is being explicitly shared with Google DoubleClick through the Mind.org.uk site, however Google sets its own DoubleClick cookies via other Google products to support development and optimisation.

“We have never sold or shared, and will never sell or share, any of our website users’ personal information with organisations so that they can be contacted for any marketing activities. Nor do we sell any information about our website users’ web browsing activity.”

Oracle: “Regarding BlueKai, any site setting a BlueKai cookie is required to collect data in accordance with applicable legal requirements. In addition, Oracle Data Cloud has implemented processes designed to prevent the ingest of third-party data from EU-based users for the sites you reference. Finally, Oracle Data Cloud does not create or offer any sensitive third-party audience segments on consumers in the EU.” These responses have been condensed. – Copyright The Financial Times Limited 2019