NBC Report Wrongly Implies Machine Learning Companies Skim Photos from Flickr, SmugMug Chief and Creative Commons CEO Respond

Earlier this week, NBC News released a report on how Creative Commons images were being used for machine learning, specifically to train AI how to recognize faces.

This report ensnared photography community Flickr and made some users mistakenly think that the company had given permission to these tech firms for them to use community member photos for machine learning.

photo by matt botsford — Photo by Matt Botsford

That’s not the case and Don MacAskill from Flickr even had to clarify what was going on, which he did with the following: “The issue isn't that Flickr is handing over your photos for free to corporations looking to train their artificial intelligence algorithms. It's that users are sharing their photos under various Creative Commons licenses without fully comprehending what those licenses entail.”

What he's saying is that users are sometimes unknowingly giving these companies permission to do this under the Creative Commons license. This is why it is so important to read the fine print on everything.

DPReview reports that this mistaken notion might have arisen from NBC News‘ less-than-clear reporting on the matter.

From the report, it would be easy to assume that these companies were doing something illicit rather than something that is totally permitted.

SmugMug’s Don MacAskill further clarified in a Tweet responding to Olivia Solon, “Photos were not “scraped … from @Flickr”. IBM is very clear that their dataset was not “scraped” but originates from opt-in @CreativeCommons licensed photos supplied in the @Flickr public research dataset. Factually incorrect. Your article needs corrections. /cc @NBCNews.”

Creative Commons CEO Ryan Merkley even responded to the controversy in an official blog post, saying: “While we do not have all the facts regarding the IBM dataset, we are aware that fair use allows all types of content to be used freely, and that all types of content are collected and used every day to train and develop AI. CC licenses were designed to address a specific constraint, which they do very well: unlocking restrictive copyright. But copyright is not a good tool to protect individual privacy, to address research ethics in AI development, or to regulate the use of surveillance tools employed online. Those issues rightly belong in the public policy space, and good solutions will consider both the law and the community norms of CC licenses and content shared online in general.”