Breaking Down the Submissions
Given the well documented irregularities throughout the comment submission process, it was clear from the start that the data was going to be duplicative and messy. If I wanted to do the analysis without having to set up the tools and infrastructure typically used for “big data,” I needed to break down the 22M+ comments and 60GB+ worth of text data and metadata into smaller pieces.⁴
Thus, I tallied up the many duplicate comments⁵ and arrived at 2,955,182 unique comments and their respective duplicate counts. I then mapped each comment into semantic space vectors⁶ and ran some clustering algorithms on the meaning of the comments.⁷ This method identified nearly 150 clusters of comment submission texts of various sizes.⁸
…click on the above link to read the rest of the article…
More than a Million Pro-Repeal Net Neutrality Comments were Likely Faked
More than a Million Pro-Repeal Net Neutrality Comments were Likely Faked
I used natural language processing techniques to analyze net neutrality comments submitted to the FCC from April-October 2017, and the results were disturbing.
NY Attorney General Schneiderman estimated that hundreds of thousands of Americans’ identities were stolen and used in spam campaigns that support repealing net neutrality. My research found at least 1.3 million fake pro-repeal comments, with suspicions about many more. In fact, the sum of fake pro-repeal comments in the proceeding may number in the millions. In this post, I will point out one particularly egregious spambot submission, make the case that there are likely many more pro-repeal spambots yet to be confirmed, and estimate the public position on net neutrality in the “organic” public submissions.¹
Key Findings:²
Breaking Down the Submissions
Given the well documented irregularities throughout the comment submission process, it was clear from the start that the data was going to be duplicative and messy. If I wanted to do the analysis without having to set up the tools and infrastructure typically used for “big data,” I needed to break down the 22M+ comments and 60GB+ worth of text data and metadata into smaller pieces.⁴
Thus, I tallied up the many duplicate comments⁵ and arrived at 2,955,182 unique comments and their respective duplicate counts. I then mapped each comment into semantic space vectors⁶ and ran some clustering algorithms on the meaning of the comments.⁷ This method identified nearly 150 clusters of comment submission texts of various sizes.⁸
…click on the above link to read the rest of the article…
Tags: faked comments, fcc, Jeff Kao, medium, net neutrality