Saul Pwanson
How a File Format Led to a Crossword Scandal

Discussion: Issue #218

In 2016 I designed a plain-text file format for crossword puzzle data, and then spent a couple of months building a micro-data-pipeline, scraping tens of thousands of crosswords from various sources. Then, having all those crosswords in a simple format, I wanted to see if there were any common grid patterns–and discovered egregious plagiarism by a major crossword editor that had gone on for years. This talk would cover the file format, data pipeline, and the design choices that aided rapid exploration; the evidence for the scandal, from the initial anomalies to the final damning visualization; and what it’s like for a data project to get 15 minutes of fame.

Slides from Saul Pwanson’s Presentation (https://zenodo.org/record/2836892#.XNyBaUXUB1Y)

Discussion

We discussed our favorite stories of how basic shell tools like AWK and friends can be used to process ridiculous amounts of data ridiculously fast - simple solutions often yield the best results.

Join us any time and get to be a better coder!

Back