Over the past few years, accessories and blog posts have started to ask some adaptation of the same question: “Why are all websites starting to look the same?”

These posts usually point out some common design elements, from large images with superimposed text, to hamburger menus, which are those three accumbent lines that, when clicked, reveal a list of page options to choose from.

My colleagues Bardia Doosti, David Crandall, Norman Su and I were belief the history of the web when we started to notice these posts agriculture up. None of the authors had done any sort of empiric study, though. It was more of a hunch they had.

We absitively to investigate the claim to see if there were any truth to the notion that websites are starting to look the same and, if so, analyze why this has been happening. So we ran a series of data mining studies that scrutinized nearly 200,000 images across 10,000 websites.

How do you even admeasurement similarity?

It’s around absurd to study the entire internet; there are over a billion websites, with many times as many webpages. Since there’s no list of them all to choose from, assuming a random sample of the internet is off the table. Even if it were possible, most people only see a tiny atom of those websites regularly, so a random sample may not even abduction the internet that most people experience.

We ended up using the websites of the Russell 1000, the top U.S. businesses by market capitalization, which we hoped would be adumbrative of trends in mainstream, accumulated web design. We also advised two other sets of sites, one with Alexa’s 500 most trafficked sites, and addition with sites nominated for Webby Awards.

Because we were absorbed in the visual elements of these websites, as data, we used images of their web pages from the Internet Archive, which consistently preserves websites. And since we wanted to gather quantitative data comparing millions of website pairs, we needed to automate the assay process.

To do that, we had to settle on a analogue of “similarity” that we could admeasurement automatically. We advised both specific attributes like color and layout, as well as attributes abstruse automatically from data using bogus intelligence.

For the color and layout attributes, we abstinent how many pixel-by-pixel edits we would have to make to transform the color scheme or page anatomy of one website into another. For the AI-generated attributes, we accomplished a apparatus acquirements model to allocate images based on which website they came from and admeasurement the attributes the model learned. Our antecedent work indicates that this does a analytic good job at barometer stylistic similarity, but it’s very difficult for humans to accept what attributes the model focused on.

How has the internet changed?

We found that across all three metrics – color, layout and AI-generated attributes – the boilerplate differences amid websites peaked amid 2008 and 2010 and then decreased amid 2010 and 2016. Layout differences decreased the most, crumbling over 30% in that time frame.