January is not yet over and 2019 has already brought us the second biggest collection of stolen data in history. Unlike traditional data breaches, Collection #1 is actually a massive collection of smaller credential stuffing lists containing username and password combinations. These lists, some of which were pulled from previous data breaches, were stored in databases on a cloud service, Mega, and advertised on a popular hacking forum. According to multiple users on this forum, Collection #1 is part of an original one terabyte upload that included seven collections; Collection #1-5, Anti-Public, and Zabagur.

Credential stuffing lists like these are treasure troves for hackers, who use them as fuel for large scale attacks against websites. Using automated tools, hackers can test thousands or millions of credential pairs to find reused email address and password combinations and gain access to any accounts these combinations happen to unlock. Even if attackers can only unlock one account in a hundred using this method, attackers using a data list of hundreds of thousands of credential combinations will be able to exploit thousands of accounts.

This collection was first reported on and analyzed by security researcher Troy Hunt. Based on Hunt’s analysis, Collection #1 contains nearly 2.7 billion rows of data from almost three thousand separate databases. This breaks down to 773 million unique email addresses but only 21 million unique passwords, which indicates that the majority of these passwords were associated with multiple accounts. Popular (and weak) passwords like “password” or “12345” tend to make multiple appearances on credential lists. 82% of these unique email addresses and 50% of these passwords were exposed in previously indexed breaches.

Though Collection #1 seems daunting in size and scope, further analysis of the data shows that this exposure may not be as serious as it seems. Despite the January 2019 discovery date, none of the data was first exposed in 2019 or even in the second half of 2018. The most recent database is dated June 2018, and of the remaining data, 58.8% of the databases were exposed in 2018 and 37.4% of the databases were exposed in 2017. The two oldest databases date back to 2010 and 2008.

With new breaches being discovered on a weekly basis, one would hope that the passwords associated with most of the accounts included in this dataset have already been changed. However, statistics about password reuse and breach fatigue suggest that a number of these accounts may still be at risk for account takeover. Individuals exposed in Collection #1 should take care to choose strong, unique passwords moving forward and not be tempted to reuse old passwords. This particular collection of credentials does not contain any other personal information (like payment data, identity information, phone numbers, or addresses), but the appearance of databases from as early as 2008 demonstrates how criminals can repackage and circulate stolen data for more than a decade. The risks associated with Collection #1 will not disappear just because the breach has been identified and publicized. As these screenshots (below) show, Terbium Labs has already identified a listing advertising this dataset for sale and multiple forum users reposting the original data. As quickly as researchers identify data breaches, cyber criminals treating data as a commodity repackage, re-market, and resell it to meet the market’s high demand.

We may never know how many accounts were compromised as a result of this massive collection. Collection #1 highlights the the dangers of password reuse and the importance of continuous monitoring. The danger of stolen data does not end with the first exposure and discovery – as the exposure of Collection #1 shows, stolen data will be distributed multiple times and in multiple ways.