I suppose I’ve lastly caught my breath after coping with these 23 billion rows of stealer logs final week. That was a bit intense, as is normally the way in which after any massive incident goes into HIBP. However the complicated nature of stealer logs coupled with an overtly lengthy weblog submit explaining them and the conflation of which companies wanted a subscription versus which had been simply accessible by anybody made for a really intense final 6 days. And there have been the problems round supply information integrity on high of all the things else, however I will come again to that.
Once we launched the flexibility to look by way of stealer logs final monththat wasn’t the primary corpus of information from an data stealer we might loaded, it was simply the primary time we might made the web site domains they expose searchable. Now that we now have an precise mannequin round this, we’ll begin going again by way of these prior incidents and backfilling the brand new searchable attributes. We have simply accomplished that with the 26M distinctive electronic mail handle corpus from August final yr and added a bunch beforehand unseen situations of an electronic mail handle mapped in opposition to a web site area. We have additionally now flagged that incident as “IsStealerLog”, so in case you’re utilizing the API, you will see that attribute now set to true.
For probably the most half, that information is all dealt with simply the identical as the prevailing stealer log information: we map electronic mail addresses to the domains they’ve appeared in opposition to within the logs then make all that searchable by full electronic mail handle, electronic mail handle area or web site area (learn final week’s actually, actually lengthy weblog submit in case you want an explainer on that). However there’s one essential distinction that we’re making use of each to the backfilling and the prevailing information, and that is associated to a little bit of cleansing up.
A theme that emerged final week was that there have been electronic mail addresses that solely appeared in opposition to one area, and that was the area the handle itself was on. If john@gmail.com is in there and the one area he seems in opposition to is gmail.com, what’s up with that? At face worth, John’s particulars have been snared while logging on to Gmail, nevertheless it would not make sense that somebody contaminated with an data stealer solely has one web site they’ve logging into captured by the malware. It must be many. This appears to be because of a mix of the supply information containing credential stuffing rows (simply electronic mail and password pairs) amidst data stealer information and someplace in our processing pipeline, introducing integrity points as a result of odd inputs. Rubbish in, rubbish out, as they are saying.
So, we have determined to use some Occam’s razor to the scenario and go along with the best rationalization: a single entry for an electronic mail handle on the area of that electronic mail handle is unlikely to point an data stealer an infection, so we’re eradicating these rows. And never including any extra that meet that standards. However there is no doubt the e-mail handle itself existed within the supply; there is no such thing as a stage of integrity points or parsing errors that causes john@gmail.com to look out of skinny air, so we’re not eradicating the e-mail addresses within the breach, simply their mapping to the area within the stealer log. I might already defined such a situation in Jan, the place there could be an electronic mail handle within the breach however no corresponding stealer log entry:
The hole is defined by a mix of electronic mail addresses that appeared in opposition to invalidly shaped domains and in some instances, addresses that solely appeared with a password and never a site. Criminals aren’t precisely famend for dumping completely shaped information units we are able to seamlessly work with, and I hope people that fall into that few % hole perceive this limitation.
FWIW, entries that matched this sample accounted for 13.6% of all rows within the stealer log desk, so this hasn’t made quite a lot of distinction by way of outright quantity.
This takes away quite a lot of confusion relating to the an infection standing of the handle proprietor. As a part of this revision, we have up to date all of the stealer log counts seen on area search dashboards, so in case you’re utilizing that characteristic, you may even see the quantity drop based mostly on the purged information or improve based mostly on the backfilled information. And we’re not sending out any further notifications for backfilled information both; there is a threshold at which comms turns into extra noise than sign and I’ve a robust suspicion that is how it could be acquired if we began sending emails saying “hey, that stealer log breach from ages in the past now has extra information”.
And that is it. We’ll maintain backfilling information, and all the corpus inside HIBP is now cleaner and extra succinct. And we’ll positively clear up all of the UX and web site copy as a part of our impending rebrand to make sure all the things is rather a lot clearer sooner or later.
I will go away you with a little bit of levity associated to subscription prices and worth. As I just lately lamented, resellers could be a nightmare to deal withand we’re significantly contemplating banning them altogether. However sometimes, they inadvertently share greater than they need to, and we get an perception into how the skin world views the service. Like a latest case the place a reseller by chance despatched us the bill they’d meant to ship the client who wished to buy from us, full with a 131% value markup It was an annual Pwned 4 subscription that is meant to be $1,370, and easily to purchase this on that buyer’s behalf after which hand them over to us, the reseller was charging $3,165. They’ll do that as a result of we make the service dust low-cost. How do we all know it is dust low-cost? As a result of one other reseller inadvertently despatched us this inner communication at present:
FWIW, we do have bank cards in Australia, they usually work simply the identical as in every single place else. I nonetheless vehemently dislike resellers, however at the very least our clients are getting a very good deal, particularly after they purchase direct
#Backfilling #Cleansing #Stealer #Logs #Pwned

Azeem Rajpoot, the author behind This Blog, is a passionate tech enthusiast with a keen interest in exploring and sharing insights about the rapidly evolving world of technology.
With a background in Blogging, Azeem Rajpoot brings a unique perspective to the blog, offering in-depth analyses, reviews, and thought-provoking articles. Committed to making technology accessible to all, Azeem strives to deliver content that not only keeps readers informed about the latest trends but also sparks curiosity and discussions.
Follow Azeem on this exciting tech journey to stay updated and inspired.