Why are there duplicate entries and related hits in Managed Data Leaks Reporting?

Written by Abi Tyas Tunggal

A digital asset, such as a database or webpage, will be picked up by multiple searches.

For example, a configuration file that contains the credentials for example.com will likely match "example.com" and "example password". While at first glance it may make sense to dedupe these, multiple results provide a strong signal to our analysts that the resource should be examined carefully.

Read our article on managed data leaks reporting for additional context.

What are related hits?

Building on this idea, a container containing one piece of sensitive data (in this example, a configuration file) is likely to contain other sensitive data.

We use the word container here to refer to any type of file storage, such as a cloud-hosted bucket, git repository, or any other publicly accessible file storage method.

To help your team remediate sensitive data exposures found through our Managed Data Leaks feature, our analysts and algorithms group these files as related hits. These related hits are investigated as one unit of work and produce a single finding in the Managed Data Leaks section of the UpGuard platform. However, in the managed data leaks reporting feature we have chosen to display the relationship between each hit.

Why do some things appear repeatedly?

There are several reasons why hits can reappear. If the contents of a file change its hash, we treat it as a new hit as it may have exposed additional sensitive data that was not previously there.

For example, imagine we found a publicly exposed GitHub repo for example.com which contained the configuration file outlined above and we produce a hit. For now, the configuration file is the only sensitive data exposed.

However, tomorrow the same engineer pushes a change to the repo which results in a secondary configuration file being exposed, which contains another piece of sensitive information.

As you can tell, this example situation is one that warrants attention.

While these cases are relatively rare, we have seen these happen enough that we believe they require monitoring as it often indicates the project is under active development.

See also:

Why are there duplicate entries and related hits in Managed Data Leaks Reporting?

Learn about how our team classifies hits, why duplicates can appear, and what related hits are in Managed Data Leaks reporting.

What are related hits?

Why do some things appear repeatedly?