Industry Insights

When Does Unusual Data Activity Become Dangerous Data Activity?

7 min Read

Christian Wimpelmann

Unusual data activity is the earliest warning sign of Insider Risk and potentially damaging data leak or data breach. Whether malicious or unintentional, unusual data access, unusual data traversing networks or apps where it generally doesn’t, etc., is often a precursor to employees doing something they shouldn’t — and data ending up somewhere much more problematic (as in, outside the organization). Unfortunately, the worst-case scenario is also an all-too-common occurrence: The first sign of a data leak is your competitor getting your stolen trade secrets.

But why is it so hard to detect unusual data activity? Let’s start with the basics:

What is unusual data or unusual data activity?

“Unusual” is analogous with the statistical concept of “outliers” — data points that fall outside the norm, far from the core population of data values, or somehow otherwise diverge from generally consistent, well-structured data.

Most security software isn’t designed to detect unusual data activity and insider risk

The most conventional data security tools, like DLP and CASB, use rules — defined by security teams — to block risky activity. These tools take a black-and-white view on data activity: An action is either allowed or not — and there’s not much consideration beyond that. But the reality is that many things might fall into the “not allowed” category that are nevertheless quite common. People (inadvertently or otherwise) butt up against the rules constantly in everyday work. In addition, there are plenty of things that might be “allowed” but could end up being quite risky, nonetheless. What you want is to see those true outliers — whichever side of the rules they fall on.

This lack of visibility into unusual data activity is a significant factor in why a recent report found that more than three-quarters of organizations with a DLP solution in place still end up suffering a data breach.

First, you need to understand “usual”

If you’re going to recognize unusual data activity, you need to define “usual” or normal data activity. Analytics offers a potential solution to the challenge of identifying unusual data activity. Sophisticated analytics tools can do a great job of homing in on the trends and patterns in data. They help security teams get a baseline around what data is moving through which vectors — and by whom — on an everyday basis. Unusual data activity can then be more clearly defined by comparing a given action against 1) the usual activity patterns of the user and 2) the usual activity patterns of that specific file or piece of data. That second point is critical — far too many security teams focus solely on the user. But it’s the data you care about, so taking a data-centric approach to monitoring for unusual data activity will help you guard what really matters.

The problem of “noise” — even among the unusual

Security teams are increasingly leveraging UEBA and other analytics-powered tools to separate the unusual from the usual. However, today, the bulk of unusual data activity amounts to the everyday work of everyday employees who unintentionally put data at risk. And “normal” is getting more and more diverse, as people are working remotely and using more cloud-based apps to connect, collaborate and move data — and companies are directly encouraging employees to explore their ingenuity to foster speed, agility, and innovation. It’s no surprise, really, that security professionals say they’re more concerned about inadvertent or negligent data leaks than malicious ones — and that 80% of insider data breaches stem from unintentional actions.

Unintentional employee actions can be incredibly damaging, but the real problem is that the vast (vast) majority of unusual data activity is ultimately harmless. So, even some of the most sophisticated security teams — using advanced analytics to flag unusual data activity — end up buried in alerts and plagued by alert fatigue. In fact, the 2021 Code42 Data Exposure Report found that nearly two-thirds of IT security leaders don’t know which data risk to prioritize.

What are the best signals of unusual data activity?

Beyond the basic concept of outliers, experienced security professionals know that there are several red flags that signify the kinds of unusual data activity that are most likely to lead to data risk. Today, leading security technologies are taking a more innovative approach to defining the unusual data activity worth that needs attention: They’re machining that experience-driven intuition on what matters into the concept of Insider Risk Indicators, or IRIs. Here are a few examples of the most common and relevant IRIs:

  • Off-hour activity: When a given user’s endpoint file activity takes place at unusual times.
  • Untrusted domains: When files are emailed or uploaded to domains and URLs that are not considered trusted, as established by the company.
  • Suspicious file mismatch: When the MIME/Media type of a high-value file, such as a spreadsheet, is disguised with the extension of a low-value file type, such as a JPEG — typically indicative of attempts to conceal data exfiltration.
  • Remote activity: Activity taking place off-network may indicate increased risk.
  • User attributes: Attributes like name, title, department, manager, and employment type (full-time, part-time, contractor) from a company’s identity management system.
  • File categories: Categories, as determined by analyzing file contents and extensions, that help signify a file’s sensitivity and value.
  • Employee departure: Employees who are leaving the organization — voluntarily or otherwise.
  • Employee risk factors: Risk factors such as contract employees, high-impact employees, flight risk, performance concerns, and elevated access privileges.
  • ZIP / compressed file movement: File activity involving .zip files since they may indicate an employee is attempting to take many files or hide files using encrypted zip folders.
  • Shadow IT apps: Unusual data activity happening on web browsers, Slack, Airdrop, FileZilla, FTP, cURL, as well as commonly unauthorized Shadow IT apps like WeChat, WhatsApp, Zoom, and Amazon Chime.
  • Public cloud sharing links: When files are shared with untrusted domains or made publicly available incorporate Google Drive, OneDrive, and Box systems.

Leveraging Insider Risk Indicators to surface the riskiest unusual data activities

One of the forward-thinking security tools that is “machining the intuition” of security teams is the Code42 Incydr solution. Incydr builds on the concept of Insider Risk Indicators, creating a highly automated process for identifying and correlating unusual data and unusual behaviors to create a high-fidelity signal of the real risks. Here’s a look at how Incydr uses IRIs to surface the riskiest unusual data activities:

Incydr detects risk across all data activity — computers, cloud, and email

Incydr starts from the premise that all data matters — and builds comprehensive visibility into all data activity, including:

  • Sync activity to cloud applications like Dropbox and iCloud
  • Uploads to personal email and other sites through web browsers
  • Files sent through Airdrop or accessed by web apps like Slack
  • Sharing from corporate cloud services like Google Drive, OneDrive, and Box
  • Email attachments from corporate Office 365 or Gmail
  • File deletions from user computers

1. Cumulative risk scores determine event severity

Incydr’s uses a robust set of IRIs, categorized by file, vector, and user risk indicators. Every IRI detected gets a numerical risk score. These scores are totaled to determine the overall risk of a detected event: critical, high, moderate, or low. Events are prioritized by severity, and users are prioritized by the number of critical events they trigger. Code42 determines these scores by combining its own product telemetry data on the highest risk IRIs with qualitative research on security practitioner experiences.

2. Prioritization settings are easily adapted based on risk tolerance

If needed, administrators can adjust the prioritization model to fit their own risk tolerance. Risk settings adjust the risk scores and modify how users and events are prioritized. Trust settings tell Incydr what activity it should de-prioritize. This includes trusted domain information that allows Incydr to distinguish between sanctioned and personal or unsanctioned activity and prevents approved file activity from triggering alerts or cluttering dashboard views. The result is an accurate, prioritized list of the highest risk users and events.

3. Providing a simple Risk Exposure Dashboard

Incydr distills all information on unusual data activity and insider risk into a Risk Exposure Dashboard. The dashboard provides security teams with a single-pane, company-wide view of suspicious file movement, sharing, and exfiltration activities by vector and file type. Perhaps more importantly, it focuses attention on the top employees whose file activity needs investigation as well as concerning remote employee activity. And it serves as the springboard into Incydr’s robust investigation and response capabilities — empowering security teams to execute a rapid, right-sized response to unusual data activity before the damage is done.

Learn more about how Code42 Incydr can help you detect and respond to unusual data activity and unusual data traversing your devices, systems, and networks.

Christian Wimpelmann

Christian Wimpelmann, CISSP, CCSP is an IAM Manager at Code42, focused on identity strategy and implementation. Christian spent the first 14 years of his career at Target Corporation, building and maintaining identity management and directory platforms.