User Docs
PlatformProduct updates
  • Getting started
    • What is DSPM?
    • Use DSPM in your company
    • Choose how to run DSPM
  • Quick start
  • Deployment guide
    • Sizing
    • Installation
      • Understand installation requirements
        • K3S installation
        • Configuring a HA K3s cluster
        • Configuring Rancher and Fleet agent to run behind an HTTP proxy
        • Install Synergy/Focus/Enterprise using Helm without Rancher
        • Install Synergy/Focus/Enterprise using Rancher
        • Air Gap Installation
        • Uploads to Rancher
      • Upgrade K3s
        • K3s - Upgrade
      • Troubleshooting
        • K3s on RHEL/CentOS/Oracle Linux
        • Networking
        • Configuring Rancher and Fleet agent to run behind a HTTP proxy if cluster was previously registered
    • Estimate hardware capacity needs
  • Administration guide
    • Customer Support Portal
    • Pattern matching
    • Data Controls
    • Analytics
    • Detectors
    • Import custom TLS certificate
    • GQL Quick Guide
    • Critical & Sensitive Classification Attribute Modification
    • How to Check AI Mesh Version
    • Webhooks
    • AI Mesh Overview
    • Is Customer Data Saved by Getvisibility?
  • Enterprise setup
    • Authentication
      • Keycloak configuration
      • Single Sign-on (SSO)
        • Using Azure AD as Keycloak Identity Provider
      • Keycloak User Federation Configuration (LDAP/AD)
      • Enable 2FA
      • Role-Based Access Control (RBAC)
      • Keycloak User Federation using LDAP over SSL
  • Implementation
    • Configuring Taxonomies & Labels
  • Integrations
    • GQL
    • Template Language
    • Multi-Language Machine Learning
    • SIEM Integration
    • Google Drive Auto-labelling
  • Scan with Getvisibility
    • Configure detectors
    • Configure data sources
      • Scan Configuration Fields
      • AWS IAM
      • AWS S3
      • Azure AD
      • Azure Blob
      • Azure Files
      • OneDrive
      • SharePoint Online
      • SharePoint on-premise
      • Box
      • Confluence Cloud
      • LDAP
      • SMB
      • Google IAM
      • Google Drive
      • ChatGPT
      • iManage
      • Dropbox
    • Scanning
      • Data Source Permissions
      • Scan Scheduler
      • Types of Scan
      • Scan History
      • Scan Analytics
      • Supported Languages for ML Classifiers
      • Rescan Files
    • Streaming
      • What is DDR?
      • How to Configure DDR Rules
      • Import Data Controls
      • Monitoring New Files via DDR Streaming
      • DDR Supported Events
      • Lineage
      • Supported Data Sources
      • Azure Blob Streaming Configuration
      • Azure Files Streaming Configuration
      • Confluence Cloud Streaming Configuration
      • Sharepoint Online Streaming Configuration
      • SMB Streaming Configuration
      • OneDrive Streaming Configuration
      • Azure AD Streaming Configuration
      • AWS S3 Streaming Configuration
      • Google Drive Streaming Configuration
      • Google IAM Streaming Configuration
      • AWS IAM Streaming Configuration
      • Box Streaming Configuration
      • Dropbox Streaming Configuration
    • Enterprise Search columns meaning
    • Supported File Types
  • Glossary
  • FAQ
  • EDC - All Documents
    • Deployment - Onboarding
      • EDC-Server Installation Guide
      • EDC-Deployment Flow Guide
        • EDC-installerConfig.json and CLI config Details
      • Deploying the agent using ManageEngine
      • EDC-Mac Agent - Installation Guide
      • Windows Agent Precheck Script
    • Functionality - Guides
      • EDC - Admin Guide - v4
      • EDC -Guide for writing Visual Labels
      • EDC- Guide for Header Footer Options
      • EDC-Metadata Details
      • EDC Supported File Types
      • Agent V4 - Configuration Options for Expert Mode
      • File Lineage - Agent Activities
      • Endpoint Data Discovery
    • Troubleshooting Documents
      • Preventing Users From Disabling Agent
      • Generate Installation Logs
      • Troubleshooting Agent for Windows
      • Guide for missing suggestions
      • Reseller Keycloak Quick Installation Guide
      • Alternative authentication methods for agent
  • EDC - All Documents
Powered by GitBook
On this page
  • Metadata Entries
  • AIP Metadata Configuration
  • Configurable Tags
  • Writing Metadata to files

Was this helpful?

Export as PDF
  1. EDC - All Documents
  2. Functionality - Guides

EDC-Metadata Details

Our tagging mechanism employs metadata to classify documents and preserve essential internal information needed for its operation.

Given that various file types utilize unique methods for storing metadata, the agent is designed to adapt to these differences. Despite the variations in storage mechanisms, the actual information written remains consistent, conforming to specific formats that are applicable across both MacOS and Windows operating systems.

Here is an example of metadata written to a .png file, illustrating the structured way in which information is stored:

{
    "GVData": "ew0KICAidGFnc2V0X2YxNGZjMWYxXzg5NTBfNDBkNV84YTI5XzQ1OTA5ZGE5NDdkNl9nZHByL3BpaSI6ICJGYWxzZSIsDQogICJ0YWdzZXRfZjE0ZmMxZjFfODk1MF80MGQ1XzhhMjlfNDU5MDlkYTk0N2Q2X3NlbnNpdGl2ZSI6ICJGYWxzZSIsDQogICJ0YWdzZXRf",
    "GVData0": "MDA0ZGVhMzNfODc1MV80Mzk5X2E3NmVfOTVmMzcxY2I0MTE5X2Rpc3RyaWJ1dGlvbiI6ICJJbnRlcm5hbCIsDQogICJ0YWdzZXRfZTE2NDA5YTdfMTcwMF80MTUzXzkwOTBfMzk1NWJjMmYwYWU4X2NsYXNzaWZpY2F0aW9uIjogIkdlbmVyYWwgQnVzaW5lc3MiDQp9",
    "GVData1": "(end)",
    "Classification": "General Business",
    "ClassificationTagSetId": "e16409a7-1700-4153-9090-3955bc2f0ae8",
    "ClassificationValue": "General Business",
    "DistributionTagSetId": "004dea33-8751-4399-a76e-95f371cb4119",
    "DistributionValue": "Internal / \u0645\u0631\u062d\u0628\u0627 \u0628\u0627\u0644\u0639\u0627\u0644\u0645",
    "FileId": "e0481ca0-a9e0-e307-07fa-6189581762a8",
    "MSIP_Label_d2ee4459-2afa-4136-9d18-4f2ebca102cf_ActionId": "95871ebc-c143-40b9-9b42-ad7bd6bc77df",
    "MSIP_Label_d2ee4459-2afa-4136-9d18-4f2ebca102cf_ContentBits": "3",
    "MSIP_Label_d2ee4459-2afa-4136-9d18-4f2ebca102cf_Enabled": "true",
    "MSIP_Label_d2ee4459-2afa-4136-9d18-4f2ebca102cf_Method": "Priviledged",
    "MSIP_Label_d2ee4459-2afa-4136-9d18-4f2ebca102cf_Name": "General Business",
    "MSIP_Label_d2ee4459-2afa-4136-9d18-4f2ebca102cf_SetDate": "2023-08-02T11:17:28Z",
    "MSIP_Label_d2ee4459-2afa-4136-9d18-4f2ebca102cf_SiteId": "ed86fd3a-ab24-4113-a9f8-6cb38f63c190",
    "TagDateTime": "2023-08-02T11:17:28Z",
    "UserId": "trzec"
}

This example demonstrates how the agent builds the metadata that gets attached to a classified file.

Metadata Entries

The agent is designed to write various types of metadata entries, each serving a specific purpose:

  1. GVData0, GVData1, GVData2, etc. - These metadata entries encapsulate internal data used by the agent, such as document ID and written visual labels. Encoded using Base64, this metadata is integral to the agent's operations and is not configurable. It will always be written by the agent.

  2. MSIP_Label_* - This category of metadata is configurable and is written to ensure compatibility with Microsoft Azure Information Protection (AIP). It allows the agent to align with Microsoft's security and protection frameworks.

  3. Configurable Tags - Apart from the above fixed metadata entries, the agent supports configurable tags. These can be tailored according to specific needs within the agent's configuration, providing flexibility in handling and storing additional information.

This design allows for a high degree of customization in how metadata is written, catering to diverse requirements and integration scenarios.

AIP Metadata Configuration

The agent is able to write metadata that aligns with AIP's standards, allowing seamless interaction with Microsoft's security framework.

Below is an example of a configuration that demonstrates how the agent can be configured to write AIP metadata.

"aip": {
    "siteId": "ed86fd3a-ab24-4113-a9f8-6cb38f63c190",
    "labels": [
      {
        "id": "734f255c-faeb-4316-b0c0-3d88dfc5bbef",
        "name": "Public",
        "method": "Priviledged",
        "contentBits":  3
      },
      {
        "id": "d2ee4459-2afa-4136-9d18-4f2ebca102cf",
        "name": "General Business",
        "method": "Priviledged",
        "contentBits": 3
      },
      {
        "id": "65650f8c-b1ca-43b3-aad5-b4fa5ffcfb95",
        "name": "Confidential",
        "method": "Priviledged",
        "contentBits": 3
      },
      {
        "id": "574e5463-b443-4182-9781-3fc620ed259c",
        "name": "Restricted",
        "method": "Priviledged",
        "contentBits": 3
      }
    ],
    "mappings": [
      {
        "classification": "Public",
        "aipLabelId": "734f255c-faeb-4316-b0c0-3d88dfc5bbef"
      },
      {
        "classification": "General Business",
        "aipLabelId": "d2ee4459-2afa-4136-9d18-4f2ebca102cf"
      },
      {
        "classification": "Confidential",
        "aipLabelId": "65650f8c-b1ca-43b3-aad5-b4fa5ffcfb95"
      },
      {
        "classification": "Restricted",
        "aipLabelId": "574e5463-b443-4182-9781-3fc620ed259c"
      }
    ]
  },

This configuration includes specific AIP labels and their corresponding classifications, defining how the agent translates its internal categorizations into a format that AIP can recognize.

How to get MIP labels from Azure

We can get the labels configured in customer’s Azure using the Fetch MIP labels from Azure in power tools.

It accepts all the required details from the user (tenantId, appName, clinetId, clientSecret, emailId) as inputs and prints out the available MIP labels as output.

The output format:

{
  "siteId": "tenant_id",
  "labels": [
    {
      "id": "label_id",
      "name": "label_name",
      "method": "label_assignment_method", // always Priviledged
      "contentBits": type of content marking applied // always 3
    }
  ]
}

This utility will use application permissions in azure app and the following API permissions need to be provided to it (with admin consent):

  • Microsoft Graph → InformationProtectionPolicy.Read.All

  • Microsoft Information Protection Sync Service → UnifiedPolicy.Tenant.Read

Configurable Tags

The agent also provides the functionality to write fully customized metadata entries.

Below is an example of a configuration that defines a variety of custom metadata tags:

{
  "id": "8e808d02-fdc4-4fc9-8938-853b0a4a5d3d",
  ...
  "defaultSettings": {
    ....
    "writeMetadataTags": [
      {
        "tagName": "ClassificationTagSetId",
        "tagHandle": "e16409a7-1700-4153-9090-3955bc2f0ae8"
      },
      {
        "tagName": "ClassificationValue",
        "tagHandle": "{classification}"
      },
      {
        "tagName": "DistributionTagSetId",
        "tagHandle": "004dea33-8751-4399-a76e-95f371cb4119"
      },
      {
        "tagName": "DistributionValue",
        "tagHandle": "{distribution}"
      },
      {
        "tagName": "FileId",
        "tagHandle": "{fileid}"
      },
      {
        "tagName": "UserId",
        "tagHandle": "{user}"
      },
      {
        "tagName": "TagDateTime",
        "tagHandle": "{datetime}"
      },
      {
        "tagName": "{name:e16409a7-1700-4153-9090-3955bc2f0ae8:0}",
        "tagHandle": "{value:e16409a7-1700-4153-9090-3955bc2f0ae8:0}"
      },
       {
      "tagName": "classification_guid",
      "tagHandle": "{classification_guid}"
  },
  {
      "tagName": "compliance_guid",
      "tagHandle": "{compliance_guid}"
  },
  {
      "tagName": "distribution_guid",
      "tagHandle": "{distribution_guid}"
  }
    ]
  },
  ...
}

The tagHandle field within this configuration supports various placeholders that facilitate dynamic tagging:

  • {classification} - current classification value of the document

  • {distribution} - current distribution value of the document

  • {compliance} - current compliance value of the document

  • {datetime} - current date and time

  • {email} - email of the current user (only works when outlook plugin is installed)

  • {user} - id of the current user

  • {machineid} - id of the current machine

  • {fileid} - unique file id

  • {classification_raw} - This is the tag value without the tag alias

  • {compliance_raw} - This is the tag value without the tag alias

  • {distribution_raw} - This is the tag value without the tag alias

  • {classification_guid} - This is a uniquid id generated based on tagset id and tag name

  • {classification_guid} - This is a uniquid id generated based on tagset id and tag name

  • {classification_guid} - This is a uniquid id generated based on tagset id and tag name

Writing Metadata to files

As already mentioned in the beginning of this page, the agent uses a slightly different approach for each file to write/read the metadata. This flexibility is necessitated by the diverse range of file types and their various mechanisms for storing metadata. The following outlines the approaches used for different file categories:

.zip

For .zip files, the agent creates a file named GV_metadata.json within the classified zip file. The format is analogous to the example provided earlier.

.avi, .wav

D3v2 tags are utilized for .avi and .wav files. These tags are variable in size and strategically placed at the beginning of the file, facilitating immediate metadata loading, even during incremental file streaming. An ID3v2 tag comprises multiple optional frames, each encapsulating metadata up to 16 MB in size.

The written metadata can be accessed via the tag comment-pol.

.mp4, .m4p, .m4v

The written metadata can be accessed via the tag comment.

.mov,

These files leverage XMP tags. XMP, or Extensible Metadata Platform, is an Adobe-created standard for embedding metadata within digital files. This approach ensures standardized, structured metadata embedding that's readily extendable, encompassing information such as creator, copyright, editing data, and more.

The written metadata can be accessed via the tag classification.

.gif, .jpg, .jpeg, .png, .tiff, .tif

These files also leverage XMP tags to write metadata. The written metadata can be accessed via the tags description and user_comment.

.vsdx, .docx, .xlsx, .xlsm, .pptx

Microsoft Office allows for Custom Properties to be written to these documents, stored internally in XML format. The metadata will be written as separate key/values tag pairs instead of JSON.

.doc, .xls, .ppt

The older binary format also supports Custom Properties but without XML storage. The metadata will be written as separate key/values tag pairs instead of JSON.

Currently not supported on MacOS.

.dxf, .dwg

For CAD files, custom summary info is utilized to store metadata

Currently not supported on MacOS.

.pdf

For PDF files, the metadata is written into the Document Information Dictionary. The metadata will be written as separate key/values tag pairs instead of JSON.

PreviousEDC- Guide for Header Footer OptionsNextEDC Supported File Types

Last updated 12 months ago

Was this helpful?

Reference:

These file types make use of the ILST tag format. Further details can be found on the .

Apple metadata page
Concepts - Label metadata in the MIP SDK