Mapped Fields
This guide provides an overview of the Mapped Fields available in Canopy Processing. These fields are automatically extracted from documents and mapped during the processing stage. Users can view, search, filter, and report on these fields throughout the review workflow.
Field Name | Type of Field | Normalizer | Description | Example Value |
---|---|---|---|---|
Alt Workflow Reviewer | Keyword | Names of reviewers who review documents in the Alt Workflow batch type. | Jane Doe; John Kim | |
Alt Workflow Reviewer Email | Keyword | The emails of the reviewers who review document in Alt Workflow batch type. | Gareth Keenan keenan.gareth@gmail.com |
|
Alt Workflow Tags | Keyword | Lowercase | Tags created by users and applied to batches in the Alt Workflow batch type. | Large Document |
Archival Created Date/Time | Date | The file created date/time as recorded by the file system and preserved within the Zip or archive container. | 2019-12-13T19:53:10Z | |
Archival Modified Date/Time | Date | The file last modified date/time as recorded by the file system and preserved within the Zip or archive container. | 2019-12-13T19:53:10Z | |
Audio Duration | Long | The length of audio or video files in seconds. | 120 | |
Batch | Keyword | An individual batch name within a batch set. | prefix-1 , TXT-1 , A01-2 |
|
Custodian | Keyword | Lowercase | All custodians, de-duplicated and primary, associated with a document. | Jane Doe |
Custom Detection | Keyword | Lowercase | Labels or Tags for each custom detection rule that returns a hit on the document. | US_SSN_VLC ; US_Passport_LC |
Document ID | Keyword | Lowercase | Canopy’s unique Search ID associated with a document. | 2FG2G55FGF |
Email BCC | Keyword | Lowercase | The names, when available, and email addresses of the Blind Carbon Copy recipients of an email message. | Gareth Keenan keenan.gareth@gmail.com |
Email CC | Keyword | Lowercase | The names, when available, and email addresses of the Carbon Copy recipients of an email message. | Gareth Keenan keenan.gareth@gmail.com |
Email Conversation Index | Keyword | The email thread created by the email system. It refers to a hidden metadata field in an email, especially in Microsoft Outlook or Exchange Server environments. | AQHc5fUAEuRWmZ2a2k6c7FyCkdK6R6kB |
|
Email Created Date/Time | Date | The date/time at which an email was created by the user. | 2019-12-13T19:53:10Z | |
Email From | Keyword | Lowercase | The name, when available, and email address of the sender of an email message. | Gareth Keenan keenan.gareth@gmail.com |
Email Message ID | Keyword | The message number created by an email application and extracted from the email’s metadata. | 1ee10ea6-d9c0-aab2-1940-f05f0deef8d8@cu.edu | |
Email Modified Date/Time | Date | The date/time an email was last modified. | 2019-12-13T19:53:10Z | |
Email Provider Submit Date/Time | Date | The date/time the email server sent the email. | 2019-12-13T19:53:10Z | |
Email Delivery Date/Time | Date | The timestamp that a recipient’s mail server records when it successfully accepts an email from the previous mail server in the delivery chain. | 2019-12-13T19:53:10Z | |
Email Recipient Count | Long | Lowercase | The number of recipients in an email. | 4 |
Email Report Date/Time | Date | The date/time that the recipient’s mail server reported the user likely opened the email. | 2019-12-13T19:53:10Z | |
Email Send Date/Time | Date | The timestamp recorded by the sender’s email client (e.g., Outlook, Gmail in a web browser, Apple Mail) at the exact moment the sender hits the “Send” button. | 2019-12-13T19:53:10Z | |
Email To | Keyword | Lowercase | The name, when available, and email address of the recipient/recipients of an email message. | Gareth Keenan keenan.gareth@gmail.com |
Family ID | Keyword | The search ID of the first file in file family: email or loose file (word, ppt, pdf, etc.). This file will never be a container file. | 2FG2G55FGF | |
File Created Date/Time | Date | The date/time the file was created. | 2019-12-13T19:53:10Z | |
File Modified Date/Time | Date | The date/time the file was last saved. | 2019-12-13T19:53:10Z | |
File Size | Long | The size of the file. | 10.92 KB, 853 Bytes | |
File Type | Keyword | Lowercase | The text extension of the file. | doc, pdf |
Image Classification | Keyword | Lowercase | The text identification of image type. | Social Security Cards |
Image Dimension | Long | The dimension of image in pixels (Width * Height). | 1000 x 1000 | |
Language | Keyword | The predominant language contained in documents. | English, French | |
Language Confidence (in %) | Long | The % of confidence level in language detection. | 80 | |
MD5 Hash | Keyword | The MD5 hash value of the file. NOTE: The Canopy application calculates and uses the SHA256 hash, which is our recommended standard for data integrity. For compatibility with some client tools and processes, Canopy also provides MD5 and SHA1 hashes. WARNING: MD5 and SHA1 are cryptographically broken and should not be used for security-sensitive purposes. They are highly susceptible to collision vulnerabilities, meaning an attacker can create two entirely different files that produce the exact same hash. Relying on these hashes can expose you to significant security risks, including data tampering and impersonation. |
5d41402abc4b2a76b9719d911017c592 |
|
Master Created Date/Time | Date | The Master Created Date/Time derived all the other date fields collected from the document. The Master Date is populated by the first date present in this prioritized list: 1. meta.eml_CreationTime 2 earliest eml date/time field from all available 3. meta.metadata_created_datetime 4. parent file Master Created Date/Time 5. meta.archive_created_datetime (date stored for file inside the archive) 6 meta.uploaded |
2019-12-13T19:53:10Z | |
Master Created Date/Time Source | Keyword | Lowercase | The name of the field used to populate the Master Created Date/Time. | Email Created Date/Time |
Master Created Date/Time Source ID | Keyword | Lowercase | The document search ID associated with the Master Created Date/Time Source. | 2FG2G55FGF |
Master Modified Date/Time | Date | The Master Modified Date/Time derived all the other date fields collected from the document. The Master Date is populated by the first date present in this prioritized list: 1. meta.eml_ClientSubmitTime 2. meta.eml_LastModificationTime 3. oldest eml date/time field from all available 4. meta.metadata_modified_datetime 5. parent file Master Modified Date/Time meta.archive_lastmodified_datetime (date stored for file inside the archive) 7. meta.uploaded |
2019-12-13T19:53:10Z | |
Master Modified Date/Time Source | Keyword | Lowercase | The name of the field used to populate the Master Modified Date/Time. | Email Modified Date/Time |
Master Modified Date/Time Source ID | Keyword | Lowercase | The document search ID associated with the Master Modified Date/Time Source Field. | 2FG2G55FGF |
Name | Text | The file name (file_name) or, in the case of emails, the email subject (subject). | team_meeting_report | |
Page Count | Long | The number of pages contained within the document. | 10 | |
Parent ID | Keyword | The search ID of the file from which a file was extracted. This can be an attachment, an embedding, or contained in a container file. | 2FG2G55FGF | |
PII Elements | Keyword | Lowercase | A list of PII element types detected in the file. | Name; Phone Number; SSN |
QA Change Reason | Keyword | The applied change reason tagged by QA. | ||
QA Created Date/Time | Date | The date/time the QA batch was created. | 2019-12-13T19:53:10Z | |
QA Reviewer | Keyword | Names of reviewers who review documents in the QA batch type. | Jane Doe | |
QA Reviewer Email ID | Keyword | The emails of the reviewers who review document in QA batch type. | Gareth Keenan keenan.gareth@gmail.com |
|
QA Status | Text | The QA status of the document. The status could be either “reviewed,” “QA pending,” or “not QA batched. | “Reviewed”, “QA pending”, or “not QA batched” | |
Reviewer | Keyword | Names of reviewers who review documents in the Review batch type. | Jane Doe | |
Reviewer Email ID | Keyword | The emails of the reviewers who review document in Review batch type. | Gareth Keenan keenan.gareth@gmail.com | |
SHA1 Hash | Keyword | The SHA1 hash value of the file. NOTE: The Canopy application calculates and uses the SHA256 hash, which is our recommended standard for data integrity. For compatibility with some client tools and processes, Canopy also provides MD5 and SHA1 hashes. WARNING: MD5 and SHA1 are cryptographically broken and should not be used for security-sensitive purposes. They are highly susceptible to collision vulnerabilities, meaning an attacker can create two entirely different files that produce the exact same hash. Relying on these hashes can expose you to significant security risks, including data tampering and impersonation. |
aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d |
|
SHA256 Hash | Keyword | The SHA256 hash value of the file. | c604a6840d44c89df5ff8b5a5c5e943be565735f4bbeb3ddb692ff58bbf6993c |
|
Source | Keyword | The original source container of the file uploaded from UI | Master Demo.zip | |
Source Path | Keyword | Lowercase | The path of the file present in container | Master Demo.zip/Master Demo/Demo Files/Long Thread/Threading.pst/Top of Outlook data file/Inbox/09A2AC2A5A11124AAFDF2BD331820B3757AFF8FCffxexmps04FFXCOFAIRFAXVAUS-2098500.eml |
Tags | Keyword | Lowercase | The list of user-created tags applied to the file. | Sensitive ; Public ; Private |
Text Length | Long | The length of documents in number of text characters. | 989 | |
Total PII Count | Long | The number of all PII detected in a document. | 10 | |
Uploaded End Date/Time | Date | The date/time when the document upload is completed in Canopy. | 2020-01-30 00:00:00.000 |