Skip to main content
Version: latest

Document Comparison

Business scenario description

Contract review is a key task for a company to prevent risks. Especially in enterprises and institutions with high-risk control requirements, such as trusts, funds, securities. Contract review requirements are extremely strict. Once the review fails, the loss is difficult to estimate.

The general contract signing process is:

  1. Party A and Party B confirm the electronic version of the contract online.
  2. Party B will print out the electronic contract and send it after stamping.
  3. After Party A receives the paper contract, they will compare it with the electronic version first to confirm that the content and terms have not been changed and send it back to Party B with a stamp.
  4. After Party B receives the paper contract, they will reconfirm with the electronic version and then archive it.

An important step in the process is contract comparison. Traditional contract comparison requires legal personnel to compare word by word, which is not only inefficient, but also causes most of the energy of legal personnel to be spent on non-professional issues. Due to the influence of business literacy, physical strength, mental state factors, it is difficult to ensure that there is absolutely no error in the audit.

The Document Comparison AI capability can be used in the contract review scenario of the legal department to assist in the manual review of contract terms, quickly locate whether the contract has been maliciously tampered with, and improve the work efficiency of legal personnel.

Features

  • Various documents: pdf, doc/docx, pictures (jpeg, jpg, png, bmp, tiff).
  • Full text comparison: Support content comparison of documents with different pages.
  • The operation is simple: Through the changes of the marked documents in different colours, the results can be quickly located.
  • Intelligent intervention: Provide an intelligent intervention model that merges differences and removes redundant information such as spaces based on semantic information. It also provides settings for ignoring punctuation marks for users to choose.

Concepts

We call a document comparison a comparison task. The reference document is the document used as a reference in the comparison task, and the comparison document is the document that needs to find differences in the comparison task.

The difference/diff is the change in the content of the compared document relative to the content of the reference document. We categorize the differences into three categories:

  • New added: Based on the content of the reference document, compare the newly added content in the comparison document.
  • Delete: Based on the content of the reference document, compare the deleted content in the comparison document.
  • Modification: Based on the content of the reference document, compare the modified content in the comparison document.

Instructions

Build new model

1 Log in to the platform, go to the AI capability page from the following path Document Understand/Document Comparison.

2 Click the “Create Model” button, input the model’s name, select the OCR engine version, and submit for confirmation.

Model setting

1 Click the setting button of the created model, go into the model configuration page

2 Configure intelligent intervention strategies

  • Configure whether to ignore punctuation, check it and click Save and the configuration will take effect.
  • Ignoring punctuation refers to ignoring this difference when the difference content is all punctuation. If the difference content contains other readable characters, it does not belong to the punctuation difference.
  • The system will only intervene in the results after each comparison is completed.

Comparison test

1 Select the created model and click the test button in the upper right corner docDiff1

2 Upload reference documents and comparison documents, click to start comparison

Note: When there are too many pages in the comparison file, please wait patiently. The comparison progress will be displayed on the page.

docDiff2

Visualized results display

The visual result of comparison will show the following information:

  • Overall comparison results
    • Differences in page numbers between two documents
    • The total number of differences found in the comparison (the ignored differences are not included)
    • oTotal number of differences ignored
  • Existing differences
    • Display all differences between two documents in units of differences
    • Different colours represent different types of differences, red represents deletion, orange represents modification, and green represents addition
    • Click the difference content, the two documents will be located at the same time where the difference occurred, and the difference content will be highlighted.
  • Ignored differences
    • Display all ignored differences of two documents in units of differences.
    • Click the difference content, the two documents will be located at the same time where the difference occurred, and the difference content will be highlighted.

docDiff3

Comparison task list

1 After logging in to the platform, enter the AI capability page from the following path Document Understand/Document Comparison

2 Open the created model and enter the task list page of the current model

3 All comparison tasks created through new tasks and API calls will appear in this task list

docDiff4

Submit a comparison task.

1 Click New Task to open the new comparison task pop-up window

2 Select the files to be compared

3 Confirm reference document and comparison document.

The reference document can be adjusted through the Select as reference document button under the document type

docDiff5

View the results of the comparison task.

1 Click the operation item details of the task list to view the comparison result of the current comparison task

2 The visual content of the comparison result is the same as that of the tester.

Download comparison results

1 Click on the operation item of the task list More/Download the comparison result to download the comparison result of the current comparison task

2 Description of comparison results

The comparison result downloaded for each comparison task contains 3 files:

  • Overview of comparison results.txt
    • Explanation of the overall comparison result
    • Detailed comparison results (ignored differences not included)
  • Reference documents
    • PDF file with highlight
  • Comparison documents
    • PDF file with highlight

Batch operations

1 Click the batch operation on the task list page

docDiff6

2 Select multiple comparison tasks

3 Select the corresponding batch operation.

Note: When the selected task cannot be operated correspondingly, the platform will automatically ignore the task. For example, when downloading multiple tasks in batches, a task whose status is being compared, the batch downloading will automatically skip this task.

docDiff7