SanteSuite Help Portal
  • SanteSuite Help Portal
    • Disclaimer
  • Product Overview
    • SanteSuite Products
      • Introducing SanteDB CDR
        • SanteDB Solutions
      • Master Patient Index - SanteMPI
      • Immunization Management System - SanteIMS
      • Privacy & Security - SanteGuard
    • SanteDB Versions
  • Architecture
    • SanteDB Architecture
      • SanteDB History
    • Solution Architecture
    • Software Architecture
      • Publish / Subscribe Architecture
      • New ADO (nuado)
      • Roadmap
    • Data & Information Architecture
      • Conceptual Information Model
        • Concept Dictionary
          • Data Dictionary
        • Acts
          • State Machine
          • Act Relationships
          • Mood Concepts
          • Class Concepts
          • Data Dictionary
        • Entities
          • State Machine
          • Entity Relationships
          • Determiner Codes
          • Class Codes
          • Data Dictionary
        • Null Reasons
        • Extended Data
      • Physical Model
        • Act Data Dictionary
        • Entity Data Dictionary
        • Concept Dictionary Data Dictionary
      • Data Storage Patterns
        • Master Data Storage
      • SanteDB Object Identifiers (OIDs)
    • Security Architecture
    • Privacy Architecture
    • Matching Engine
    • HIE & Interoperability
  • Installation
    • Installation
    • Releases
      • 3.0 Releases
      • Queenston Release
    • Quick Start Guide
      • Seeding ONC Patient Matching Data
    • Operationalizing SanteDB
      • Information Gathering & Analysis
      • Planning & Preparation Work
        • Pre-flight Checklist
        • Develop a Business Architecture
        • Develop an Information Architecture
          • Establishing Minimum Datasets
          • Identity Environment
        • Develop Operational Technology Architecture
        • Developing Privacy Impact Assessments
        • Develop Threat / Risk Assessments
      • Deployment
        • Pre-Flight Checklist
        • Installing Software
          • SanteDB iCDR Server
            • Installation on Virtual or Physical Environments
              • Installation on Microsoft Windows
              • Installation on Linux and Unix
            • Installation using Appliances
              • Using Docker Containers
                • Adding Sample Data
                • Feature Configuration
                • SanteDB within Instant OpenHIE
              • Using Virtual Appliances
            • Installation Qualification
              • Master Patient Index / Client Registry Qualification
                • MPI/CR Test Cases for HL7v2
                  • TEST: OHIE-CR-02-HL7v2
                  • TEST: OHIE-CR-03-HL7v2
                  • TEST: OHIE-CR-04-HL7v2
                  • TEST: OHIE-CR-05-HL7v2
                  • TEST: OHIE-CR-06-HL7v2
                  • TEST: OHIE-CR-07-HL7v2
                  • TEST: OHIE-CR-08-HL7v2
                  • TEST: OHIE-CR-09-HL7v2
                  • TEST: OHIE-CR-10-HL7v2
                  • TEST: OHIE-CR-11-HL7v2
                  • TEST: OHIE-CR-12-HL7v2
                  • TEST: OHIE-CR-13-HL7v2
                  • TEST: OHIE-CR-14-HL7v2
                  • TEST: OHIE-CR-15-HL7v2
                  • TEST: OHIE-CR-16-HL7v2
                  • TEST: OHIE-CR-17-HL7v2
                  • TEST: OHIE-CR-18-HL7v2
                  • TEST: OHIE-CR-01-HL7v2
                  • HL7v2 Test Cases Instructions
                • MPI/CR Test Cases for FHIR
                  • TEST: OHIE-CR-01-FHIR
                  • TEST: OHIE-CR-02-FHIR
                  • TEST: OHIE-CR-03-FHIR
                  • TEST: OHIE-CR-04-FHIR
                  • TEST: OHIE-CR-05-FHIR
                  • TEST: OHIE-CR-06-FHIR
                  • TEST: OHIE-CR-07-FHIR
                  • TEST: OHIE-CR-08-FHIR
                  • TEST: OHIE-CR-09-FHIR
                  • FHIR Test Cases Instructions
              • Security Administration Testing
                • Administrative Panel Validation
                  • User Management Tests
                    • TEST: SECURITY-UM-01
                    • TEST: SECURITY-UM-02
                    • TEST: SECURITY-UM-03
                    • TEST: SECURITY-UM-04
                    • TEST: SECURITY-UM-05
                    • TEST: SECURITY-UM-06
                    • TEST: SECURITY-UM-07
                    • TEST: SECURITY-UM-08
                    • TEST: SECURITY-UM-09
                    • TEST: SECURITY-UM-10
                    • TEST: SECURITY-UM-11
                    • TEST: SECURITY-UM-12
                    • TEST: SECURITY-UM-13
                    • TEST: SECURITY-UM-14
                    • TEST: SECURITY-UM-15
                    • TEST: SECURITY-UM-16
                    • TEST: SECURITY-UM-17
                    • TEST: SECURITY-UM-18
                    • TEST: SECURITY-UM-19
                    • TEST: SECURITY-UM-20
                    • TEST: SECURITY-UM-21
                    • TEST: SECURITY-UM-22
                    • TEST: SECURITY-UM-23
                    • TEST: SECURITY-UM-24
                    • TEST: SECURITY-UM-25
                    • TEST: SECURITY-UM-26
                    • TEST: SECURITY-UM-27
                    • TEST: SECURITY-UM-28
                    • TEST: SECURITY-UM-29
                    • TEST: SECURITY-UM-30
                    • TEST: SECURITY-UM-31
                    • TEST: SECURITY-UM-32
                    • TEST: SECURITY-UM-33
                    • TEST: SECURITY-UM-34
                    • TEST: SECURITY-UM-35
                    • TEST: SECURITY-UM-36
                    • TEST: SECURITY-UM-37
                  • Group/Role Management Tests
                    • TEST: SECURITY-GRM-01
                    • TEST: SECURITY-GRM-02
                    • TEST: SECURITY-GRM-03
                    • TEST: SECURITY-GRM-04
                    • TEST: SECURITY-GRM-05
                    • TEST: SECURITY-GRM-06
                    • TEST: SECURITY-GRM-07
                    • TEST: SECURITY-GRM-08
                    • TEST: SECURITY-GRM-09
                    • TEST: SECURITY-GRM-10
                    • TEST: SECURITY-GRM-11
                    • TEST: SECURITY-GRM-12
                    • TEST: SECURITY-GRM-13
                    • TEST: SECURITY-GRM-14
                    • TEST: SECURITY-GRM-15
                  • Security Policy Management Tests
                    • TEST: SECURITY-PM-01
                    • TEST: SECURITY-PM-02
                    • TEST: SECURITY-PM-03
                    • TEST: SECURITY-PM-04
                  • Device Management Tests
                    • TEST: SECURITY-DM-01
                    • TEST: SECURITY-DM-02
                    • TEST: SECURITY-DM-03
                    • TEST: SECURITY-DM-04
                    • TEST: SECURITY-DM-05
                    • TEST: SECURITY-DM-06
                    • TEST: SECURITY-DM-07
                    • TEST: SECURITY-DM-08
                    • TEST: SECURITY-DM-09
                  • Application Management Tests
                    • TEST: SECURITY-AM-01
                    • TEST: SECURITY-AM-02
                    • TEST: SECURITY-AM-03
                    • TEST: SECURITY-AM-04
                    • TEST: SECURITY-AM-05
                    • TEST: SECURITY-AM-06
                    • TEST: SECURITY-AM-07
                    • TEST: SECURITY-AM-08
          • SanteDB dCDR Instances
            • Installing Web Access Gateway
            • Installing Disconnected Gateway
            • Installing Disconnected Windows Application
            • Installing the dCDR SDK
            • User Interface App Settings
        • Configuring Privacy Controls
        • Post Deployment Tuning
        • Securing SanteDB Configuration
        • Securing SanteDB Databases
        • Securing SanteDB APIs
      • Rollout
    • Demonstration Environments
  • Operations
    • SanteDB Operations
    • Server Administration
      • Configuration Tool
        • Messaging Settings
          • HL7 Version 2 Service
          • FHIR R4 Service
          • GS1 BMS XML Service
          • Health Data Services Interface
          • Administrative Management Interface
        • Diagnostics Settings
        • Persistence Settings
          • Retention Policies
          • Resource Manager Settings
          • Database Connections
        • System Settings
        • Performance Settings
        • Security Settings
          • Data Privacy Filtering
          • Auditing Configuration
        • Operating System Settings
      • Server Configuration File
        • Service API Configuration
          • REST Service Configuration
        • Connection Strings
        • Application Service Context Configuration
        • Applet Configuration
        • Diagnostics Configuration
        • Data Quality Services
      • SanteDB iCDR Host Command
      • Backup Procedures
      • Log File Management
    • CDR Administration
      • SanteDB Administration Portal
        • Logging In
        • Managing Your Profile
        • System Administration
          • Jobs
          • Logs
          • Pub/Sub Manager
          • Server Status
          • Dispatcher Queue
          • Probes
        • Reference Data Administration
          • Place Administration
          • Facility Administration
          • Materials
          • Identity Domain Management
        • Concept Dictionary Administration
          • Concept Sets
          • Concepts
          • Code Systems
        • CDR Administration
          • Importing Data
          • Data Quality Rules
          • Extensions
          • Decision Support Library
            • View CDSS Library
            • Edit CDSS Library
          • Matching Configuration
            • Creating / Viewing Configurations
            • General Configuration
            • Blocking Configuration
            • Scoring Configuration
            • Classification Configuration
            • Testing Match Configuration
            • Match Configuration XML Definition
        • Data Warehouse
        • Reports Centre
        • Security Administration
          • Managing User Accounts
          • Managing Groups
          • Managing Policies
          • Managing Devices
          • Managing Applications
          • Reviewing Audits
      • SanteDB Administration Console
        • User Administration
        • Group / Role Administration
        • Policy Administration
        • Device Administration
        • Application Administration
    • Standard Operating Procedures
      • User Management SOPs
        • SOP: Onboarding Users
        • SOP: User Lockout
        • SOP: Deactivating Users
      • Role Management SOPs
        • SOP: Role Policy Assignment
        • SOP: Assigning Users to Roles
        • SOP: Creating New Roles
      • Device Management SOPs
        • SOP: Onboarding new HL7v2 Device
        • SOP: Onboarding new dCDR Device
      • Application Management SOPs
      • Standard Operating Procedure Template
  • User Guides & Training
    • SanteDB User Guides
    • Common User Interface Elements
    • SanteMPI
      • Getting Started with the MPI
      • SanteMPI Matches
      • SanteMPI Searching
      • SanteMPI Power Search
      • SanteMPI Patient Detail
        • Demographics Tab
          • Demographic Information Panel
          • Identifiers Panel
          • Related Persons Panel
          • Entity Relationships Panel
        • Master Data Management Tab
          • Records of Truth
        • Data Quality Tab
      • SanteMPI Dashboard
    • SanteEMR
      • EMR Administration
        • Care Pathways
        • Visit Types & Flows
        • Clinical Templates
    • SanteGuard
  • Developers
    • Extending & Customizing SanteDB
    • Getting Started
    • SanteDB XML Schemas
    • Applets
      • Applet Use and Lifecycle
      • Applet SDK Components
        • Applet Development Environment
        • SanteDB Brain Bug
        • Package Manager
        • BRE Debugger
      • Applet Structure
      • JavaScript API
      • Business Intelligence Assets
        • BI Asset Definitions
          • Data Sources
          • Parameters
          • Queries
          • Reference Data
          • Views
          • Data Marts
          • Reports
          • Indicators
        • BI Render Controls
      • Localization
      • Customization & Branding
      • Assets
        • HTML Assets
        • HTML Widgets
        • Virtual Assets
      • AngularJS
      • Clinical Decision-Support
        • CDSS Definitions
        • Legacy CDSS
      • Business Rules
      • Dataset Files
      • External Data Maps
      • Applet Solution Packages
      • JavaScript API Reference
      • Recipes
        • Adding Security Policy based on Occupation
        • Assigning a Home Facility
        • Codified Address
        • Generating ID on Registration
    • .NET Plugins
      • Plugin Libraries
      • Host Context & Lifecycle
      • Business Model Objects
      • Services & Configuration
        • Configuration
          • Configuration Panels
          • Custom Docker Feature Configuration
        • Passive Services
        • Daemon Services
        • Service Definitions
          • Ad-Hoc Cache Provider
          • Application Identity Provider
          • Audit Dispatch Service
          • Barcode Generator Provider
          • Business Rules Service
          • Care Plan Generation Service
          • CDSS Clinical Protocol Repository
          • Concept/Terminology Provider
          • Configuration Manager Service
          • Daemon Service
          • Data Archiving Service
          • Data Privacy Enforcement Provider
          • Data Signing Service
          • dCDR Subscription Definition Provider
          • dCDR Subscription Execution Provider
          • Device Identity Provider
          • Exec-Once Message Persistence
          • Freetext Search Provider
          • IDataPersistenceService{TData}
          • IDataPersistenceServiceEx{TModel}
          • IDataQualityConfigurationProviderService
          • Identity Domain Provider
          • IDispatcherQueueManagerService
          • IElevatableIdentityProviderService
          • IExtensionTypeRepository
          • IFastQueryDataPersistenceService{TEntity}
          • IFastQueryRepositoryService{TEntity}
          • IPersistableQueryRepositoryService{TEntity}
          • IPubSubManagerService
          • IRecordMergingService{T}
          • IRepositoryService
          • ISecurityRepositoryService
          • ISqlDataPersistenceService
          • IStoredQueryDataPersistenceService{TEntity}
          • ITagPersistenceService
          • ITemplateDefinitionRepositoryService
          • IThreadPoolService
          • IUnionQueryDataPersistenceService{TEntity}
          • IValidatingRepositoryService{TModel}
          • Job Management Service
          • Localization Provider
          • Mail Repository Provider
          • Name Alias Provider
          • Network Metadata Provider
          • Password Hashing Service
          • Password Validation Service
          • Policy Decision Provider (PDP)
          • Policy Enforcement Provider (PEP)
          • Policy Information Provider (PIP)
          • Primary Data Caching Provider
          • Query Result Scoring Provider
          • Record Matching Configuration Provider
          • Record Matching Provider
          • Record Merging Provider
          • Repository Service
          • Repository Service with Cancellation Support
          • Repository Service with Extended Functions
          • Repository Service with Notification Support
          • Resource Checkout/Locking Provider
          • Resource Patching Provider
          • Resource Pointer Service
          • Role Provider
          • Security Challenge Authentication Provider
          • Security Challenge Storage Provider
          • Session Authentication Provider
          • Session Storage Provider
          • Stateful Query Provider
          • Stock Management Provider
          • Symmetric Encryption Provider
          • TFA/MFA Secret Generator
          • User Identity Provider
          • User Notification Relay Provider
          • User Notification Template Filler
          • User Notification Template Repository
      • Plugin Metadata
      • Database Patching
      • Custom Match Algorithms
      • Unit Testing Framework
      • Digital Signing Requirements
      • .NET API Reference
    • Service APIs
      • OpenID Connect
        • Consent & Privacy
      • Business Intelligence Service (BIS)
      • Administration Management Interface (AMI)
      • Health Data Service Interface (HDSI)
        • HTTP Request Verbs
        • HDSI Query Syntax
          • Filter Functions
        • API Responses
        • Patching
        • MDM Extensions for HDSI
        • Synchronization API
        • Visual Resource Pointer API
      • HL7v2
        • Enabling HL7v2 Interfaces
        • HL7 Authentication
        • SanteDB HL7v2 Implementation
      • HL7 FHIR
        • Enabling FHIR Interfaces
        • SanteDB FHIR Implementation
          • FHIR Subscriptions
          • Related Persons
        • Extending FHIR Functionality
      • GS1 BMS XML
      • Examples
        • Connecting to the FHIR API
        • Obtaining A Session
    • SanteDB Software Publishers
  • Knowledgebase
    • Knowledgebase
      • SanteDB 2.1.161+ on PostgreSQL 10 returns "websearch_to_tsquery" error
      • Upgrading SanteDB iCDR with large databases
      • Upgrading Gateway to SanteDB Langley (v2.0.30+) from SanteDB Kelowna and earlier
      • When sending a National Scoped ID in PID-19 (SSN) you receive "AuthorityUuid" missing error
      • After Installing dCDR you receive an error on SecurityUser
      • When logging into the dCDR you are immediately logged back out
      • PostgreSQL connections fail with block message
      • Backing up HDSI server database
      • You receive an "out of disk space" error on the IMS server
      • Setting up the "sherlock" service
      • Diagnosing service port issues
      • You receive a certificate expired or certificate not found error on startup
      • After updating a database field the values are not reflected in the application layer
      • Diagnosing Submission Errors From Mobile Device
      • Migrating A SanteDB Server
      • Pruning and Cleaning the Database
      • Improving Download Speeds on Slow Connections
      • You receive a client already running error message
      • Resetting the configuration of the Windows & Linux Applications
      • After setting up the application data appears to be missing
      • Disconnected Client Window is Scaled Improperly
      • Fatal Error on Startup
      • Synchronization Issues on Mobile
      • Installation on Mono 4.x does not permit joining of realm
      • Creating A Public Backup
      • Installing the SanteDB Disconnected Server
    • Fixes & Patches
      • 20170721-01
      • 20170725-01
      • 20170803-01
      • 20170804-01
      • 20170913-01
      • 20171003-01
      • 20171011-01
      • 20171016-01
      • 20171023-01
      • 20171030-01
      • 20171108-01
      • 20171124-01
      • 20180126-01
      • 20180131-01
      • 20180211-01
      • 20181112-01
      • 20181113-01
      • 20190322-01
      • 20190522-01
      • 20190625-01
      • 20200105-01
  • OpenIZ
    • About OpenIZ
      • Upgrading from OpenIZ to SanteDB
    • FAQ
    • OpenIZ Demonstration Servers
Powered by GitBook
On this page
  • Blocking
  • Scoring
  • When Null Actions
  • Transforming Data
  • Classification
  • Bulk / Batch Matching

Was this helpful?

  1. Architecture

Matching Engine

SanteMPI uses the SanteDB matching engine. This page may be moved in the future to a common page as the SanteDB matching engine supports more than just Patient resources.

The matching engine in SanteDB is a multi-stage process whereby an inbound (or a target record) is compared with the current dataset within the CDR. The matching process occurs in three stages:

  • Blocking : In the blocking phase records are queried from the CDR's database infrastructure. The blocking phase is used to reduce the total number of records being scored, which can be more computationally expensive.

  • Scoring : In the scoring stage, the target record is compared to those records retrieved from the blocking phase. Each attribute configured is transformed/normalized and then a score applied for each attribute. Furthermore there are API hooks for implementing partial weighting/partial scoring of an attribute (based on frequency of data in database, NLP, or other future methods)

  • Classification : In the classification stage, each record's score (the sum of each attribute's score) is grouped into Match, NonMatch, and ProbableMatch. These thresholds are configured.

SanteDB allows for the configuration of multiple match configurations, and allows configuration for the "default" match configuration to be used for regular operation (whenever data is inserted, updated, etc.).

Blocking

In the blocking phase of the matching execution, the candidate record (named $input) is compared with records persisted in the database using one or more blocking configurations.

A blocking configuration is expressed using the HDSI Query Syntax which is translated into SQL. Blocking instructions can use any supported Filter Functions for the selected database.

Records are read from the database (as blocks) with multiple blocks being combined with either an intersect or union function. Blocks can be loaded as either:

  • SOURCE Records: Whereby the source records are loaded from the database as they were sent. Blocking in this mode is less CPU intensive and less database intensive, however relies on source information as a "picture" of what data is available for a patient.

  • MASTER Records: Where by the blocks are loaded using the MDM layer and are computed based on existing known and suspected links. This method of blocking more closely resembles what users see in the UI when MDM is enabled, however it does slow down matching performance as each record must be cross-referenced with the master data record. It also allows for matching based on records of truth.

Blocks from each statement are combined together to form a result set (in C# an IEnumerable<T>) which are passed into the scoring stage.

In the example below, the SanteDB matching engine will load records for an $input record by:

  • If the input record contains an SSN identifier , it will filter records in the database by matching SSN. It will then perform an MDM gather (i.e. the matching mode is performed on MASTER records) , these records will be UNION with

  • The results of a local query whereby:

    • If the $input has Given name, then the given name must match, AND

    • If the $input has a Family name, then the family name must match, AND

    • If the $input has a dateOfBirth then the date of birth must match, AND

    • If the $input has a gender concept then the gender must match

In pseudocode terms, the blocking query for an $input of John Smith, Male, SSN 203-204-3049 born 1980-03-02 would resemble:

SELECT * 
FROM 
    masters AS master
    LEFT JOIN locals AS local
WHERE
    local.identifier[SSN].value = '203-204-3049'
UNION ALL 
SELECT * 
FROM 
    locals AS local
WHERE 
    local.name.component[Given].value = 'John'
    AND local.name.component[Family].value = 'Smith'
    AND local.dateOfBirth = '1980-03-02' 
    AND local.gender = 'Male'

The actual SQL generated by the SanteDB iCDR is must more complex, this example illustrates the concept.

Scoring

During the scoring phase, the records from the blocking stage are compared to the $input record on an attribute by attribute basis using a collection of assertions about the attribute. If all assertions on an attribute evaluate to TRUE then the matchWeight score is added to that records total score, if the assertions are FALSE then the nonMatchWeight score (a negative score) is added to the record's total score.

Overall, the process of comparing a blocked record (named $block) with the $input record is:

  1. The scoring attribute may declare that it depends on another attribute being scored (i.e. don't evaluate the city attribute unless state attribute has passed. If the dependent attribute was not scored as a positive (match) then the current attribute is assigned the whenNull() score.

  2. The attribute path on the configuration is read from both $block and $input.

  3. The matching engine determines if either the $block or $input attribute value extracted is null. If either is null then the the When Null Actions is taken.

  4. The matching engine then determines if any transforms have been configured (see: Transforming Data). This is a process whereby data is extracted, tokenized, shifted, padded, etc. on both the $input and $block variables. The result of each transform is stored as the new attribute value in memory and the next transform is applied against the output of the previous.

  5. Finally the actual assertion is applied. The assertion is usually a binary logical operator (less than, equal, etc.) the result of which results in the matchWeight or nonMatchWeight being applied.

When Null Actions

There can occur instances of either the inbound record or the source records from the database which are missing the specified attribute. When this is the case the attribute's whenNull attribute controls the behavior of the evaluation of that attribute. The behaviors are:

@whenNull

Description

match

When the value is null in $block or $input, treat the attribute as a "match" (i.e. assume the missing data matches)

nonmatch

When the value is null in either $block or $input, treat the attribute as a non-match

(i.e. assume the missing data would not match)

ignore

When the value is null in $block or $input, don't evaluate the attribute. (i.e. neither the non-match or match scores should be considered). This is similar to zero however, the entire attribute is removed from consideration. The absolute score is added no value, however the total possible score is reduced.

disqualify

When the value is null in $block or $input disqualify the entire record from consideration

(i.e. it doesn't matter what the other attribute scores are, the record is considered

not a match)

zero

When the value is null in $block or $input the attribute should be scored a 0. This is different than applying the match or non-match weight which will be a positive or negative number respectively. This is similar to the ignore setting except zero does not impact the denominator of the score.

Transforming Data

The date_extract is applied to both records and then the assertion of "eq" is applied. The following data transforms are available in SanteDB.

Transform

Applies

Action

addresspart_extract

Entity Address

Extracts a portion of an address for consideration

date_difference

Date

Extracts the difference (in weeks, months, years, etc.) between the A record and B record.

date_extract

Date

Extracts a portion of the date from both records.

name_alias

Entity Name

Considers any of the configured aliases for the name meeting a particular threshold of relevance (i.e. Will = Bill is stronger than Tess = Theresa)

abs

Number

Returns the absolute value of a number

dmetaphone

Text

Returns the double metaphone code (with configured code length) from the input string

length

Text

Returns the length of text string

levenshtein

Text

Returns the levenshtein string difference between the A and B values.

sorensen_dice

Text

Returns the Sorensen Dice coefficient of text content A and B values.

jaro_winkler

Text

Returns the Jaro-Winkler (with default threshold of 0.7) between A and B values.

metaphone

Text

Returns the metaphone phonetic code for the attribute

similarity

Text

Returns a %'age (based on levenshtein difference) of string A to string B (i.e. 80% similar or 90% similar)

soundex

Text

Extracts the soundex codification of the input values.

substr

Text

Extracts a portion of the input values

tokenize

Text

Tokenizes the input string based on splitting characters, the tokenized values can then be transformed independently using any of the transforms listed in this table.

timespan_extract

TimeSpan

Extracts a portion of a timespan such as minutes, hours, seconds.

Classification

The scores for each of the scored attributes are then summed for each $block record and the block is classified as:

  • Match: The two records are determined to agree with one another according to configuration. the matching engine is instructing whatever called it (the MDM, MPI, etc.) that the two should be linked/merged/combined/etc.

  • Possible: The two records are not "non-matches" however there is insufficient confidence that the two are definite matches. Whatever called the matching operation should flag the two for manual reconciliation.

  • Non-Match: The two records are definite non-matches.

Bulk / Batch Matching

Remove / Delete:

Keep:

  • Suspected Client Links (MDM-Candidate links)

  • Automatic Master Links

  • Verified Ignore

  • Verified Matches

  • System Master Links

After the suspected truth is cleared, the job will begin the process of re-matching the registered dataset for SanteDB. The matching process is multi-threaded, and designed to ensure that the machine on which the match process is as efficient as possible. To do this, the following pattern is used:

The batch matching process registers 4 primary threads on the actively configured thread pool to control the match process:

  • Gather Thread: This worker is responsible for querying data from the source dataset in 1,000 record batches. The rate at which the records are loaded will depend on the speed of the persistence layer (SanteDB 2.1.x or 2.3.x) as well as the disk performance of the storage technology.

  • Match Thread: This worker is responsible for breaking the 1,000 record batches into smaller partitions of records (depending on CPU of host machine). The configured matching algorithm is then launched for each record in the batch on independent threads (i.e. matching is done concurrently on the host machine).

  • Writer Thread: Once the match thread workers have completed their matching task, the results are queued for the writer thread. The writer thread is responsible for committing the matches to the read/write database.

  • Monitor Thread: The monitoring thread is responsible for updating the status of the job.

The performance of the batch matching will depend on the speed of the host machine as well as the version of SanteDB that is being used.

SanteSuite's community server was used for testing in the following configuration:

  • Application Server:

    • 4 VCPU

    • 4 GB RAM

    • Non-persistent (ram-only) REDIS Cache

  • Database Server:

    • 12 VCPU

    • 12 GB RAM

    • RAID 1+0 SSD disk infrastructure (4+4 SSD)

The versions of SanteDB tested yielded the following results:

  • Version < 2.1.160 of SanteDB: ~28,000 records per hour

  • Version > 2.1.165 of SanteDB: ~50,000 records per hour

  • Version 2.3.x of SanteDB (internal alpha): ~100,000 records per hour

It is important to ensure that your host system is configured such that the thread pool (accessed through the Probes administrative panel) has at minimum, 5 available worker threads to complete batch matching.

PreviousPrivacy ArchitectureNextHIE & Interoperability

Last updated 3 years ago

Was this helpful?

Each data type which is registered in the pattern has a corresponding registered. This job resets the suspected ground truth using the following rules:

Master Data Management
Match Job