From 7f0f85f8573dbf66311bf9b12b8a2a5b07cda318 Mon Sep 17 00:00:00 2001
From: Katy Sadowski You may inspect the failing rows using the following SQL: You may build upon this query by joining the relevant
- You may build upon this query by joining the
+ Also note that when updating the OMOP vocabularies, previously
+standard concepts could have been become non-standard and need
+remapping. Often this remapping can be done programatically, by
+following the ‘Maps to’ relationship to the new standard concept. This check failure means that the failing rows will not be picked up
-in a standard OHDSI analysis. It is highly recommended to work with your
-ETL team or data provider, if possible, to resolve this issue. However, you may work around it at your own risk by determining
whether or not the affected rows are relevant for your analysis. Here’s
an example query you could run to inspect failing rows in the
diff --git a/docs/articles/checks/measureConditionEraCompleteness.html b/docs/articles/checks/measureConditionEraCompleteness.html
index e8ce2a9f..2689e33c 100644
--- a/docs/articles/checks/measureConditionEraCompleteness.html
+++ b/docs/articles/checks/measureConditionEraCompleteness.html
@@ -180,7 +180,7 @@
Level: TABLE Level: TABLE Various ETL issues may result in persons missing records in a given
+event table: Note that in some cases, the failure threshold for this check may
-need to be adjusted according to completeness expectations for a given
-data source. Failures of this check on fields required in the CDM specification
-are redundant with failures of Failures of this check on required fields are redundant with failures
+of ETL developers have 2 main options for the use of this check on
non-required fields: Unexpectedly missing values should be investigated for a potential
-root cause in the ETL. For expected missingness, rows that violate this
-check in non-required fields are acceptable but should be clearly
-communicated to data users so that they can know when and when not to
-expect data to be present in each field. To avoid confusion for users,
-however, thresholds should be modified to avoid check failures at
-expected levels. This check informs you of the level of missing data in each column of
-the CDM. If data is missing in a required column, see the isRequired
-documentation for more information. The interpretation of a check failure on a non-required column will
depend on the context. In some cases, the threshold for this check will
have been very deliberately set, and any failure should be cause for
diff --git a/docs/articles/checks/plausibleAfterBirth.html b/docs/articles/checks/plausibleAfterBirth.html
index 610bc1e4..cacfbb0a 100644
--- a/docs/articles/checks/plausibleAfterBirth.html
+++ b/docs/articles/checks/plausibleAfterBirth.html
@@ -181,7 +181,7 @@ Recall that the Recall that the A failure of this check usually indicates a failure to map a source
value to an OMOP concept. In some cases, such a failure can and should
be remediated in the concept-mapping step of the ETL. In other cases, it
@@ -259,25 +253,24 @@ The query results will give you a summary of the source codes which
failed to map to an OMOP concept. Inspecting this data should give you
an initial idea of what might be going on. If source values return legitimate matches on concept_code, it’s
possible that there is an error in the concept mapping step of your ETL.
-Please note that while the Add a New Data Quality Check
Don Torok
- 2024-06-29
+ 2024-07-11
Source: vignettes/AddNewCheck.rmd
AddNewCheck.rmd
Check Status Definitions
Dmitry Ilyn,
Maxim Moinat
- 2024-06-29
+ 2024-07-11
Source: vignettes/CheckStatusDefinitions.rmd
CheckStatusDefinitions.rmd
Data Quality Check Type Definitions
Clair
Blacketer
- 2024-06-29
+ 2024-07-11
Source: vignettes/CheckTypeDescriptions.rmd
CheckTypeDescriptions.rmd
Getting Started
Clair
Blacketer
- 2024-06-29
+ 2024-07-11
Source: vignettes/DataQualityDashboard.rmd
DataQualityDashboard.rmd
Running the DQD on a Cohort
Clair
Blacketer
- 2024-06-29
+ 2024-07-11
Source: vignettes/DqdForCohorts.rmd
DqdForCohorts.rmd
Running the DQD in SqlOnly mode
Maxim
Moinat
- 2024-06-29
+ 2024-07-11
Source: vignettes/SqlOnly.rmd
SqlOnly.rmd
Failure Thresholds and How to Change Them
Clair
Blacketer
- 2024-06-29
+ 2024-07-11
Source: vignettes/Thresholds.rmd
Thresholds.rmd
Index
Katy
Sadowski
- 2024-06-29
+ 2024-07-11
Source: vignettes/checkIndex.Rmd
checkIndex.Rmd
cdmDatatype
Katy
Sadowski
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/cdmDatatype.Rmd
cdmDatatype.Rmd
cdmField
Heidi Schmidt,
Katy Sadowski
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/cdmField.Rmd
cdmField.Rmd
cdmTable
John Gresh,
Katy Sadowski
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/cdmTable.Rmd
cdmTable.Rmd
fkClass
Clair
Blacketer, Katy Sadowski
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/fkClass.Rmd
fkClass.Rmd
fkDomain
Clair
Blacketer, Katy Sadowski
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/fkDomain.Rmd
fkDomain.Rmd
isForeignKey
Dmytry
Dymshyts, Katy Sadowski
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/isForeignKey.Rmd
isForeignKey.Rmd
User GuidanceViolated rows query
-- @cdmTableName.@cdmFieldName is the x_concept_id or x_source_concept_id field in a CDM table
--- Inspect the contents of the x_source_value field to investigate the source of the error
+
-- @cdmTableName.@cdmFieldName is the _concept_id or _source_concept_id field in a CDM table
+-- Inspect the contents of the _source_value field to investigate the source of the error
SELECT
'@cdmTableName.@cdmFieldName' AS violating_field,
diff --git a/docs/articles/checks/isPrimaryKey.html b/docs/articles/checks/isPrimaryKey.html
index 8fdef03a..cc5d3275 100644
--- a/docs/articles/checks/isPrimaryKey.html
+++ b/docs/articles/checks/isPrimaryKey.html
@@ -181,7 +181,7 @@
isPrimaryKey
John Gresh,
Katy Sadowski
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/isPrimaryKey.Rmd
isPrimaryKey.Rmd
isRequired
Katy
Sadowski
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/isRequired.Rmd
isRequired.Rmd
ETL DevelopersFill in the missing values:
isStandardValidConcept
Stephanie Hong,
Katy Sadowski
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/isStandardValidConcept.Rmd
isStandardValidConcept.Rmd
Definitionhttps://ohdsi.github.io/CommonDataModel/dataModelConventions.html#Mapping.
X_concept_id
column, and all X_concept_id
-columns in those tables._concept_id
) columns in all event tables.
ETL DevelopersX_concept_id
-column with 0. See the Book of OHDSI for additional guidance on the
-concept mapping process: https://ohdsi.github.io/TheBookOfOhdsi/ExtractTransformLoad.html#step-2-create-the-code-mappings
+for a source code, you MUST populate its
_concept_id
column
+with 0. See the Book of OHDSI for additional guidance on the concept
+mapping process: https://ohdsi.github.io/TheBookOfOhdsi/ExtractTransformLoad.html#step-2-create-the-code-mappings
SELECT
'@cdmTableName.@cdmFieldName' AS violating_field,
- cdmTable.*
-FROM @schema.@cdmTableName cdmTable
- JOIN @vocabDatabaseSchema.concept co ON cdmTable.@cdmFieldName = co.concept_id
-WHERE co.concept_id != 0
- AND (co.standard_concept != 'S' OR co.invalid_reason IS NOT NULL)
X_concept_id
and X_source_concept_id
columns
-to the concept table and inspecting their names and vocabularies. If the
-X_source_concept_id
correctly represents the source code in
-X_source_value
, the fix will be a matter of ensuring your
-ETL is correctly using the concept_relationship table to map the source
-concept ID to a standard concept via the ‘Maps to’ relationship. If you
-are not populating the X_source_concept_id
column and/or
-are using an intermediate concept mapping table, you may need to inspect
-the mappings in your mapper table to ensure they’ve been generated
-correctly using the ‘Maps to’ relationship for your CDM’s vocabulary
-version._source_concept_id
column to the concept table and
+inspecting the source concepts from which the failing non-standard
+concepts were mapped. If the _source_concept_id
correctly
+represents the source code in _source_value
, the fix will
+be a matter of ensuring your ETL is correctly using the
+concept_relationship table to map the source concept ID to a standard
+concept via the ‘Maps to’ relationship. If you are not populating the
+_source_concept_id
column and/or are using an intermediate
+concept mapping table, you may need to inspect the mappings in your
+mapper table to ensure they’ve been generated correctly using the ‘Maps
+to’ relationship for your CDM’s vocabulary version.Data Users
measureConditionEraCompleteness
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/measureConditionEraCompleteness.Rmd
measureConditionEraCompleteness.Rmd
measurePersonCompleteness
Katy
Sadowski
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/measurePersonCompleteness.Rmd
measurePersonCompleteness.Rmd
2024-06-29
Summary
-
Context: Validation
Category: Completeness
Subcategory:
Severity: CDM convention ⚠ Characterization ✔
Context: Validation
Category: Completeness
Subcategory:
Severity: CDM convention ⚠ (for observation period),
+Characterization ✔ (for all other tables)Description
@@ -220,8 +221,12 @@
DefinitionAll other tablesAction on persons missing records in other clinical event tables will
depend on the characteristics of the source database. In certain cases,
missingness is expected – some persons may just not have a given type of
-data available in the source. In others, various ETL issues may result
-in persons missing records in a given event table:
+data available in the source. For instance, in most data sources, one
+would expect most patients to have at least one visit, diagnosis, and
+drug, while one would not expect every single patient to have
+had a medical device.
+
@@ -286,9 +295,6 @@ All other tablesperson_source_value to trace back to these
persons’ source data to identify source data records potentially missed
by the ETL.
-
measureValueCompleteness
Katy
Sadowski
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/measureValueCompleteness.Rmd
measureValueCompleteness.Rmd
Definition
@@ -248,13 +252,13 @@
Violated rows query
ETL Developers
-isRequired
. See isRequired documentation for more
-information.isRequired
. See isRequired
+documentation for more information.
-
ETL Developers
Data Users
isRequired
documentation for more information.
plausibleAfterBirth
Maxim Moinat,
Katy Sadowski
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/plausibleAfterBirth.Rmd
plausibleAfterBirth.Rmd
plausibleBeforeDeath
Maxim
Moinat
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/plausibleBeforeDeath.Rmd
plausibleBeforeDeath.Rmd
plausibleGender
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/plausibleGenderUseDescendants.Rmd
plausibleGenderUseDescendants.Rmd
plausibleStartBeforeEnd
Maxim
Moinat
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/plausibleStartBeforeEnd.Rmd
plausibleStartBeforeEnd.Rmd
plausibleTemporalAfter
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/plausibleTemporalAfter.Rmd
plausibleTemporalAfter.Rmd
plausibleUnitConceptIds
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/plausibleUnitConceptIds.Rmd
plausibleUnitConceptIds.Rmd
plausibleValueHigh
Dymytry
Dymshyts
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/plausibleValueHigh.Rmd
plausibleValueHigh.Rmd
Definition
diff --git a/docs/articles/checks/plausibleValueLow.html b/docs/articles/checks/plausibleValueLow.html
index 9cdbe4b0..40d10b8a 100644
--- a/docs/articles/checks/plausibleValueLow.html
+++ b/docs/articles/checks/plausibleValueLow.html
@@ -181,7 +181,7 @@
plausibleValueLow
Dymytry
Dymshyts
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/plausibleValueLow.Rmd
plausibleValueLow.Rmd
Definition
diff --git a/docs/articles/checks/sourceConceptRecordCompleteness.html b/docs/articles/checks/sourceConceptRecordCompleteness.html
index b5f59d0b..2d4effb4 100644
--- a/docs/articles/checks/sourceConceptRecordCompleteness.html
+++ b/docs/articles/checks/sourceConceptRecordCompleteness.html
@@ -181,7 +181,7 @@
sourceConceptRecordCompleteness
Katy
Sadowski
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/sourceConceptRecordCompleteness.Rmd
sourceConceptRecordCompleteness.Rmd
DefinitionSource
concept mapping
X_source_concept_id
) columns in all event tables._source_concept_id
) columns in all event tables.
-
User Guidance
ETL Developers
-X_source_concept_id
columns should
+_source_concept_id
columns should
contain the OMOP concept representing the exact code used in the source
-data for a given record: “If the ETL DevelopersTo investigate the failure, run the following query:
SELECT
concept.concept_name AS standard_concept_name,
- cdmTable.X_concept_id, -- standard concept ID field for the table
+ cdmTable._concept_id, -- standard concept ID field for the table
c2.concept_name AS source_value_concept_name,
- cdmTable.X_source_value, -- source value field for the table
+ cdmTable._source_value, -- source value field for the table
COUNT(*)
FROM @cdmDatabaseSchema.@cdmTableName cdmTable
-LEFT JOIN @vocabDatabaseSchema.concept ON concept.concept_id = cdmTable.X_concept_id
+LEFT JOIN @vocabDatabaseSchema.concept ON concept.concept_id = cdmTable._concept_id
-- WARNING this join may cause fanning if a source value exists in multiple vocabularies
-LEFT JOIN @vocabDatabaseSchema.concept c2 ON concept.concept_code = cdmTable.X_source_value
+LEFT JOIN @vocabDatabaseSchema.concept c2 ON concept.concept_code = cdmTable._source_value
AND c2.domain_id = <Domain of cdmTable>
WHERE cdmTable.@cdmFieldName = 0
--- AND cdmTable.value_as_number IS NOT NULL -- uncomment for unit_concept_id checks
-GROUP BY 1,2,3
-ORDER BY 4 DESC
X_source_concept_id
fields are
+Please note that while the _source_concept_id
fields are
technically not required, it is highly recommended to populate them with
OMOP concepts whenever possible. This will greatly aid analysts in
understanding the provenance of the data.Data UsersJared
Houghtaling, Clair Blacketer
-
2024-06-29
+ 2024-07-11
Source: vignettes/checks/sourceValueCompleteness.Rmd
sourceValueCompleteness.Rmd
DefinitionDefinitionKaty
Sadowski
-
2024-06-29
+ 2024-07-11
Source: vignettes/checks/standardConceptRecordCompleteness.Rmd
standardConceptRecordCompleteness.Rmd
DefinitionDefinitionETL DevelopersTo investigate the failure, run the following query:
SELECT
concept_name,
- cdmTable.X_source_concept_id, -- source concept ID field for the table
- cdmTable.X_source_value, -- source value field for the table
+ cdmTable._source_concept_id, -- source concept ID field for the table
+ cdmTable._source_value, -- source value field for the table
COUNT(*)
FROM @cdmDatabaseSchema.@cdmTableName cdmTable
-LEFT JOIN @vocabDatabaseSchema.concept ON concept.concept_id = cdmTable.X_source_concept_id
+LEFT JOIN @vocabDatabaseSchema.concept ON concept.concept_id = cdmTable._source_concept_id
WHERE cdmTable.@cdmFieldName = 0
-- AND cdmTable.value_as_number IS NOT NULL -- uncomment for unit_concept_id checks
GROUP BY 1,2,3
@@ -327,11 +328,12 @@ ETL Developers
Finally, if the investigation query returns no source value, you
must trace the relevant record(s) back to their source and confirm if
-the missing value is expected. If not, identify and fix the related
-issue in your ETL. If the record legitimately has no value/code in the
-source data, then the standard concept ID may be left as 0. However, in
-some cases these “code-less” records represent junk data which should be
-filtered out in the ETL. The proper approach will be context-dependent
+the missing source value is expected. If not, identify and fix the
+related issue in your ETL. If the record legitimately has no value/code
+in the source data, then the standard concept ID may be left as 0.
+However, in some cases these “code-less” records represent junk data
+which should be filtered out in the ETL. The proper approach will be
+context-dependent
- Note in the special case of unitless measurements/observations, the
unit_concept_id field should NOT be coded as 0 and rather should be left
diff --git a/docs/articles/checks/withinVisitDates.html b/docs/articles/checks/withinVisitDates.html
index 9ff9e5ba..1ca9c9b6 100644
--- a/docs/articles/checks/withinVisitDates.html
+++ b/docs/articles/checks/withinVisitDates.html
@@ -181,7 +181,7 @@
withinVisitDates
Clair
Blacketer
- 2024-06-29
+ 2024-07-11
Source: vignettes/checks/withinVisitDates.Rmd
withinVisitDates.Rmd
@@ -237,8 +237,13 @@ Definition
diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml
index a3657ed9..ed074627 100644
--- a/docs/pkgdown.yml
+++ b/docs/pkgdown.yml
@@ -34,7 +34,7 @@ articles:
standardConceptRecordCompleteness: checks/standardConceptRecordCompleteness.html
Thresholds: Thresholds.html
withinVisitDates: checks/withinVisitDates.html
-last_built: 2024-06-29T20:33Z
+last_built: 2024-07-12T01:44Z
urls:
reference: https://ohdsi.github.io/DataQualityDashboard/reference
article: https://ohdsi.github.io/DataQualityDashboard/articles
diff --git a/vignettes/checks/isForeignKey.Rmd b/vignettes/checks/isForeignKey.Rmd
index fd69461f..12e527f1 100644
--- a/vignettes/checks/isForeignKey.Rmd
+++ b/vignettes/checks/isForeignKey.Rmd
@@ -37,7 +37,7 @@ This check failure must be resolved. Failures in various fields could impact ana
Many CDM columns are foreign keys to the `concept_id` column in the `CONCEPT` table. See below for suggested investigation steps for concept ID-related foreign key check failures:
-- An `x_concept_id` missing from the CONCEPT table might be the result of an error in `SOURCE_TO_CONCEPT_MAP`; you may check it this way:
+- An `_concept_id` missing from the CONCEPT table might be the result of an error in `SOURCE_TO_CONCEPT_MAP`; you may check it this way:
### Violated rows query
```sql
@@ -50,8 +50,8 @@ WHERE concept.concept_id IS NULL;
- Other types of concept-related errors can be investigated by inspecting the source values for impacted rows as follows:
```sql
--- @cdmTableName.@cdmFieldName is the x_concept_id or x_source_concept_id field in a CDM table
--- Inspect the contents of the x_source_value field to investigate the source of the error
+-- @cdmTableName.@cdmFieldName is the _concept_id or _source_concept_id field in a CDM table
+-- Inspect the contents of the _source_value field to investigate the source of the error
SELECT
'@cdmTableName.@cdmFieldName' AS violating_field,
diff --git a/vignettes/checks/isRequired.Rmd b/vignettes/checks/isRequired.Rmd
index a2cea729..3beec0c2 100644
--- a/vignettes/checks/isRequired.Rmd
+++ b/vignettes/checks/isRequired.Rmd
@@ -46,7 +46,7 @@ Recommended actions:
- To catch this issue further upstream, consider adding a not-null constraint on the column in your database (if possible)
- Fill in the missing values:
- - In some columns, placeholder values are acceptable to replace missing values. For example, in rows for which there is no x_source_value or no standard concept mapping, the value 0 should be placed in the x_concept_id column
+ - In some columns, placeholder values are acceptable to replace missing values. For example, in rows for which there is no _source_value or no standard concept mapping, the value 0 should be placed in the _concept_id column
- Similarly, the CDM documentation suggests derivation/imputation strategies for certain columns. For example, the visit_end_date column is required but several options for deriving a placeholder are provided: https://ohdsi.github.io/CommonDataModel/cdm54.html#VISIT_OCCURRENCE. Consult the documentation for similar conventions on other columns
- For missing values in columns in which it is not acceptable to add a placeholder or derived value (i.e. primary & foreign keys other than concept IDs), there is likely a corresponding ETL error which needs to be fixed
- If you are unable to fill in the missing value for a record according to the CDM conventions, it is best to remove the record from your database. It is recommended to document this action for data users, especially if you need to do this for more than a handful of records and/or if there is a pattern to the missing data
diff --git a/vignettes/checks/isStandardValidConcept.Rmd b/vignettes/checks/isStandardValidConcept.Rmd
index bce6180c..319743d2 100644
--- a/vignettes/checks/isStandardValidConcept.Rmd
+++ b/vignettes/checks/isStandardValidConcept.Rmd
@@ -23,10 +23,10 @@ The number and percent of records that do not have a standard, valid concept in
## Definition
-- *Numerator*: The number of rows with an `X_concept_id` that exists in `CONCEPT.concept_id` but does not equal zero, and has `CONCEPT.standard_concept` != ‘S’ or non-NULL `CONCEPT.invalid_reason`.
+- *Numerator*: The number of rows with an `_concept_id` that exists in `CONCEPT.concept_id` but does not equal zero, and has `CONCEPT.standard_concept` != ‘S’ or non-NULL `CONCEPT.invalid_reason`.
- *Denominator*: The total number of rows in the table.
-- *Related CDM Convention(s)*: All `X_concept_id` columns should contain a standard, valid concept, or 0: https://ohdsi.github.io/CommonDataModel/dataModelConventions.html#Mapping.
-- *CDM Fields/Tables*: All standard concept ID (`X_concept_id`) columns in all event tables.
+- *Related CDM Convention(s)*: All `_concept_id` columns should contain a standard, valid concept, or 0: https://ohdsi.github.io/CommonDataModel/dataModelConventions.html#Mapping.
+- *CDM Fields/Tables*: All standard concept ID (`_concept_id`) columns in all event tables.
- *Default Threshold Value*: 0%
@@ -35,21 +35,24 @@ Failures of this check represent a violation of the fundamental CDM convention r
### ETL Developers
-A failure of this check indicates an issue with the concept mapping portion of your ETL, and must be resolved. Ensure that your ETL is only mapping source codes to standard, valid concepts (via the ‘Maps to’ relationship). Note as well that if no standard concept mapping exists for a source code, you MUST populate its `X_concept_id` column with 0. See the Book of OHDSI for additional guidance on the concept mapping process: https://ohdsi.github.io/TheBookOfOhdsi/ExtractTransformLoad.html#step-2-create-the-code-mappings
+A failure of this check indicates an issue with the concept mapping portion of your ETL, and must be resolved. Ensure that your ETL is only mapping source codes to standard, valid concepts (via the ‘Maps to’ relationship). Note as well that if no standard concept mapping exists for a source code, you MUST populate its `_concept_id` column with 0. See the Book of OHDSI for additional guidance on the concept mapping process: https://ohdsi.github.io/TheBookOfOhdsi/ExtractTransformLoad.html#step-2-create-the-code-mappings
You may inspect the failing rows using the following SQL:
```sql
SELECT
'@cdmTableName.@cdmFieldName' AS violating_field,
- cdmTable.*
+ cdmTable.*,
+ co.*
FROM @schema.@cdmTableName cdmTable
JOIN @vocabDatabaseSchema.concept co ON cdmTable.@cdmFieldName = co.concept_id
WHERE co.concept_id != 0
AND (co.standard_concept != 'S' OR co.invalid_reason IS NOT NULL)
```
-You may build upon this query by joining the relevant `X_concept_id` and `X_source_concept_id` columns to the concept table and inspecting their names and vocabularies. If the `X_source_concept_id` correctly represents the source code in `X_source_value`, the fix will be a matter of ensuring your ETL is correctly using the concept_relationship table to map the source concept ID to a standard concept via the ‘Maps to’ relationship. If you are not populating the `X_source_concept_id` column and/or are using an intermediate concept mapping table, you may need to inspect the mappings in your mapper table to ensure they’ve been generated correctly using the ‘Maps to’ relationship for your CDM’s vocabulary version.
+You may build upon this query by joining the `_source_concept_id` column to the concept table and inspecting the source concepts from which the failing non-standard concepts were mapped. If the `_source_concept_id` correctly represents the source code in `_source_value`, the fix will be a matter of ensuring your ETL is correctly using the concept_relationship table to map the source concept ID to a standard concept via the ‘Maps to’ relationship. If you are not populating the `_source_concept_id` column and/or are using an intermediate concept mapping table, you may need to inspect the mappings in your mapper table to ensure they’ve been generated correctly using the ‘Maps to’ relationship for your CDM’s vocabulary version.
+
+Also note that when updating the OMOP vocabularies, previously standard concepts could have been become non-standard and need remapping. Often this remapping can be done programatically, by following the 'Maps to' relationship to the new standard concept.
### Data Users
This check failure means that the failing rows will not be picked up in a standard OHDSI analysis. Especially when participating in network research, where only standard concepts are used, this might result in invalid results. It is highly recommended to work with your ETL team or data provider, if possible, to resolve this issue.
diff --git a/vignettes/checks/measurePersonCompleteness.Rmd b/vignettes/checks/measurePersonCompleteness.Rmd
index 7c7e9735..7e1531c9 100644
--- a/vignettes/checks/measurePersonCompleteness.Rmd
+++ b/vignettes/checks/measurePersonCompleteness.Rmd
@@ -56,7 +56,9 @@ WHERE cdmTable2.person_id IS NULL
All persons in the CDM must have an observation period; OHDSI analytics tools only operate on persons with observable time, as represented by one or more observation periods. Persons missing observation periods may represent a bug in the ETL code which generates observation periods. Alternatively, some persons may have no observable time in the source data. These persons should be removed from the CDM.
#### All other tables
-Action on persons missing records in other clinical event tables will depend on the characteristics of the source database. In certain cases, missingness is expected – some persons may just not have a given type of data available in the source. In others, various ETL issues may result in persons missing records in a given event table:
+Action on persons missing records in other clinical event tables will depend on the characteristics of the source database. In certain cases, missingness is expected – some persons may just not have a given type of data available in the source. For instance, in most data sources, one would expect most patients to have at least one visit, diagnosis, and drug, while one would *not* expect every single patient to have had a medical device.
+
+Various ETL issues may result in persons missing records in a given event table:
- Mis-mapping of domains, resulting in the placement of records in the incorrect table
- Incorrect parsing of source data, resulting in loss of valid records
diff --git a/vignettes/checks/measureValueCompleteness.Rmd b/vignettes/checks/measureValueCompleteness.Rmd
index 1983d3dd..9cd61f76 100644
--- a/vignettes/checks/measureValueCompleteness.Rmd
+++ b/vignettes/checks/measureValueCompleteness.Rmd
@@ -56,7 +56,7 @@ ETL developers have 2 main options for the use of this check on non-required fie
- The check threshold may be left on 100% for non-required fields such that the check will never fail. The check result can be used simply to understand completeness for these fields
- The check threshold may be set to an appropriate value corresponding to completeness expectations for each field given what’s available in the source data. The check may be disabled for fields known not to exist in the source data. Other fields may be set to whichever threshold is deemed worthy of investigation
-Unexpectedly missing values should be investigated for a potential root cause in the ETL. For expected missingness, rows that violate this check in non-required fields are acceptable but should be clearly communicated to data users so that they can know when and when not to expect data to be present in each field. To avoid confusion for users, however, thresholds should be modified to avoid check failures at expected levels.
+Unexpectedly missing values should be investigated for a potential root cause in the ETL. If a threshold has been adjusted to account for expected missingness, this should be clearly communicated to data users so that they can know when and when not to expect data to be present in each field.
### Data Users
This check informs you of the level of missing data in each column of the CDM. If data is missing in a required column, see the `isRequired` documentation for more information.
diff --git a/vignettes/checks/sourceConceptRecordCompleteness.Rmd b/vignettes/checks/sourceConceptRecordCompleteness.Rmd
index 51627af4..2d06d39d 100644
--- a/vignettes/checks/sourceConceptRecordCompleteness.Rmd
+++ b/vignettes/checks/sourceConceptRecordCompleteness.Rmd
@@ -23,10 +23,10 @@ The number and percent of records with a value of 0 in the source concept field
## Definition
-- *Numerator*: The number of rows with a value of 0 in the `X_source_concept_id` source concept field. In the case of `MEASUREMENT.unit_source_concept_id` and `OBSERVATION.unit_source_concept_id`, the number of rows with a value of 0 in the `X_source_concept_id` source concept field AND a non-NULL `value_as_number`.
-- *Denominator*: The total number of rows in the table. In the case of `MEASUREMENT.unit_source_concept_id` and `OBSERVATION.unit_source_concept_id`, the number of rows with non-NULL `value_as_number`.
+- *Numerator*: The number of rows with a value of 0 in the `_source_concept_id` source concept field.
+- *Denominator*: The total number of rows in the table.
- *Related CDM Convention(s)*: [Source concept mapping](https://ohdsi.github.io/CommonDataModel/dataModelConventions.html#Fields)
-- *CDM Fields/Tables*: All source concept ID (`X_source_concept_id`) columns in all event tables.
+- *CDM Fields/Tables*: All source concept ID (`_source_concept_id`) columns in all event tables.
- *Default Threshold Value*:
- 10% for source concept ID columns in condition, drug, measurement, procedure, device, and observation tables
- 100% for all other source concept ID columns
@@ -36,7 +36,7 @@ The number and percent of records with a value of 0 in the source concept field
Source concept mapping is an important part of the OMOP concept mapping process which allows data users insight into the provenance of the data they are analyzing. It’s important to populate the source concept ID field for all source values that exist in the OMOP vocabulary. Failures of this check should be well-understood and documented so that data users can plan accordingly in the case missing data might impact their analysis.
### ETL Developers
-Recall that the `X_source_concept_id` columns should contain the OMOP concept representing the exact code used in the source data for a given record: “If the is coded in the source data using an OMOP supported vocabulary put the concept id representing the source value here.”
+Recall that the `_source_concept_id` columns should contain the OMOP concept representing the exact code used in the source data for a given record: “If the <_source_value> is coded in the source data using an OMOP supported vocabulary put the concept id representing the source value here.”
A failure of this check usually indicates a failure to map a source value to an OMOP concept. In some cases, such a failure can and should be remediated in the concept-mapping step of the ETL. In other cases, it may represent a mapping that currently is not possible to implement.
@@ -45,24 +45,23 @@ To investigate the failure, run the following query:
```sql
SELECT
concept.concept_name AS standard_concept_name,
- cdmTable.X_concept_id, -- standard concept ID field for the table
+ cdmTable._concept_id, -- standard concept ID field for the table
c2.concept_name AS source_value_concept_name,
- cdmTable.X_source_value, -- source value field for the table
+ cdmTable._source_value, -- source value field for the table
COUNT(*)
FROM @cdmDatabaseSchema.@cdmTableName cdmTable
-LEFT JOIN @vocabDatabaseSchema.concept ON concept.concept_id = cdmTable.X_concept_id
+LEFT JOIN @vocabDatabaseSchema.concept ON concept.concept_id = cdmTable._concept_id
-- WARNING this join may cause fanning if a source value exists in multiple vocabularies
-LEFT JOIN @vocabDatabaseSchema.concept c2 ON concept.concept_code = cdmTable.X_source_value
+LEFT JOIN @vocabDatabaseSchema.concept c2 ON concept.concept_code = cdmTable._source_value
AND c2.domain_id =
WHERE cdmTable.@cdmFieldName = 0
--- AND cdmTable.value_as_number IS NOT NULL -- uncomment for unit_concept_id checks
GROUP BY 1,2,3
ORDER BY 4 DESC
```
The query results will give you a summary of the source codes which failed to map to an OMOP concept. Inspecting this data should give you an initial idea of what might be going on.
-If source values return legitimate matches on concept_code, it’s possible that there is an error in the concept mapping step of your ETL. Please note that while the `X_source_concept_id` fields are technically not required, it is highly recommended to populate them with OMOP concepts whenever possible. This will greatly aid analysts in understanding the provenance of the data.
+If source values return legitimate matches on concept_code, it’s possible that there is an error in the concept mapping step of your ETL. Please note that while the `_source_concept_id` fields are technically not required, it is highly recommended to populate them with OMOP concepts whenever possible. This will greatly aid analysts in understanding the provenance of the data.
If source values do NOT return matches on concept_code and you are NOT handling concept mapping locally for a non-OMOP source vocabulary, then you likely have a malformed source code or one that does not exist in the OMOP vocabulary. Please see the documentation in the [standardConceptRecordCompleteness](standardConceptRecordCompleteness.html) page for instructions on how to handle this scenario.
diff --git a/vignettes/checks/sourceValueCompleteness.Rmd b/vignettes/checks/sourceValueCompleteness.Rmd
index 06c4f1ca..fa04db09 100644
--- a/vignettes/checks/sourceValueCompleteness.Rmd
+++ b/vignettes/checks/sourceValueCompleteness.Rmd
@@ -23,10 +23,10 @@ The number and percent of distinct source values in the @cdmFieldName field of t
## Definition
-- *Numerator*: Distinct `X_source_value` entries where the corresponding standard `X_concept_id` field is 0.
-- *Denominator*: Total distinct `X_source_value` entries, including NULL, in the respective event table.
+- *Numerator*: Distinct `_source_value` entries where the corresponding standard `_concept_id` field is 0.
+- *Denominator*: Total distinct `_source_value` entries, including NULL, in the respective event table.
- *Related CDM Convention(s)*: The OMOP Common Data Model specifies that codes that are present in a native database should be mapped to standard concepts using either the intrinsic mappings defined in the standard vocabularies or extrinsic mappings defined by the data owner or ETL development team. Note also that variations of this check logic are also used in the [EHDEN CDM Inspection Report](https://github.com/EHDEN/CdmInspection) package, as well as the [AresIndexer](https://github.com/OHDSI/AresIndexer) package for generating indices of unmapped codes.
-- *CDM Fields/Tables*: Runs on all event tables that have `X_source_value` fields.
+- *CDM Fields/Tables*: Runs on all event tables that have `_source_value` fields.
- *Default Threshold Value*:
- 10% for `_source_value` fields in condition, measurement, procedure, drug, visit.
- 100% for all other fields
diff --git a/vignettes/checks/standardConceptRecordCompleteness.Rmd b/vignettes/checks/standardConceptRecordCompleteness.Rmd
index 80f77bf7..8f487ac3 100644
--- a/vignettes/checks/standardConceptRecordCompleteness.Rmd
+++ b/vignettes/checks/standardConceptRecordCompleteness.Rmd
@@ -23,14 +23,14 @@ The number and percent of records with a value of 0 in the standard concept fiel
## Definition
-- *Numerator*: The number of rows with a value of 0 in the `X_concept_id` standard concept field. In the case of `MEASUREMENT.unit_concept_id` and `OBSERVATION.unit_concept_id`, the number of rows with a value of 0 in the `X_concept_id` standard concept field AND a non-NULL `value_as_number`.
+- *Numerator*: The number of rows with a value of 0 in the `_concept_id` standard concept field. In the case of `MEASUREMENT.unit_concept_id` and `OBSERVATION.unit_concept_id`, the number of rows with a value of 0 in the `_concept_id` standard concept field AND a non-NULL `value_as_number`.
- *Denominator*: The total number of rows in the table. In the case of `MEASUREMENT.unit_concept_id` and `OBSERVATION.unit_concept_id`, the number of rows with a non-NULL `value_as_number`.
- *Related CDM Convention(s)*: [Standard concept mapping](https://ohdsi.github.io/CommonDataModel/dataModelConventions.html#Fields)
-- *CDM Fields/Tables*: All standard concept ID (`X_concept_id`) columns in all event tables.
+- *CDM Fields/Tables*: All standard concept ID (`_concept_id`) columns in all event tables.
- *Default Threshold Value*:
- 0% for type concept fields and standard concept fields in era tables
- 5% for most standard concept fields in clinical event tables
- - 100% for fields more susceptible to specific ETL implementation context
+ - 100% for fields more susceptible to specific ETL implementation context (e.g. `place_of_service_concept_id`, `modifier_concept_id`)
## User Guidance
@@ -45,11 +45,11 @@ To investigate the failure, run the following query:
```sql
SELECT
concept_name,
- cdmTable.X_source_concept_id, -- source concept ID field for the table
- cdmTable.X_source_value, -- source value field for the table
+ cdmTable._source_concept_id, -- source concept ID field for the table
+ cdmTable._source_value, -- source value field for the table
COUNT(*)
FROM @cdmDatabaseSchema.@cdmTableName cdmTable
-LEFT JOIN @vocabDatabaseSchema.concept ON concept.concept_id = cdmTable.X_source_concept_id
+LEFT JOIN @vocabDatabaseSchema.concept ON concept.concept_id = cdmTable._source_concept_id
WHERE cdmTable.@cdmFieldName = 0
-- AND cdmTable.value_as_number IS NOT NULL -- uncomment for unit_concept_id checks
GROUP BY 1,2,3