diff --git a/_pkgdown.yml b/_pkgdown.yml index e1dc440b..822c5d5f 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -68,8 +68,30 @@ navbar: href: articles/checks/fkDomain.html - text: fkClass href: articles/checks/fkClass.html + - text: measurePersonCompleteness + href: articles/checks/measurePersonCompleteness.html + - text: measureValueCompleteness + href: articles/checks/measureValueCompleteness.html + - text: isStandardValidConcept + href: articles/checks/isStandardValidConcept.html + - text: standardConceptRecordCompleteness + href: articles/checks/standardConceptRecordCompleteness.html + - text: sourceConceptRecordCompleteness + href: articles/checks/sourceConceptRecordCompleteness.html + - text: sourceValueCompleteness + href: articles/checks/sourceValueCompleteness.html - text: plausibleAfterBirth href: articles/checks/plausibleAfterBirth.html + - text: plausibleBeforeDeath + href: articles/checks/plausibleBeforeDeath.html + - text: plausibleStartBeforeEnd + href: articles/checks/plausibleStartBeforeEnd.html + - text: plausibleValueHigh + href: articles/checks/plausibleValueHigh.html + - text: plausibleValueLow + href: articles/checks/plausibleValueLow.html + - text: withinVisitDates + href: articles/checks/withinVisitDates.html hades: text: hadesLogo href: https://ohdsi.github.io/Hades diff --git a/docs/404.html b/docs/404.html index bf6f0a21..7c89b76d 100644 --- a/docs/404.html +++ b/docs/404.html @@ -6,7 +6,7 @@
vignettes/AddNewCheck.rmd
AddNewCheck.rmd
vignettes/CheckStatusDefinitions.rmd
CheckStatusDefinitions.rmd
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/CheckTypeDescriptions.html b/docs/articles/CheckTypeDescriptions.html index 442bbeb5..a58ea494 100644 --- a/docs/articles/CheckTypeDescriptions.html +++ b/docs/articles/CheckTypeDescriptions.html @@ -6,7 +6,7 @@vignettes/CheckTypeDescriptions.rmd
CheckTypeDescriptions.rmd
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/DataQualityDashboard.html b/docs/articles/DataQualityDashboard.html index 011ad993..d22f577a 100644 --- a/docs/articles/DataQualityDashboard.html +++ b/docs/articles/DataQualityDashboard.html @@ -6,7 +6,7 @@vignettes/DataQualityDashboard.rmd
DataQualityDashboard.rmd
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/DqdForCohorts.html b/docs/articles/DqdForCohorts.html index 695886ac..d9530035 100644 --- a/docs/articles/DqdForCohorts.html +++ b/docs/articles/DqdForCohorts.html @@ -6,7 +6,7 @@vignettes/DqdForCohorts.rmd
DqdForCohorts.rmd
vignettes/SqlOnly.rmd
SqlOnly.rmd
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/Thresholds.html b/docs/articles/Thresholds.html index a33a1685..7bcce30c 100644 --- a/docs/articles/Thresholds.html +++ b/docs/articles/Thresholds.html @@ -6,7 +6,7 @@vignettes/Thresholds.rmd
Thresholds.rmd
vignettes/checkIndex.Rmd
checkIndex.Rmd
vignettes/checks/cdmDatatype.Rmd
cdmDatatype.Rmd
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/checks/cdmField.html b/docs/articles/checks/cdmField.html index c1fe7eb6..3464b260 100644 --- a/docs/articles/checks/cdmField.html +++ b/docs/articles/checks/cdmField.html @@ -6,7 +6,7 @@vignettes/checks/cdmField.Rmd
cdmField.Rmd
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/checks/cdmTable.html b/docs/articles/checks/cdmTable.html index c1c3df5d..03d2cd40 100644 --- a/docs/articles/checks/cdmTable.html +++ b/docs/articles/checks/cdmTable.html @@ -6,7 +6,7 @@vignettes/checks/cdmTable.Rmd
cdmTable.Rmd
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/checks/fkClass.html b/docs/articles/checks/fkClass.html index 38018115..bbfbd439 100644 --- a/docs/articles/checks/fkClass.html +++ b/docs/articles/checks/fkClass.html @@ -6,7 +6,7 @@vignettes/checks/fkClass.Rmd
fkClass.Rmd
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/checks/fkDomain.html b/docs/articles/checks/fkDomain.html index bdc8ede4..bc151de1 100644 --- a/docs/articles/checks/fkDomain.html +++ b/docs/articles/checks/fkDomain.html @@ -6,7 +6,7 @@vignettes/checks/fkDomain.Rmd
fkDomain.Rmd
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/checks/isForeignKey.html b/docs/articles/checks/isForeignKey.html index c27c07eb..49b25eeb 100644 --- a/docs/articles/checks/isForeignKey.html +++ b/docs/articles/checks/isForeignKey.html @@ -6,7 +6,7 @@vignettes/checks/isForeignKey.Rmd
isForeignKey.Rmd
-- @cdmTableName.@cdmFieldName is the x_concept_id or x_source_concept_id field in a CDM table
--- Inspect the contents of the x_source_value field to investigate the source of the error
+-- @cdmTableName.@cdmFieldName is the _concept_id or _source_concept_id field in a CDM table
+-- Inspect the contents of the _source_value field to investigate the source of the error
SELECT
'@cdmTableName.@cdmFieldName' AS violating_field,
@@ -292,7 +325,7 @@ Data Users
-Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/checks/isPrimaryKey.html b/docs/articles/checks/isPrimaryKey.html
index 2f6cd952..cc5d3275 100644
--- a/docs/articles/checks/isPrimaryKey.html
+++ b/docs/articles/checks/isPrimaryKey.html
@@ -6,7 +6,7 @@
isPrimaryKey • DataQualityDashboard
-
+
@@ -112,9 +112,42 @@
fkClass
+
+ measurePersonCompleteness
+
+
+ measureValueCompleteness
+
+
+ isStandardValidConcept
+
+
+ standardConceptRecordCompleteness
+
+
+ sourceConceptRecordCompleteness
+
+
+ sourceValueCompleteness
+
plausibleAfterBirth
+
+ plausibleBeforeDeath
+
+
+ plausibleStartBeforeEnd
+
+
+ plausibleValueHigh
+
+
+ plausibleValueLow
+
+
+ withinVisitDates
+
vignettes/checks/isPrimaryKey.Rmd
isPrimaryKey.Rmd
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/checks/isRequired.html b/docs/articles/checks/isRequired.html index 694d065f..3da6dfaa 100644 --- a/docs/articles/checks/isRequired.html +++ b/docs/articles/checks/isRequired.html @@ -6,7 +6,7 @@vignettes/checks/isRequired.Rmd
isRequired.Rmd
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/checks/isStandardValidConcept.html b/docs/articles/checks/isStandardValidConcept.html index 002d32ad..7f40000a 100644 --- a/docs/articles/checks/isStandardValidConcept.html +++ b/docs/articles/checks/isStandardValidConcept.html @@ -6,7 +6,7 @@vignettes/checks/isStandardValidConcept.Rmd
isStandardValidConcept.Rmd
Level: FIELD
Context: Verification
Category: Conformance
Subcategory: Value
Severity:
Level: FIELD
Context: Verification
Category: Conformance
Subcategory: Value
Severity: CDM convention ⚠
_concept_id
) columns in all event tables.
Failures of this check represent a violation of the fundamental CDM +convention requiring all concept IDs to belong to the OMOP standard +vocabulary. This is an essential convention in enabling standard +analytics. If source codes have not been properly mapped to OMOP +standard concepts in a CDM, studies designed using the OMOP standard +vocabulary will return inaccurate results for that database.
A failure of this check indicates an issue with the concept mapping
+portion of your ETL, and must be resolved. Ensure that your ETL is only
+mapping source codes to standard, valid concepts (via the ‘Maps to’
+relationship). Note as well that if no standard concept mapping exists
+for a source code, you MUST populate its _concept_id
column
+with 0. See the Book of OHDSI for additional guidance on the concept
+mapping process: https://ohdsi.github.io/TheBookOfOhdsi/ExtractTransformLoad.html#step-2-create-the-code-mappings
You may inspect the failing rows using the following SQL:
+SELECT
+ '@cdmTableName.@cdmFieldName' AS violating_field,
+ cdmTable.*,
+ co.*
+FROM @schema.@cdmTableName cdmTable
+ JOIN @vocabDatabaseSchema.concept co ON cdmTable.@cdmFieldName = co.concept_id
+WHERE co.concept_id != 0
+ AND (co.standard_concept != 'S' OR co.invalid_reason IS NOT NULL)
You may build upon this query by joining the
+_source_concept_id
column to the concept table and
+inspecting the source concepts from which the failing non-standard
+concepts were mapped. If the _source_concept_id
correctly
+represents the source code in _source_value
, the fix will
+be a matter of ensuring your ETL is correctly using the
+concept_relationship table to map the source concept ID to a standard
+concept via the ‘Maps to’ relationship. If you are not populating the
+_source_concept_id
column and/or are using an intermediate
+concept mapping table, you may need to inspect the mappings in your
+mapper table to ensure they’ve been generated correctly using the ‘Maps
+to’ relationship for your CDM’s vocabulary version.
Also note that when updating the OMOP vocabularies, previously +standard concepts could have been become non-standard and need +remapping. Often this remapping can be done programatically, by +following the ‘Maps to’ relationship to the new standard concept.
This check failure means that the failing rows will not be picked up +in a standard OHDSI analysis. Especially when participating in network +research, where only standard concepts are used, this might result in +invalid results. It is highly recommended to work with your ETL team or +data provider, if possible, to resolve this issue.
+However, you may work around it at your own risk by determining +whether or not the affected rows are relevant for your analysis. Here’s +an example query you could run to inspect failing rows in the +condition_occurrence table:
+SELECT
+ condition_concept_id AS violating_concept,
+ c1.concept_name AS violating_concept_name,
+ condition_source_concept_id AS source_concept,
+ c2.concept_name AS source_concept_name,
+ c2.vocabulary_id AS source_vocab,
+ condition_source_value,
+ COUNT(*)
+FROM @cdmDatabaseSchema.condition_occurrence
+ JOIN @vocabDatabaseSchema.concept c1 ON condition_occurrence.condition_concept_id = c1.concept_id
+ LEFT JOIN @vocabDatabaseSchema.concept c2 ON condition_occurrence.condition_source_concept_id = c2.concept_id
+WHERE c1.concept_id != 0
+ AND (c1.standard_concept != 'S' OR c1.invalid_reason IS NOT NULL)
+GROUP BY 1,2,3,4,5,6
+ORDER BY 7 DESC
If you can confirm by inspecting the source concept and/or source +value that the affected rows are not relevant for your analysis, you can +proceed with your work and ignore the issue. However, especially if a +large number of rows are impacted it’s recommended to act upon these +failures as there could potentially be deeper issues with the ETL +concept mapping process that need to be fixed.
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
vignettes/checks/measureConditionEraCompleteness.Rmd
measureConditionEraCompleteness.Rmd
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/checks/measurePersonCompleteness.html b/docs/articles/checks/measurePersonCompleteness.html index 57da0042..3d68ce34 100644 --- a/docs/articles/checks/measurePersonCompleteness.html +++ b/docs/articles/checks/measurePersonCompleteness.html @@ -6,7 +6,7 @@vignettes/checks/measurePersonCompleteness.Rmd
measurePersonCompleteness.Rmd
Level: TABLE
Context: Validation
Category: Completeness
Subcategory:
Severity:
Level: TABLE
Context: Validation
Category: Completeness
Subcategory:
Severity: CDM convention ⚠ (for observation period),
+Characterization ✔ (for all other tables)
The number and percent of persons in the CDM that do not have at least one record in the @cdmTableName -table
+table.PERSON
table.
OBSERVATION_PERIOD
record. Otherwise, CDM
+conventions do not dictate any rules for person completeness.
PERSON
table.
OBSERVATION_PERIOD
+For most tables, this check is a characterization of the completeness
+of various data types in the source data. However, in the case of
+OBSERVATION_PERIOD
, this check should actually be
+considered a CDM convention check as it is used to enforce the
+requirement that all persons have at least one observation period. A
+failure of this check on the OBSERVATION_PERIOD
table is a
+serious issue as persons without an OBSERVATION_PERIOD
+cannot be included in any standard OHDSI analysis.
Run the following query to obtain a list of persons who had no data +in a given table. From this list of person_ids you may join to other +tables in the CDM to understand trends in these individuals’ data which +may provide clues as to the root cause of the issue.
All persons in the CDM must have an observation period; OHDSI +analytics tools only operate on persons with observable time, as +represented by one or more observation periods. Persons missing +observation periods may represent a bug in the ETL code which generates +observation periods. Alternatively, some persons may have no observable +time in the source data. These persons should be removed from the +CDM.
+Action on persons missing records in other clinical event tables will +depend on the characteristics of the source database. In certain cases, +missingness is expected – some persons may just not have a given type of +data available in the source. For instance, in most data sources, one +would expect most patients to have at least one visit, diagnosis, and +drug, while one would not expect every single patient to have +had a medical device.
+Various ETL issues may result in persons missing records in a given +event table:
+If more persons than expected are missing data in a given table, run
+the violated rows SQL snippet to retrieve these persons’ person_ids, and
+inspect these persons’ other clinical event data in the CDM for trends.
+You may also use person_source_value
to trace back to these
+persons’ source data to identify source data records potentially missed
+by the ETL.
Severe failures, such as unexpected nearly empty tables, must be +fixed by the ETL team before a dataset can be used. Note as well that +any person missing an observation period will not be able to be included +in any analysis using OHDSI tools.
+Failures with a result close to the specified failure threshold may +be accepted, at your own risk and only if the result matches your +understanding of the source data. The violated rows SQL may be used to +inspect the full records for persons missing data in a given table in +order to validate your expectations or point to potential issues in the +ETL which need to be resolved.
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
vignettes/checks/measureValueCompleteness.Rmd
measureValueCompleteness.Rmd
Level: FIELD
Context: Verification
Category: Completeness
Subcategory:
Severity:
Level: FIELD
Context: Verification
Category: Completeness
Subcategory:
Severity: Characterization ✔
This check’s primary purpose is to characterize completeness of +non-required fields in the OMOP CDM. It is most useful when the failure +threshold for each non-required field is customized to expectations +based on the source data being transformed into OMOP. In this case, the +check can be used to catch unexpected missingness due to ETL errors. +However, in all cases, this check will serve as a useful +characterization to help data users understand if a CDM contains the +right data for a given analysis.
+While the failure threshold is set to 0 for required fields, note
+that this is duplicative with the isRequired
check - and
+fixing one failure will resolve the other!
Failures of this check on required fields are redundant with failures
+of isRequired
. See isRequired
+documentation for more information.
ETL developers have 2 main options for the use of this check on +non-required fields:
+Unexpectedly missing values should be investigated for a potential +root cause in the ETL. If a threshold has been adjusted to account for +expected missingness, this should be clearly communicated to data users +so that they can know when and when not to expect data to be present in +each field.
This check informs you of the level of missing data in each column of
+the CDM. If data is missing in a required column, see the
+isRequired
documentation for more information.
The interpretation of a check failure on a non-required column will +depend on the context. In some cases, the threshold for this check will +have been very deliberately set, and any failure should be cause for +concern unless justified and explained by your ETL provider. In other +cases, even if the check fails it may not be worrisome if the check +result is in line with your expectations given the source of the data. +When in doubt, utilize the inspection query above to ensure you can +explain the missing values.
+Of course, if there is a failure on a non-required field you know +that you will not need in your analysis (for example, missing drug +quantity in an analysis not utilizing drug data), the check failure may +be safely ignored.
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
vignettes/checks/plausibleAfterBirth.Rmd
plausibleAfterBirth.Rmd
This check verifies that events happen after birth. This check is
-only run on fields where the PLAUSIBLE_AFTER_BIRTH
-parameter is set to Yes. The birthdate is taken from
-the person
table, either the birth_datetime
or
-composed from year_of_birth
, month_of_birth
,
-day_of_birth
(taking 1st month/1st day if missing).
This check verifies that events happen after birth. The birthdate is
+taken from the person
table, either the
+birth_datetime
or composed from year_of_birth
,
+month_of_birth
, day_of_birth
(taking 1st
+month/1st day if missing).
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/checks/plausibleBeforeDeath.html b/docs/articles/checks/plausibleBeforeDeath.html index c1fdc510..a2d11704 100644 --- a/docs/articles/checks/plausibleBeforeDeath.html +++ b/docs/articles/checks/plausibleBeforeDeath.html @@ -6,7 +6,7 @@vignettes/checks/plausibleBeforeDeath.Rmd
plausibleBeforeDeath.Rmd
Level: FIELD
Context: Verification
Category: Plausibility
Subcategory: Temporal
Severity:
Level: FIELD
Context: Verification
Category: Plausibility
Subcategory: Temporal
Severity: Characterization ✔
The number and percent of records with a date value in the @cdmFieldName field of the @cdmTableName table that occurs after death.
+The number and percent of records with a date value in the
+cdmFieldName field of the cdmTableName
+table that occurs more than 60 days after death. Note that this check
+replaces the previous plausibleDuringLife
check.
A record violates this check if the date is more than 60 days after +the death date of the person, allowing administrative records directly +after death.
Events are expected to occur between birth and death. The check
+plausibleAfterbirth
checks for the former, this check for
+the latter. The 60-day period is a conservative estimate of the time it
+takes for administrative records to be updated after a person’s death.
+By default, both start and end dates are checked.
SELECT
+ '@cdmTableName.@cdmFieldName' AS violating_field,
+ cdmTable.*
+FROM @cdmDatabaseSchema.@cdmTableName cdmTable
+JOIN @cdmDatabaseSchema.death de
+ ON cdmTable.person_id = de.person_id
+WHERE cdmTable.@cdmFieldName IS NOT NULL
+ AND CAST(cdmTable.@cdmFieldName AS DATE) > DATEADD(day, 60, de.death_date)
Start dates after death are likely to be source data issues, and +failing this check should trigger investigation of the source data +quality. End dates after death can occur due to derivation logic. For +example, a drug exposure can be prescribed as being continued long after +death. In such cases, it is recommended to update the logic to end the +prescription at death.
For most studies, a low number of violating records will have limited +impact on data use as it could be caused by lagging administrative +records. However, it might signify a larger data quality issue. Note +that the percentage violating records reported is among records from +death persons and such might be slightly inflated if comparing to the +overall population.
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
vignettes/checks/plausibleDuringLife.Rmd
plausibleDuringLife.Rmd
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/checks/plausibleGender.html b/docs/articles/checks/plausibleGender.html index c7f28f82..a5e62088 100644 --- a/docs/articles/checks/plausibleGender.html +++ b/docs/articles/checks/plausibleGender.html @@ -6,7 +6,7 @@vignettes/checks/plausibleGender.Rmd
plausibleGender.Rmd
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/checks/plausibleGenderUseDescendants.html b/docs/articles/checks/plausibleGenderUseDescendants.html new file mode 100644 index 00000000..2e92d60d --- /dev/null +++ b/docs/articles/checks/plausibleGenderUseDescendants.html @@ -0,0 +1,268 @@ + + + + + + + +vignettes/checks/plausibleGenderUseDescendants.Rmd
+ plausibleGenderUseDescendants.Rmd
Level: CONCEPT
Context: Validation
Category: Plausibility
Subcategory: Atemporal
Severity:
For a CONCEPT_ID @conceptId (@conceptName), the number and percent of records +associated with patients with an implausible gender (correct gender = +@plausibleGender).
+vignettes/checks/plausibleStartBeforeEnd.Rmd
plausibleStartBeforeEnd.Rmd
Level: FIELD
Context: Verification
Category: Plausibility
Subcategory: Temporal
Severity:
Level: FIELD
Context: Verification
Category: Plausibility
Subcategory: Temporal
Severity: CDM convention ⚠
The number and percent of records with a value in the @cdmFieldName field of the @cdmTableName that occurs after the date in the -@plausibleStartBeforeEndFieldName.
+The number and percent of records with a value in the
+cdmFieldName field of the cdmTableName
+that occurs after the date in the
+plausibleStartBeforeEndFieldName. Note that this check
+replaces the previous plausibleTemporalAfter
check.
This check is attempting to apply temporal rules within a table, +specifically checking that all start dates are before the end dates. For +example, in the VISIT_OCCURRENCE table it checks that the +VISIT_OCCURRENCE_START_DATE is before VISIT_OCCURRENCE_END_DATE. The +start date can be before the end date or equal to the end date. It is +applied to the start date field and takes the end date field as a +parameter. Both date and datetime fields are checked.
source_release_date
is before
+cdm_release_date
.
If the start date is after the end date, it is likely that the data +is incorrect or the dates are unreliable.
There main reason for this check to fail is often that the source +data is incorrect. If the end date is derived from other data, the +calculation might not take into account some edge cases.
+Any violating checks should either be removed or corrected. In most +cases this can be done by adjusting the end date: - With a few +exceptions, the end date is not mandatory and can be left empty. - If +the end date is mandatory (notably visit_occurrence and drug_exposure), +the end date can be set to the start date if the event. Make sure to +document this as it leads to loss of duration information. - If this +check fails for the observation_period, it might signify a bigger +underlying issue. Please investigate all records for this person in the +CDM and source. - If neither the start or end date can be trusted, +pleaes remove the record from the CDM.
+Make sure to clearly document the choices in your ETL +specification.
An start date after the end date gives negative event durations,
+which might break analyses. Especially take note if this check fails for
+the observation_period
table. This means that there are
+persons with negative observation time. If these persons are included in
+a cohort, it will potentially skew e.g. survival analyses.
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
vignettes/checks/plausibleTemporalAfter.Rmd
plausibleTemporalAfter.Rmd
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/checks/plausibleUnitConceptIds.html b/docs/articles/checks/plausibleUnitConceptIds.html index d37e8a73..cf0df5fb 100644 --- a/docs/articles/checks/plausibleUnitConceptIds.html +++ b/docs/articles/checks/plausibleUnitConceptIds.html @@ -6,7 +6,7 @@vignettes/checks/plausibleUnitConceptIds.Rmd
plausibleUnitConceptIds.Rmd
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
diff --git a/docs/articles/checks/plausibleValueHigh.html b/docs/articles/checks/plausibleValueHigh.html index 6d82ddf0..7879501f 100644 --- a/docs/articles/checks/plausibleValueHigh.html +++ b/docs/articles/checks/plausibleValueHigh.html @@ -6,7 +6,7 @@vignettes/checks/plausibleValueHigh.Rmd
plausibleValueHigh.Rmd
Level: FIELD
Context: Verification
Category: Plausibility
Subcategory: Atemporal
Severity:
Level: FIELD
Context: Verification
Category: Plausibility
Subcategory: Atemporal
Severity: Characterization ✔
This check counts the number of records that have a value in the +specified field that is higher than some expected value. Failures of +this check might represent true data anomalies, but especially in the +case when the failure percentage is high, something may be afoot in the +ETL pipeline.
+Use this query to inspect rows with an implausibly high value:
The investigation approach may differ by the field being checked. For
+example, for CONDITION_OCURRENCE.condition_start_date
you
+might look how much it differs in average, to find a clue as to what
+happened:
SELECT
+ MEDIAN(DATEDIFF(day, condition_start_date, current_date))
+FROM condition_occurrence
+WHERE condition_start_date > current_date
+;
Or the discrepancy be associated with specific attributes:
+SELECT
+ co.condition_concept_id,
+ co.condition_type_concept_id,
+ co.condition_status_concept_id,
+ COUNT(1)
+FROM condition_occurrence co
+WHERE condition_start_date > current_date
+GROUP BY co.condition_concept_id, co.condition_type_concept_id, co.condition_status_concept_id
+ORDER BY COUNT(1) DESC
+;
There might be several different causes of future dates: typos in the +source data, wrong data format used in the conversion, timezone issues +in the ETL environment and/or database, etc.
+For the DRUG_EXPOSURE
values, there might be be typos,
+data processing bugs (for example, if days supply is calculated), or
+rare true cases when a prescription deviated from standard industry
+practices.
If the issue is determined to be related to ETL logic, it must be
+fixed. If it’s a source data issue, work with your data partners and
+users to determine the best remediation approach. PERSON
+rows with invalid birth dates should be removed from the CDM, as any
+analysis relying on age will be negatively impacted. Other implausible
+values should be explainable based on your understanding of the source
+data if they are to be retained. In some cases event rows may need to be
+dropped from the CDM if the implausible value is unexplainable and could
+cause downstream quality issues. Be sure to clearly document any data
+removal logic in your ETL specification.
The implication of a failure of this check depends on the count of +errors and your need for the impacted columns. If it’s a small count, it +might just be noise in the data which will unlikely impact an analysis. +If the count is large, however, proceed carefully - events with future +dates will likely be excluded from your analysis, and drugs with +inflated supply values could throw off any analysis considering duration +or patterns of treatment.
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
vignettes/checks/plausibleValueLow.Rmd
plausibleValueLow.Rmd
Level: FIELD
Context: Verification
Category: Plausibility
Subcategory: Atemporal
Severity:
Level: FIELD
Context: Verification
Category: Plausibility
Subcategory: Atemporal
Severity: Characterization ✔
This check counts the number of records that have a value in the +specified field that is lower than some expected value. Failures of this +check might represent true data anomalies, but especially in the case +when the failure percentage is high, something may be afoot in the ETL +pipeline.
+Use this query to inspect rows with an implausibly high value:
- -SELECT
+ '@cdmTableName.@cdmFieldName' AS violating_field,
+ cdmTable.*
+FROM @schema.@cdmTableName cdmTable
+WHERE cdmTable.@cdmFieldName < @plausibleValueHigh
See guidance for plausibleValueHigh for detailed +investigation instructions (swapping out “high” for “low” and “>” for +“<” where appropriate).
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
vignettes/checks/sourceConceptRecordCompleteness.Rmd
sourceConceptRecordCompleteness.Rmd
Level: FIELD
Context: Verification
Category: Completeness
Subcategory:
Severity:
Level: FIELD
Context: Verification
Category: Completeness
Subcategory:
Severity: CDM convention ⚠
_source_concept_id
) columns in all event tables.
Source concept mapping is an important part of the OMOP concept +mapping process which allows data users insight into the provenance of +the data they are analyzing. It’s important to populate the source +concept ID field for all source values that exist in the OMOP +vocabulary. Failures of this check should be well-understood and +documented so that data users can plan accordingly in the case missing +data might impact their analysis.
Recall that the _source_concept_id
columns should
+contain the OMOP concept representing the exact code used in the source
+data for a given record: “If the <_source_value> is coded in the
+source data using an OMOP supported vocabulary put the concept id
+representing the source value here.”
A failure of this check usually indicates a failure to map a source +value to an OMOP concept. In some cases, such a failure can and should +be remediated in the concept-mapping step of the ETL. In other cases, it +may represent a mapping that currently is not possible to implement.
+To investigate the failure, run the following query:
+SELECT
+ concept.concept_name AS standard_concept_name,
+ cdmTable._concept_id, -- standard concept ID field for the table
+ c2.concept_name AS source_value_concept_name,
+ cdmTable._source_value, -- source value field for the table
+ COUNT(*)
+FROM @cdmDatabaseSchema.@cdmTableName cdmTable
+LEFT JOIN @vocabDatabaseSchema.concept ON concept.concept_id = cdmTable._concept_id
+-- WARNING this join may cause fanning if a source value exists in multiple vocabularies
+LEFT JOIN @vocabDatabaseSchema.concept c2 ON concept.concept_code = cdmTable._source_value
+AND c2.domain_id = <Domain of cdmTable>
+WHERE cdmTable.@cdmFieldName = 0
+GROUP BY 1,2,3
+ORDER BY 4 DESC
The query results will give you a summary of the source codes which +failed to map to an OMOP concept. Inspecting this data should give you +an initial idea of what might be going on.
+If source values return legitimate matches on concept_code, it’s
+possible that there is an error in the concept mapping step of your ETL.
+Please note that while the _source_concept_id
fields are
+technically not required, it is highly recommended to populate them with
+OMOP concepts whenever possible. This will greatly aid analysts in
+understanding the provenance of the data.
If source values do NOT return matches on concept_code and you are +NOT handling concept mapping locally for a non-OMOP source vocabulary, +then you likely have a malformed source code or one that does not exist +in the OMOP vocabulary. Please see the documentation in the standardConceptRecordCompleteness +page for instructions on how to handle this scenario.
Since most standard OHDSI analytic workflows rely on the standard +concept field and not the source concept field, failures of this check +will not necessarily impact your analysis. However, having the source +concept will give you a better understanding of the provenance of the +code and highlight potential issues where meaning is lost due to mapping +to a standard concept.
+Utilize the investigation queries above to understand the scope and +impact of the mapping failures on your specific analytic use case. If +none of the affected codes seem to be relevant for your analysis, it may +be acceptable to ignore the failure. However, since it is not always +possible to understand exactly what a given source value represents, you +should proceed with caution and confirm any findings with your ETL +provider if possible.
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
vignettes/checks/sourceValueCompleteness.Rmd
sourceValueCompleteness.Rmd
Level: FIELD
Context: Verification
Category: Completeness
Subcategory:
Severity:
Level: FIELD
Context: Verification
Category: Completeness
Subcategory:
Severity: CDM convention ⚠
_source_value
fields.
_source_value
fields in condition, measurement,
+procedure, drug, visit.This check will look at all distinct source values in the specified
+field and calculate how many are mapped to a standard concept of 0. This
+check should be used in conjunction with the standardConceptRecordCompleteness check
+to identify potential mapping issues in the ETL.
This check is a good measure of the overall mapping rate within each +domain. For example, a table may have high +standardConceptRecordCompleteness (that is, a large percentage of +records with a non-zero standard concept) but a low score on this check. +This would indicate that the “long tail” of rarer codes have not been +mapped while more common codes have good mapping coverage. It is always +important to interrogate the results of these two checks together to +ensure complete understanding of vocabulary mapping in your CDM.
+The following SQL can be used to summarize unmapped source values by +record count in a given CDM table:
Fails of this check are (most often) related directly to semantic +mapping. First, the ETL developer should investigate if a source +vocabulary is present in the native data that was not accounted for in +the ETL document and/or code. This is most likely if the unmapped source +values are codes rather than text values. Second, the +source-to-concept-map file or table should be updated to link the +unmapped source values with domain-appropriate concepts.
When this check fails, source data granularity is being lost; not all +of the information related to a particular event or modifier is being +captured in OMOP CDM format. Although the information about an event may +exist in the source value field, it cannot easily be used in downstream +analytics processes that rely on standard OMOP concepts.
+Please see the standardConceptRecordCompleteness page +for a much more detailed overview of handling mapping quality issues in +your OMOP CDM.
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
vignettes/checks/standardConceptRecordCompleteness.Rmd
standardConceptRecordCompleteness.Rmd
Level: FIELD
Context: Verification
Category: Completeness
Subcategory:
Severity:
Level: FIELD
Context: Verification
Category: Completeness
Subcategory:
Severity: CDM convention ⚠
_concept_id
) columns in all event tables.
place_of_service_concept_id
,
+modifier_concept_id
)Standard concept mapping is one of the most fundamental conventions +of the OMOP CDM. It enables standardized analysis across diverse data +sources and allows users to abstract away the tedium of traversing +source vocabularies when building phenotypes. As such, it is highly +recommended to map as many concepts in your source as possible. Failures +of this check should be well-understood and documented so that data +users can plan accordingly in the case missing data might impact their +analysis.
A failure of this check usually indicates a failure to map a source +value to a standard OMOP concept. In some cases, such a failure can and +should be remediated in the concept-mapping step of the ETL. In other +cases, it may represent a mapping that currently is not possible to +implement.
+To investigate the failure, run the following query:
+SELECT
+ concept_name,
+ cdmTable._source_concept_id, -- source concept ID field for the table
+ cdmTable._source_value, -- source value field for the table
+ COUNT(*)
+FROM @cdmDatabaseSchema.@cdmTableName cdmTable
+LEFT JOIN @vocabDatabaseSchema.concept ON concept.concept_id = cdmTable._source_concept_id
+WHERE cdmTable.@cdmFieldName = 0
+-- AND cdmTable.value_as_number IS NOT NULL -- uncomment for unit_concept_id checks
+GROUP BY 1,2,3
+ORDER BY 4 DESC
This will give you a summary of the source codes which failed to map +to an OMOP standard concept. Inspecting this data should give you an +initial idea of what might be going on.
+SELECT
+ concept_id AS standard_concept_mapping
+FROM @vocabDatabaseSchema.concept_relationship
+JOIN @vocabDatabaseSchema.concept ON concept.concept_id = c oncept_relationship.concept_id_2
+ AND relationship_id = ‘Maps to’
+WHERE concept_relationship.concept_id_1 = <source concept ID>
If no results are returned, consider whether the source concept +ID is part of the OMOP vocabularies. If it is, then there is likely a +vocabulary issue which should be reported. If it is not (i.e., it is a +local concept), then there is likely an issue with your local +source-to-concept mapping
If the investigation query returns a source value and source +concept ID but no concept name, this indicates the source concept ID +does not exist in your concept table. This may be expected if your ETL +includes local source-to-concept mappings. If not, then your ETL has +assigned a malformed source concept ID and will need to be +debugged
If the investigation query returns a source value but no source +concept ID (or a source concept ID of 0), run the following query to +search for the source value in the OMOP vocabulary (note that if your +ETL includes local mappings and the code in question is known not to +exist in OMOP, you should search your local mapping table/config +instead):
-- may return false positives if the same value exists in multiple vocabularies
+-- only applicable in the case where the source value column is populated only with a vocabulary code
+SELECT
+ *
+FROM @vocabDatabaseSchema.concept
+WHERE concept_code = <source value>
It is important to note that records with a 0 standard concept ID +field will be unusable in standard OHDSI analyses and thus should only +be preserved if there is truly no standard concept ID for a given +record. Depending on the significance of the records in question, one +should consider removing them from the dataset; however, this choice +will depend on a variety of context-specific factors and should be made +carefully. Either way, the presence/absence of these unmappable records +and an explanation for why they could not be mapped should be clearly +documented in the ETL documentation.
Since unmapped records will not be picked up in standard OHDSI +analytic workflows, this is an important check failure to understand. +Utilize the investigation queries above to understand the scope and +impact of the mapping failures on your specific analytic use case. If +none of the affected codes seem to be relevant for your analysis, it may +be acceptable to ignore the failure. However, since it is not always +possible to understand exactly what a given source value represents, you +should proceed with caution and confirm any findings with your ETL +provider if possible.
+In the case where the source concept ID column is populated with a +legitimate OMOP concept, it will be possible to query this column +instead of the standard concept column in your analyses. However, doing +so will require building source concept sets and as such losing the +power of the OMOP standard vocabularies in defining comprehensive, +generalizable cohort definitions.
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
vignettes/checks/withinVisitDates.Rmd
withinVisitDates.Rmd
Level: FIELD
Context: Verification
Category: Conformance
Subcategory:
Severity:
Level: FIELD
Context: Verification
Category: Conformance
Subcategory:
Severity: Characterization ✔
The number and percent of records not within one week on either side -of the corresponding visit occurrence start and end date
+The number and percent of records that occur one week before the
+corresponding visit_start_date
or one week after the
+corresponding visit_end_date
visit_occurrence_id
)
visit_occurrence_id
foreign key in the event tables as “The
+visit during which the CONDITION_OCCURRENCE
, PROCEDURE_OCCURRENCE
,
+DRUG_EXPOSURE
, DEVICE_EXPOSURE
,
+MEASUREMENT
, NOTE
, OBSERVATION
,
+and VISIT_DETAIL
. It will check either the
+X_date
or X_start_date
fields for alignment
+with corresponding VISIT_OCCURRENCE
dates by linking on the
+visit_occurrence_id
. (Note: For
+VISIT_DETAIL it will check both the visit_detail_start_date and
+visit_detail_end_date. The default threshold for these two checks is
+1%.)
VISIT_DETAIL
+There is no explicit convention that describes how events should +align temporally with the visits they correspond to. This check is meant +to identify egregious mismatches in dates that could signify an +incorrect date field was used in the ETL or that the data should be used +with caution if there is no reason for the mismatch (history of a +condition, for example).
+If this check fails the first action should be to investigate the +failing rows for any patterns. The main query to find failing rows is +below:
SELECT
+ '@cdmTableName.@cdmFieldName' AS violating_field,
+ vo.visit_start_date, vo.visit_end_date, vo.person_id,
+ cdmTable.*
+FROM @cdmDatabaseSchema.@cdmTableName cdmTable
+JOIN @cdmDatabaseSchema.visit_occurrence vo
+ ON cdmTable.visit_occurrence_id = vo.visit_occurrence_id
+WHERE cdmTable.@cdmFieldName < dateadd(day, -7, vo.visit_start_date)
+ OR cdmTable.@cdmFieldName > dateadd(day, 7, vo.visit_end_date)
The first step is to investigate whether visit and event indeed +should be linked - e.g., do they belong to the same person; how far are +the dates apart; is it possible the event was recorded during the visit. +If they should be linked, then the next step is to investigate which of +the event date and visit date is accurate.
+One suggestion would be to identify if all of the failures are due to +many events all having the same date. In some institutions there is a +default date given to events in the case where a date is not given. +Should this be the problem, the first step should be to identify if +there is a different date field in the native data that can be used. If +not, considerations should be made to determine if the rows should be +dropped. Without a correct date it is challenging to use such events in +health outcomes research.
+Another reason for the discrepancy could be that the wrong date has +been used for events. For instance, in some systems a diagnosis could +have both an ‘observation date’ and an ‘administration date’. If the +physician is updating records at a later date, the administration date +can be later than the actual date the diagnosis was observed. In those +cases, the observation date has to be used. If there is only an +administration date, it is in some cases arguable to use the visit date +for the diagnosis date.
+Another suggestion would be to investigate if the failures are +related to ‘History of’ conditions. It is often the case that a +patient’s history is recorded during a visit, in which case they may +report a diagnosis date prior to the given visit. In some cases it may +be appropriate to conserve these records; the decision to do so will +depend on the reliability of the recorded date in your source data.
If the failure percentage of withinVisitDates is high, a data user +should be careful with using the data. This check might indicate a +larger underlying conformance issue with either the event dates or +linkage with visits. At the same time, there might be a valid reason why +events do not happen within 7 days of the linked visit.
+Make sure to understand why this check fails. Specifically, be +careful when using such data in outcomes research. Without specific +dates for an event, it is challenging to determine if an adverse event +occurred after a drug exposure, for example.
+Note that this check specifically compares event dates to
+VISIT_OCCURRENCE
dates. There is no equivalent check for
+VISIT_DETAIL
that verifies whether the event date is within
+(a week of) the visit detail start and end dates.
Site built with pkgdown 2.0.7.
+Site built with pkgdown 2.0.9.
The description of the data quality check
A dataframe containing the table checks
A dataframe containing the field checks
Should the SQLs be executed (FALSE) or just returned (TRUE)?
A dataframe containing the table checks