Healthcare data collection has a very long history of people trying to collect data on the frequency of different conditions and causes of death. A major issue is encapsulating clinical conditions (which can be complex) into logical categories that are useful for large scale population data, while still be adequately able to describe clinical conditions. A way to address this is clinical coding.
Clinical coding uses a consistent set of terms to describe different conditions. For example the haematological malignancy multiple myeloma is alternatively called plasma cell myeloma, myeloma, myelomatosis or Kahler’s disease, as well as foreign language variants. Furthermore, this condition may be confused by a lay reader with myeloid leukaemia, myelomalacia or myelodysplastic syndrome, which are a different conditions entirely. Accurately recording these terms and interchangeability between them is a challenge for computers, as computers do not understand the context of these terms, or why they are different conditions.
To get around this problem clinical coding is used. Instead of being called multiple myeloma in clinical codes, the condition is given an ICD-10 (one of a number of clinical coding systems) code of C90.0. This use of a single code to describe a condition allows statistics from different sites (which may use different terminology) to be compared and compiled. Coding data consistently allows large datasets to be created from multiple sites, this allows epidemiological data to be collected and analysed, which is important for public health, as it allows the accurate monitoring of conditions and the development of public health policies to reduce their incidence. The benefit of coding in this context is that the data are computer readable, meaning they can be processed by data analysis pipelines.
The main issues with clinical coding arise from the fact that computers and humans assimilate information differently. Humans read information and form meaning from it in the context of other information they’re reading. So a number would be read with a street name to infer an address. Computers do not assimilate information this way, they only store and process data, which is information without any context. The only context a computer gets from information is what it is has been programmed to do with it by a human who has programmed the software.
This difference in how humans and computers process information means that clinical coding systems that are logical for humans may not be logical for computers, and vice versa. The SNOMED clinical coding system attempts to assigns a code to each treatment, condition, investigation or object in a healthcare setting. This means that there are codes that describe concepts that are never encountered, for example “Judicial Execution by Guillotine” (something that hasn’t taken place for over 40 years) has a code 23791009, but there is no code for post radiation thyroiditis.
ICD-10 uses an alternative approach that considers builds codes from components, so that a code contains descriptors of relevant information that can be combined to describe the clinical condition. Unfortunately this has problems as the number of codes possible can be expanded massively, with often illogical outcomes. For example the code V31.22 describes the incident “Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income”, which sounds silly, but is nevertheless a legal clinical code.
Ultimately no clinical coding system is perfect, however the key is consistency in the coding system used so that data can be collected and used from multiple sites for the benefit of public health services as a whole.