ANALYTICS FOR THE HEALTHCARE ENVIRONMENT

 

TEXTUAL ETL – PROCESSING MEDICAL RECORDS

Since the beginning doctors have communicated their thoughts, prescriptions and diagnoses in the form of text, even in EMR.

The problem with text – narrative - is that it cannot be easily analyzed and understood by the computer.

In order for text to be easily analyzable by the computer, text must be read and transformed into a standard data base format.

Healthinfomap uses a proprietary technology known as “Textual ETL”. Textual ETL reads medical records in a narrative format and transforms those records into a standard data base format. A unique feature of textual ETL is the ability to find and identify the context of text. In doing so, healthinfomap presents text and context in its data bases. For all practical purposes textual ETL “normalizes” text.

Another feature of textual ETL is to take the “normalized” text that is created from the reading of the medical document and to restructure the text into a relational data base in a form that is recognizable to the analyst. In doing so textual ETL greatly facilitates the ability of the analyst to quickly understand the data and to use it analytically inside a computer.

Textual ETL enables to:

  • Operate on any form of text – formal, slang, doctor’s notes, etc.;

  • Operate in many languages – English, Spanish, Portuguese, and

    many others;

  • Operate on output from OCR and voice transcription;

  • Produce output in any standard dbms – Oracle, Teradata, SQL Server, DB2, Netezza, Hadoop, and others.

  • Operate in a parallel manner, so that the output is not constrained by the capacity of a machine.

Textual ETL makes use of 67 (and counting) different internal algorithms to read text, interpret text, and identify the context of the text. The algorithms used by textual ETL are able to be dynamically controlled by the operator and the software itself.

Textual ETL makes use of externally created taxonomies. In addition textual ETL has its own set of taxonomies and has a suite of tools for building and maintaining custom built taxonomies.


Bill Inmon, Chief Data Structuring Officer, healthinfomap

Bill is a serial entrepreneur, writer and a computer scientist, recognized by many as the father of the data warehouse. In July 2007, Inmon was named by Computerworld as one of the ten people that most influenced the first 40 years of the computer industry. In 1991 he founded the company Prism Solutions, which he took public. In 1995 he founded Pine Cone Systems (renamed Ambeo later on). Inmon was the creator of the Government Information Factory, as well as Data Warehousing 2.0. More recently Inmon has developed the technology for including unstructured textual data into the data warehouse - the world's first "textual ETL” and "textual disambiguation".