Natural Language Processing for Healthcare

NLP is now making the difference

Humanity produces terabytes of text every single day, but what is its value? Machines understand only numbers and bits, so the techniques of Natural Language Processing bridge the gap between human written free text and machine understandable structured data.

Health Care organizations collected Electronic Health Records for years, in a free text format. They are sitting on a huge amount of data, but they can’t make them valuable, as they are not useful neither for clinic, nor for research purpose.

Data from the past can feed future decision-making, let’s bring it to reality!

How do we do it

Natural Language Processing (NLP) is the field of science dealing with modelling, elaborating and interpreting text written in natural language so that a computer can understand it.

We are focused on extracting concept such as diseases, treatments, drugs and exams from medical texts.
We deal with the Italian language, even if the majority of current research is done on the English language.
For this task, called Named Entity Recognition (NER), we use state-of-the-art Machine Learning tools written in Python (spaCy/gensim), using both pre-trained language models and newly developed ones.
Our initial challenge is to create a labelled dataset in Italian to train new models, to enable algorithms to infer concepts from the linguistic structure of the text.

The database is created with open-source, web-based softwares which are easy to use and do not require prior knowledge.
The new dataset will be integrated with the semantic database DBpedia and released as open-source with the scientific community, to foster research and enable a higher level of accuracy in medical text processing.

How do we engineer it for daily practice and healthcare analytics

Reports surveillance

To ensure that the information entered in free text and structured data corresponds, and check for errors and incongruences

ICD check

To guarantee that the ICD associated with a clinical episode is consistent

Complications discovery

To discover unreported complications in surgery reports and suggest additions to the patient’s electronic medical report

Mining of legacy data

To give exponential value to the vast corpus of high quality, unstructured data available in free-text form that comes from legacy databases and systems

NLP for Clinical Surveillance

realtime alarm on semantic incongruencies