Ingredio application

Enhancing the food & cosmetics OpenAIRE Research Graph for consumer health

Ingredio application is a natural processing language (NLP) application that offers a pipeline of three services related to biomedical text. The application is able to classify biomedical text based on certain features of its content, extract compound names and infer causal relations from the text, however it is experimental and is not meant to replace human curation. It's main use is to showcase how this can be used as a high-throughput and high precision language filtering software for large scale biomedical data. The codebase of the application can be found here.

Usage

While each stage can be used independently, the application facilitates the sequential usage in its three stages (Classification, Entity Extraction and Causality Inference). Each stage involves submitting text. If the query bears results, the text is forwarded to the next stage filling all the required information for the submission of the next stage.

Classification

The classification stage of the application employs different machine learning models that were trained independently with the aim to be able to classify biomedical text according to its relevance with toxicity of compounds found in foods and cosmetics. This stage is based on combining four different ML algorithms to reach a consensus regarding the classification of the text.

Extract compound names from biomedical text

The entity extraction stage, is able to find names of chemical compounds in biomedical text, classifying stretches of characters as entities that denote compound names. This stage is based on the BERT model. The user again similarly with the first stage inputs its text to the text area input field and the model.

If compound names are identified:

The compounds are shown in the output below the input field.
The text is fed into the next stage text area input field.
The compounds names are fed into the compounds field of the next stage.

Causality Inference

The Causality Inference stage is able to determine if compounds found in a biomedical text cause any adverse effect defined by the user. This stage is also based on the BERT model. There are three inputs in this stage:

The input text.
A list of compound names.
A list of adverse effects.

The list of compound names and adverse effects can be filled automatically by running the second stage. The adverse effects can be expanded or removed by the user.