
Recently, I was involved in one of the projects as a consultant in information systems. One of the tasks of this project was the collection and analysis of information on emergency situations, which can have a major impact, including biosafety. These can be outbreaks of infectious diseases, poisoning, etc. These can be outbreaks of infectious diseases, poisoning, etc. As you know, the severity of the negative consequences depends on how quickly and accurately react to appropriate services.
If you have no time to read, you can go directly to the site of the project. You will find a link at the end of this publication.
Where can I get the information I need?
It is not worth explaining that the information component plays a key role. It is important to get objective data as early as possible in order to analyze it and make an adequate decision. So, there are key points – the timeliness and accuracy of operational data. The question arises – where to get it? Most likely, such data are available in medical institutions.
Unfortunately, there is no way to collect such information in real time in medical institutions, since the information system that will provide such functions has not been implemented. What remains – The Internet and professional contacts of the staff of this project.
Indeed, several employees monitor the Internet every day, catching news in search engines (mainly by Google) by keywords.
Human factor
It seemed to me that it was extremely ineffective for a number of reasons and primarily due to the human factor. It’s hard for the average person – the whole day to do a search for many keywords and then learn all the news, analyze them and, if relevant, to save and do a brief description of each news item. Such an employee should never get tired and be very qualified for an independent correct assessment of the event. By the way, I have not found clear criteria, that allow to unambiguously determine the belonging of an emergency situation to the required category. This means that each employee, at his own discretion, determines the importance of each news, that the search engine offers.
It is obvious, that the effectiveness of such an approach does not meet the required level. As a result, the necessary information will not be received in time and the measures taken, may not correspond to the level of the problem.
Technology comes to the rescue
As can be seen, here it will be useful technologies that will help as much as possible to eliminate the human factor-technology Artificial Intelligence (AI).
The first step is the process of obtaining information and its analysis should be divided into stages, to know exactly where and what kind of AI technology will be most appropriate. It was necessary to expand the range of keywords. For this purpose it was used normative documents, which is given a detailed list of dangerous infectious diseases. An offline collection of information for these keywords has been configured. The sources of information were selected news aggregators and online news resources in each region. It turned out about a hundred sources, including from the territories of the countries with which the region borders.
Internet – date source of information
In Ukraine, Internet penetration already exceeds 75% and high growth rates continue to persist. This means that we will find information much faster on the Web until we wait for official notifications. Previously, I have listed the problems that do not allow to effectively collect relevant information in medical institutions. Therefore, today the Internet should be considered as an important source of information for primary monitoring.
At the next stage, it was necessary to analyze the selected large amounts of information. It was necessary to complete at least two tasks – to classify the text and make a summary of articles in order to remove unnecessary information.
About natural language processing
Summarizing text, this is a Natural language processing (NLP) task in which we are trying to create a resume. Given the type of information processed, there is no room for interpretation and creativity. Therefore, it is necessary to convey the essence of the message as accurately as possible.
I did not find ready-made corpora for work with this topic in the public domain, so I had to form them myself. After some time, when the amount of information the database will be sufficient, neural network training will be conducted on these data, as long as it was necessary to form a synthetic dataset and its train a neural network. For better validation, I used topic modeling. Subsequently, the information analysis was carried out by two models – based on thematic modeling and text classification by a neural network. If in both cases we have a positive result, such an emergency situation receives a high degree of risk. If only one method gives the criteria for a biohazard, then it is assigned a medium risk. In other cases, the minimum risk is indicated. Thus, all the information that matched the search criteria, divided into three categories – high, medium and low risk.
Neural networks
At the initial stages of work, this information will contain a lot of “noise”. This will be until a sufficient base is collected for training the neural network.
Neural networks are used to solve complex problems that require analytical calculations such as those that make the human brain. The most common applications of neural networks are text classification. This is what we are using to solve one of the tasks set.
According to the rate of filling the database, it is supposed to be expected about 5-6 months. Until then, operator participation is required. He will, if necessary, specify the category of the emergency situation. In addition, I made it possible to manually enter information about the emergency situation and its attributes.
Project participants and employees of medical institutions, regardless of their territorial location, can enter information into the system. This means that in addition to monitoring the Internet, an opportunity has been created to obtain the most valid data from specialized sources.
Results visualization
The last stage is visualization. As comfortably as possible to see events for a certain period in specific areas. This is best done on an interactive map. I will not describe in detail the process as imaging options can be many and it is more a matter of preference. In my opinion, what is more important here is what we have “under the hood” and not in the picture. At a minimum, the system allows us to identify other important objects that fall into the zone of influence of an emergency. The size of such a zone depends on the belonging of an emergency situation to one or another category.
In fact, there is no limit to perfection and everything depends only on the requirements of the customer, and in this case I was my own customer. I had to create algorithms for the behavior of the system, since there were no specific wishes. I probably hurried and went ahead, but I still have to wait a while.
Well, while we are waiting, you can see the current model of the system described above. To do this, follow the link: Public Health AI