Jan Sevcik, Co-Founder & CEO/CTO, Medical Search Technologies
More often than not, a company begins its journey from a simple idea. An idea that escalates from mitigating a problem or two into taking a more structured form to cater to the market needs. Medical Search Technologies (MST), the company that crunches data to offer actionable insights to the healthcare providers, traces the same path. MST is the brainchild of Jan Sevcik, a technologist and a visionary. While working as a CIO for a public company, he developed an illness and was not responding well to the treatment. Looking out for options to cure his illness, he embarked on a quest to search for additional information that would help to enhance his treatment. “I knew that there are databases like PubMed that hold a wealth of information in the form of medical journals. However, when one searches for a query, there are thousands of results that pop up, making it almost impossible to read all the articles,” says Sevcik. Leveraging his experience of building robust algorithms to solve specific challenges, he started a personal text-mining project on medical journal data. Wherein the task was to develop a searching algorithm that can relate to the query and bring out the results in a more intelligent and relational way, rather than just simply finding the keyword of the searched query. What started as a project to enhance his treatment, resulted in an interesting searching algorithm that was able to find the relevant data, helping him restore his health to normal in the process. Realizing the potential that the algorithm holds, and the void present in the market for this kind of solution, Sevcik cemented the idea to establish MST.
The company offers ARE4, an AI backed, natural language processing (NLP) and information extraction (IE) technology for the healthcare industry that creates structured data from raw, unstructured text files. The type of files which are often found in electronic medical records, pathology reports, PubMed and many other medical data warehouses. ARE4 can process unstructured data like patient progress notes and radiology report and transform into a usable data format. Additionally, it is fully capable of tackling problems such as extracting the information from handwritten physician prescription, which is often incomplete and hard to read or retrieving the history of the patient population responding to various therapies. ARE4 processes the many errors contained in the raw data like headers, incomplete and run on sentences, misspellings, as well as punctuation issues. Furthermore, the AI-backed platform can dive deep to search the typed query quickly and effectively, for example, searching for tumor size, that surpasses the simple search, and brings out the granular level of details, which will enable the researchers, physician, and scientists to gain insights to enhance precision medicine.
“What really sets us apart is that our solution is a combination of traditional NLP and machine learning versus just pure machine learning, which helps us deal with the messy sentence structures,” points out Sevcik.
The company has invested time and efforts to train their solutions with millions of medical records, which helps the AI to understand the complexity inherent in the unstructured medical text. Once the data is fed into the powerful algorithm, it cleans the abbreviations, issues with named entities such as genes, and much more not found in typical layman text. For example, if we have to determine who is going to be the next President based on tweets, we do twitter sentiment analysis. The first step is to remove all the special characters (@, #, &, $) from the tweet to make the data clean and make the machine understand the text. This is exactly what ARE4 replicates; without any manual labor involved, it cleans and interprets the text. The next step, after the text is processed, is to connect the dots, that is, to establish a relationship between the data. Here, instead of using a typical relational SQL system, MST harnesses graph database. A typical relational database stores the data in the form of tables, which is fine, but as the number of tables multiplies, it escalates the problem of handling the data due to the increasing number of keys and joins (which are expensive) that prolongs the time taken to process the queries. On the other hand, the graph database works by storing the relationship along with the data. Each node is physically linked in the database, allowing it to have unprecedented speed. Moreover, this connected relationship, using the resource description framework (RDF) gets stored in triplestore, a specialized database used for storing “triples”—a data entity composed of a subject, a predicate, and an object—to enable lighting fast and accurate search results.
What really sets us apart is that our solution is a combination of traditional NLP and machine learning versus just pure machine learning, which helps us deal with the messy sentence structures
The icing on the cake is the layer of Confidence Scoring Engine (CSE) present in the ARE4 solution. Here, CSE analyzes the physician’s level of confidence in their report, using a machine-learning algorithm. Hence, each aspect of the report is examined for the confidence level of the physician making the report. For example, words like ‘for sure’ in a report will result as a confirmed yes and high confidence score, whereas the words like ‘maybe,’ ‘most likely’ will be scored low based on the calculations done by the algorithm. Going an extra mile with the underlying technology translates into the users having a robust system that enables them to unlock the potential of unstructured data.
In times when technological advancements are necessary to stay ahead of the pack, MST continually strives to work on various enhancements, be it their ingestion of data or analytics solution. Among multiple things in the workshop, one they are keenly working on is Adverse Event Reporting that correctly receives, tracks, mines a patient’s prescribed medication, and analytically arrives at a conclusion as to what needs to be reported and what is irrelevant to report. Although the big picture revolves around the sophisticated search, the company with its various add-ons is set to disrupt many systems and processes currently used in the healthcare industry.