Analysis and Strategy of Tools to Improve Remote Interactions and Document Management

CERSI Collaborators: Kunpeng Zhang, Jim Polli, Bill Bentley, Keiasia Robinson

FDA Collaborators: Ralph Bernstein, Mihir Jaiswal

Project Start: January 27, 2023

Regulatory Science Challenge

Although the COVID-19 pandemic made in-person interactions difficult, FDA needed to continue its mission of approving drug applications and ensuring the quality and availability of drugs in the US market. Inspections and meetings with the pharmaceutical industry and other stakeholders are essential for FDA to assess and surveil drug products and manufacturing facilities. FDA and CDER regularly employ remote interactions to supplement these in-person operations and inform regulatory decision-making. Remote interactions include audio and video connections, as well as both internal and external document sharing.

The COVID-19 pandemic prompted FDA to expand the use of remote interactions to continue engaging with industry and stakeholders. To conduct remote interactive evaluations (RIEs), such as remote livestreaming video of operations, teleconferences, screen sharing, and other remote interactions, FDA has been using non-integrated technologies without a common interface; these components are not able to communicate with one another. FDA needs improved strategies, tools, and technologies to improve the quality of remote and post-remote interaction activities. FDA will utilize artificial intelligence (AI) to improve upon four identified areas that will support remote interactions, specifically RIEs; transcription, translation, document and evidence management, and co-working spaces.

FDA uses the term AI to describe a branch of computer science, statistics, and engineering that uses algorithms or models to perform tasks and exhibit behaviors such as learning, making decisions, and making predictions. FDA has been developing and adopting many tools that utilize AI. These tools include language-based concepts such as natural language processing and development of MedDRAi encoders for patient narratives, as well as machine learning image analysis techniques to help ensure and predict drug quality. However, those tools and technologies may have limitations when applied to contexts specific to FDA’s scientific and regulatory activities. To overcome this problem, transfer learning is used, which will teach computers to recognize patterns and features from one setting and apply it to a new setting, with the help of human feedback. This approach will be used to build one or more AI models that can organize and understand different types of regulatory and scientific documents and evidence used in remote interactions, such as audio, video, images, and documents, all in one place.

Project Description and Goals

This project aims to improve four major tools used in remote interactions, as identified by FDA. These include transcription, translation, document and evidence management, and co-working space.

Automatic speech recognition has been widely used in many applications. Its cutting-edge technology is a transformer-based sequence to sequence (seq2seq) model, which is trained to generate transcripts autoregressively and has been fine-tuned on certain datasets. Researchers usually use "seq2seq" in machine translation to convert a sequence of text from one language to another. The original text is translated into a language that a computer can understand. Then, the code is used to generate the translated version of the text. It is similar to a translator who listens to someone speak in one language and then repeats what is said in another language.

Using pre-trained language models directly, however, may not be suitable in remote interactions tools because the models may not work properly with different accents and specialized regulatory and scientific terminologies. This is because the models are trained on a specific type of data and may not be able to handle data that is significantly different from what they were trained to use. To address this, researchers plan to manually read a set of video/audio recordings to obtain their true transcripts. Researchers will then fine-tune the model to make it adapt to this new domain.

Additionally, it is not appropriate to directly apply the existing pre-trained seq2seq models in this FDA setting because (a) some languages used in the FDA context might not exist in current models; and (b) domain specific terms used at FDA are very different from general colloquial languages. To tackle these challenges, models in this project are trained for some specialized languages and then fine-tuned with pre-trained models for major/common languages. For both situations, researchers prepare high-quality training sets labeled by experts. The University of Maryland CERSI (M-CERSI) plans to build a system to manage different documents and evidence, by implementing three sub-systems: (a) document classifier, (b) video/audio classifier, and (c) an interactive middleware that connects the trained model at the backend and the input at the frontend. With this, all documents created during co-working can be shared and accessed by all participants.