Macrogen: Genetic Counseling Chatbot (GenTok AI; chatGENE AI)

🧑Example of AX from the Customer Perspective: Genetic Counseling Chatbot (GenTok AI; chatGENE AI)

Executive Summary

With the rapid growth of direct-to-consumer (DTC) genetic testing, services like GenTok AI and chatGENE AI deliver genetic information and test results directly to customers. This expansion, however, has created new challenges: a dramatic increase in counselor workload, the need for specialized knowledge to interpret complex results, and difficulties in searching for information within lengthy PDF reports.

Key Contributions

A. Problem Identification and Project Planning: Identified the core challenge of scaling genetic counseling services to meet rising DTC demand and led the overall project planning and strategy.

B. LLM Chatbot Development: Designed and developed a domain-specific chatbot using the OpenAI API, ensuring accurate, accessible, and domain-restricted explanations of genetic test results.

C. Web Interface Implementation: Built an interactive web application using Streamlit, enabling seamless user interaction and rapid deployment.

D. Cloud Deployment and Infrastructure: Deployed the solution on AWS, utilizing AMI, ASG, and ALB for scalability, reliability, and maintainability, while also managing cloud security and monitoring.

Achievements

Successfully launched the AI service, enabling customers to ask follow-up questions about their genetic test results and receive clear, domain-accurate answers.

Significantly reduced the workload on human counselors through automation.

Improved overall customer experience and satisfaction.

Established a scalable solution essential for maintaining service quality as the business expanded into the DTC market.

Introduction, Problem, and Goal

Introduction

Advancements in sequencing and array technologies have led to a consistent decrease in the cost of genetic testing each year. As genetic testing becomes more affordable and our understanding of genetics deepens, the concept of utilizing genetic information to achieve personalized medicine and healthcare is rapidly gaining traction. This trend has fueled the growth of direct-to-consumer (DTC) genetic testing, which is now expanding beyond traditional clinical DNA testing. At Macrogen, we provide genetic counseling support for individuals who have undergone genetic testing. However, the increasing popularity of DTC genetic testing has introduced new challenges, particularly in managing the growing demand for counseling and ensuring that customers can understand the often complex information contained in their genetic test results.

Problem

The surge in DTC genetic testing has significantly increased the workload for genetic counselors, making it difficult to provide timely and high-quality support to all clients.

Genetic test result reports often contain complex information that requires specialized knowledge to interpret, which many customers find difficult to understand without expert guidance.

Goal

To automate and enhance the genetic counseling process using Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) technologies, thereby reducing the burden on human counselors.

To provide clear, accurate, and accessible explanations of genetic test results, empowering customers to understand their genetic information regardless of their background knowledge.

To maintain or improve the quality and accuracy of information provided to customers while scaling counseling services to meet increasing demand.

Technical Overview

Large Language Models (LLMs)

OpenAI (primary LLM)
Llama (PoC stage)
Gemma (PoC stage)

Web Interface Framework

Streamlit (for interactive dashboard and AI interface)

Data Processing & Storage

LangChain (for document embedding and retrieval, initial phase)
ChromaDB (vector database, initial phase)

Parquet format (for data storage in ChromaDB)

OpenAI native file search (current phase)

Cloud Infrastructure (AWS)

AWS AMI (Amazon Machine Image, for easy updates)
AWS ASG (Auto Scaling Group, for scalability)
AWS ALB (Application Load Balancer, for scalability and routing)
AWS WAF (Web Application Firewall, for security)
AWS CloudWatch (for logging and monitoring)

Problem-Solving in Action: Insights from Overcoming Project Hurdles

1. Framework Selection and Frontend Development

Problem:

My limited experience with frontend development made it difficult to quickly build an interactive web application for LLM-based services. I needed a solution that would allow me to leverage my Python expertise without getting bogged down by frontend complexities.

How I Solved It:

I systematically evaluated several frameworks—Streamlit, Dash, and Gradio—by building small prototypes with each. Through this process, I identified that Streamlit provided the most seamless integration with Python and the fastest path to a functional, interactive interface. This allowed me to focus my efforts on LLM integration and backend logic, rather than spending excessive time on UI development.

2. Ensuring Domain Accuracy and Restriction in LLMs

Problem:

In the biology and healthcare domain, it is critical that LLM outputs are both accurate and restricted to the appropriate context. LLMs can sometimes generate off-topic or imprecise responses, which is unacceptable in this field.

How I Solved It:

I implemented a Retrieval-Augmented Generation (RAG) approach, embedding user test results and internal documentation directly into the model’s context. I also invested significant effort in prompt engineering, iteratively refining prompts to guide the model’s responses. To further safeguard accuracy, I considered integrating a verdict AI to validate outputs. This multi-layered approach ensured that the LLM’s responses remained both accurate and domain-specific.

3. Addressing the “Lost in the Middle” Problem

Problem:

When processing long documents, LLMs sometimes overlook or forget information presented in the middle sections, a phenomenon known as “lost in the middle.” This risked missing critical details in user reports or lengthy inputs.

How I Solved It:

I tackled this by chunking documents into smaller, logically organized sections and carefully structuring the context fed to the model. I also prioritized the placement of key information to ensure it was always within the model’s attention window. Additionally, I kept up with advancements in LLM architectures, adopting newer models as they improved context handling.

Sources

연합뉴스마크로젠, 유전자 AI 상담 서비스 오픈 베타 시작 | 연합뉴스

마크로젠, 유전자 AI 상담 서비스 오픈 베타 시작 | 연합뉴스

(서울=연합뉴스) 김현수 기자 = 디지털 헬스케어 기업 마크로젠[038290]이 유전정보 기반 건강관리 플랫폼 '젠톡'(GenTok)에 생성형 ...

THE AI[창간 5주년 특집] 이승빈 마크로젠 CSO “30억 개 유전자, AI가 읽고 정밀의학이 답한다”

[창간 5주년 특집] 이승빈 마크로젠 CSO “30억 개 유전자, AI가 읽고 정밀의학이 답한다”

[편집자 주] 조선미디어그룹이 설립한 인공지능 전문매체, ‘더에이아이(THE AI)’가 창간 5주년을 맞이했습니다. THE AI는 생성형 AI 열풍이 불기 전부터, AI 가능성과 한계를 탐구하며 깊이 있는 취재와 분석을 이어왔습니다. 이번 5주년 특집에서는 국내외 AI 석학 및 전문가

itbusinesstodayKEAN Health Launches AI Search for Genetic Testing

KEAN Health Launches AI Search for Genetic Testing

KEAN Health Inc. has added a new feature, "Tell me, AI Search Chat Genie," to its all-inclusive genetic test, "chatGENE Pro."

PR TIMES日本初！AI検索が遺伝子検査に搭載｜全てを備えた遺伝子検査「chatGENE Pro（チャットジーンプロ）」

日本初！AI検索が遺伝子検査に搭載｜全てを備えた遺伝子検査「chatGENE Pro（チャットジーンプロ）」

株式会社KEAN Healthのプレスリリース（2025年2月4日 15時00分）日本初！AI検索が遺伝子検査に搭載｜全てを備えた遺伝子検査「chatGENE Pro（チャットジーンプロ）」