使用genomeagent镜像

智能体AI Agent，被设计为具有独立思考和行动能力的AI程序，只需提供一个研究目标和数据路径，即可自适应生成一个任务序列执行分析任务。

生信智能体AI Agent可以应对激增的多组学数据分析需求、高成本的生物信息学培训和满足不同应用场景的个性化需求，为多领域生命科学研究提供高效、定制化的数据分析支持。

使用genomeagent镜像的详细步骤如下所示：

步骤1：订阅镜像
步骤2：创建Notebook
步骤3：进入jupyterlab环境使用genomeagent智能体框架

步骤1：订阅镜像

使用genomeagent镜像前，需要您在资产市场中订阅该镜像到项目中。

登录医疗智能体，进入基因平台。
在“资产市场”中查找“genomeagent”镜像。
单击界面右侧“订阅”图标，订阅该镜像。
订阅的镜像将显示在“项目管理 > 镜像”页面的镜像列表中。

步骤2：创建Notebook

在“项目管理 > 项目名称 > 开发”页面，单击“创建Notebook”，参考表参数说明填写信息。

表1 参数说明
参数名称	说明
名称	Notebook的名称。
描述	Notebook的简要描述。
镜像类型	选择“自定义”镜像。
工作环境	选择“genomeagent”镜像。
CPU	设置CPU为4.0核。
GPU	设置GPU为0。
内存	设置内存大于8G。
存储路径	单击“存储路径”右侧文件夹图标，设置用于存储Notebook数据的OBS路径。如果想直接使用已有的文件或数据，可将数据提前上传至对应的OBS路径下。用户在Notebook列表的所有文件读写操作是基于所选择的当前项目OBS路径下的内容操作，暂不支持引用的项目数据路径。

参数填写无误后，单击“立即创建”，创建Notebook。

步骤3：进入jupyterlab环境使用genomeagent智能体框架

选择步骤2中状态为“运行中”的Notebook实例，单击“操作”列的“打开”访问Notebook，请参考打开Notebook。

进入Notebook的根目录下“genomeagent_examples”文件夹中，包含使用“genomeagent_examples”进行分析的示例，可供参考，以下均以example.ipynb教程示例进行。

表2 genome-agent示例
示例名称	说明
example.ipynb	jupyter形式的代码示例。
example_with_user_modify.py	与example.ipynb内容基本相同，为python脚本形式示例。

配置eihealth-toolkit（见example.ipynb中配置提示，需要添加config以及切换到当前目录）。
- 初始化配置请参考《医疗智能体-CLI命令速查》中的初始化配置章节。
- 切换项目，请参考《医疗智能体-CLI命令速查》中的切换项目章节，运行成功时会提示：
```
health client config file url:   /home/health-user/.health/config.ini  
add ak successfully!  
add sk successfully!  
add region successfully!  
add platform-id successfully!  
add user-name successfully!  
add password successfully!  
add domain-name successfully!  
add obs-endpoint successfully!  
add obs_install_path successfully!  
switch to project xxxxx successfully! 
```
  镜像本身已默认安装health-linux-x86_64版本，如需其他版本请参考eihealth-toolkit使用说明。

运行示例：

例如用户在/path/to/examples目录下存在以下数据文件和结构。镜像自带数据样例，位于~/work/genomeagent_examples/case1。

$ls ~/work/genomeagent_examples

    case1/
        ├── data
        │         ├── mm39.fa
        │         ├── mm39.ncbiRefSeq.gtf
        │         ├── SRR1374921.fastq.gz
        │         ├── SRR1374922.fastq.gz
        │         ├── SRR1374923.fastq.gz
        │         ├── SRR1374924.fastq.gz
        │         └── TruSeq3-SE.fa
        └── output

用户任务的设置配置为config_data，见example.ipynb教程代码框。

# 用户输入任务相关参数以 key-value 字典形式提供
# data_list：存储在平台项目的OBS中的数据文件路径，所在项目与当前开发环境项目需一致，建议为相对路径且在开发环境中当前python进程工作目录下能访问；以及提供自然语言表述的数据含义。
# output_dir：存储在平台项目的OBS中的数据输出路径，所在项目与当前开发环境项目需一致，建议为相对路径且在开发环境中当前python进程工作目录下能访问；以及提供自然语言表述的数据含义。
# 任务目标goal_description，提供自然语言表述的生信任务目的
# 相关信息meta，可以提供额外信息给模型辅助完成任务，相关参数：GenomeAgent(use_meta=True)即可纳入meta信息给模型， 默认为False，不采用meta信息。
config_data = {
  # 必须，描述数据路径和数据含义
    'data_list': {
        "./case1/data/SRR1374921.fastq.gz": "single-end mouse rna-seq reads, replicate 1 in LoGlu group",
        "./case1/data/SRR1374922.fastq.gz": "single-end mouse rna-seq reads, replicate 2 in LoGlu group",
        "./case1/data/SRR1374923.fastq.gz": "single-end mouse rna-seq reads, replicate 1 in HiGlu group",
        "./case1/data/SRR1374924.fastq.gz": "single-end mouse rna-seq reads, replicate 2 in HiGlu group",
        "./case1/data/TruSeq3-SE.fa": "trimming adapter",
        "./case1/data/mm39.fa": "mouse mm39 genome fasta",
        "./case1/data/mm39.ncbiRefSeq.gtf": "mouse mm39 genome annotation"},
  # 必需，指定该次任务的输出目录
    'output_dir': './case1/output',
  # 必需，描述该次任务的目的，例如find the differentially expressed genes
    'goal_description': 'find the differentially expressed genes',
  # 可选，提供给大模型的辅助参考meta信息，也可以完全为空，让大模型自主发挥
    'meta': '\
    step1, fastqc to check reads; \
    step2, cutadapt with pair-end mode; \
    step3, hisat2 build genome index and then alignment reads to mouse genome; \
    step4, samtool view and sort for converting SAM to sorted BAM, and then samtool index sorted BAM files; \
    step5, htseq to get gene expression for each biological sample (the output of htseq: the output of htseq: column 1 is gene id, column 2 is gene expression value, no header line); \
    step6, calculate the differentially expressed genes using DESeq2 ( conda install bioconda::bioconductor-deseq2 -y )'
}

自动化任务设置，配置基因平台的作业信息、大模型访问信息，见example.ipynb教程代码框。

Agent_data = {
    # 【必须】
    # GenomeAgent相关参数，
    # job_name可自定义
    # project_name为当前项目名
    # 【可选】
    # 其余参数说明见：https://support.huaweicloud.com/clilist-eihealth/eihealth_31_0032.html
    'job': {
        # 【必须】
        'job_name': 'case1',
        'project_name': 'project_name',
        # 【可选】
        'nodeLabels': '',
        'priority': '',
        'health_tool': ''
    },
    # 【必需】
    # 配合GenomeAgent(model='fromconfig')使用，配置模型的模型推理参数，预留参数接口，给需要调用的自部署模型用
    # 请注意：您调用的第三方大模型API服务，其数据传输及处理可能涉及第三方服务器。本应用及平台不对第三方服务的内容、准确性、完整性及安全性承担任何责任。
    # 请勿在使用过程中输入敏感、个人隐私或机密信息。使用本服务即表示您同意自行承担因第三方API使用可能产生的风险，包括但不限于数据泄露、服务中断或结果不准确。
    # 对于因使用第三方服务而导致的任何直接或间接损失，我们不承担任何法律责任。
    # 如您调用的第三方大模型API服务需要设置APIKEY或者headers，请参考https://platform.openai.com/docs/quickstart/step-2-set-up-your-api-key
    # genomeagent默认从环境变量中获取'USER_APIKEY/USER_HEADERS'变量内容作为访问。

    # 如果所用的大语言模型使用OpenAI模式部署的模型推理服务
    # 模型参数attributes和网络访问proxy均应该遵循你的模型服务设置
    # 例如:
    'model_params':{'type': 'openai', 'url': 'https://api.deepseek.com/v1',
        'model': 'deepseek-chat',
      # 'proxy': "http://xxx.xxx.xxx.xxx:port",
        'attributes': {'max_tokens': 4096}},
    # 如果所用的大语言模型使用request模式部署的模型推理服务
    # 请求体headers、模型参数attributes、网络访问proxy均应该遵循你的模型服务设置
    # 例如:
    # 'model_params':{
        # 'type': 'request', 'url': 'https://api.deepseek.com/chat/completions',
        # 'model': 'deepseek-chat', 
        # 'proxy': {"http": "http://xxx.xxx.xxx.xxx:port", "https": "http://xxx.xxx.xxx.xxx:port"},
        # 'attributes':{'max_tokens': 4096}},

    # 基因平台工作流的APP参数，使用说明：https://support.huaweicloud.com/clilist-eihealth/eihealth_31_0020.html
    # 【必须】
    # 'image': 'project_name/genomeagent:1.0.1'，该参数需要根据订阅的genomeagent修改，为 'image': '源项目名/镜像名:版本号'
    # 【可选】
    # 根据情况修改resources下的参数（例如cpu、memory和gpu）
    # 其余参数无需改动
    'app': {
        'image': 'project_name/genomeagent:1.0.0',
        # 根据情况修改resources下的参数（例如cpu、memory和gpu）
        'resources': {'cpu_type': 'X86', 'cpu': '4C', 'memory': '8G', 'gpu_type': '', 'gpu': '0'},
        # 其余参数无需改动
        'name': 'ga-app', 'version': 'v1.0', 'summary': '', 'description': '', 'labels': [],
        # 【不可改】
        # 注意commands与inputs字段下参数不可改动
        'commands': [ 'python3 .shell.py --cpu 0 --times 360 --interval 5 --script ${input-sh} &> ${input-sh}.log ' ],
        'inputs': [
            {'enum': [], 'name': 'input-sh', 'description': '', 'required': False, 'concurrent': '', 'type': 'FILE', 'pattern': '', 'values': []},
            {'enum': [], 'name': 'input-dir', 'description': '', 'required': False, 'concurrent': '', 'type': 'DIRECTORY', 'pattern': '', 'values': []},
            {'enum': [], 'name': 'result-path', 'description': '', 'required': False, 'concurrent': '', 'type': 'DIRECTORY', 'pattern': '', 'values': []}
        ],
        'outputs': [], 'node_labels': []
    },
    # 基因平台工作流的workflow参数，使用说明：https://support.huaweicloud.com/clilist-eihealth/eihealth_31_0026.html
    # 【必须】
    # 'app_id': 'ga-app::v1.0', 与APP参数中的'name': 'ga-app'和'version': 'v1.0'相匹配，默认相匹配，无需改动
    # 【可选】
    # 仅需要根据情况修改tasks-resources下的参数（例如cpu、memory和gpu），其余参数无需改动，注意tasks-inputs下参数不可改动
    'workflow': {
    'tasks': [{
            'app_id': 'ga-app::v1.0',
            # 根据情况修改tasks-resources下的参数（例如cpu、memory和gpu）
            'resources': {'cpu': '4C','cpu_type': 'X86','memory': '8G', 'gpu_type': '', 'gpu': '0'},
            # 其余参数无需改动
            'task_name': 'ga',                
            'display_name': '',
            'output_dir': '',
            'inputs': [
                {'name': 'input-sh', 'source': 'MANUAL', 'values': []},
                {'name': 'input-dir', 'source': 'MANUAL', 'values': []},
                {'name': 'result-path', 'source': 'MANUAL', 'values': []}
            ] } ],
            'name': 'ga-wf', 'version': 'v1.0', 'summary': '', 'description': '', 'labels': [], 'timeout': 10080, 'output_dir': ''
    }}
config_data.update(Agent_data)

请注意：您调用的第三方大模型API服务，其数据传输及处理可能涉及第三方服务器。本应用及平台不对第三方服务的内容、准确性、完整性及安全性承担任何责任。请勿在使用过程中输入敏感、个人隐私或机密信息。使用本服务即表示您同意自行承担因第三方API使用可能产生的风险，包括但不限于数据泄露、服务中断或结果不准确。对于因使用第三方服务而导致的任何直接或间接损失，我们不承担任何法律责任。

将镜像中的genomeagent_examples/case1示例文件拷贝到本开发环境所在项目的OBS桶中，见example.ipynb教程代码框。

# 获取本开发环境项目的OBS桶目录，复制 'examples' 文件夹并切换到目标目录
import os
import shutil
project_name = config_data['job']['project_name']
shutil.copytree(os.path.expanduser("~/work/genomeagent_examples"), os.path.expanduser(f"~/work/{project_name}/genomeagent_examples"), dirs_exist_ok=True)
os.chdir(os.path.expanduser(f"~/work/{project_name}/genomeagent_examples"))
os.getcwd()

加载GenomeAgent，见example.ipynb教程代码框。

# 如果有需要，可以将以上配置保存为 YAML 文件
# import yaml
# with open('./case1.1/config.yaml', 'w') as outfile:
#     yaml.dump(config_data, outfile, default_flow_style=False, sort_keys=False)
from genomeagent import GenomeAgent
# 用config中model_params指定模型和接口等参数：model='fromconfig'
# 输出并执行代码：execute=True
# 用户不参与修改：user_modify=False （交互式用户修改目前仅支持python解释器或者python脚本下运行）
# 用EIFlow投递作业：local_mode=False
ga = GenomeAgent(config=config_data, model='fromconfig', execute=True, user_modify=False, local_mode=False, use_meta=True)
# 也可以从保存的文件中读取config
# ga = GenomeAgent(config='./case1.1/config.yaml', model='fromconfig', execute=True, user_modify=False, local_mode=False, use_meta=True)
# 开始执行自动任务
ga.run()