Python 如何快速识别 Snakemake 中的规则是否需要输入函数

发布于06月29日

我正在关注他们文档页面上的snakemake教程，并真正理解了输入函数https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#step-3-input-functions的概念

基本上，他们对config.yaml的定义如下:

samples:
  A: data/samples/A.fastq
  B: data/samples/B.fastq

和Snakefile，无任何输入功能:

configfile: "config.yaml"

rule all:
    input:
        "plots/quals.svg"

rule bwa_map:
    input:
        "data/genome.fa",
        "data/samples/{sample}.fastq"
    output:
        "mapped_reads/{sample}.bam"
    threads: 12
    shell:
        "bwa mem -t {threads} {input} | samtools view -Sb - > {output}"

rule samtools_sort:
    input:
        "mapped_reads/{sample}.bam"
    output:
        "sorted_reads/{sample}.bam"
    shell:
        "samtools sort -T sorted_reads/{wildcards.sample} -O bam {input} > {output}"

rule samtools_index:
    input:
        "sorted_reads/{sample}.bam"
    output:
        "sorted_reads/{sample}.bam.bai"
    shell:
        "samtools index {input}"

rule bcftools_call:
    input:
        fa = "data/genome.fa",
        bam = expand("sorted_reads/{sample}.bam",sample=config['samples']),
        bai = expand("sorted_reads/{sample}.bam.bai",sample=config['samples'])
    output:
        "calls/all.vcf"
    shell:
        "bcftools mpileup -f {input.fa} {input.bam} | "
        "bcftools call -mv - > {output}"

rule plot_quals:
    input:
        "calls/all.vcf"
    output:
        "plots/quals.svg"
    script:
        "scripts/plot-quals.py"

在本教程中，他们提到此扩展发生在初始化步骤中:

bam = expand("sorted_reads/{sample}.bam",sample=config['samples']),
bai = expand("sorted_reads/{sample}.bam.bai",sample=config['samples'])

在此阶段，无法确定规则bwa_map的FASTQ路径.然而，如果我们按原样运行，代码会工作，为什么会这样？

然后，他们建议使用输入函数将bwa_map延迟到下一阶段(DAG阶段)，如下所示:

def get_bwa_map_input_fastqs(wildcards):
    return config["samples"][wildcards.sample]

rule bwa_map:
    input:
        "data/genome.fa",
        get_bwa_map_input_fastqs
    output:
        "mapped_reads/{sample}.bam"
    threads: 8
    shell:
        "bwa mem -t {threads} {input} | samtools view -Sb - > {output}"

我真的很困惑，输入函数什么时候有意义，什么时候没有？

def get_genome(wildcards): if wildcards.name in ['bob', 'alice']: return 'human_genome.fa', if wildcards.name in ['mickey', 'jerry'], return 'mouse_genome.fa', rule align: input: reads= '{name}.fastq', genome= get_genome, output: bam= '{name}.bam', shell: ...

Python 如何快速识别 Snakemake 中的规则是否需要输入函数

推荐答案

Python相关问答推荐

滚动和，句号来自Pandas列

将jit与numpy linSpace函数一起使用时出错

使用索引列表列表对列进行切片并获取行方向的向量长度

numpy卷积与有效

如何根据一列的值有条件地 Select 前N个组，然后按两列分组？

将JSON对象转换为Dataframe

如何使用scipy的curve_fit与约束，其中拟合的曲线总是在观测值之下？

如何并行化/加速并行numba代码？

isinstance()在使用dill.dump和dill.load后，对列表中包含的对象失败

如何从需要点击/切换的网页中提取表格？

Tkinter菜单自发添加额外项目

解决调用嵌入式函数的XSLT中表达式的语法移位/归约冲突

在单次扫描中创建列表

使用Python TCP套接字发送整数并使用C#接收—接收正确数据时出错

PySpark：如何最有效地读取不同列位置的多个CSV文件

正在try 让Python读取特定的CSV文件

普洛特利express 发布的人口普查数据失败

401使用有效的OAuth令牌向Google Apps脚本Web App发出POST请求时出现未经授权的错误("；

搜索结果未显示.我的URL选项卡显示：http：//127.0.0.1：8000/search？"；，而不是这个："；http：//127.0.0.1：8000/search？q=name"；

如何在开始迭代自定义迭代器类时重置索引属性？