Best Practices for Nextflow Configuration and Testing

This document outlines best practices for creating and managing Nextflow configuration files, including nextflow_schema.json, modules.json, nextflow.config, HPC cluster configurations, AWS Batch configurations, parameter files, and testing strategies.

Creating nextflow_schema.json

1. File Structure

The nextflow_schema.json file defines all pipeline parameters in JSON Schema format:

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "https://raw.githubusercontent.com/nf-core/pipeline/master/nextflow_schema.json",
    "title": "nf-core/pipeline pipeline parameters",
    "description": "Brief description of the pipeline",
    "type": "object",
    "$defs": {
        "input_output_options": {
            "title": "Input/output options",
            "type": "object",
            "fa_icon": "fas fa-terminal",
            "description": "Define where the pipeline should find input data and save output data.",
            "required": ["input", "outdir"],
            "properties": {
                "input": {
                    "type": "string",
                    "format": "file-path",
                    "exists": true,
                    "schema": "assets/schema_input.json",
                    "pattern": "^\\S+\\.(csv|tsv|json|yaml|yml)$",
                    "description": "Path to input samplesheet file.",
                    "help_text": "Detailed help text explaining the parameter.",
                    "fa_icon": "fas fa-file-csv"
                }
            }
        }
    },
    "allOf": [
        { "$ref": "#/$defs/input_output_options" },
        { "$ref": "#/$defs/reference_genome_options" }
    ]
}

2. Parameter Organization

Organize parameters into logical groups using $defs:

"$defs": {
    "input_output_options": { ... },
    "reference_genome_options": { ... },
    "read_trimming_options": { ... },
    "alignment_options": { ... },
    "analysis_options": { ... }
}

Best Practices:

Group related parameters together
Use descriptive group titles
Include Font Awesome icons (fa_icon)
Add clear descriptions

3. Parameter Properties

Each parameter should include:

{
    "parameter_name": {
        "type": "string",           // string, integer, number, boolean, array, object
        "format": "file-path",      // file-path, directory-path, uri, email, etc.
        "exists": true,             // For file paths
        "pattern": "^\\S+\\.csv$", // Regex pattern for validation
        "default": "value",         // Default value (optional)
        "description": "Brief description",
        "help_text": "Detailed help text with examples",
        "fa_icon": "fas fa-icon",
        "enum": ["option1", "option2"],  // For restricted choices
        "minimum": 0,               // For numeric types
        "maximum": 100
    }
}

4. Parameter Types

String:

{
    "input": {
        "type": "string",
        "format": "file-path",
        "exists": true,
        "description": "Input file path"
    }
}

Integer:

{
    "min_read_length": {
        "type": "integer",
        "default": 25,
        "minimum": 1,
        "maximum": 1000,
        "description": "Minimum read length"
    }
}

Boolean:

{
    "skip_trimming": {
        "type": "boolean",
        "description": "Skip read trimming step"
        // Note: Don't include "default": false for booleans (redundant)
    }
}

Enum (Restricted Choices):

{
    "trimmer": {
        "type": "string",
        "default": "trimgalore",
        "enum": ["trimgalore", "fastp"],
        "description": "Tool to use for read trimming"
    }
}

5. Required Parameters

Mark required parameters in the group definition:

{
    "input_output_options": {
        "required": ["input", "outdir"],
        "properties": { ... }
    }
}

6. Conditional Requirements

Use help_text to document conditional requirements:

{
    "gff": {
        "type": "string",
        "format": "file-path",
        "description": "Path to GFF3 annotation file.",
        "help_text": "This parameter must be specified if neither --genome nor --gtf are specified."
    }
}

7. Default Values

Set defaults appropriately:

{
    "min_read_length": {
        "type": "integer",
        "default": 25  // Explicit default
    },
    "transcript_fasta": {
        "type": "string",
        "default": null  // Explicit null for optional parameters
    },
    "skip_trimming": {
        "type": "boolean"
        // No default for boolean (defaults to false)
    }
}

8. Validation Patterns

Use regex patterns for validation:

{
    "input": {
        "pattern": "^\\S+\\.(csv|tsv|json|yaml|yml)$"
    },
    "email": {
        "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$"
    }
}

9. Schema References

Reference external schemas for complex validation:

{
    "input": {
        "schema": "assets/schema_input.json",
        "description": "Input samplesheet validated against schema"
    }
}

10. Best Practices Summary

Creating modules.json

1. File Structure

The modules.json file tracks installed modules and subworkflows from nf-core/modules:

{
    "name": "nf-core/pipeline",
    "homePage": "https://github.com/nf-core/pipeline",
    "repos": {
        "https://github.com/nf-core/modules.git": {
            "modules": {
                "nf-core": {
                    "module_name/submodule": {
                        "branch": "master",
                        "git_sha": "abc123def456...",
                        "installed_by": ["subworkflow_name", "modules"]
                    }
                }
            },
            "subworkflows": {
                "nf-core": {
                    "subworkflow_name": {
                        "branch": "master",
                        "git_sha": "abc123def456...",
                        "installed_by": ["subworkflows"]
                    }
                }
            }
        }
    }
}

2. Module Entries

Each module entry includes:

{
    "fastqc": {
        "branch": "master",
        "git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
        "installed_by": ["fastq_fastqc_umitools_fastp", "fastq_fastqc_umitools_trimgalore", "modules"]
    }
}

Fields:

branch: Git branch name (usually “master”)
git_sha: Full commit SHA of the module version
installed_by: List of subworkflows/modules that use this module

3. Subworkflow Entries

Subworkflow entries follow the same structure:

{
    "fastq_qc_trim_filter_setstrandedness": {
        "branch": "master",
        "git_sha": "d9ec4ef289ad39b8a662a7a12be50409b11df84b",
        "installed_by": ["subworkflows"]
    }
}

4. Tools for Managing modules.json

The modules.json file should be managed using nf-core CLI tools. Here are the available commands:

Module Management Commands

Install a module:

nf-core modules install <module_name>
# Example: nf-core modules install fastqc

Install multiple modules:

nf-core modules install fastqc trimgalore samtools

Install a module from a specific path:

nf-core modules install <module_name> --dir modules/nf-core

Update a specific module:

nf-core modules update <module_name>
# Example: nf-core modules update fastqc

Update all modules:

nf-core modules update --all

Update modules to latest versions:

nf-core modules update --all --latest

Remove a module:

nf-core modules remove <module_name>
# Example: nf-core modules remove fastqc

List installed modules:

nf-core modules list

Check module versions:

nf-core modules list --check-versions

Show module information:

nf-core modules info <module_name>
# Example: nf-core modules info fastqc

Subworkflow Management Commands

Install a subworkflow:

nf-core subworkflows install <subworkflow_name>
# Example: nf-core subworkflows install fastq_qc_trim_filter_setstrandedness

Update a subworkflow:

nf-core subworkflows update <subworkflow_name>

Update all subworkflows:

nf-core subworkflows update --all

List installed subworkflows:

nf-core subworkflows list

Remove a subworkflow:

nf-core subworkflows remove <subworkflow_name>

Additional Tools

Create modules.json from scratch:

nf-core modules create-test-yml

Lint modules.json:

nf-core modules lint

Check for module updates:

nf-core modules check-versions

Install nf-core CLI:

# Using pip
pip install nf-core

# Using conda
conda install -c bioconda nf-core

# Using mamba
mamba install -c bioconda nf-core

5. Maintenance

When to update:

After installing new modules: nf-core modules install <module_name>
After updating modules: nf-core modules update <module_name> or --all
After adding new subworkflows: nf-core subworkflows install <subworkflow_name>
After module version changes: nf-core modules update --all
When checking for updates: nf-core modules check-versions

Best Practices:

Don’t manually edit modules.json - Always use nf-core CLI tools
Commit modules.json to version control after changes
Review installed_by fields to understand dependencies
Keep git SHAs accurate for reproducibility
Use nf-core modules check-versions regularly to find updates
Test after updating modules to ensure compatibility
Document why specific module versions are pinned (if needed)

6. Module Installation Examples

# Install a single module
nf-core modules install fastqc

# Install multiple modules at once
nf-core modules install fastqc trimgalore samtools

# Install a module and update modules.json
nf-core modules install star/align

# Update all modules to latest versions
nf-core modules update --all --latest

# Update a specific module
nf-core modules update fastqc

# Check which modules have updates available
nf-core modules check-versions

# List all installed modules
nf-core modules list

# Install a subworkflow
nf-core subworkflows install fastq_qc_trim_filter_setstrandedness

# Update all subworkflows
nf-core subworkflows update --all

Creating nextflow.config

1. File Structure

Organize nextflow.config in clear sections:

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Pipeline Name Nextflow config file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Default config options for all compute environments
----------------------------------------------------------------------------------------
*/

// Global default params
params {
    // Parameter definitions
}

// Load base.config by default
includeConfig 'conf/base.config'

// Profiles
profiles {
    docker { ... }
    singularity { ... }
    test { includeConfig 'conf/test.config' }
}

// Load custom configs
includeConfig params.custom_config_base ? "${params.custom_config_base}/nfcore_custom.config" : "/dev/null"

// Load igenomes.config if required
includeConfig !params.igenomes_ignore ? 'conf/igenomes.config' : 'conf/igenomes_ignored.config'

// Environment variables
env {
    PYTHONNOUSERSITE = 1
    R_PROFILE_USER   = "/.Rprofile"
    R_ENVIRON_USER   = "/.Renviron"
    JULIA_DEPOT_PATH = "/usr/local/share/julia"
}

// Process shell options
process.shell = [
    "bash",
    "-C",      // No clobber
    "-e",      // Exit on error
    "-u",      // Unset variables error
    "-o",
    "pipefail" // Pipe failure handling
]

// Timeline, report, trace, DAG
timeline { enabled = true; file = "${params.outdir}/pipeline_info/execution_timeline.html" }
report { enabled = true; file = "${params.outdir}/pipeline_info/execution_report.html" }
trace { enabled = true; file = "${params.outdir}/pipeline_info/execution_trace.txt" }
dag { enabled = true; file = "${params.outdir}/pipeline_info/pipeline_dag.html" }

// Manifest
manifest {
    name            = 'nf-core/pipeline'
    homePage        = 'https://github.com/nf-core/pipeline'
    description     = "Pipeline description"
    mainScript      = 'main.nf'
    defaultBranch   = 'master'
    nextflowVersion = '!>=25.04.8'
    version         = '1.0.0'
}

// Plugins
plugins {
    id 'nf-schema@2.5.1'
}

// Validation
validation {
    defaultIgnoreParams = ["genomes"]
    monochromeLogs = params.monochrome_logs
}

// Load modules.config
includeConfig 'conf/modules.config'

2. Parameter Definitions

Define all parameters with defaults:

params {
    // Input options
    input      = null
    contrasts  = null
    outdir     = null

    // Reference genome
    genome     = null
    fasta      = null
    gtf        = null
    gff        = null

    // Analysis options
    skip_trimming = false
    skip_alignment = false
    trimmer = 'trimgalore'

    // Tool-specific options
    extra_star_align_args = null
    extra_fastqc_args = null

    // Boilerplate
    email = null
    help = false
    version = false
}

Best Practices:

Group related parameters
Use descriptive names
Set appropriate defaults
Use null for optional parameters
Document complex parameters

3. Profiles

Define profiles for different execution environments:

profiles {
    docker {
        docker.enabled = true
        conda.enabled = false
        singularity.enabled = false
        docker.runOptions = '-u $(id -u):$(id -g)'
    }
    
    // Docker with AMD64 emulation (for macOS ARM64)
    docker_amd64 {
        docker.enabled = true
        docker.runOptions = '-u $(id -u):$(id -g) --platform=linux/amd64'
        conda.enabled = false
        singularity.enabled = false
    }
    
    singularity {
        singularity.enabled = true
        singularity.autoMounts = true
        singularity.cacheDir = "${workDir}/singularity"
        conda.enabled = false
        docker.enabled = false
    }
    
    conda {
        conda.enabled = true
        conda.channels = ['conda-forge', 'bioconda']
        conda.cacheDir = "${workDir}/conda"
        docker.enabled = false
        singularity.enabled = false
    }
    
    mamba {
        conda.enabled = true
        conda.useMamba = true
        conda.cacheDir = "${workDir}/mamba"
        docker.enabled = false
        singularity.enabled = false
    }
    
    // ARM64 profile with Wave (for automatic container conversion)
    arm64 {
        process.arch = 'arm64'
        apptainer.ociAutoPull = true
        singularity.ociAutoPull = true
        wave.enabled = true
        wave.freeze = true
        wave.strategy = 'conda,container'
    }
    
    test {
        includeConfig 'conf/test.config'
    }
    
    test_full {
        includeConfig 'conf/test_full.config'
    }
    
    debug {
        dumpHashes = true
        process.beforeScript = 'echo $HOSTNAME'
        cleanup = false
    }
    
    gpu {
        docker.runOptions = '-u $(id -u):$(id -g) --gpus all'
        apptainer.runOptions = '--nv'
        singularity.runOptions = '--nv'
    }
}

Profile Selection Guidelines:

Docker: Use for local development, CI/CD, and production (when Docker is available)
Docker with AMD64 emulation (docker_amd64): Use on macOS ARM64 for compatibility with AMD64-only images
Singularity/Apptainer: Use on HPC clusters where Docker is not available
Conda/Mamba: Use when containers are unavailable or for development (slower but more flexible)
ARM64 profile: Use on ARM64 systems with Wave for automatic platform handling

Note: Some tools may not be available in all environments. For example, RibORF 2.0 requires a custom Docker image and is not available via conda/mamba. See Container Management Best Practices for detailed guidance.

4. Container Registry Configuration

Set default registries:

apptainer.registry    = 'quay.io'
docker.registry       = 'quay.io'
podman.registry       = 'quay.io'
singularity.registry  = 'quay.io'
charliecloud.registry = 'quay.io'

Best Practices:

Use quay.io/biocontainers/ prefix for biocontainers images
Verify image availability before committing to pipeline
Document custom Docker images (e.g., RibORF 2.0)
Check platform compatibility (AMD64 vs ARM64)

For detailed container management guidance, see:

Container Management Best Practices - Comprehensive guide on conda/mamba environments, Docker, Singularity, and cross-platform considerations

5. Environment Variables

Export variables to prevent conflicts:

env {
    PYTHONNOUSERSITE = 1
    R_PROFILE_USER   = "/.Rprofile"
    R_ENVIRON_USER   = "/.Renviron"
    JULIA_DEPOT_PATH = "/usr/local/share/julia"
}

6. Process Shell Options

Configure safe shell behavior:

process.shell = [
    "bash",
    "-C",      // No clobber - prevent overwriting files
    "-e",      // Exit on error
    "-u",      // Unset variables error
    "-o",
    "pipefail" // Return error if any command in pipe fails
]

7. Manifest

Define pipeline metadata:

manifest {
    name            = 'nf-core/pipeline'
    homePage        = 'https://github.com/nf-core/pipeline'
    description     = "Pipeline description"
    mainScript      = 'main.nf'
    defaultBranch   = 'master'
    nextflowVersion = '!>=25.04.8'
    version         = '1.0.0'
    doi             = 'https://doi.org/10.5281/zenodo.xxxxx'
    contributors    = [
        [
            name: 'Author Name',
            affiliation: 'Institution',
            email: 'email@example.com',
            github: '@username',
            contribution: ['author'],
            orcid: '0000-0000-0000-0000'
        ]
    ]
}

8. Best Practices Summary

Clear section headers with separators
All parameters defined with defaults
Profiles for all execution environments
Environment variables to prevent conflicts
Safe shell options configured
Manifest with complete metadata
Plugins properly configured
Validation settings appropriate
Include configs in logical order

HPC Cluster Configurations

1. SLURM Configuration

Create conf/slurm.config:

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    SLURM cluster configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

process {
    executor = 'slurm'
    queue    = 'normal'
    clusterOptions = '-A myaccount'
    
    // Default resource limits
    cpus   = { 1      * task.attempt }
    memory = { 6.GB   * task.attempt }
    time   = { 4.h    * task.attempt }
    
    // Process-specific resources
    withLabel:process_single {
        cpus   = { 1 }
        memory = { 6.GB * task.attempt }
        time   = { 4.h  * task.attempt }
    }
    
    withLabel:process_low {
        cpus   = { 2     * task.attempt }
        memory = { 12.GB * task.attempt }
        time   = { 4.h   * task.attempt }
    }
    
    withLabel:process_medium {
        cpus   = { 6     * task.attempt }
        memory = { 36.GB * task.attempt }
        time   = { 8.h   * task.attempt }
    }
    
    withLabel:process_high {
        cpus   = { 12    * task.attempt }
        memory = { 72.GB * task.attempt }
        time   = { 16.h  * task.attempt }
    }
    
    withLabel:process_long {
        time = { 48.h * task.attempt }
    }
    
    withLabel:process_high_memory {
        memory = { 200.GB * task.attempt }
    }
}

executor {
    name = 'slurm'
    queueSize = 100
    pollInterval = '30 sec'
    submitRateLimit = '10/1min'
}

Key SLURM Options:

executor = 'slurm': Use SLURM executor
queue: Default queue name
clusterOptions: Additional SLURM options (e.g., account, partition)
queueSize: Maximum concurrent jobs
pollInterval: How often to check job status
submitRateLimit: Rate limit for job submission

2. SGE (Sun Grid Engine) Configuration

Create conf/sge.config:

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    SGE cluster configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

process {
    executor = 'sge'
    queue    = 'all.q'
    clusterOptions = '-l h_vmem=6G'
    
    cpus   = { 1      * task.attempt }
    memory = { 6.GB   * task.attempt }
    time   = { 4.h    * task.attempt }
    
    withLabel:process_single {
        cpus   = { 1 }
        memory = { 6.GB * task.attempt }
        time   = { 4.h  * task.attempt }
    }
    
    withLabel:process_medium {
        cpus   = { 6     * task.attempt }
        memory = { 36.GB * task.attempt }
        time   = { 8.h   * task.attempt }
    }
    
    withLabel:process_high {
        cpus   = { 12    * task.attempt }
        memory = { 72.GB * task.attempt }
        time   = { 16.h  * task.attempt }
    }
}

executor {
    name = 'sge'
    queueSize = 100
    pollInterval = '30 sec'
}

3. PBS/Torque Configuration

Create conf/pbs.config:

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    PBS/Torque cluster configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

process {
    executor = 'pbs'
    queue    = 'batch'
    clusterOptions = '-l walltime=4:00:00'
    
    cpus   = { 1      * task.attempt }
    memory = { 6.GB   * task.attempt }
    time   = { 4.h    * task.attempt }
    
    withLabel:process_single {
        cpus   = { 1 }
        memory = { 6.GB * task.attempt }
        time   = { 4.h  * task.attempt }
    }
    
    withLabel:process_medium {
        cpus   = { 6     * task.attempt }
        memory = { 36.GB * task.attempt }
        time   = { 8.h   * task.attempt }
    }
}

executor {
    name = 'pbs'
    queueSize = 100
    pollInterval = '30 sec'
}

4. LSF Configuration

Create conf/lsf.config:

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    LSF cluster configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

process {
    executor = 'lsf'
    queue    = 'normal'
    clusterOptions = '-M 6000 -R "rusage[mem=6000]"'
    
    cpus   = { 1      * task.attempt }
    memory = { 6.GB   * task.attempt }
    time   = { 4.h    * task.attempt }
    
    withLabel:process_single {
        cpus   = { 1 }
        memory = { 6.GB * task.attempt }
        time   = { 4.h  * task.attempt }
    }
}

executor {
    name = 'lsf'
    queueSize = 100
    pollInterval = '30 sec'
}

5. HPC Best Practices

Resource Allocation:
- Match resources to process labels
- Use task.attempt for retry scaling
- Set appropriate time limits
Queue Management:
- Use appropriate queue names
- Set queueSize to limit concurrent jobs
- Configure submitRateLimit to avoid overwhelming scheduler
Cluster-Specific Options:
- Use clusterOptions for account, partition, etc.
- Test resource requests match cluster limits
- Document cluster-specific requirements
Container Support:
- Ensure Singularity/Apptainer is available
- Configure container paths if needed
- Test container execution
Storage Considerations:
- Use shared filesystems for work directory
- Configure scratch space if available
- Set appropriate workDir location

AWS Batch Configurations

1. Basic AWS Batch Configuration

Create conf/awsbatch.config:

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    AWS Batch configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

process {
    executor = 'awsbatch'
    queue    = 'my-batch-queue'
    
    cpus   = { 1      * task.attempt }
    memory = { 6.GB   * task.attempt }
    time   = { 4.h    * task.attempt }
    
    withLabel:process_single {
        cpus   = { 1 }
        memory = { 6.GB * task.attempt }
        time   = { 4.h  * task.attempt }
    }
    
    withLabel:process_medium {
        cpus   = { 6     * task.attempt }
        memory = { 36.GB * task.attempt }
        time   = { 8.h   * task.attempt }
    }
    
    withLabel:process_high {
        cpus   = { 12    * task.attempt }
        memory = { 72.GB * task.attempt }
        time   = { 16.h  * task.attempt }
    }
}

aws {
    region = 'us-east-1'
    batch {
        cliPath = '/home/ec2-user/miniconda3/envs/nextflow/bin/aws'
        maxParallelTransfers = 4
    }
}

executor {
    name = 'awsbatch'
    queueSize = 100
    pollInterval = '30 sec'
}

2. AWS Batch with S3 Storage

process {
    executor = 'awsbatch'
    queue    = 'my-batch-queue'
    
    // Use S3 for work directory
    scratch = false
}

aws {
    region = 'us-east-1'
    batch {
        cliPath = '/home/ec2-user/miniconda3/envs/nextflow/bin/aws'
    }
    
    // S3 configuration
    s3 {
        storageClass = 'STANDARD'
        storageEncryption = 'AES256'
        maxParallelTransfers = 4
        maxTransferAttempts = 6
    }
}

// Use S3 for work directory
workDir = 's3://my-bucket/work'

// Use S3 for output
params.outdir = 's3://my-bucket/results'

3. AWS Batch with EFS

process {
    executor = 'awsbatch'
    queue    = 'my-batch-queue'
    
    // Use EFS for work directory (faster than S3)
    scratch = '/mnt/efs/work'
}

aws {
    region = 'us-east-1'
    batch {
        cliPath = '/home/ec2-user/miniconda3/envs/nextflow/bin/aws'
    }
}

// Use EFS for work directory
workDir = '/mnt/efs/work'

// Use S3 for output
params.outdir = 's3://my-bucket/results'

4. AWS Batch Job Definition Mapping

Map process labels to AWS Batch job definitions:

process {
    executor = 'awsbatch'
    
    withLabel:process_single {
        executor.queue = 'single-queue'
        executor.jobRole = 'arn:aws:iam::account:role/BatchJobRole'
    }
    
    withLabel:process_high {
        executor.queue = 'high-memory-queue'
        executor.jobRole = 'arn:aws:iam::account:role/BatchJobRole'
    }
}

aws {
    region = 'us-east-1'
    batch {
        cliPath = '/home/ec2-user/miniconda3/envs/nextflow/bin/aws'
    }
}

5. AWS Batch Best Practices

Queue Configuration:
- Create separate queues for different resource needs
- Use compute environments with appropriate instance types
- Configure job definitions with correct resources
Storage Strategy:
- Use EFS for work directory (faster I/O)
- Use S3 for final outputs (cost-effective)
- Configure appropriate storage classes
IAM Roles:
- Use IAM roles for Batch jobs (not access keys)
- Grant minimal required permissions
- Use separate roles for different job types
Container Images:
- Push container images to ECR
- Use appropriate image tags
- Test container execution in Batch
Cost Optimization:
- Use Spot instances where possible
- Right-size compute resources
- Clean up work directories regularly
- Use appropriate S3 storage classes
Monitoring:
- Enable CloudWatch logging
- Monitor Batch queue metrics
- Set up alerts for failures

Creating Parameter Files

1. Using nf-core launch (Interactive Web Interface)

Launch an interactive web interface to configure parameters:

nf-core launch nf-core/pipeline

Features:

Opens a web browser with an interactive parameter configuration interface
Shows all available parameters with descriptions and help text
Validates inputs in real-time
Provides parameter grouping and search functionality
Allows downloading a params.json file with your configuration
Supports loading existing parameter files for editing

Usage:

# Launch for a specific pipeline
nf-core launch nf-core/riboseq

# Launch and specify a tag/version
nf-core launch nf-core/riboseq --revision 1.2.0

# Launch with an existing parameter file to edit
nf-core launch nf-core/riboseq -params-file params.json

Workflow:

Run nf-core launch nf-core/pipeline
Web browser opens with parameter interface
Configure parameters interactively
Click “Download” to save params.json
Use the downloaded file: nextflow run nf-core/pipeline -params-file params.json

2. Using nf-core pipelines create-params-file

Generate a parameter file template from the pipeline schema:

nf-core pipelines create-params-file <pipeline_directory>

Features:

Creates a params.yaml file with all pipeline parameters
Includes default values and descriptions as comments
Organized by parameter groups
Ready for editing and use with -params-file

Usage:

# Create params.yaml in current directory for a local pipeline
nf-core pipelines create-params-file /path/to/pipeline

# Create params.yaml with hidden options included
nf-core pipelines create-params-file /path/to/pipeline --show-hidden

# Create params.yaml for a specific pipeline version
nf-core pipelines create-params-file /path/to/pipeline --revision 1.2.0

Example output (params.yaml):

# Input/output options
input: null  # Path to comma-separated file containing information about the samples
outdir: null  # The output directory where the results will be saved

# Reference genome options
genome: null  # Name of iGenomes reference
fasta: null  # Path to FASTA genome file
gtf: null  # Path to GTF annotation file

# Trimming options
trimmer: 'trimgalore'  # Tool to use for read trimming
skip_trimming: false  # Skip read trimming step
save_trimmed: false  # Save trimmed reads to output directory

# Analysis options
skip_ribocode: false  # Skip RiboCode analysis
skip_riboorf: false  # Skip RibORF analysis

Best Practices:

Uncomment and modify parameters you want to change
Keep default values for parameters you don’t need to customize
Use --show-hidden to include advanced/hidden parameters
Commit example parameter files (without sensitive data) to version control

3. Using nextflow run –help

Generate parameter template from command-line help:

nextflow run nf-core/pipeline --help > params_template.txt

Note: This generates a text file with parameter descriptions, but not a directly usable parameter file. Use nf-core pipelines create-params-file for a ready-to-use YAML file.

4. Manual Parameter File Creation

Create parameter files manually if needed:

# Input/Output Options
input: '/path/to/samplesheet.csv'
contrasts: '/path/to/contrasts.csv'
outdir: '/path/to/results'

# Reference Genome Options
genome: 'GRCh38'
# OR
fasta: '/path/to/genome.fasta'
gtf: '/path/to/annotation.gtf'

# Trimming Options
trimmer: 'trimgalore'
skip_trimming: false
save_trimmed: false

# Alignment Options
aligner: 'star'
skip_alignment: false

# Analysis Options
skip_ribocode: false
skip_riboorf: false
skip_ribotish: false

# Tool-Specific Options
extra_star_align_args: '--outFilterMismatchNmax 2'
extra_fastqc_args: '--quiet'

# MultiQC Options
multiqc_title: 'My Ribo-seq Analysis'
skip_multiqc: false

5. JSON Parameter File

Create params.json (typically generated by nf-core launch):

{
    "input": "/path/to/samplesheet.csv",
    "contrasts": "/path/to/contrasts.csv",
    "outdir": "/path/to/results",
    "genome": "GRCh38",
    "trimmer": "trimgalore",
    "skip_trimming": false,
    "aligner": "star",
    "skip_ribocode": false,
    "multiqc_title": "My Ribo-seq Analysis"
}

6. Using Parameter Files

YAML (from create-params-file):

# Edit params.yaml, then run
nextflow run nf-core/pipeline -profile docker -params-file params.yaml

JSON (from launch):

# Download params.json from nf-core launch, then run
nextflow run nf-core/pipeline -profile docker -params-file params.json

Override parameters:

# Parameters in file can be overridden on command line
nextflow run nf-core/pipeline -profile docker -params-file params.yaml --skip_ribocode

7. Comparison of Methods

Method	Best For	Output Format	Interactive	Notes
`nf-core launch`	Interactive configuration	JSON	Yes	Web interface, validation, download
`nf-core pipelines create-params-file`	Template generation	YAML	No	Includes defaults and comments
`nextflow run --help`	Documentation	Text	No	Parameter descriptions only
Manual creation	Custom needs	YAML/JSON	No	Full control, more error-prone

Recommended workflow:

First time: Use nf-core launch for interactive setup
Template creation: Use nf-core pipelines create-params-file for team templates
Quick edits: Edit YAML/JSON files directly
Documentation: Use --help for parameter reference

8. Parameter File Best Practices

Organization:
- Group related parameters
- Use comments in YAML files
- Keep file structure logical
Documentation:
- Include comments explaining choices
- Document conditional parameters
- Note required vs. optional parameters
Version Control:
- Don’t commit parameter files with sensitive data
- Use .gitignore for local parameter files
- Create example parameter files for documentation
Validation:
- Validate parameter files before running
- Use --help to check parameter names
- Test with -profile test first

Testing Modules

1. Test File Structure

Create test files in modules/nf-core/tool/process/tests/:

modules/nf-core/tool/process/
├── main.nf
├── meta.yml
├── environment.yml
└── tests/
    ├── main.nf.test
    ├── main.nf.test.snap
    └── nextflow.config

2. Running Module Tests

Basic test:

cd modules/nf-core/tool/process/tests
nf-test test main.nf.test

With specific profile:

nf-test test main.nf.test -profile docker

Update snapshots:

nf-test test main.nf.test --update-snapshots

Stub tests only:

nf-test test main.nf.test -stub

3. Test Configuration

Create tests/nextflow.config:

process {
    withName: '.*' {
        publishDir = [
            path: { "${params.outdir}/${task.process.tokenize(':')[-1]}" },
            mode: 'copy'
        ]
    }
}

params {
    outdir = 'test_results'
    modules_testdata_base_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/'
}

4. Test Coverage

Test all scenarios:

See NF_TEST_BEST_PRACTICES.md for detailed guidance.

Testing Subworkflows

1. Test File Structure

Create test files in subworkflows/nf-core/subworkflow_name/tests/:

subworkflows/nf-core/subworkflow_name/
├── main.nf
├── meta.yml
└── tests/
    ├── main.nf.test
    ├── main.nf.test.snap
    └── nextflow.config

2. Subworkflow Test Example

nextflow_workflow {

    name "Test Subworkflow SUBWORKFLOW_NAME"
    script "../main.nf"
    workflow "SUBWORKFLOW_NAME"

    tag "subworkflows"
    tag "subworkflows_nfcore"
    tag "subworkflow_name"

    test("basic test") {
        when {
            workflow {
                """
                input[0] = channel.of([
                    [ id: 'test', single_end: true ],
                    [ file(params.modules_testdata_base_path + 'path/to/test.fastq.gz', checkIfExists: true) ]
                ])
                """
            }
        }

        then {
            assertAll (
                { assert workflow.success },
                { assert workflow.out.output_name[0][1] ==~ ".*/expected.*" },
                { assert snapshot(workflow.out.versions).match() }
            )
        }
    }
}

3. Running Subworkflow Tests

cd subworkflows/nf-core/subworkflow_name/tests
nf-test test main.nf.test -profile docker

Testing Workflows

1. Test Configuration Files

Create test configs in conf/:

conf/test.config:

process {
    resourceLimits = [
        cpus: 4,
        memory: '15.GB',
        time: '1.h'
    ]
}

params {
    config_profile_name        = 'Test profile'
    config_profile_description = 'Minimal test dataset to check pipeline function'

    // Input data
    input = 'https://raw.githubusercontent.com/nf-core/test-datasets/pipeline/samplesheet.csv'
    contrasts = 'https://raw.githubusercontent.com/nf-core/test-datasets/pipeline/contrasts.csv'
    
    // Reference data
    fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome.fasta'
    gtf = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome.gtf'
    
    // Test-specific overrides
    min_trimmed_reads = 1000
    skip_ribotricer = true
}

conf/test_full.config:

// Full test with all modules enabled
includeConfig 'conf/test.config'

params {
    config_profile_name        = 'Full test profile'
    config_profile_description = 'Full test dataset with all modules enabled'
    
    // Enable all analysis modules
    skip_ribotricer = false
    skip_ribocode = false
    skip_riboorf = false
}

2. Running Workflow Tests

Minimal test:

nextflow run . -profile test,docker --outdir test_results

Full test:

nextflow run . -profile test_full,docker --outdir test_results

Test with custom parameters:

nextflow run . -profile test,docker --outdir test_results --skip_ribocode

3. CI/CD Testing

GitHub Actions example:

name: Test Pipeline
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Install Nextflow
        run: |
          wget -qO- https://get.nextflow.io | bash
          sudo mv nextflow /usr/local/bin/
      
      - name: Run test profile
        run: |
          nextflow run . -profile test,docker --outdir test_results
      
      - name: Run test_full profile
        run: |
          nextflow run . -profile test_full,docker --outdir test_results_full

4. Test Data Management

Using nf-core test datasets:

params {
    pipelines_testdata_base_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/'
    modules_testdata_base_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/'
}

Local test data:

params {
    input = "${projectDir}/tests/data/samplesheet.csv"
    fasta = "${projectDir}/tests/data/genome.fasta"
    gtf = "${projectDir}/tests/data/genome.gtf"
}

5. Test Best Practices

Test Profiles:
- Create minimal test profile (test.config)
- Create full test profile (test_full.config)
- Use small test datasets
- Set resource limits for CI/CD
Test Coverage:
- Test all major workflow paths
- Test conditional execution
- Test with different input types
- Test error handling
Test Data:
- Use publicly available test datasets
- Keep test data small but representative
- Document test data sources
- Version test data
CI/CD Integration:
- Run tests on every commit
- Test with multiple profiles (docker, singularity)
- Test on multiple platforms if possible
- Fail fast on errors

Test Data Management

1. Test Data Sources

nf-core test datasets:

Publicly available on GitHub
Organized by pipeline and module
Versioned and tagged
URL: https://raw.githubusercontent.com/nf-core/test-datasets/

Local test data:

Store in tests/data/
Keep files small
Document data sources
Version control test data

2. Test Data Organization

tests/
├── data/
│   ├── samplesheet.csv
│   ├── genome.fasta
│   ├── genome.gtf
│   └── fastq/
│       ├── sample1_R1.fastq.gz
│       └── sample1_R2.fastq.gz
└── configs/
    └── test_local.config

3. Test Data Best Practices

Size:
- Keep test data minimal but representative
- Use chromosome subsets for genomes
- Use small FASTQ files (1000-10000 reads)
Availability:
- Use publicly accessible URLs
- Ensure test data is stable
- Document data sources
Versioning:
- Tag test data versions
- Document test data changes
- Keep test data compatible with pipeline versions

Summary Checklists

nextflow_schema.json

modules.json

nextflow.config

HPC Configurations

AWS Batch Configurations

Parameter Files

Generated using nf-core launch or manually
Well-organized and documented
Validated before use
Sensitive data excluded from version control

Testing

References

NF_TEST_BEST_PRACTICES.md - Detailed testing guide
MODULES_CONFIG_BEST_PRACTICES.md - Module configuration guide
CONTAINER_MANAGEMENT_BEST_PRACTICES.md - Comprehensive guide on conda/mamba environments, Docker, Singularity, cross-platform images, and container management

Creating nextflow_schema.json

1. File Structure

2. Parameter Organization

3. Parameter Properties

4. Parameter Types

5. Required Parameters

6. Conditional Requirements

7. Default Values

8. Validation Patterns

9. Schema References

10. Best Practices Summary

Creating modules.json

1. File Structure

2. Module Entries

3. Subworkflow Entries

4. Tools for Managing modules.json

Module Management Commands

Subworkflow Management Commands

Additional Tools

5. Maintenance

6. Module Installation Examples

Creating nextflow.config

1. File Structure

2. Parameter Definitions

3. Profiles

4. Container Registry Configuration

5. Environment Variables

6. Process Shell Options

7. Manifest

8. Best Practices Summary

HPC Cluster Configurations

1. SLURM Configuration

2. SGE (Sun Grid Engine) Configuration

3. PBS/Torque Configuration

4. LSF Configuration

5. HPC Best Practices

AWS Batch Configurations

1. Basic AWS Batch Configuration

2. AWS Batch with S3 Storage

3. AWS Batch with EFS

4. AWS Batch Job Definition Mapping

5. AWS Batch Best Practices

Creating Parameter Files

1. Using nf-core launch (Interactive Web Interface)

2. Using nf-core pipelines create-params-file

3. Using nextflow run –help

4. Manual Parameter File Creation

5. JSON Parameter File

6. Using Parameter Files

7. Comparison of Methods

8. Parameter File Best Practices

Testing Modules

1. Test File Structure

2. Running Module Tests

3. Test Configuration

4. Test Coverage

Testing Subworkflows

1. Test File Structure

2. Subworkflow Test Example

3. Running Subworkflow Tests

Testing Workflows

1. Test Configuration Files

2. Running Workflow Tests

3. CI/CD Testing

4. Test Data Management

5. Test Best Practices

Test Data Management

1. Test Data Sources

2. Test Data Organization

3. Test Data Best Practices

Summary Checklists

nextflow_schema.json

modules.json

nextflow.config

HPC Configurations

AWS Batch Configurations

Parameter Files

Testing

References

Related Documentation