POS-ISP: Pipeline Optimization at the Sequence Level for Task-aware ISP

Abstract

Task-aware ISP optimization models image signal proessing (ISP) as a composition of predefined operations and adapts it to task-specific objectives, yet jointly optimizing module sequence and parameters remains challenging. Recent methods employ neural architecture search (NAS) or step-wise reinforcement learning (RL), but NAS introduces training-inference mismatch, and step-wise RL incurs unstable training and high computational overhead due to decision-making at each stage. In this paper, we propose POS-ISP, a sequence-level RL framework that reformulates modular ISP optimization as a global sequence prediction problem. It predicts the entire module sequence and its parameters in a single forward pass and optimizes the pipeline using a terminal task reward, removing intermediate supervision and redundant executions to enhance stability and efficiency. Extensive experiments across multiple downstream tasks demonstrate that POS-ISP consistently improves task performance while reducing computational cost and memory usage. These results highlight sequence-level joint optimization as a stable and efficient paradigm for task-aware ISP design.

Key Idea

💡 Direct task reward learning at the sequence level

▶

Predict the entire ISP pipeline at once

→ Improves computational efficiency

▶

Optimize with the final task reward

→ Eliminates unstable reward estimation and stabilizes training

Highlights

POS-ISP achieves state-of-the-art performance across multiple downstream tasks while maintaining low computational overhead.

Detection Gain

+0.6

mAP@0.5:0.95 improvement over the previous best method on LOD-Dark.

Segmentation Gain

+5.0

mAP@0.5:0.95 improvement over the previous best method on LIS-Dark.

Model Size

0.53M

Lightweight predictor with only 0.53M parameters.

Predictor Runtime

1.55 ms

Runtime of the full ISP pipeline prediction measured on a single NVIDIA RTX 2080 Ti.

Framework

POS-ISP constructs a task-adaptive ISP pipeline using two predictors: a sequence predictor that determines the ordered module sequence, and a parameter predictor that estimates image-adaptive parameters for the selected pipeline.

Sequence Predictor

The sequence predictor autoregressively models the probability of ISP module sequences and predicts the next module conditioned on the previously selected ones. It is implemented with a GRU-based recurrent architecture, which captures inter-module dependencies while enabling efficient sequence-level prediction.

Parameter Predictor

The parameter predictor extracts a compact image representation with a lightweight CNN encoder and predicts parameter sets for all candidate ISP modules. During pipeline construction, only the parameters corresponding to the selected module sequence are applied.

Detailed architecture of the sequence predictor. It predicts the ISP pipeline autoregressively until the end-of-sequence token is produced.

Quantitative Results

POS-ISP consistently outperforms prior task-aware ISP methods on object detection and instance segmentation.

Object Detection

Method	LOD-Dark			LOD-All
Method	mAP@0.5:0.95	mAP@0.5	mAP@0.75	mAP@0.5:0.95	mAP@0.5	mAP@0.75
Input RAW	44.1	67.7	47.5	54.3	71.4	57.1
Camera ISP	37.6	55.4	41.6	49.6	65.5	53.2
DRL-ISP	44.2	67.8	48.4	54.5	72.1	58.5
ReconfigISP	43.7	66.7	47.8	51.0	68.5	54.4
AdaptiveISP	47.2	71.4	51.7	56.8	73.5	61.4
POS-ISP (Ours)	47.8	72.1	52.8	57.2	73.9	61.7

Instance Segmentation

Method	LIS-Dark			LIS-All
Method	mAP@0.5:0.95	mAP@0.5	mAP@0.75	mAP@0.5:0.95	mAP@0.5	mAP@0.75
Input RAW	27.8	45.6	27.9	32.6	52.3	33.0
Camera ISP	20.1	35.1	20.0	30.4	48.9	31.0
DRL-ISP	27.1	44.7	27.4	23.6	40.1	23.8
ReconfigISP	24.2	40.8	24.5	31.1	51.2	31.0
AdaptiveISP	25.2	42.3	25.2	32.4	52.3	32.5
POS-ISP (Ours)	32.1	51.8	32.1	34.9	55.9	34.9

Quantitative comparisons on object detection and instance segmentation benchmarks. POS-ISP achieves the best results on all reported metrics.

Efficiency

Since ISP operates as a preprocessing stage before downstream vision models, it must run under strict computational and memory constraints. POS-ISP minimizes prediction overhead while maintaining strong task performance.

Method	Params (M)	MACs (M)	Peak GPU Memory (MB)	Runtime (ms)	FPS
DRL-ISP	6.57	155.3	1013.9	15.71	63.65
AdaptiveISP	7.18	70.2	39.6	12.72	78.62
POS-ISP (Ours)	0.53	15.1	14.4	1.55	645.16

All results are measured on a single NVIDIA RTX 2080 Ti with input resolution of 512 × 512. Runtime excludes the execution time of ISP modules.

On-device Runtime

Method	Latency (ms)	FPS
AdaptiveISP	29.0	34.48
POS-ISP	7.21	138.70

On-device runtime measured on a Galaxy S10 CPU. POS-ISP runs about 4× faster than AdaptiveISP, enabling real-time mobile deployment with negligible overhead.

Why is POS-ISP efficient?

📌 Fixed ISP sequence at inference — only the parameter predictor runs
📌 Lightweight CNN architecture — compact and efficient model
📌 Single-pass parameter prediction — all module parameters estimated in one forward pass

Qualitative Results on Object Detection

Representative comparisons on low-light object detection scenes. POS-ISP improves visibility and produces more reliable downstream detections than prior task-aware ISP methods.

Qualitative Results on Instance Segmentation

Representative comparisons on low-light instance segmentation. POS-ISP preserves object structure more clearly and yields higher-quality masks under challenging illumination.

BibTeX

@inproceedings{won2026posisp,
  title     = {POS-ISP: Pipeline Optimization at the Sequence Level for Task-aware ISP},
  author    = {Won, Jiyun and Yang, Heemin and Kim, Woohyeok and Ok, Jungseul and Cho, Sunghyun},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
  year      = {2026}
}