MODF-SIR: a Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning

MODF-SIR is a lightweight MLLM-based, distillation-augmented, multi-agent collaborative framework for social intelligence reasoning.
ELT Retriever Agent
AKD Router Agent
GRPO Grounder Agent
OMLT Reasoner Agent
TTA Reviser Agent
Grounder Mode
Final Answer
The final answer will appear here.

ELT Retriever Agent

Retrieves broad visual evidence from the full clip.
Full Output
Output will appear here.

AKD Router Agent

Decides whether focused temporal grounding is needed.
Full Output
Output will appear here.

GRPO Grounder Agent

Locates the most relevant temporal span.
Full Output
Output will appear here.

OMLT Reasoner Agent

Produces the reasoning result from the selected evidence.
Full Output
Output will appear here.

TTA Reviser Agent

Performs self-checking and revises confidence.
Full Output
Output will appear here.
This demo is only for workflow demonstration, so the full model is not loaded.