process:
- trained:
- swint_mcm
- swint_cb_mcm
- swint_cb_scm
- resdcn_scm
- resdcn_cb_scm
- resdcn_cb_mcm
- training:
- evaluated:
- swint_mcm
- swint_cb_scm
- swint_cb_mcm
- resdcn_cb_scm
- resdcn_cb_mcm
- resdcn_scm
- evaluating:
- resdcn_mcm
evaluated on 1000 samples
KAF-Net: Part-level Kinematic Relation Graph Generation For Robot Manipulation
backbone |
In. Aug |
class balance |
$mAp_{50}$ |
$R@10/20/40$ |
$mR@10/20/40$ |
SwinT |
MCM |
yes |
24.2 |
33.4/53.8/73.9 |
25.1/52.9/69.2 |
|
|
no |
23.4 |
43.5/62.3/78.7 |
26.9/55.3/67.9 |
|
Single |
yes |
23.5 |
32.3/51.6/69.2 |
34.4/52.9/65.4 |
|
|
no |
24.2 |
32.7/51.4/68.6 |
25.2/53.5/66.3 |
ResDCN |
MCM |
yes |
23.1 |
32.8/52.8/70.1 |
24.8/52.8/67.7 |
|
|
no |
23.4 |
36.9/52.5/69.9 |
23.9/47.8/62.4 |
|
Single |
yes |
20.5 |
39.9/54.3/69.3 |
37.7/51.2/65.1 |
|
|
no |
22.3 |
33.8/52.4/69.4 |
35.7/55.6/66.7 |
On Swin Transformer with class balance: |
|
|
|
|
|
Image-Mask Branch |
$mAp_{50}$ |
$mR@10/20/40$ |
Yes |
|
|
No |
|
|
VLM |
VI |
VR |
unVR |
Gemini 2.5 Flash |
|
|
|
Pixtral 12B |
|
|
|