AI/Object Detection & Pose Estimation

Cascaded Pyramid Network for Multi-person pose estimation

땽뚕 2021. 9. 9. 16:17
728x90

1. introduction
object의 정확한 검출에 악영향을 미치는 경우가 많다. ex. keypoint들끼리 겹쳐있다. 다른 object에 가려서 보이지 않는 키포인트가 있다. 이런 한계를 극복하고 분류하기 어려운 hard keypoint를 잘 검출하기 위해 두개의 stage로 이루어진 CPN을 제안한다.



Human Detector + GlobalNet + RefineNet

Human Detector: 해당 논문에서 사용한 base object detector는 FPN(Feature Pyramid Network)에서 RoI pooling을 Mask R-CNN의 RoI Align으로 바꾼 것 
GlobalNet: 간단한 keypoint들은 잘 localize시키지만 뭉쳐있거나 보이지 않는 키포인트들은 틀릴 수 있다. global한 특징을 잡아서 localize 시키는 ResNet 기반의 네트워크.
GlobalNet 세부 설명

Last residual blocks의 output을 다양한 resolution에서의 feature map이라고 생각했을 때, 각 feature map에 3x3 convolution filter를 적용해 keypoints의 heat map을 얻을 수 있음. 

하지만 FPN과 유사하게, 이런 식으로하면 낮은 resolution의 feature map과 높은 resolution의 feature map에서 trade off가 발생.

RefineNet: GlobalNet에서 제대로 localize하지 못한 hard keypoint를 online hard keypoint mining loss를 이용해서 localize 시키는 네트워크. 단순히 모든 Pyramid Feature들을 다 concat함.

 

 

2. Cascaded Pyramid Network
top down 기반의 아키텍처.
top down: 먼저 detector가 주어진 이미지에서 사람이라고 판단되는 bounding box를 뽑아내고 각 bounding 박스마다 keypoint를 localize 시키는 방법

 

728x90