I have implemented the below algorithm to solve a Depth AI use case - Social Distance Monitoring - using monocular images. The output of the algorithm is shown at the bottom.
a) Feed the input image frames every couple of seconds
b) Fuse the disparity map with the object detection output, similar to visual sensor fusion. More concretely, find the depth of those pixels inside each object bounding box to compute the median to estimate the object depth.
c) Find the centroid of each object bounding box and map the corresponding depth to the object.
d) Across each object in the image, find the depth difference and also (x, y) axis difference. Multiply depth difference with a scaling factor.
e)Use the Pythagoras theorem to compute the Euclidean distance between each bounding box considering depth difference as one axis. The scaling factor for depth needs to be estimated during the initial camera calibration.
# Detections contains bounding boxes using object detection model
boxcount = 0
depths = []
bboxMidXs = []
bboxMidYs = []
# This is computed to reflect real distance during initial camera calibration
scalingFactor = 1000
# Depth scaling factor is based on one-time cam calibration
for detection in detections:
xmin, ymin, xmax, ymax = detection
depths.append(np.median(disp[ymin:ymax, xmin:xmax]))
bboxMidXs.append((xmin+xmax)/2)
bboxMidYs.append((ymin+ymax)/2)
size = disp.shape[:2]
# disp = draw_detections(disp, detection)
xmin = max(int(detection[0]), 0)
ymin = max(int(detection[1]), 0)
xmax = min(int(detection[2]), size[1])
ymax = min(int(detection[3]), size[0])
boxcount = boxcount + 1
cv2.rectangle(disp, (xmin, ymin), (xmax, ymax), (0,255,0), 2)
cv2.putText(disp, '{} {}'.format('person', boxcount),
(xmin, ymin - 7), cv2.FONT_HERSHEY_COMPLEX, 0.6, (0,255,0), 1)
for i in range(len(bboxMidXs)):
for j in range(i+1, len(bboxMidXs)):
dist = np.square(bboxMidXs[i] - bboxMidXs[j]) +
np.square((depths[i]-depths[j])*scalingFactor)
# check whether less than 200 to detect
# social distance violations
if np.sqrt(dist) < 200:
color = (0, 0, 255)
thickness = 3
else:
color = (0, 255, 0)
thickness = 1
cv2.line(original_img, (int(bboxMidXs[i]), int(bboxMidYs[i])),
(int(bboxMidXs[j]), int(bboxMidYs[j])), color, thickness)
Input Image:
Disparity Map - Object Detection Fusion:
Output Image:
The people who don't adhere to minimum threshold distance are identified to be violating the social distancing norm, using a monocular camera image.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.