Pose2Art Project
Jerry Isdale, MIOAT.com
Maui Institute of Art and Technology
notes started oct 12 2022
This Page is now supersceded by the Project
Basic idea
Create a low cost, (mostly) open source 'smart' camera system to capture human pose and use it to drive interactive immersive art installations. Yes, its kinda like the Microsoft Kinect/Azure 'product', but DIY open to upgrading.
- use one or more smart cameras to capture Human Pose from video stream
- stream that data (multicast?) – pose points (OSC), raw frames, skeleton overlay (video), outline, etc etc
- receive stream to drive CGI Rendering Engine using skeleton data etc
- project that stream on wall (or use all above streams as input to video switcher/overlay
Hardware:
- Edge Computing: Raspberry Pi 4 and Nvidia Jetson Nano are target platforms i have. Google Coral may be a better low cost alternative to the raspberry pi 4.
- Camera: Does not need to be high resolution. a usb webcam or CSI interface (ribbon cable, rPi camera, arducam, etc.)
- Network: Wired Ethernet is preferred over WiFi for installations to avoid interference. A single cable can connect the edge device with PC. Configuration of software is a bit tricky
- Rendering Engine: Decently powerful computer with high graphics card running TouchDesigner, Unity, Unreal or similar visual software
- Display: either a video wall or projection video setup
Options:
Multiple cameras can be used to create 3d pose tracking
Stream video from edge cameras to rendering engine. Not sure of usable protocol
Tracking multiple people, in contact with each other (dancing, acro-yoga etc)
Depth Camera: cameras that give Point Cloud depth data could be used
STATUS:
This very much a work in process (with uneven progress).
18Nov: I have gotten the Camera/Pose etc working and feeding Points over network to PC via OSC which feeds data into TouchDesigner
Currently i;m note taking on both pi and pc (with multiple boot sd cards for different OS on Pi)
Example Art Installations
(insert links to still/video of pose tracking in interactive environments)
----
Oct 14
10 steps in Pose2Art process
(make a graphic of this flow)
- image capture
- pose extract
- pose render (optional)
- stream send pose data, video optionally
- physical send ( transport)
- physical rcv
- stream receive
- stream process
- render/overlay
- project/display
Project Plan and this Page
- This doc:survey's tech options for each of the 10 stages listed
- survey existing solutions, focus on newer ones with multi-person options
- find one that runs on my rPi4
- build out a demo using rPi4 and TouchDesigner for rendering.
Nov 20 status:
The QEngineering Raspberry Pi image comes with TensorFlowLite properly installed, along with a C++ demo of pose capture. Adding Libosc++ got it emitting OSC data. Fair bit of mucking around with network static ip, routes, and firewalls was required, but finally got it working with PC. Found at least 1 TouchDesigner example of reading OSC pose data and got it working. Looking into other demos, like a Kinect driving TD Therimin simulator.
OSC (OpenSoundControl) currently chosen as data transport. It is VERY much user defined messages, and I have yet to see any 'standards' for how to name the Pose data. Kinect tracked point names might be useful.
Survey of System Demos
Web searching turned up a LOT of links on pose estimation using machine learning. Some include source code repositories and documentation, others are academic papers or other non-replicable demos. This section is a summary of some. Hopefully one will be found to actually work?
30 oct 2022: links below this update
Attempting to run demos has been Interesting, with lots of classic dependency issues. Some Python pose examples were made to work, but alas Very slowly. The QEngineering rPi example is in c++ and its basics ran much faster (6-10fps) than the python ones. It (and many other examples) use the TensorFlow Lite implementations to run on rPi. TFLite seems to be decent and there are both pretrained and reduced models available as well as the TF blog on how to train a new set on something more than the COCO yoga and dance poses. The options here after getting basics done.
Next steps are putting the pose data into a Message (OSC based) and sending that over network (UDP) to a 'server'.
More dependency issues over Socket/ASIO and OSC libraries, but some progress.
There is no 'standard' for the OSC pose messages. There are some examples of message, json, xml etc pose with either the 17 point or 33 point models, even some showing multiple person tracking data. Since we are writing our own code, we can define it on both ends. Receiver will likely be TouchDesigner, at least for the first prototype.
links to Pose
Tracking demos with code
- SAT LivePose
https://sat-mtl.gitlab.io/documentation/livepose/
https://sat-mtl.gitlab.io/documentation/livepose/en/contents.html
- LivePose is a command line tool which tracks people skeletons from a RGB or grayscale video feed (live or not), applies various filters on them (for detection, selection, improving the data, etc) and sends the results through the network (OSC and Websocket are currently supported).
- Requires: Ubuntu 20.04, nvidia gpu (jetson, pc-rtx, etc)
- Other parts of SAT-MTL's GitLab site mention rpi distribution: https://gitlab.com/sat-mtl/distribution/mpa-bullseye-arm64-rpiL
- indications that LivePose has rPi distrib and outputs OSC, we start from that
- unfortunately it seems LivePose may not work on Jetson Nano or rPi4, so move to other options
- rpi TensorFlowLite, PoseNet
Ethan Del's rpi_pose_estimation builds on Tensor Flow Lite and seems to be simple python with webcam
https://github.com/ecd1012/rpi_pose_estimation
https://medium.com/analytics-vidhya/pose-estimation-on-the-raspberry-pi-4-83a02164eb8e
uses OpenPose
ActionAi and YogAI - Jetson
ActionAI is follow on to YogAI . The latter was touted as using rPi while new ActionAI uses JetsonNano
https://github.com/smellslikeml/ActionAI
https://www.hackster.io/yogai/yogai-smart-personal-trainer-f53744
- web TensorFlow OpenPose js
There are several projects that use browser based (javascript) webcam pose estimation. These might be worth looking into, although more for their use of underlying Pose tools
https://github.com/nishagandhi/OpenPose_PythonOpenCV
https://www.youtube.com/watch?v=DpGHWa2gOcc phoneCam+touchdesigner
April Tags https://github.com/ju1ce/April-Tag-VR-FullBody-Tracker
MediaPipe https://github.com/ju1ce/Mediapipe-VR-Fullbody-Tracking
FreeMoCap
Active Oct 2022; pre-alpha
The FreeMoCap Project: A free-and-open-source, hardware-and-software-agnostic, minimal-cost, research-grade, motion capture system and platform for decentralized scientific research, education, and training
https://github.com/freemocap/freemocap
OpenPose
used in Ethan rPi pose
https://cmu-perceptual-computing-lab.github.io/openpose/web/html/doc/
https://viso.ai/deep-learning/openpose/
https://www.geeksforgeeks.org/openpose-human-pose-estimation-method/
https://github.com/CMU-Perceptual-Computing-Lab/openpose Active early 2022
https://www.youtube.com/watch?v=d3VrS4kgTn0
https://www.arxiv-vanity.com/papers/1812.08008/ OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
Steam webcam
https://www.youtube.com/watch?v=bQCC2HQX2u8
https://store.steampowered.com/app/1366950/Driver4VR/
Capture Systems:
- usb web cam
- rPi (csi bus) camera
- Arducam stereo (csi bus)
- Intel Realsense (usb cam w/depth sensor)
ML Pose Engines Systems
- ModelNet v2 (ml network network)
- TensorFlow
- PyTorch
TensorFlow/TensorFlow Lite
https://pimylifeup.com/raspberry-pi-tensorflow-lite/
PoseNet OpenCV
nVidia BodyPoseNet, TensorRT
https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/purpose_built_models/bodyposenet.html
SAT LivePose https://gitlab.com/sat-mtl/tools/livepose
MMPose https://github.com/open-mmlab/mmpose
stream protocols - video, data (osc)
Video Streaming: ndi
https://www.newtek.com/ndi/applications/
https://www.mgraves.org/2020/05/dicaffiene-using-a-raspberry-pi-4-to-display-an-ndi-stream/
https://github.com/rbalykov/ndi-rpi
Data: OpenSoundControl (OSC)
OpenSoundControl (OSC) is a data transport specification (an encoding) for realtime message communication among applications and hardware.
- python. c+ c# bindings
- available for unreal, unity and touchdesigner
- text based, hierarchial tags for messages
- encoding requires agreement on both ends
- no single Standard for MoCap Pose
https://opensoundcontrol.stanford.edu/
osc4py: https://osc4py3.readthedocs.io/en/latest/
Transport - memory, multicast, disk
The Transport layer moves P2A assets between machines. This may be using in-memory on same system or across network.
render engines- td, unity, unreal, resolume
Rendering Engines should accept at least one of raw video, video+pose overlay, and pose data; using only OSC/PoseData would drive avatars and/or animation/synthesis.
likely first demo: TouchDesigner variant on Kinect
skeleton interaction with particles,
https://www.google.com/search?q=touchdesigner+interactive+particles
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.