Interactive Robot Hand with Google mediapipe

I have been fascinated by all things robots, sci-fi movies and motion tracking for as long as I can remember. Combining these into fun interactives is what drives my creativity.

I spent much of the pandemic era working with Python, Raspberry Pi and TouchDesigner for use in computer vision projects. I also ran the company and produced many projects, but this is the stuff that makes me feel alive. One of my goals was to create a motion tracked robotic hand that mimics the users hand movements, to emulate a Terminator-type hand that is user controlled.

After much research and 3D printing tests to create a robotic hand from scratch, I gave up after too many failed prints and annoying fabrication issues. I found a robot hand on Amazon, which comes with micro servos already attached, making it quite easy to work with.

Using a Raspberry Pi with a Servo Hat from Adafruit, I was able to quickly connect the servos and get to work programming some test sequences. After installing the necessary libraries, I created a quick cycling test of each finger in Python on the RPi, while the Terminator2 Theme played, to set the mood.

There are many methods to track hands and finger joints, but I found Google Mediapipe to be the easiest and fastest to implement. After spending so much time working through tutorials or having issues with particular libraries spitting out random errors, Mediapipe worked on the first try. I use the community version of PyCharm as my IDE of choice for Python programming.

It’s easy to create new environments and install multiple libraries. After watching many tutorials on implementing Google Mediapipe hand tracking by Murtaza’s Workshop, especially this particular one, I got the hang of Mediapipe and started focusing on how to track the hand data within the video frame and printing the results.

By referencing the wrist (0) as the center point, it’s possible to keep the hand centered in the frame. In the code snippet below, I’m calling the wrist the refX and refY and using those to offset the image to keep the hand in full view in center of the screen at all times. Note: to keep it simple, I’m only using the data from the finger tips as I only have one servo per finger.

# Determine distance from base reference point, this method avoids worrying about absolute pixel positioning, keeps it relative
thumbDx = thumbX - refX
foreDx = refY - foreY
middleDx = refY - middleY
ringDx = refY - ringY
pinkyDx = refY - pinkyY

I attempted to implement Mediapipe directly on the Pi with no luck after 3 days of trying. I put this aside for now and am using my PC to track the hand in real time and send the data to the Pi over UDP to the IP of the Pi. To talk to the Pi, I’m using a simple UDP communication protocol. It took awhile to figure out, but I had to use this line to format the message to send my data in the proper format to be read by the Pi. This meant I had to send the messages as the intended ‘integer’ format using this command.

MESSAGE = struct.pack('iiiii', new_thumbDx, new_foreDx, new_middleDx, new_ringDx, new_pinkyDx)

The ‘iiiii’ represents each of the 5 finger tips that are tracked. On the Pi, I’m doing the opposite and reading this through a very simple script. Here’s a video of the final implementation of it all working.

The code below takes the incoming data stream and parses and scales it to values that worked well. Most of this is from the example code for the Pi Hat. I used a struct.unpack to unpack the integer data array and used that to send to the kit.servo.angles.

from adafruit_servokit import ServoKit
import time
import socket
import struct
import RPi.GPIO as GPIO
from pygame import mixer

#UDP_IP = "127.0.0.1" #for internal
UDP_IP = "192.168.1.28" #This is this Pi IP
UDP_PORT = 50002 #just set outgoing to this
 
sock = socket.socket(socket.AF_INET, # Internet
    socket.SOCK_DGRAM) # UDP
sock.bind((UDP_IP, UDP_PORT))
 
kit = ServoKit(channels=16)
# Initialize pygame mixer
mixer.init()
# Load the sounds
sound = mixer.Sound('terminator2theme.wav')
#sound.play()

kit.servo[0].angle = 130
kit.servo[1].angle = 0
kit.servo[2].angle = 0
kit.servo[3].angle = 0
kit.servo[4].angle = 0

time.sleep(1)

while True:
    data, addr = sock.recvfrom(1024) # buffer size is 1024 bytes
    #print("received message: %s" % data) 
    #i = struct.unpack('36b', data) #36bytes total length of stream. Need to parse data via an array
    #19,23,27,31,35 #These are the array numbers to pull from, when sending from Max patch
    #i = struct.unpack('20b', data) #20bytes total length of stream.
    i = struct.unpack('iiiii', data) #takes data in a 5 integers, basically an array.
    
    thumb = (i[0])
    fore = (i[1])
    middle = (i[2])
    ring = (i[3])
    pinky = (i[4])
    print (thumb,fore,middle,ring,pinky)

    kit.servo[0].angle = thumb
    kit.servo[1].angle = fore
    kit.servo[2].angle = middle
    kit.servo[3].angle = ring
    kit.servo[4].angle = pinky

Just a little message for Covid-19

Next steps for this project: Add another hand for 2 hand control and face/head tracking to control a robot head with eye tracking. Mediapipe is pretty much built for this! I am pretty excited to build a life-size creepy robot that mimics a users movements in real-time. It’s probably a useless machine, but should be fun to build.

Author: Brian Dressel

Previous
Previous

AR lens studio experiments

Next
Next

Laser animation & sound design