Automated Typing: Mercury Robot X1's Precision and Efficiency

ElephantRobotics · July 26, 2024, 9:40am

Introduction

With the advancement of modern technology, the application of robots in daily life and work scenarios is becoming increasingly widespread. This article will introduce the MercuryX1, an advanced robot capable of recognizing and determining keyboard keys through its arm-mounted camera, thereby performing precise typing operations. Through this case study, we will demonstrate MercuryX1’s potential in the field of automated office work, highlighting its significant advantages in improving efficiency and reducing human errors.

Follow along as we first provide a brief introduction to the product used.

Product

Mercury X1

The Mercury X1 is a wheeled humanoid robot composed of the Mercury B1 and a high-performance mobile base, boasting 19 DOF. Its single arm is a 7-DOF humanoid robotic arm. The entire machine is equipped with the NVIDIA Jetson Xavier as its main controller.

The mobile base has extensive sensory capabilities, including high-performance LiDAR, ultrasonic sensors, and 2D vision sensors. Its direct-drive motor system allows it to reach a maximum speed of 1.2 m/s, climb obstacles up to 2 cm high, and handle inclines up to 15 degrees. The maximum battery life of the whole machine is up to 8 hours.

Moreover, the Mercury X1 supports mainstream simulation software such as ROS, Moveit, Gazebo, and Mujoco, enhancing the robot’s ability for autonomous learning and rapid iteration.

myCobot Pro Adaptive Gripper

The myCobot Pro Adaptive Gripper is designed to be compatible with the Mercury X1. It provides substantial gripping force and features a standard M8 aviation connector interface.

Camera Flange

The camera module compatible with the Mercury X1 can be mounted at the end of the robot’s arms and used in conjunction with the gripper. The camera captures relevant information about objects, utilizing this data for machine vision recognition algorithms. Through a USB interface, the data is transmitted to the Jetson Nano main controller for further processing.

Technical Highlights

Next, we will introduce the key technologies used in this project.

pymycobot

pymycobot is a control library designed by Elephant Robotics specifically for their robotic arm products. This library allows users to conveniently and quickly call APIs to achieve precise control and operation of the robot. pymycobot provides a wide range of functional interfaces, simplifying the programming process and enabling developers to focus on application development.

OpenCV

OpenCV is an open-source computer vision library. Through this library, users can easily and quickly call various image processing and computer vision APIs to achieve image recognition, object detection, and image transformation. OpenCV offers a rich set of functional interfaces, streamlining the development process and allowing developers to concentrate on application implementation.

STag

STag is a stable marker code widely used in the fields of computer vision and robot positioning. With STag, users can achieve reliable object recognition and tracking. STag provides robust performance and easily integrable interfaces, simplifying developers’ tasks in positioning and recognition applications.

Project

The primary functions of the entire project are robotic arm motion control, hand-eye calibration (coordinate system transformation), and machine vision recognition. Let’s first introduce the most important aspect: machine vision recognition.

Machine Vision Recognition

To enable the X1 to type, it must recognize the keyboard. Since the robot cannot inherently recognize the keyboard, we need to teach it the layout and key positions. This is where STag and OpenCV come into play. The STag marker code can determine the keyboard’s position and provide coordinate parameters.

The following code snippet refreshes the camera interface to obtain a real-time view and detect the position of the STag code:

def stag_identify_loop(self):
    while True:
        self.camera.update_frame()  # Refresh the camera interface
        frame = self.camera.color_frame()  # Get the current frame
        (corners, ids, rejected_corners) = stag.detectMarkers(frame, 11)  # Get the angle and ID of the QR code in the picture
        marker_pos_pack = self.calc_markers_base_position(corners, ids)  # Get the coordinates of the object (camera system)
        print("Camera coords = ", marker_pos_pack)
        cv2.imshow("Press any key on the keyboard to exit", frame)
        # cv2.waitKey(1)
        # Press any key on the keyboard to exit
        if cv2.waitKey(1) & 0xFF != 255:
            break

Two STag codes are used to determine the position, and the draw function marks the keyboard’s location in the image. It is necessary to consider the use of both hands; although single-arm operation is possible, we aim to simulate human typing, with the left hand responsible for the left area and the right hand for the right area.

def draw(frame, arm):
    global per_right_corners, per_left_corners
    # Grayscale the image
    imGray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    # Get the corner points of the QR code in the image
    (corners, ids, rejected_corners) = stag.detectMarkers(imGray, 11)
    # Get the displacement and rotation vector of the QR code relative to the camera through the corner points
    marker_pos_pack = calc_markers_base_position(corners, ids, marker_size, mtx, dist)
    if arm == "left":
        # Get the coordinates of the current end of the robot arm
        stag_cur_coords = np.array(ml.get_base_coords())
        stag_cur_bcl = stag_cur_coords.copy()
        stag_cur_bcl[-3:] *= (np.pi / 180)
        # Get the coordinates of the QR code relative to the base of the robot arm through the hand-eye matrix
        stag_fact_bcl = Eyes_in_hand(stag_cur_bcl, marker_pos_pack, "left")
        stag_coord = stag_cur_coords.copy()
        stag_coord[0] = stag_fact_bcl[0]
        stag_coord[1] = stag_fact_bcl[1]
        stag_coord[2] = stag_fact_bcl[2]
        # Store the three-dimensional coordinates of the QR code
        keyboard_coords[","] = stag_coord
    else:
        # Get the coordinates of the current end of the robot arm
        stag_cur_coords = np.array(mr.get_base_coords())
        stag_cur_bcl = stag_cur_coords.copy()
        stag_cur_bcl[-3:] *= (np.pi / 180)
        stag_fact_bcl = Eyes_in_hand(stag_cur_bcl, marker_pos_pack, "right")
        stag_coord = stag_cur_coords.copy()
        stag_coord[0] = stag_fact_bcl[0]
        stag_coord[1] = stag_fact_bcl[1]
        stag_coord[2] = stag_fact_bcl[2]
        keyboard_coords["."] = stag_coord
    rvecs, tvecs = solve_marker_pnp(corners, marker_size, mtx, dist)
    cv2.drawFrameAxes(frame, mtx, dist, rvecs, tvecs, 50)
    draw_img = frame

    x1 = corners[0][0][0][0]
    y1 = corners[0][0][0][1]
    x2 = corners[0][0][1][0]
    y2 = corners[0][0][1][1]

    x = x1 - x2
    y = y1 - y2

Next, the algorithm determines the coordinates of each key. The identified keys are stored in an array, defining the left-hand and right-hand areas accordingly.

#Left-arm keyboard layout
left_keyboard_txt = [
    ["q", "w", "e", "r", "t", "y", "u", "i", "o", "p"],
    ["a", "s", "d", "f", "g", "h", "j", "k", "l"],
    ["z", "x", "c", "v", "b", "n", "m"]
]
#Right-arm keyboard layout
right_keyboard_txt = [
    ["m", "n", "b", "v", "c", "x", "z"],
    ["l", "k", "j", "h", "g", "f", "d", "s", "a"],
    ["p", "o", "i", "u", "y", "t", "r", "e", "w", "q"],
]
#Letters tapped by left arm
left_control = ["q", "w", "e", "r", "t",
                "a", "s", "d", "f", "g",
                "z", "x", "c", "v", ","]
#Letters tapped by left arm
right_control = ["y", "u", "i", "o", "p",
                 "h", "j", "k", "l",
                 "b", "n", "m", "."]

With this setup, the X1 can recognize the keyboard and identify the corresponding key positions. The next step is to address the hand-eye calibration of the robotic arm, which involves converting the coordinate system of the target object to match the coordinate system at the end of the robotic arm.

Hand-Eye Calibration: Eye-in-Hand

Marker Detection

Use the camera to capture images and detect STag markers, obtaining the 3D coordinates of the markers. Call the solve_marker_pnp function to calculate the position and orientation of the markers in the camera coordinate system.

def calc_markers_base_position(corners: NDArray, ids: T.List, marker_size: int, mtx: NDArray, dist: NDArray) -> T.List:
    if len(corners) == 0:
        return []
    rvecs, tvecs = solve_marker_pnp(corners, marker_size, mtx, dist)
    res = []
    for i, tvec, rvec in zip(ids, tvecs, rvecs):
        tvec = tvec.squeeze().tolist()
        rvec = rvec.squeeze().tolist()
        rotvector = np.array([[rvec[0], rvec[1], rvec[2]]])
        Rotation = cv2.Rodrigues(rotvector)[0]
        Euler = CvtRotationMatrixToEulerAngle(Rotation)
        cam_coords = tvec + rvec
        target_coords = cam_coords
    return target_coords

Coordinate Transformation
Convert the marker’s rotation vector into a rotation matrix, and then into Euler angles for further calculation and analysis. Combine the translation vector and rotation vector to obtain the target coordinates.
The cv2.Rodrigues function is used for conversions between rotation vectors and rotation matrices. This function converts rotation vectors to rotation matrices, or vice versa.

rotvector = np.array([[rvec[0], rvec[1], rvec[2]]])
Rotation = cv2.Rodrigues(rotvector)[0]

 #Conversion between Euler angles and rotation matrices
def CvtRotationMatrixToEulerAngle(pdtRotationMatrix):
    pdtEulerAngle = np.zeros(3)
    pdtEulerAngle[2] = np.arctan2(pdtRotationMatrix[1, 0], pdtRotationMatrix[0, 0])
    fCosRoll = np.cos(pdtEulerAngle[2])
    fSinRoll = np.sin(pdtEulerAngle[2])
    pdtEulerAngle[1] = np.arctan2(-pdtRotationMatrix[2, 0], (fCosRoll * pdtRotationMatrix[0, 0]) + (fSinRoll * pdtRotationMatrix[1, 0]))
    pdtEulerAngle[0] = np.arctan2((fSinRoll * pdtRotationMatrix[0, 2]) - (fCosRoll * pdtRotationMatrix[1, 2]), (-fSinRoll * pdtRotationMatrix[0, 1]) + (fCosRoll * pdtRotationMatrix[1, 1]))
    return pdtEulerAngle

def CvtEulerAngleToRotationMatrix(ptrEulerAngle):
    ptrSinAngle = np.sin(ptrEulerAngle)
    ptrCosAngle = np.cos(ptrEulerAngle)
    ptrRotationMatrix = np.zeros((3, 3))
    ptrRotationMatrix[0, 0] = ptrCosAngle[2] * ptrCosAngle[1]
    ptrRotationMatrix[0, 1] = ptrCosAngle[2] * ptrSinAngle[1] * ptrSinAngle[0] - ptrSinAngle[2] * ptrCosAngle[0]
    ptrRotationMatrix[0, 2] = ptrCosAngle[2] * ptrSinAngle[1] * ptrCosAngle[0] + ptrSinAngle[2] * ptrSinAngle[0]
    ptrRotationMatrix[1, 0] = ptrSinAngle[2] * ptrCosAngle[1]
    ptrRotationMatrix[1, 1] = ptrSinAngle[2] * ptrSinAngle[1] * ptrSinAngle[0] + ptrCosAngle[2] * ptrCosAngle[0]
    ptrRotationMatrix[1, 2] = ptrSinAngle[2] * ptrSinAngle[1] * ptrCosAngle[0] - ptrCosAngle[2] * ptrSinAngle[0]
    ptrRotationMatrix[2, 0] = -ptrSinAngle[1]
    ptrRotationMatrix[2, 1] = ptrCosAngle[1] * ptrSinAngle[0]
    ptrRotationMatrix[2, 2] = ptrCosAngle[1] * ptrCosAngle[0]
    return ptrRotationMatrix

#Combine rotation and translation vectors
cam_coords = tvec + rvec
target_coords = cam_coords

By transforming these coordinates into a common coordinate system, the robotic arm and the keyboard can “understand” each other’s positions. This allows the robot to directly obtain the results for the keyboard key positions.

Robotic Arm Motion Control

Once we have the coordinates of the target object, it’s time for the X1 to shine and start executing movements. Here, we use pymycobot to control the movements of the robotic arms.

# 'mr' is the right hand control, 'ml' is the left hand controlmr = Mercury("/dev/ttyACM0")ml = Mercury("/dev/ttyTHS0")# Send angle positions and speed to the robotic armsmr.send_angles(mr_pos, sp)ml.send_angles(ml_pos, sp)# Send coordinates and speed to the robotic armsmr.send_coords(mr_pos, sp)ml.send_coords(ml_pos, sp)

By transmitting the target coordinates to the robotic arms, we can achieve the typing operation. Let’s see how the movement performs.

Explanation

Initialization: We initialize two instances of the Mercury class for the right and left arms, specifying the respective device paths for communication.
Angle Positions and Speed: Using the send_angles method, we send the desired angles and speed to the robotic arms.
Coordinates and Speed: Using the send_coords method, we send the target coordinates and speed to the robotic arms.

By following these steps, the X1 can precisely move its arms to the target coordinates, enabling it to type on the keyboard.
7月25日_1_

Even if the position is changed, it can be recognized no matter how the keyboard is placed.
7月25日_2_

Summary

In this article, we provided a detailed introduction to the application of the Mercury X1 wheeled humanoid robot in a typing task. With its 19 degrees of freedom, extensive sensory capabilities, and high-performance control system, Mercury X1 has demonstrated significant potential in the field of automated office work.

By integrating a compatible camera module and advanced machine vision algorithms, Mercury X1 can accurately recognize and operate a keyboard, significantly improving work efficiency and accuracy. As technology continues to advance, we look forward to seeing Mercury X1 showcase its exceptional performance in more fields, bringing even more possibilities to intelligent automation.