Research‎ > ‎Robot Arms‎ > ‎Robotic Application‎ > ‎

AI


AI-based robotic grasp

Robotic grasp using neural networks and several stochastic machine learning

 

- Research Background

  • Logistics systems cover a wide variety of objects. Therefore, technology that can deal with many objects is needed. However, because of lack of object manipulation technology, many workers are needed for transportation, packaging, and other tasks.

- Research Objectives

  • Robot deals with various types of objects using artificial intelligence techniques.

- Research Output

1. Object detection

Recognizing objects using Mask R-CNN on given RGB camera images.

 

*  Mask R-CNN

1) Bounding box detection based on Faster R-CNN and segmented objects

2) Using ResNet and RPN (Region Proposal Network)

3) ResNet to extract the feature map of given RGB camera images.

4) RPN (Region Proposal Network) to find the bounding boxes of objects

 

< Example of Mask R-CNN >

 

                              < Structure of Mask R-CNN >

 

* Fine-tuned Mask R-CNN

1) 45 kinds of objects

 

            < Examples of target objects: ACRV picking benchmark dataset >

 

 

2) Demonstration

YouTube 동영상

                                    < Fine-tuned Mask R-CNN>

 

2. Object Grasping based on ANN

* Learning grasping pose of robot

* Disadvantages collecting training data in a real working environment

1) A lot of time

2) Robot operation cost

3) Supervision

 

* Implementing a realistic working environment on the simulator

1) Collecting large amounts of data through simulation

 

                    < System configuration: (a) robot in simulation and (b) real robot >

 

* Learning to grasp based on AlexNet

1) Input: image of the detected object

2) Output: pose of gripper with the highest grasping success rate


                                                         < Structure of fine-tuned AlexNet >

 

* The ensemble learning method combines various classifiers to achieve better performance than a single classifier.

1) Combining the results of learning models

2) Obtaining more reliable results than a learning model

 

                 < Concept of ensemble learning >

 

* 4 classifiers based on four data sets with ensemble learning: trained under

1) Real data

2) Simulation data

3) Another simulation data

4) Real data + simulation data

 

    < Application of ensemble learning to grasp >



YouTube 동영상

                              < Example of object grasping >

 

3. Object Grasping based on stochastic machine learning

* Object pose (x, y, q) is necessary to grasp objects.

* Location (x, y) can be known using Mask R-CNN

* Angle (q) is predicted by using the masks of Mask R-CNN and PCA (principal 

   component analysis)

 

* Extracting depth images using masks

 

 < Example of extracted depth image using mask >

 

* Grasping pose estimation based on PCA

1) Predicting the shortest direction (minor axis) of the objects at the center of depth  

    images

 

  < Result of predicted grasping pose >

 

* Grasping demonstration

YouTube 동영상

                               < Example of object grasping >




4. Bin picking demonstration for 3 scenarios

Target objects: 20 known objects + 5 unknown objects

Known objects: 20 objects = 7 foods + 7 toys + 6 tools

   - Stage 1: Objects in blue boxes

   - Stage 2: All objects

   - Stage 3: Objects in red boxes + 5 unknown objects




 

Scenarios


Stage

Contents

1

Grasping 10 known objects in 2D clutter

2

Grasping 20 known objects in 3D bin

3

Grasping 15 known objects and 5 unknown objects in 3D bin,

and placing foods at designated locations in certain pose



Demonstration for each stage

 




AI-based robotic assembly

 

- Research Background

  • Recent robot learning through deep reinforcement learning is learning various robot tasks through DNN without using specific control or recognition algorithms. However, it is difficult to apply this learning method to the contact task of a robot because it can generate excessive force in the random search process of reinforcement learning. Therefore, it is necessary to solve the contact problem using the existing force controller when applying reinforcement learning to contact tasks.

 

- Research Objective

  • Trajectory generation based on reinforcement learning algorithm for force-based robotic assembly

 

- Research Outputs

1) Reinforcement learning method based on DMP and PoWER

2) Reinforcement learning method based on NNMP and DDPG

 

- Reinforcement learning method based on DMP and PoWER

  • DMP is used to create complex trajectories that generate the contact force required for assembly through force controller. Then, PoWER is applied to optimize the DMP-based trajectory for the assembly task.

 

                                          < Control system using DMP & PoWER >

 

* Dynamic movement primitive (DMP)

1) A motor primitive based on Stefan Schaal's proposed dynamical system

2) DMP can generate complex trajectories using the minimum number of linear 

     parameters.

3) The shape of the trajectory is determined by the linear parameter. àSuitable for 

     reinforcement learning

 

* Policy learning by weighting exploration with the returns (PoWER)

1) Episode based reinforcement learning algorithm applicable to linear deterministic 

     policy function

2) Reinforcement learning algorithm using expectation maximization (EM) 

 à No learning rate required.

3) PoWER generally has an excellent learning speed, but the applicable form of the 

    policy function is limited

 

* Demonstration

1) Robot: SCORA-V (Safe Collaborative Robot Arm – Vertical type) developed in the 

     laboratory

2) Control system: PC-based controller with a current control cycle of 1 ms through the 

     EtherCAT communication

3) Force control algorithm: torque-based impedance controller

4) Assembly parts: square peg-in-hole, size: 50.0 x 50.0 x 30.0 mm, tolerance: 0.1 mm

 

YouTube 동영상


                    < Assembly demo before and after learning >

 

- Reinforcement learning method based on NNMP and DDPG

  • NNMP is used to create complex trajectories that generate the contact force required for assembly through force controller. Then, DDPG is applied to optimize the trajectory generated by NNMP for the assembly task.

 

                                                    < Control system using NNMP & DDPG >

 

* Neural network-based movement primitive (NNMP)

1) DNN is used to generate complex trajectories by using various input signals 

     (measured force and position).

2) The velocity and position are calculated by integrating the acceleration to generate a 

     continuous trajectory.


3) The size and motion time of the trajectory can be changed by adjusting the 

     normalization matrix.

4) DAgger based Imitation learning algorithm for proposed NNMP is developed.

 

                       < Neural network-based movement primitive >

 

* Deep deterministic policy gradient (DDPG) for NNMP

1) The measured force and position are added as the state of the NNMP to reflect the 

    contact state.

2) In order to apply DDPG, the neural network of NNMP is regarded as an actor network.

3) Ornstein–Uhlenbeck (OU) noise is added to the action for exploration 

     in reinforcement learning.

 

                 < Structure of robot system for reinforcement learning with force controller and NNMP >

 

* Demonstration

1) Robot: SCORA-V (Safe Collaborative Robot Arm – Vertical type) developed in the 

    laboratory

2) Control system: PC-based controller with a current control cycle of 1 ms through the 

    EtherCAT communication

3) Force control algorithm: torque-based impedance controller

4) Assembly parts: square peg-in-hole, size: 50.0 x 50.0 x 30.0 mm, tolerance: 0.1 mm


YouTube 동영상


                       < Assembly demo before and after learning >




Simulator to the real-world transfer of manipulation policy


- Research Background

  • Reinforcement Learning of a robot’s manipulation policy with deep-learning based approximation requires expensive data. To overcome this, simulators that reflect real-world robot and its surrounding environment have been used to generate Markov transitions for training deep-RL networks. 
  • However, the discrepancy between the simulator and real-world (e.g., friction model, visual rendering, robot dynamics model) deters the direct transfer of the simulator-trained deep-RL networks to real-world robot agent. We aim to solve this issue while preserving sample-efficiency of using the simulator and not deteriorating the trained networks’ generalization error on various manipulation tasks.

 

- Research Objective

  • Create an effective simulated environment and training method to transfer deep-RL networks trained in the simulator to real-world robot agent. 

 

- Research Outputs

  • A demonstration data collection system for aggregating robot’s motion Markov transitions
  • Asymmetric Actor-Critic based for manipulating real-world robot in POMDP1)
  • Embracement of recent successful extensions in deep-RL (e.g., PER2), n-step transition3))

 

 * A demonstration data collection system for aggregating robot’s motion Markov transitions


YouTube 동영상


                < Transition data collection via velocity kinematics >


  1. We are currently using Gazebo simulator which is fully compatible with Sawyer robotic arm in ROS communication. 
  2. However, to our knowledge, recent robot dynamics simulators (e.g., Gazebo, V-REP, MuJoCo) are not equipped with built-in inverse velocity kinematics. 
  3. In the viewpoint of reinforcement learning on robotics, defining joint velocity/torque commands is essential to achieving end-to-end deep-RL, which makes it also important to aggregate Markov transitions incorporating velocity/ torque commands as actions. Inspired by this, we incorporated our closed-loop robot control system on the simulator, currently enabling us to exploit velocity command data as actions in Markov transition.

 

* Asymmetric Actor-Critic based for manipulating real-world robot in POMDP

  1. Inspired by the work of [Pinto, Lerrel, et al., 2017], we aim to implement a learning system that both utilizing the simulator’s fully-observable characteristic and off-policy learning algorithms’ independence in sampling Markov transition data. Therefore, we have modified several recently successful off-policy learning algorithms (e.g., DQN, DDPG, ACER) to fit in our learning environment described above.
  2. It remains a challenge to resolve the Actor-network’s (in DDPG) POMDP characteristic that the visual observation it gets is enough to infer the latent states of robots (e.g., joint positions, velocities, and efforts that Critic-network gets). In addition to this, we tackle effective exploration methods for an actor to explore the environment from the joint velocity commands.
Comments