Cutting edge technology

Introduction: Recently, the HPAIC (Human Protein Atlas Image Classification) competition sponsored by Kaggle and Leica Microsystems and NVIDIA officially ended. The competition lasted for three months, with 2,236 teams from all over the world participating. The Polar Chain AI Institute and the Engineering Academy finally won the challenge gold medal.

Recently, the HPAIC (Human Protein Atlas Image Classification) competition sponsored by Kaggle and sponsored by Leica Microsystems and NVIDIA officially ended. The competition lasted for three months, with 2,236 teams from all over the world participating. The Polar Chain AI Institute and the Engineering Academy finally won the challenge gold medal.

Introduction to the game

Protein is an "actor" in human cells that performs many functions that promote life together. Protein classification is limited to a single pattern in one or several cell types, but to fully understand the complexity of human cells, the model must classify the mixed patterns in a range of different human cells.

Visualizing images of proteins in cells is often used in biomedical research, and these cells can be the key to the next medical breakthrough. However, due to advances in high-throughput microscopy, these images are generated much faster than manual evaluation. Therefore, there is a greater need for automated biomedical image analysis to accelerate understanding of human cells and diseases.

Although this is a competition in biology, its essence is the image multi-label classification problem in the machine vision direction. The participating teams also include many competition experts in the field of machine vision and machine learning.

Data Analysis

Officially provided us with two types of datasets, one is a pngx512 png image, and the other is a Twenty-two image of 2048x2048 or 3072x3072. The dataset is approximately 268G, of which the training set :31072 x 4 sheets, test set: 11702 x 4 sheets.

A protein map consists of four staining methods (red, green, blUe, yellow), the image example is as follows:

We merged 4 channels into 3 channel (RYB) visualizations as shown below:

There are 28 categories in this competition For example, Nucleoplasm, Nuclear membrane, etc., each map image can have one or more labels. The number of tags is counted as follows:

can find that the number of tags is concentrated in 1-3, but there will still be 5 tags in the image, which adds a certain difficulty to the game.

On the other hand, the difficulty is that the number of samples in the dataset is very uneven, with the most images in the category of 12,885, and the image with the least image, only 11 images, which causes great difficulty in the competition, the sample size distribution The situation can be seen in the figure.

During the course of the competition, participants gradually found the official additional data set HPAv18, and obtained official authorization. These data sets have 105,678, which greatly expanded the sample size and provided us with a large amount. s help.

Environment Resources

Hardware We used 4 NVIDIA TESLA P100 graphics cards and used pytorch as our model training framework.

Image Preprocessing

The HPAv18 image is somewhat different from the officially given image, although it is also composed of 4 staining methods, but each stained image is an RGB image, not the official one. The single-channel image, and the values ​​of the three channels of RGB are quite different. We have preprocessed these images and only take one channel for each RGB image (r_out=r, g_out=g, b_out=b, y_out=b ) and scale these images to 512x512 and 1024x1024.

For the TIFF file, we spent a week downloading this data set and then scaling all the images to 1024x1024.

Data augmentation

The augmentation methods used in our competition are Rotation, Flip and Shear; because we don't know if there is any relationship between multiple cells in an image. So thanThere was no augmentation method using random cropping in the game.

This article was written by the author of the cutting-edge technology. The views represent only the author and do not represent the OFweek position. If you have any infringement or other problems, please contact us.

Hot topic