Please refer to the following.
The character and image of junior high school students can be chosen according to the character.
It not only needs to understand the content of the image, but also needs to translate this understanding into natural language.
The neural network model used to generate image text description includes two main elements.
Image plus text description involves generating a human-readable text description given an image (such as a photo).
The deep learning method has replaced the traditional method, and has made the latest technical achievements for the problem of automatically generating image descriptions (called "subtitles").