JPEG - JPEG AI

JPEG AI Dataset

JPEG AI dataset was constructed to evaluate the performance of state-of-the-art learning-based image coding solutions and can be used for training, validation and testing of novel learning-based image coding solutions.

The JPEG AI dataset is organized according to:

Training dataset: The training dataset aims to provide a set of images to create a model suitable for a learning-based image codec solution.
Validation dataset: The validation dataset aims to provide a set of images to be used during the training to validate the convergence of the training algorithm used by some learning-based image codec solution.
Test dataset (hidden): The test dataset cannot be used neither for training nor validation and will be used to evaluate the final performance of learning-based image coding solutions. Test images are kept hidden until some appropriate stage, to avoid being used for training.

The diversity of the images contained in the JPEG AI dataset is high, namely in terms of their characteristics, such as content and spatial resolution. These datasets have the following characteristics:

Format: PNG images (RGB color components, non-interlaced);
Spatial resolution: from 256×256 to 8K (8 bit);
Training/validation dataset: 5264/350 images.

The numbers of images above allow for an efficient training/validation and they are typically larger than the numbers used in other works. The training and validation dataset will be available at sftp://jpeg-cfe@amalia.img.lx.it.pt. The password is given by sending an email to .

The test images should provide a well-balanced set of diverse images that can be used for a representative evaluation of learning-based image coding solutions. The test images used in the JPEG AI Call for Evidence are available here.