JPEG DNA - JPEG Committee explores standardization of media storage in DNA
July 23, 2020

DNA is a macromolecule and an essential component for a large number of living entities. It is made of simple units that line up in a particular order within this large molecule. The order of these units carries the genetic information of a specific life organism, similar to how the order of bits in a binary stream carry information. This means, artificial DNA molecules can be created with any desired information. It has been demonstrated recently, that it is possible to store a rather large amount of information though synthesis of DNA molecules and to retrieve this information by sequencing the latter. Although the current cost of the above process still remains prohibitive for many applications, it is rapidly decreasing and it is on the verge of becoming feasible in practice. Storage in form of DNA exhibits a number of attractive benefits. Among others, it has orders of magnitude higher density when compared to conventional storage technologies; it has a longevity spanning thousands of years when compared to dozens of years in current storage approaches; and it is very energy efficient. A unique particularity of DNA is that the information stored in it is based on a quaternary representation based on adenine (A), guanine (G), Cytosine (C) and thymine (T), instead of the usual digital information represented in binary “0” and “1”. This opens the door to the efficient coding of information in a quaternary representation of data instead of the usual binary form. In addition, DNA storage and retrieval possess intrinsic properties and error characteristics that are different from those found in conventional binary transmission and storage mechanisms used so-far.

The JPEG Committee has decided to explore efficient image coding approaches for applications where DNA is used as storage support. One such application is long term preservation of media content. Because of its past successful history of offering efficient image and image sequence formats for storage and archival applications, the JPEG committee is well positioned to address standardization challenges related to multimedia content efficient representations and, in particular, for image and image sequences in the context of DNA storage.

As a minimum, JPEG committee could launch an activity to convert its existing image coding formats from compressed binary representation to compressed “AGCT” quaternary representation. Standardized image coding approaches along with appropriate tools such as error resiliency and associated metadata, particularly suited to the DNA digital information storage requirements, are also good directions for JPEG to explore. As an immediate step, the DNA digital information storage applications need to be explored more in detail with particular emphasis on image and image sequences as information. They should then be prioritized in terms of time to market and maturity with efforts primarily focusing on specific use cases that can gather a critical mass of stakeholders.

To this end, an AHG called JPEG DNA has been created to carry out this exploration. A first draft of the findings of the JPEG DNA AHG can be found here.

JPEG DNA AHG mailing list is open to public and all stakeholders and interested parties in media technologies and applications that can benefit from DNA storage technologies are invited to join the mailing list and to take part in the activities by registering here.