The Image Data Model

Images are ubiquitous objects used on all electronic media, implemented in a large number of ways with different file formats, different access strategies and different notions of information. The most general notion of an image is a 2D raster of values representing gray scale or colour values at particular sampled points within the image. An extension of the basic image model is the time-evolving model (i.e., movie or animation), with time representing a third dimension. In structural biology, we deal with 3D structures, often viewed as static, but in reality subject to time-dependent variation. In electron microscopy, the images taken are approximations of 2D projections of 3D structures. The problem with most available software is that none supports the concept of an image in its most general form.

The image as a five-dimensional data set

The structures studied in structural biology is inherently 3D, thus the basis of the image model is also 3D. In addition, we would like to pack multiple 2D images into a single file. However, packing 2D images as the sections of a 3D map confuses the distinction between 2D and 3D and should be avoided. Furthermore, we also want to be able to store multiple 3D maps, requiring an additional dimension.

Most of the data sets we work with have single values at each pixel or voxel (gray scale values), but situations may arise where each voxel may have multiple values. The most common usage of multiple pixel values is to represent complex numbers and colour, such as RGB (Red-Green-Blue) and CMYK (Cyan-Magenta-Yellow -blacK). More extensive use may be to associate a list of values or spectrum at each pixel. This requires yet another dimension in the image model. The meaning of the channels is captured in the notion of a compound type, including simple (one value), complex(two values), RGB color (three values, etc.

The five dimensions are therefore (in storage order):

Data types

Each channel in the image contains a single value, where the data types supported in Bsoft are listed in Table 1. Images with multiple channels can have any data type. This makes the image data model more general and as an example, allows for specification of color images as floating point values.

Table 1. Bsoft image data types
Enumerated data typeC data typeSize (bytes)Single letter code
UCharunsigned char1b
SCharsigned char1c
UShortunsigned short2u

Compound types

The definition of the channels is captured in the compound type, where the types supported are listed in Table 2.

Table 2. Bsoft image compound types
Enumerated compound typeElementsSize (values)Single letter code
TSimplegray value1S
TComplexreal, imaginary2C
TVector2x, y3V
TVector3x, y, z3V
TViewx, y, z, a4O
TRGBr, g, b3R
TRGBAr, g, b, a4A
TCMYKc, m, y, k4K
TMultiarray of valuesnM

Image file formats

It seems that every image processing software package has one or more of its own image file formats. Even in packages where external formats have been adopted, changes in those formats literally made them different formats. There are many conversion programs dealing with specific pairwise conversions - not a particularly efficient solution to the user. Bsoft attempts to deal with images as generalized constructs, encapsulating most of the information embedded in the image files in an internal structure. The notion of conversion is now trivial, as reading and writing of multiple file formats are supported. The limiting factor in this is still the limitations within each file format. E.g., you cannot expect file formats designed for single images (such as MRC and EM) to store multiple images (whether 2D or 3D).

Table 3. Image file format features (as implemented in Bsoft)
Image formatExtensionsData typesDimensionsFourier/ComplexSampling InfoRemarks
ASCII.asc, .txt(text)3D, singleListNo 
BioRad.picb, u3D, singleNoNoConfocal microscopy
Brix.brxb3D, singleNoIndirectO package, Xtal
Brookhaven STEM.datb2D, double interleavedNoOne valueSTEM corrections applied on reading, .ccp, .ccp4c, s, f, S, F3D, singleCentered hermitianIndirectXtal
Digital Instruments.dis2D, doubleNoNoNo write support
Digital, .dm3, .dm4b, s, i, f, F2D, singleNoNoProprietary format
Ditabis image plate reader.IPL, .IPH, .IPR, .IPCs, i2D, singleNoTwo valuesMicron package
DSN6.dsn6, .dn6, .omapb3D, singleNoIndirectO package, Xtal
DX.dxf3D, singleNoThree valuesOpenDX, visualization
EM.emb, s, i, f3D, singleHermitianNoEM package
Goodford.potf3D, singleNoOne valueElectrostatic potential
GRD.grd(all)3D, multipleNoThree valuesComplete Bsoft image data model
HKL.hkl(text)3D, singleListNoStructure factor format
Imagic.img (.hed)b, s, f, F2D, multipleCenteredNoHeader in a separate file
Image Magick.miffb (RGB)2D, multipleNoNoX-window display program
JPEG.jpg, .jpegb (RGB)2D, singleNoNoWeb image format
MFF.mffb, f3D, singleNoThree valuesWhatif package
MRC.mrcb, s, f, S, F3D, multipleCentered hermitianIndirectMRC package
PIC BP.bpb2D, singleNoNoPIC package
PIF.pifb, s, i, f, S, F3D, multipleBinary listThree valuesPFT/EM3DR package
PNG.pngb, s (RGB)2D, singleNoTwo valuesNetwork image format
PNM.pbm, .pgm, .ppmb (RGB)2D, singleNoNoneSimple image format
Ser.serf2D, multipleNoNoneFEI series format
Situs.situsf3D, singleNoOne valueSitus package
SPE.spef2D, singleNoNoneSPE CCD format
Spider.spif3D, multipleHermitianOne valueSpider package
Suprim.spm, .sup, .fb, s, i, f (RGB)3D, singleStandardOne valueSuprim package
TIFF.tif, .tiffb, s, i, f (RGB)3D, multipleNoTwo valuesOnly the byte data type is common

Sampling information: The sampling or voxel/pixel size information is represented as three values (for x, y and z), or two values (TIFF only provides for sampling information in the x and y directions), or one value (for all three directions). Crystallographic formats (such as CCP4 and MRC) give sampling indirectly, calculated from the ratios of the unit cell dimensions and the voxel size of the unit cell (this leads to inaccuracies due to round-off).

Raw files - custom interpretation of image files

Bsoft offers a "raw" format to be able to load image files where the format is either not supported, or there is a problem with the header information in the file. Any input file name appended with a series of tag-value pairs as described below, invokes an attempt to read the file based on the command-line information given by the user, and to ignore any information in the file header itself. The image file name must be following by a string using the sharp character, "#", as delimiter between tag-value pairs. E.g., to interpret the file "input.file" according to particular data type and size parameters:

bimg -verbose 7 input.file#d=f#x=120,120,55#h=1024

This line will interpret the file as containing a 3D image in floating point format, with the data starting at byte 1024. Typically, the minimum necessary to interpret a file is the data type, the size, and the header bytes to skip

Table 4. Tag-value descriptions for custom interpretation of image files
hbytesHeader size = initial number of bytes to skip
ddatatype_letterData type character (1,b,c,u,s,j,i,k,l,f,d,S,I,F,D)
xsize_x,size_y,size_zImage size in voxels
ppage_x,page_y,page_zPage size in voxels
abytesNumber of bytes to pad between pages
ssampling_x,sampling_y,sampling_zSampling/voxel size in angstrom/voxel
cnumber_channelsNumber of channels (gray scale = 1, RGB = 3)
nnumber_imagesNumber of images in the file
iselected_imageSelect one image to read
ftransform_typen=NoTransform, s=Standard, c=Centered, h=Hermitian, q=CentHerm
b0/1Byte swapping flag
v0/1VAX floating point flag