The Image Data Model
Images are ubiquitous objects used on all electronic media, implemented in a large number of ways with different file formats, different access strategies and different notions of information. The most general notion of an image is a 2D raster of values representing gray scale or colour values at particular sampled points within the image. An extension of the basic image model is the time-evolving model (i.e., movie or animation), with time representing a third dimension. In structural biology, we deal with 3D structures, often viewed as static, but in reality subject to time-dependent variation. In electron microscopy, the images taken are approximations of 2D projections of 3D structures. The problem with most available software is that none supports the concept of an image in its most general form.
The image as a five-dimensional data set
The structures studied in structural biology is inherently 3D, thus the basis of the image model is also 3D. In addition, we would like to pack multiple 2D images into a single file. However, packing 2D images as the sections of a 3D map confuses the distinction between 2D and 3D and should be avoided. Furthermore, we also want to be able to store multiple 3D maps, requiring an additional dimension.
Most of the data sets we work with have single values at each pixel or voxel (gray scale values), but situations may arise where each voxel may have multiple values. The most common usage of multiple pixel values is to represent complex numbers and colour, such as RGB (Red-Green-Blue) and CMYK (Cyan-Magenta-Yellow -blacK). More extensive use may be to associate a list of values or spectrum at each pixel. This requires yet another dimension in the image model. The meaning of the channels is captured in the notion of a compound type, including simple (one value), complex(two values), RGB color (three values, etc.
The five dimensions are therefore (in storage order):
Channels - one or more values associated with each voxel
X-dimension
Y-dimension
Z-dimension
Images - a series of images, typically with some relationship (such as 2D projections of the same particle, a tilt series, or a time series)
Data types
Each channel in the image contains a single value, where the data types supported in Bsoft are listed in Table 1. Images with multiple channels can have any data type. This makes the image data model more general and as an example, allows for specification of color images as floating point values.
Enumerated data type | C data type | Size (bytes) | Single letter code |
---|---|---|---|
UChar | unsigned char | 1 | b |
SChar | signed char | 1 | c |
UShort | unsigned short | 2 | u |
Short | short | 2 | s |
Int | int | 4 | i |
Long | long | 8 | l |
Float | float | 4 | f |
Double | double | 8 | d |
Compound types
The definition of the channels is captured in the compound type, where the types supported are listed in Table 2.
Enumerated compound type | Elements | Size (values) | Single letter code |
---|---|---|---|
TSimple | gray value | 1 | S |
TComplex | real, imaginary | 2 | C |
TVector2 | x, y | 3 | V |
TVector3 | x, y, z | 3 | V |
TView | x, y, z, a | 4 | O |
TRGB | r, g, b | 3 | R |
TRGBA | r, g, b, a | 4 | A |
TCMYK | c, m, y, k | 4 | K |
TMulti | array of values | n | M |
Image file formats
It seems that every image processing software package has one or more of its own image file formats. Even in packages where external formats have been adopted, changes in those formats literally made them different formats. There are many conversion programs dealing with specific pairwise conversions - not a particularly efficient solution to the user. Bsoft attempts to deal with images as generalized constructs, encapsulating most of the information embedded in the image files in an internal structure. The notion of conversion is now trivial, as reading and writing of multiple file formats are supported. The limiting factor in this is still the limitations within each file format. E.g., you cannot expect file formats designed for single images (such as MRC and EM) to store multiple images (whether 2D or 3D).
Image format | Extensions | Data types | Dimensions | Fourier/Complex | Sampling Info | Remarks |
---|---|---|---|---|---|---|
ASCII | .asc, .txt | (text) | 3D, single | List | No | |
BioRad | .pic | b, u | 3D, single | No | No | Confocal microscopy |
Brix | .brx | b | 3D, single | No | Indirect | O package, Xtal |
Brookhaven STEM | .dat | b | 2D, double interleaved | No | One value | STEM corrections applied on reading |
CCP4 | .map, .ccp, .ccp4 | c, s, f, S, F | 3D, single | Centered hermitian | Indirect | Xtal |
Digital Instruments | .di | s | 2D, double | No | No | No write support |
Digital Micrograph | .dm, .dm3, .dm4 | b, s, i, f, F | 2D, single | No | No | Proprietary format |
Ditabis image plate reader | .IPL, .IPH, .IPR, .IPC | s, i | 2D, single | No | Two values | Micron package |
DSN6 | .dsn6, .dn6, .omap | b | 3D, single | No | Indirect | O package, Xtal |
DX | .dx | f | 3D, single | No | Three values | OpenDX, visualization |
EM | .em | b, s, i, f | 3D, single | Hermitian | No | EM package |
Goodford | .pot | f | 3D, single | No | One value | Electrostatic potential |
GRD | .grd | (all) | 3D, multiple | No | Three values | Complete Bsoft image data model |
HKL | .hkl | (text) | 3D, single | List | No | Structure factor format |
Imagic | .img (.hed) | b, s, f, F | 2D, multiple | Centered | No | Header in a separate file |
Image Magick | .miff | b (RGB) | 2D, multiple | No | No | X-window display program |
JPEG | .jpg, .jpeg | b (RGB) | 2D, single | No | No | Web image format |
MFF | .mff | b, f | 3D, single | No | Three values | Whatif package |
MRC | .mrc | b, s, f, S, F | 3D, multiple | Centered hermitian | Indirect | MRC package |
PIC BP | .bp | b | 2D, single | No | No | PIC package |
PIF | .pif | b, s, i, f, S, F | 3D, multiple | Binary list | Three values | PFT/EM3DR package |
PNG | .png | b, s (RGB) | 2D, single | No | Two values | Network image format |
PNM | .pbm, .pgm, .ppm | b (RGB) | 2D, single | No | None | Simple image format |
Ser | .ser | f | 2D, multiple | No | None | FEI series format |
Situs | .situs | f | 3D, single | No | One value | Situs package |
SPE | .spe | f | 2D, single | No | None | SPE CCD format |
Spider | .spi | f | 3D, multiple | Hermitian | One value | Spider package |
Suprim | .spm, .sup, .f | b, s, i, f (RGB) | 3D, single | Standard | One value | Suprim package |
TIFF | .tif, .tiff | b, s, i, f (RGB) | 3D, multiple | No | Two values | Only the byte data type is common |
Sampling information: The sampling or voxel/pixel size information is represented as three values (for x, y and z), or two values (TIFF only provides for sampling information in the x and y directions), or one value (for all three directions). Crystallographic formats (such as CCP4 and MRC) give sampling indirectly, calculated from the ratios of the unit cell dimensions and the voxel size of the unit cell (this leads to inaccuracies due to round-off).
Raw files - custom interpretation of image files
Bsoft offers a "raw" format to be able to load image files where the format is either not supported, or there is a problem with the header information in the file. Any input file name appended with a series of tag-value pairs as described below, invokes an attempt to read the file based on the command-line information given by the user, and to ignore any information in the file header itself. The image file name must be following by a string using the sharp character, "#", as delimiter between tag-value pairs. E.g., to interpret the file "input.file" according to particular data type and size parameters:
bimg -verbose 7 input.file#d=f#x=120,120,55#h=1024 output.map
This line will interpret the file as containing a 3D image in floating point format, with the data starting at byte 1024. Typically, the minimum necessary to interpret a file is the data type, the size, and the header bytes to skip
Tag | Value | Description |
---|---|---|
h | bytes | Header size = initial number of bytes to skip |
d | datatype_letter | Data type character (1,b,c,u,s,j,i,k,l,f,d,S,I,F,D) |
x | size_x,size_y,size_z | Image size in voxels |
p | page_x,page_y,page_z | Page size in voxels |
a | bytes | Number of bytes to pad between pages |
s | sampling_x,sampling_y,sampling_z | Sampling/voxel size in angstrom/voxel |
c | number_channels | Number of channels (gray scale = 1, RGB = 3) |
n | number_images | Number of images in the file |
i | selected_image | Select one image to read |
f | transform_type | n=NoTransform, s=Standard, c=Centered, h=Hermitian, q=CentHerm |
b | 0/1 | Byte swapping flag |
v | 0/1 | VAX floating point flag |