Coding style
Bsoft is written to facilitate the rapid development of image and
molecular processing applications. The coding style is kept simple and
designed to avoid ambiguities. There is a certain formalization and
discipline required to code this way, as laid out in the following
guidelines:
- Modularity:
- Keep globals to a minimum: The main code in Bsoft has only one
package-wide global meant mostly for reporting purposes,
"verbose".
- Separate program front ends from functionality: In Bsoft, each
program is viewed as an user interface to process options and drive
processing. The Bsoft library can therefore be used in the scope of
any program.
- Encapsulate units of functionality in actual functions:
In Bsoft I attempt to write each function doing only a specific task.
- Every source file in the library has its own header file
(don't merge header files!). E.g.: utilities.c and utilities.h.
- Separate I/O from processing: All the functions reading
and writing files and dealing with specific formats feed into a small
number of interface functions:
- Images: read_img and write_img
- Molecules: read_molecule and write_molecule
- Models: read_model and write_model
- Parameters: read_project and write_project
- Generality:
- Only four forms of information, each with associated
objects (structures or classes) in the code encapsulating all the relevant data:
- Images
- Molecules
- Models
- Parameters
- Every function is written to deal with all incarnations of the
data form it processes. For images, this means that every function
needs to address all data types. For molecules, both atomic coordinate
and sequence data are encoded in the same structural hierarchy, and
each function needs to take this into account.
- A typical function should be written to provide a general
solution to a problem posed, rather than just returning a specific
result.
- Command line option handling:
- Old model: The original Bsoft option handling was managed
in typical Unix fashion as single letter tags followed in some cases
by an option value or argument (using the getopt function).
- New model: The use of single letter tags proved to be too
restrictive as Bsoft grew, and a new mechanism was introduced allowing
the user to use truncated versions of long option tags provided they were
unambiguous. This is largely compatible with the old style options as
long as a space is used between the option tag and value.
- New model and the usage block: This model uses the "usage"
block of strings to determine option mappings, making the design of the
usage strings important, as set out in the following rules:
- Any line starting with '-' is assumed to indicate an option
description.
- The option tag can only be 15 characters long.
- The option tag must be separated from the example value
by a whitespace.
- The presence of an example value indicates that the
option requires a value.
- New model mechanism: The command line argument list is parsed
for options indicated by '-' as the first character. An argument deemed
an option tag is scanned against the usage block to find the full tag
and determine whether it takes a value. The option tag-value pairs are
stored in a linked list and returned. These tag-value pairs are then
evaluated to set command-line parameters.
- Error handling:
- Function return values: Functions in the Bsoft package returns
three types of values, where each can be used as an indication of an error:
- An integer used as error code: Error codes are always less
than zero.
- A calculated value: An error may be indicated as an
implausable value for the return variable.
- A pointer to a structure: A NULL return value indicates an
error.
- Handling: To make an error condition as useful as possible, the
point of failure in each function in the calling hierarchy should be
identified by propagating the error condition back to the top level.
This means that an error should not let the program exit at a low level
function.
- Warnings: A warning is required to indicate an unexpected
condition, or a corrective action that may be counter to what the user
expects, but mostly a non-fatal condition.
- Memory tracking:
- All memory is allocated and deallocated explicitly, usually
within the function of use, unless the function returns an allocated structure.
- In previous versions of Bsoft, memory allocation and deallocation
were tracked and unaccounted instances reported for debugging purposes.
- With expanded conversion to C++ objects, memory tracking is now
done with a tool such as valgrind
- Image processing model:
- An image is read as a whole, processed, and the output written
as a whole. This ensures modularity in the code, avoiding mixing I/O
and processing issues. Due to the possibly prohibitive size of a
multi-image file, a facility has been provided to access individual
images from a multi-image file.
- Functions may process image data in place (i.e., replacing the
old data) to limit memory requirements, or generate new image
structures, depending on the requirements of the algorithm.
- Very large images can be divided into tiles, processed and reassembled.
- Documentation:
- Bsoft used to have its own documentation system, but it is now switched
to a common format to be able to use the
Doxygen documentation generator
- The comment block must precede the function and start with
"/**" and end with "**/" on their own lines.
- All keywords within the block must start with "@" as the
first character on a line.
- Many keywords can be used, with the following typical:
- @brief
- @param
- @return
- @file
- @author
- @date
Image file formats
The varieties of image formats and even greater varieties of programs
producing files of these formats, mean that problems are encountered
because the programmers did not adhere to a complete and up to date
specification of a format, and typically took shortcuts to avoid having
to deal with all the issues included in a file format. This generates
problems such as poor data type support, omission of statistical
information, and even garbage in some fields which make well-behaved
programs crash. Here are some of the policies in Bsoft dealing with
such sloppiness in image format handling:
- The principle in Bsoft is that of access to all images,
regardless of format. The notion of an image format converter as a standalone
functionality is therefore considered outdated.
- The Bsoft policy is to adhere as closely as possible to the file
format specification. The priority is therefore to follow published
specifications, and then to try to deal with the I/O of other
packages. In the case of TIFF files, Bsoft provides for many datatypes
(including short and float) described in the version 6 specification.
- The image input functions in Bsoft attempts to clean up image
header problems as best they can, and
often such problems experienced with other programs can be resolved by
passing it through a Bsoft program such as "bimg".
- Endianness is handled on reading images based on the byte order found
in particular header fields. When writing images, the native byte order
of the processor is imposed.
- The data type is preserved as far as possible, changed only on
user request (option) or when the receiving file format does not
support the data type.
- Due to numerous problems encountered with reading date and time
fields in image files, Bsoft programs now only write the date and time
into these fields.
- Labels and titles in image headers may contain garbage with
control characters detrimental to program execution. Bsoft programs
write their own strings into these fields.