Common Practices
With its extensive use, a number of common practices and conventions have evolved to help avoid errors in C programs. These are simultaneously a demonstration of the application of good software engineering principles to a language and an indication of the limitations of C. Although few are used universally, and some are controversial, each of these enjoys wide use.
Dynamic multidimensional arrays
Although one-dimensional arrays are easy to create dynamically using malloc, and fixed-size multidimensional arrays are easy to create using the built-in language feature, dynamic multidimensional arrays are trickier. There are a number of different ways to create them, each with different tradeoffs. The two most popular ways to create them are:
- They can be allocated as a single block of memory, just like static multidimensional arrays. This requires that the array be rectangular (i.e. subarrays of lower dimensions are static and have the same size). The disadvantage is that the syntax of declaration the pointer is a little tricky for programmers at first. For example, if one wanted to create an array of ints of 3 columns and rows rows, one would do
int (*multi_array)[3] = malloc(rows * sizeof(int[3]));
- (Note that here multi_array is a pointer to an array of 3 ints.)
- Because of array-pointer interchangeability, you can index this just like static multidimensional arrays, i.e. multi_array[5][2] is the element at the 6th row and 3rd column.
- Dynamic multidimensional arrays can be allocated by first allocating an array of pointers, and then allocating subarrays and storing their addresses in the array of pointers.[1] (This approach is also known as an Iliffe vector). The syntax for accessing elements is the same as for multidimensional arrays described above (even though they are stored very differently). This approach has the advantage of the ability to make ragged arrays (i.e. with subarrays of different sizes). However, it also uses more space and requires more levels of indirection to index into, and can have worse cache performance. It also requires many dynamic allocations, each of which can be expensive.
For more information, see the comp.lang.c FAQ, question 6.16.
In some cases, the use of multi-dimensional arrays can best be addressed as an array of structures. Before user-defined data structures were available, a common technique was to define a multi-dimensional array, where each column contained different information about the row. This approach is also frequently used by beginner programmers. For example, columns of a two-dimensional character array might contain last name, first name, address, etc.
In cases like this, it is better to define a structure that contains the information that was stored in the columns, and then create an array of pointers to that structure. This is especially true when the number of data points for a given record might vary, such as the tracks on an album. In these cases, it is better to create a structure for the album that contains information about the album, along with a dynamic array for the list of songs on the album. Then an array of pointers to the album structure can be used to store the collection.
- Another useful way to create a dynamic multi-dimensional array is to flatten the array and index manually. For example, a 2-dimensional array with sizes x and y has x*y elements, therefore can be created by
int dynamic_multi_array[x*y];
The index is slightly trickier than before, but can still be obtained by y*i+j. You then access the array with
static_multi_array[i][j];
dynamic_multi_array[y*i+j];
Some more examples with higher dimensions:
int dim1[w];
int dim2[w*x];
int dim3[w*x*y];
int dim4[w*x*y*z];
dim1[i]
dim2[w*j+i];
dim3[w*(x*i+j)+k] // index is k + w*j + w*x*i
dim4[w*(x*(y*i+j)+k)+l] // index is w*x*y*i + w*x*j + w*k + l
Note that w*(x*(y*i+j)+k)+l is equal to w*x*y*i + w*x*j + w*k + l, but uses fewer operations (see Horner's Method). It uses the same number of operations as accessing a static array by dim4[i][j][k][l], so should not be any slower to use.
The advantage to using this method is that the array can be passed freely between functions without knowing the size of the array at compile time (since C sees it as a 1-dimensional array, although some way of passing the dimensions is still necessary), and the entire array is contiguous in memory, so accessing consecutive elements should be fast. The disadvantage is that it can be difficult at first to get used to how to index the elements.
Constructors and destructors
In most object-oriented languages, objects cannot be created directly by a client that wishes to use them. Instead, the client must ask the class to build an instance of the object using a special routine called a constructor. Constructors are important because they allow an object to enforce invariants about its internal state throughout its lifetime. Destructors, called at the end of an object's lifetime, are important in systems where an object holds exclusive access to some resource, and it is desirable to ensure that it releases these resources for use by other objects.
Since C is not an object-oriented language, it has no built-in support for constructors or destructors. It is not uncommon for clients to explicitly allocate and initialize records and other objects. However, this leads to a potential for errors, since operations on the object may fail or behave unpredictably if the object is not properly initialized. A better approach is to have a function that creates an instance of the object, possibly taking initialization parameters, as in this example:
struct string {
size_t size;
char *data;
};
struct string *create_string(const char *initial) {
assert (initial != NULL);
struct string *new_string = malloc(sizeof(*new_string));
if (new_string != NULL) {
new_string->size = strlen(initial);
new_string->data = strdup(initial);
}
return new_string;
}
Similarly, if it is left to the client to destroy objects correctly, they may fail to do so, causing resource leaks. It is better to have an explicit destructor which is always used, such as this one:
void free_string(struct string *s) {
assert (s != NULL);
free(s->data); ''/* free memory held by the structure */''
free(s); ''/* free the structure itself */''
}
It is often useful to combine destructors with #Nulling freed pointers.
Sometimes it is useful to hide the definition of the object to ensure that the client does not allocate it manually. To do this, the structure is defined in the source file (or a private header file not available to users) instead of the header file, and a forward declaration is put in the header file:
struct string;
struct string *create_string(const char *initial);
void free_string(struct string *s);
Nulling freed pointers
As discussed earlier, after free()
has been called on a pointer, it becomes a dangling pointer. Worse still, most modern platforms cannot detect when such a pointer is used before being reassigned.
One simple solution to this is to ensure that any pointer is set to a null pointer immediately after being freed: [2]
free(p);
p = NULL;
Unlike dangling pointers, a hardware exception will arise on many modern architectures when a null pointer is dereferenced. Also, programs can include error checks for the null value, but not for a dangling pointer value. To ensure it is done at all locations, a macro can be used:
#define FREE(p) do { free(p); (p) = NULL; } while(0)
(To see why the macro is written this way, see #Macro conventions.) Also, when this technique is used, destructors should zero out the pointer that they are passed, and their argument must be passed by reference to allow this. For example, here's the destructor from #Constructors and destructors updated:
void free_string(struct string **s) {
assert(s != NULL && *s != NULL);
FREE((*s)->data); ''/* free memory held by the structure */''
FREE(*s); ''/* free the structure itself */''
*s=NULL; ''/* zero the argument */''
}
Unfortunately, this idiom will not do anything to any other pointers that may be pointing to the freed memory. For this reason, some C experts regard this idiom as dangerous due to creating a false sense of security.
Macro conventions
Because preprocessor macros in C work using simple token replacement, they are prone to a number of confusing errors, some of which can be avoided by following a simple set of conventions:
- Placing parentheses around macro arguments wherever possible. This ensures that, if they are expressions, the order of operations does not affect the behavior of the expression. For example:
- Wrong:
#define square(x) x*x
- Better:
#define square(x) (x)*(x)
- Wrong:
- Placing parentheses around the entire expression if it is a single expression. Again, this avoids changes in meaning due to the order of operations.
- Wrong:
#define square(x) (x)*(x)
- Better:
#define square(x) ((x)*(x))
- Dangerous, remember it replaces the text in verbatim. Suppose your code is
square (x++)
, after the macro invocation will x be incremented by 2
- Wrong:
- If a macro produces multiple statements, or declares variables, it can be wrapped in a do { ... } while(0) loop, with no terminating semicolon. This allows the macro to be used like a single statement in any location, such as the body of an if statement, while still allowing a semicolon to be placed after the macro invocation without creating a null statement.[3][4][5][6][7][8] Care must be taken that any new variables do not potentially mask portions of the macro's arguments.
- Wrong:
#define FREE(p) free(p); p = NULL;
- Better:
#define FREE(p) do { free(p); p = NULL; } while(0)
- Wrong:
- Avoiding using a macro argument twice or more inside a macro, if possible; this causes problems with macro arguments that contain side effects, such as assignments.
- If a macro may be replaced by a function in the future, considering naming it like a function.
- By convention, preprocessor values and macros defined by
#define
are named in all uppercase letters.[9][10][11][12][13]
Further reading
- Adam N. Rosenberg. [http://www.the-adam.com/adam/rantrave/st02.pdf "A Description of One Programmer’s Programming Style Revisited"]. 2001. p. 19-20.
- comp.lang.c FAQ list: "Why isn't a pointer null after calling free?" mentions that "it is often useful to set [pointer variables] to NULL immediately after freeing them".
- "comp.lang.c FAQ: What's the best way to write a multi-statement macro?".
- "The C Preprocessor: Swallowing the Semicolon"
- "Why use apparently meaningless do-while and if-else statements in macros?"
- "do {...} while (0) in macros"
- "KernelNewbies: FAQ / DoWhile0".
- "PRE10-C. Wrap multistatement macros in a do-while loop".
- "What is the history for naming constants in all uppercase?"
- "The Preprocessor".
- "C Language Style Guide".
- "non capitalized macros are always evil".
- "Exploiting the Preprocessor for Fun and Profit".
There are a huge number of C style guidelines.
- "C and C++ Style Guides" by Chris Lott lists many popular C style guides.
- The Motor Industry Software Reliability Association (MISRA) publishes "MISRA-C: Guidelines for the use of the C language in critical systems". (Wikipedia: MISRA C; [1]).