Revision as of 12:20, 13 May 2024 by Admin

Variables

Like most programming languages, C uses and processes variables. In C, variables are human-readable names for the computer's memory addresses used by a running program. Variables make it easier to store, read and change the data within the computer's memory by allowing you to associate easy-to-remember labels for the memory addresses that store your program's data. The memory addresses associated with variables aren't determined until after the program is compiled and running on the computer.

At first, it's easiest to imagine variables as placeholders for values, much like in mathematics. You can think of a variable as being equivalent to its assigned value. So, if you have a variable i that is initialized (set equal) to 4, then it follows that i + 1 will equal 5. However, a skilled C programmer is more mindful of the invisible layer of abstraction going on just under the hood: that a variable is a stand-in for the memory address where the data can be found, not the data itself. You will gain more clarity on this point when you learn about pointers.

Since C is a relatively low-level programming language, before a C program can utilize memory to store a variable it must claim the memory needed to store the values for a variable. This is done by declaring variables. Declaring variables is the way in which a C program shows the number of variables it needs, what they are going to be named, and how much memory they will need.

Within the C programming language, when managing and working with variables, it is important to know the type of variables and the size of these types. A type’s size is the amount of computer memory required to store one value of this type. Since C is a fairly low-level programming language, the size of types can be specific to the hardware and compiler used – that is, how the language is made to work on one type of machine can be different from how it is made to work on another.

All variables in C are typed. That is, every variable declared must be assigned as a certain type of variable.

Declaring, Initializing, and Assigning Variables

Here is an example of declaring an integer, which we've called some_number. (Note the semicolon at the end of the line; that is how your compiler separates one program statement from another.)

int some_number;

This statement tells the compiler to create a variable called some_number and associate it with a memory location on the computer. We also tell the compiler the type of data that will be stored at that address, in this case an integer. Note that in C we must specify the type of data that a variable will store. This lets the compiler know how much total memory to set aside for the data (on most modern machines an int is 4 bytes in length). We'll look at other data types in the next section.

Multiple variables can be declared with one statement, like this:

int anumber, anothernumber, yetanothernumber;

In early versions of C, variables had to be declared at the beginning of a block. In C99 it is allowed to mix declarations and statements arbitrarily – but doing so is not usual, because it is rarely necessary, some compilers still don’t support C99 (portability), and it may, because it is uncommon yet, irritate fellow programmers (maintainability).

After declaring variables, you can assign a value to a variable later on using a statement like this:

some_number = 3;

The assignment of a value to a variable is called initialization. The above statement directs the compiler to insert an integer representation of the number "3" into the memory address associated with some_number. We can save a bit of typing by declaring and assigning data to a memory address at the same time:

int some_new_number = 4;

You can also assign variables to the value of other variable, like so:

some_number = some_new_number;

Or assign multiple variables the same value with one statement:

anumber = anothernumber = yetanothernumber = 8;

This is because the assignment x = y returns the value of the assignment, y. For example, some_number = 4 returns 4. That said, x = y = z is really a shorthand for x = (y = z).

Naming Variables

Variable names in C are made up of letters (upper and lower case) and digits. The underscore character ("_") is also permitted. Names must not begin with a digit. Unlike some languages (such as Perl and some BASIC dialects), C does not use any special prefix characters on variable names.

Some examples of valid (but not very descriptive) C variable names:

foo
Bar
BAZ
foo_bar
_foo42
_
QuUx

Some examples of invalid C variable names:

2foo    (must not begin with a digit)
my foo  (spaces not allowed in names)
$foo    ($ not allowed -- only letters, and _)
while   (language keywords cannot be used as names)

As the last example suggests, certain words are reserved as keywords in the language, and these cannot be used as variable names.

It is not allowed to use the same name for multiple variables in the same scope. When working with other developers, you should therefore take steps to avoid using the same name for global variables or function names. Some large projects adhere to naming guidelines[1] to avoid duplicate names and for consistency.

In addition there are certain sets of names that, while not language keywords, are reserved for one reason or another. For example, a C compiler might use certain names "behind the scenes", and this might cause problems for a program that attempts to use them. Also, some names are reserved for possible future use in the C standard library. The rules for determining exactly what names are reserved (and in what contexts they are reserved) are too complicated to describe here, and as a beginner you don't need to worry about them much anyway. For now, just avoid using names that begin with an underscore character.

The naming rules for C variables also apply to naming other language constructs such as function names, struct tags, and macros, all of which will be covered later.

Literals

Anytime within a program in which you specify a value explicitly instead of referring to a variable or some other form of data, that value is referred to as a literal. In the initialization example above, 3 is a literal. Literals can either take a form defined by their type (more on that soon), or one can use hexadecimal (hex) notation to directly insert data into a variable regardless of its type. Hex numbers are always preceded with 0x. For now, though, you probably shouldn't be too concerned with hex.

The Four Basic Data Types

In Standard C there are four basic data types. They are int, char, float, and double.

The int type

The int type stores integers in the form of "whole numbers". An integer is typically the size of one machine word, which on most modern home PCs is 32 bits (4 octets). Examples of literals are whole numbers (integers) such as 1, 2, 3, 10, 100... When int is 32 bits (4 octets), it can store any whole number (integer) between -2147483648 and 2147483647. A 32 bit word (number) has the possibility of representing any one number out of 4294967296 possibilities (2 to the power of 32).


If you want to declare a new int variable, use the int keyword. For example:

int numberOfStudents, i, j = 5;

In this declaration we declare 3 variables, numberOfStudents, i and j, j here is assigned the literal 5.

The char type

The char type is capable of holding any member of the execution character set. It stores the same kind of data as an int (i.e. integers), but typically has a size of one byte. The size of a byte is specified by the macro CHAR_BIT which specifies the number of bits in a char (byte). In standard C it never can be less than 8 bits. A variable of type char is most often used to store character data, hence its name. Most implementations use the ASCII character set as the execution character set, but it's best not to know or care about that unless the actual values are important.

Examples of character literals are 'a', 'b', '1', etc., as well as some special characters such as '\0' (the null character) and '\n' (newline, recall "Hello, World"). Note that the char value must be enclosed within single quotations.

When we initialize a character variable, we can do it two ways. One is preferred, the other way is bad programming practice.

The first way is to write:

char letter1 = 'a';

This is good programming practice in that it allows a person reading your code to understand that letter1 is being initialized with the letter 'a' to start off with.

The second way, which should not be used when you are coding letter characters, is to write:

char letter2 = 97; /* in ASCII, 97 = 'a' */

This is considered by some to be extremely bad practice, if we are using it to store a character, not a small number, in that if someone reads your code, most readers are forced to look up what character corresponds with the number 97 in the encoding scheme. In the end, letter1 and letter2 store both the same thing – the letter 'a', but the first method is clearer, easier to debug, and much more straightforward.

One important thing to mention is that characters for numerals are represented differently from their corresponding number, i.e. '1' is not equal to 1. In short, any single entry that is enclosed within 'single quotes'.

There is one more kind of literal that needs to be explained in connection with chars: the string literal. A string is a series of characters, usually intended to be displayed. They are surrounded by double quotations (" ", not ' '). An example of a string literal is the "Hello, World!\n" in the "Hello, World" example.

The string literal is assigned to a character array, arrays are described later. Example:

const char MY_CONSTANT_PEDANTIC_ITCH[] = "learn the usage context.\n";
printf("Square brackets after a variable name means it is a pointer to a string of memory blocks the size of the type of the array element.\n");

The float type

float is short for floating point. It stores inexact representations of real numbers, both integer and non-integer values. It can be used with numbers that are much greater than the greatest possible int. float literals must be suffixed with F or f. Examples are: 3.1415926f, 4.0f, 6.022e+23f.

It is important to note that floating-point numbers are inexact. Some numbers like 0.1f cannot be represented exactly as floats but will have a small error. Very large and very small numbers will have less precision and arithmetic operations are sometimes not associative or distributive because of a lack of precision. Nonetheless, floating-point numbers are most commonly used for approximating real numbers and operations on them are efficient on modern microprocessors.[2] Floating-point arithmetic is explained in more detail on Wikipedia.

float variables can be declared using the float keyword. A float is only one machine word in size. Therefore, it is used when less precision than a double provides is required.

The double type

The double and float types are very similar. The float type allows you to store single-precision floating point numbers, while the double keyword allows you to store double-precision floating point numbers – real numbers, in other words. Its size is typically two machine words, or 8 bytes on most machines. Examples of double literals are 3.1415926535897932, 4.0, 6.022e+23 (scientific notation). If you use 4 instead of 4.0, the 4 will be interpreted as an int.

The distinction between floats and doubles was made because of the differing sizes of the two types. When C was first used, space was at a minimum and so the judicious use of a float instead of a double saved some memory. Nowadays, with memory more freely available, you rarely need to conserve memory like this – it may be better to use doubles consistently. Indeed, some C implementations use doubles instead of floats when you declare a float variable.

If you want to use a double variable, use the double keyword.

sizeof

If you have any doubts as to the amount of memory actually used by any variable (and this goes for types we'll discuss later, also), you can use the sizeof operator to find out for sure. (For completeness, it is important to mention that sizeof is a unary operator, not a function.) Its syntax is:

sizeof object
sizeof(type)

The two expressions above return the size of the object and type specified, in bytes. The return type is size_t (defined in the header <stddef.h>) which is an unsigned value. Here's an example usage:

size_t size;
int i;
size = sizeof(i);

size will be set to 4, assuming CHAR_BIT is defined as 8, and an integer is 32 bits wide. The value of sizeof's result is the number of bytes.

Note that when sizeof is applied to a char, the result is 1; that is:

sizeof(char)

always returns 1.

Data type modifiers

One can alter the data storage of any data type by preceding it with certain modifiers.

long and short are modifiers that make it possible for a data type to use either more or less memory. The int keyword need not follow the short and long keywords. This is most commonly the case. A short can be used where the values fall within a lesser range than that of an int, typically -32768 to 32767. A long can be used to contain an extended range of values. It is not guaranteed that a short uses less memory than an int, nor is it guaranteed that a long takes up more memory than an int. It is only guaranteed that sizeof(short) <= sizeof(int) <= sizeof(long). Typically a short is 2 bytes, an int is 4 bytes, and a long either 4 or 8 bytes. Modern C compilers also provide long long which is typically an 8 byte integer.

In all of the types described above, one bit is used to indicate the sign (positive or negative) of a value. If you decide that a variable will never hold a negative value, you may use the unsigned modifier to use that one bit for storing other data, effectively doubling the range of values while mandating that those values be positive. The unsigned specifier also may be used without a trailing int, in which case the size defaults to that of an int. There is also a signed modifier which is the opposite, but it is not necessary, except for certain uses of char, and seldom used since all types (except char) are signed by default.

The long modifier can also be used with double to create a long double type. This floating-point type may (but is not required to) have greater precision than the double type.

To use a modifier, just declare a variable with the data type and relevant modifiers:

unsigned short int usi;  /* fully qualified -- unsigned short int */
short si;                /* short int */
unsigned long uli;       /* unsigned long int */

const qualifier

When the const qualifier is used, the declared variable must be initialized at declaration. It is then not allowed to be changed.

While the idea of a variable that never changes may not seem useful, there are good reasons to use const. For one thing, many compilers can perform some small optimizations on data when it knows that data will never change. For example, if you need the value of π in your calculations, you can declare a const variable of pi, so a program or another function written by someone else cannot change the value of pi.

Note that a Standard conforming compiler must issue a warning if an attempt is made to change a const variable - but after doing so the compiler is free to ignore the const qualifier.

Magic numbers

When you write C programs, you may be tempted to write code that will depend on certain numbers. For example, you may be writing a program for a grocery store. This complex program has thousands upon thousands of lines of code. The programmer decides to represent the cost of a can of corn, currently 99 cents, as a literal throughout the code. Now, assume the cost of a can of corn changes to 89 cents. The programmer must now go in and manually change each entry of 99 cents to 89. While this is not that big a problem, considering the "global find-replace" function of many text editors, consider another problem: the cost of a can of green beans is also initially 99 cents. To reliably change the price, you have to look at every occurrence of the number 99.

C possesses certain functionality to avoid this. This functionality is approximately equivalent, though one method can be useful in one circumstance, over another.

Using the const keyword

The const keyword helps eradicate magic numbers. By declaring a variable const corn at the beginning of a block, a programmer can simply change that const and not have to worry about setting the value elsewhere.

There is also another method for avoiding magic numbers. It is much more flexible than const, and also much more problematic in many ways. It also involves the preprocessor, as opposed to the compiler. Behold...

#define

When you write programs, you can create what is known as a macro, so when the computer is reading your code, it will replace all instances of a word with the specified expression.

Here's an example. If you write

#define PRICE_OF_CORN 0.99

when you want to, for example, print the price of corn, you use the word PRICE_OF_CORN instead of the number 0.99 – the preprocessor will replace all instances of PRICE_OF_CORN with 0.99, which the compiler will interpret as the literal double 0.99. The preprocessor performs substitution, that is, PRICE_OF_CORN is replaced by 0.99 so this means there is no need for a semicolon.

It is important to note that #define has basically the same functionality as the "find-and-replace" function in a lot of text editors/word processors.

For some purposes, #define can be harmfully used, and it is usually preferable to use const if #define is unnecessary. It is possible, for instance, to #define, say, a macro DOG as the number 3, but if you try to print the macro, thinking that DOG represents a string that you can show on the screen, the program will have an error. #define also has no regard for type. It disregards the structure of your program, replacing the text everywhere (in effect, disregarding scope), which could be advantageous in some circumstances, but can be the source of problematic bugs.

You will see further instances of the #define directive later in the text. It is good convention to write #defined words in all capitals, so a programmer will know that this is not a variable that you have declared but a #defined macro. It is not necessary to end a preprocessor directive such as #define with a semicolon; in fact, some compilers may warn you about unnecessary tokens in your code if you do.

Scope

In the Basic Concepts section, the concept of scope was introduced. It is important to revisit the distinction between local types and global types, and how to declare variables of each. To declare a local variable, you place the declaration at the beginning (i.e. before any non-declarative statements) of the block to which the variable is deemed to be local. To declare a global variable, declare the variable outside of any block. If a variable is global, it can be read, and written, from anywhere in your program.

Global variables are not considered good programming practice, and should be avoided whenever possible. They inhibit code readability, create naming conflicts, waste memory, and can create difficult-to-trace bugs. Excessive usage of globals is usually a sign of laziness or poor design. However, if there is a situation where local variables may create more obtuse and unreadable code, there's no shame in using globals.

Other Modifiers

Included here, for completeness, are more of the modifiers that standard C provides. For the beginning programmer, static and extern may be useful. volatile is more of interest to advanced programmers. register and auto are largely deprecated and are generally not of interest to either beginning or advanced programmers.

static

static is sometimes a useful keyword. It is a common misbelief that the only purpose is to make a variable stay in memory.

When you declare a function or global variable as static, you cannot access the function or variable through the extern (see below) keyword from other files in your project. This is called static linkage.

When you declare a local variable as static, it is created just like any other variable. However, when the variable goes out of scope (i.e. the block it was local to is finished) the variable stays in memory, retaining its value. The variable stays in memory until the program ends. While this behaviour resembles that of global variables, static variables still obey scope rules and therefore cannot be accessed outside of their scope. This is called static storage duration.

Variables declared static are initialized to zero (or for pointers, NULL[3][4]) by default. They can be initialized explicitly on declaration to any constant value. The initialization is made just once, at compile time.

You can use static in (at least) two different ways. Consider this code, and imagine it is in a file called jfile.c:

#include <stdio.h>
 
static int j = 0;
 
void up(void)
{
   /* k is set to 0 when the program starts. The line is then "ignored"
    * for the rest of the program (i.e. k is not set to 0 every time up()
    * is called)
    */
   static int k = 0;
   j++;
   k++;
   printf("up() called.   k= %2d, j= %2d\n", k , j);
}
 
void down(void)
{
   static int k = 0;
   j--;
   k--;
   printf("down() called. k= %2d, j= %2d\n", k , j);
}
 
int main(void)
{
   int i;
     
   /* call the up function 3 times, then the down function 2 times */
   for (i = 0; i < 3; i++)
      up();
   for (i = 0; i < 2; i++)
      down();
    
   return 0;
}

The j variable is accessible by both up and down and retains its value. The k variables also retain their value, but they are two different variables, one in each of their scopes. Static variables are a good way to implement encapsulation, a term from the object-oriented way of thinking that effectively means not allowing changes to be made to a variable except through function calls.

Running the program above will produce the following output:

up() called.   k=  1, j=  1
up() called.   k=  2, j=  2
up() called.   k=  3, j=  3
down() called. k= -1, j=  2
down() called. k= -2, j=  1

Features of static variables :

    1. Keyword used        - static
    2. Storage             - Memory
    3. Default value       - Zero
    4. Scope               - Local to the block in which it is declared
    5. Lifetime            - Value persists between different function calls
    6. Keyword optionality - Mandatory to use the keyword

extern

extern is used when a file needs to access a variable in another file that it may not have #included directly. Therefore, extern does not allocate memory for the new variable, it just provides the compiler with sufficient information to access a variable declared in another file.

Features of extern variable :

    1. Keyword used        - extern
    2. Storage             - Memory
    3. Default value       - Zero
    4. Scope               - Global (all over the program)
    5. Lifetime            - Value persists till the program's execution comes to an end
    6. Keyword optionality - Optional if declared outside all the functions

volatile

volatile is a special type of modifier which informs the compiler that the value of the variable may be changed by external entities other than the program itself. This is necessary for certain programs compiled with optimizations – if a variable were not defined volatile then the compiler may assume that certain operations involving the variable are safe to optimize away when in fact they aren't. volatile is particularly relevant when working with embedded systems (where a program may not have complete control of a variable) and multi-threaded applications.

auto

auto is a modifier which specifies an "automatic" variable that is automatically created when in scope and destroyed when out of scope. If you think this sounds like pretty much what you've been doing all along when you declare a variable, you're right: all declared items within a block are implicitly "automatic". For this reason, the auto keyword is more like the answer to a trivia question than a useful modifier, and there are lots of very competent programmers that are unaware of its existence.

Features of automatic variables :

    1. Keyword used        - auto
    2. Storage             - Memory
    3. Default value       - Garbage value (random value)
    4. Scope               - Local to the block in which it is defined
    5. Lifetime            - Value persists while the control remains within the block
    6. Keyword optionality - Optional

register

register is a hint to the compiler to attempt to optimize the storage of the given variable by storing it in a register of the computer's CPU when the program is run. Most optimizing compilers do this anyway, so use of this keyword is often unnecessary. In fact, ANSI C states that a compiler can ignore this keyword if it so desires – and many do. Microsoft Visual C++ is an example of an implementation that completely ignores the register keyword.

Features of register variables :

    1. Keyword used        - register
    2. Storage             - CPU registers (values can be retrieved faster than from memory)
    3. Default value       - Garbage value
    4. Scope               - Local to the block in which it is defined
    5. Lifetime            - Value persists while the control remains within the block
    6. Keyword optionality - Mandatory to use the keyword

References

  1. Examples of naming guidelines are those of the GNOME Project or the parts of the Python interpreter that are written in C.
  2. Representations of real numbers other than floating-point numbers exist but are not fundamental data types in C. Some C compilers support fixed-point arithmetic data types, but these are not part of standard C. Libraries such as the GNU Multiple Precision Arithmetic Library offer more data types for real numbers and very large numbers.
  3. [1] - What is NULL and how is it defined?
  4. [2] - NULL or 0, which should you use?