Getting Started with C: Variables
What are variables and how to use them in C.
Use Case
Before I explain what a variable is, it might help to know why we use them.
Fundamentally, a program is a state that gets output in some way. If we modify the state with inputs the state changes and therefore the output changes.
For example, a text editor’s state may contain the text data from the currently open file. If we change the text, the characters displayed on the screen will also change reflecting that. These will also be stored in the open file when we save the document, and reloaded into the application’s state when we open the file in future.
We use variables to store an application’s state in our computer’s memory.
What is a variable?
A variable is value that is defined by the programmer and given a name to refer back to later.
Essentially you’re telling the compiler: “Hey here’s this thing I want to talk about, whenever I use this word, I’m specifically referring to this specific thing”
It’s a pretty broad definition, so might not be clear at first. To be more specific, it’s a way for us to map a point and size in memory to a name, so we can describe what we want to do with it more easily.
This might also be a bit difficult to understand at first if you aren’t already familiar with how a computer works. So to give a brief overview:
- In your computer you have, at the very least, Physical Memory (RAM) and a CPU.
- Physical Memory is used to store the current state of applications, this state is lost when you close the application or the computer is turned off.
- The CPU runs instructions, some of these instructions will tell it to grab some point in memory. Further instructions will tell the CPU to modify that memory in some way and return it to where it found it.
In order to tell our computer which memory to grab, we use variables as a human readable shorthand.
We won’t go into it too much here but there’s also another step in-between your CPU and Physical Memory. Your OS will have what’s called Virtual Memory which is used to manage physical allocations.
There are quite a few terms that are used interchangeably when talking about variables, sometimes people will use different terminology based on the context. For example, “argument” usually refers to a variable that is being passed into another process, method or function. However, sometimes people will also call these “parameters”. These are all the same concept.
Variables in C
C and C++ are “Strongly Typed” languages, this means we have to be explicit about the size and layout of the variables we use.
Other languages, will let you describe variables in a looser way, often just using the word var or let to say you’re defining a variable. In C we need to tell the compiler what kind or type of variable we’re describing.
A type will tell the compiler what, and how big, a variable is.
To declare a variable in C and C++ use a type and a variable name, formatted like this: type VariableName = value;
This will create a variable of type type with the name VariableName and then assign it the initial value value.
For a more real example, if we wanted to create an unsigned 8-bit integer value called Byte, with the initial value 0 we would write the following:
unsigned char Byte = 0;
We can then later change the value this variable stores like this: Byte = 2;
After which, the variable Byte will equal 2
We’ll get more into what types of variables you can use in the following sections.
The capitalisation of variable and type names is a stylistic choice, as long as there are no spaces in the name it will compile. The type must match the case of it’s definition.
Bits and Bytes
To properly describe types in C, it will be useful to have a basic understanding of how memory is used.
In a conventional computer, data is stored as either 0 or 1, this is called a bit.
Since we want to represent numbers that are bigger than 1, computers have a way of expressing larger numbers by using multiple bits. A block of 8 bits is called a byte.
All memory in C is accessed in bytes, meaning the minimum number of bits a variable can have is 8.
The computer can then assign a whole number value to each of the bits in a byte. A combination of these bits can represent values in-between these whole number values.
Bits are read right to left, the value each bit represents is double the one that came before it. For example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
In a single byte variable each bit maps to
the following values
0 0 0 0 0 0 0 0
^ ^ ^ ^ ^ ^ ^ ^
128 64 32 16 8 4 2 1
so:
0 0 0 0 0 0 0 1 = 1,
0 0 0 0 0 0 1 0 = 2,
0 0 0 0 0 0 1 1 = 3,
...
1 1 1 1 1 1 1 1 = 256
The computer can then string multiple bytes together to express larger numbers, continuing to double the value from the last bit of the previous byte. Making it possible to store even bigger values.
1
2
3
4
5
Byte Boundary
00000000 | 0 0 0 0 0 0 0 0
^ ^ ^ ^ ^ ^ ^ ^ ^
256 128 64 32 16 8 4 2 1
Values that require more than 1 byte can be read in different ways. This is referred to as Endianess. Big-Endian means the bytes that represent larger values are stored in the first, left-most, byte. Little-Endian means the bytes that represent smaller values are stored in the first, left-most, byte. E.g. a 2 byte (16 bit) value with the bytes 00000001 00000000 would be 256 in Big-Endian, and 1 in Little-Endian. Commonly (on desktops at least) Big-Endian is used, but some use cases may mean Little-Endian is optimal.
The bit containing the largest value in a series of one or more bytes is often referred to as the most significant bit.
Variable Types
Base types in C are:
| Type | Size | Description |
|---|---|---|
| bool | 1 Byte | 1-bit integer |
| char | 1 Byte | 8-bit integer |
| short | 2 Bytes | 16-bit integer |
| long | 4 Bytes | 32-bit integer |
| long long | 8 Bytes | 64-bit integer |
| float | 4 Bytes | 32-bit floating point |
| double | 8 Bytes | 64-bit floating point |
An Integer is a whole number, or a number without a decimal point.
A Floating point is a number that contains a decimal point.
There is also the type
int, however this is loosely defined and is sometimes a 16-bit integer and other times a 32-bit integer. It’s usually best to be explicit and useshortandlonginstead so there’s no mistaking what thetypeactually is.
For integers, there are also signed and unsigned keywords that can go before the type. These determine if the value can be negative (signed) or if it can only be positive (unsigned).
For example an unsigned short is a 16-bit integer that ranges from 0 to 65,535.
Where as a signed short is a 16-bit integer that ranges from -32,768 to 32,768.
The signed and unsigned keywords do not change the size of the type, which is why the maximum positive value of an unsigned integer, is double the maximum positive value of a signed integer. The signed type needs to use it’s most significant bit to store the sign (if the number is negative or positive). As a result if you assign an unsigned short with the value -1, when you read the value of the unsigned short, it will return 65,535.
This may seem strange, since 1 in binary is 00000001. So you’d assume -1 is represented as 10000001, this is not the case. For reasons outside the scope of this article, something called Two’s Compliment is used to store negative numbers in C. Where the signed bit mentioned earlier inverts the rest of the bits inside the number. So -1 is actually represented as 1111111, and if you interpret that as an unsigned integer, you’ll get the maximum possible value for the unsigned varient.
But getting back on topic…
There are other keywords that tell the compiler what to expect when using this variable. These are:
const: Tells the compiler that this value will never change after being set, this can be helpful for optimisation and let’s other programmers better understand the intent of the variables use.static: Tells the compiler that this variable is the same memory in every instance it’s used. This means it’s value is persistent even if it would ordinarily be removed from memory. Often this is used for global variables or functions (variables or functions that can be accessed from anywhere).volatile: Tells the compiler that this variable is likely to change, this might not be obviously useful at first, but when it comes to multi-threading it’s necessary to tell the CPU to re-fetch the variable’s value, even if it wouldn’t need to do it under regular circumstances.
In addition to these types there is a special non-type called void void stores no information and is a way to tell the compiler that nothing is expected. This will become more relevant later on.
When assigning values to variables, there are suffixes that tell the compiler what kind it is. The number
10for example.10by itself implies it’s an integer,10.0implies it’s a double, and10.0fimplies it’s a float. There are more, but these are the most common.
You can use the word
longbefore a type to double it’s bit count. E.glong long longresults in a 128 bit integer, andlong doubleresults in a 128 bit floating point.
Operators
Operators are the name we use for mathematical symbols, or characters that function like mathematical symbols, in programming. For example -, +, /, *, and = are all used as operators in C.
+is used for addition.-is used for subtraction, or to make the number to the right of it negative./is used for division.%is used for modulo, the remainder of an integer division.*is used for multiplication, as well as defining and accessing pointers.=is used to set the value of a variable.
In addition to this, you can use any +, -, / and * operator in conjunction with the = operator as a short hand for:
Value = Value [operator] OtherValue
For example, Value += OtherValue is the same operation as: Value = Value + OtherValue
You can also use -- or ++ as a shorthand for adding/subtracting 1 to/from the value.
E.g. Value++ is the same as: Value = Value + 1
Value-- is the same as: Value = Value - 1
All of these you may have used before when typing into a calculator, but C also has some that you might not have seen.
Comparisons
-
==Equal, used for comparing values, results in non-zerotrueif values to the left and right are equal, otherwise results infalse. E.g.2 == 2results intrue,1 == 2results infalse -
!=Not equal, used for comparing values, results infalseif the left and right values are the same, results intrueif they are different. E.g.1 == 2will result intrue,2 == 2results infalse. -
>Greater than, used for comparing values, if the left value is greater than the right, results intrueotherwise results infalse. E.g.1 > 0results intrue,1 > 2results infalse. -
<Less than, used for comparing values, if the left value is less than the right value, results intrue, otherwise results infalse. E.g.0 < 1results intrue,1 < 0results infalse. -
>=Greater than or equal, used for comparing values. If the left value is greater than, or the same as the right value, results intrue, otherwisefalse. E.g.1 >= 1results intrue,1 >= 0results infalse. -
<=less than or equal, used for comparing values. If the left value is less than or equal to the right value, results intrue, otherwise results infalse. E.g.1 <= 1results intrue,1 <= 0results infalse. -
&&logical AND, results intrueif the statements to the left and right aretrue, otherwise returnsfalse. E.g.( 2 == 2 ) && ( 1 == 1 )returnstrue,( 2 == 1 ) && (1 == 1)results infalse. -
||logical OR, results intrueif either of the statements to the left or right aretrue. E.g.(1 == 0) || (2 == 2)results intrue,(1 == 0) || (2 == 0)results infalse -
!inverts the result of a comparison. E.g.!(1 == 1)results infalse,!(1 == 2)results intrue.
trueandfalseare actually just1and0respectively. This means you can use the result of a comparison as a multiplier by casting, e.g.:Value = (long)(OtherValue > 0) * Multiplier. This will only factor in the multiplier value ifOtherValueis greater than0. Useful for what’s called “branchless” programming, which is a more advanced topic, but may be useful to know. Also worth notingtrueandfalseexist in C++ as standard but have to be user defined in C.
Bitwise operations
Bitwise operations are a bit more of an advanced topic. I wanted to include them here to encourage thinking about data as bits and bytes, rather than a higher level concept. Long term this mindset can help you come up with solutions that exploit the layout of memory to gain performance or reduce code complexity. So don’t worry too much if you don’t 100% internalise the following information.
-
&used for AND-ing bits together, results in a value where both values have the same bit set. E.g.0010 & 0110will result in a value with the bits0010. -
|used for OR-ing bits together, results in a value where either value have a bit set. E.g.1010 | 0110will result in a value with the bits1110. -
^is used for XOR-ing bits together, results in a value where a bit is set to1but only in one of the values not both. E.g.1011 ^ 1001will result in a value with the bits0010. -
~used for flipping the bits of a single value. E.g.~1011will result in a value with the bits0100 -
<<Used for left shifting the bits of a value, this is when all the bits are moved a number of places to the left. E.g.0010 << 1results in0100,0010 << 2results in1000. This also has the property of doubling an unsigned integer. -
>>Used for right shifting the bits of a value, this is when all the bits are moved a number of places to the right. E.g.0100 >> 1results in0010,0100 >> 2results in0001. This also has the property of halving an unsigned integer.
Typedef
It can be useful to define shorthand for these types so we don’t have to keep typing out things like unsigned long long or try to parse all the visual noise this would produce when reading though your code.
We can do this by using the typedef keyword.
typedef allows you to assign an alias to another type, telling the compiler that when you use that name, you’re referring a variable that is the same size and layout of this type.
The syntax for this is: typedef type new_type;
For example if I want to map unsigned short to u8 (shorthand for an unsigned integer with 8-bits). I would do the following:
typedef unsigned short u8;
Now I can use u8 throughout my code in place of unsigned short.
The standard library has some typedefs for common types already, but personally I prefer to write my own.
In base C,
typedefis also required for any custom types we create to tell the compiler to expect them to be used as arguments for functions (more on that later).
Enum
enum, short for Enumerated Type, is a way of grouping constant integer values, when it is beneficial to map these integers to a more descriptive name.
For example:
u32 Type = entity_type_enemy;
Is more descriptive and easier to remember than:
u32 Type = 2;
While you could do this just by defining a series of constant integers like this:
1
2
3
4
5
const u32 entity_type_none = 0;
const u32 entity_type_player = 1;
const u32 entity_type_trader = 2;
const u32 entity_type_enemy = 3;
const u32 entity_type_physics_object = 4;
In my expirence it’s often better to create an enum like this:
1
2
3
4
5
6
7
8
9
typedef enum entity_type entity_type;
enum entity_type
{
entity_type_none,
entity_type_player,
entity_type_trader,
entity_type_enemy,
entity_type_physics_object
};
The
typedefis not required in C++, you define an enum by just using the wordenumand then the contents, like this:enum entity_type { ... };
You can then use the enum type entity_type as a variable for storing data and passing into functions.
Like so: entity_type Type = entity_type_enemy;
This Makes it easier to follow what the code is doing, if you see something that has an entity_type you know that it’s probably an entity, and you know exactly what to search for when looking for all possible entity types.
Due to their enumerated nature, if you don’t explicitly set a value, each entry is 1 greater than the one that came before it. Allowing us to change the order they’re in, and add or remove them, without introducing bugs.
The compiler will do this for us:
1
2
3
4
5
6
7
8
9
typedef enum entity_type entity_type;
enum entity_type
{
entity_type_none = 0,
entity_type_player = 1,
entity_type_trader = 2,
entity_type_enemy = 3,
entity_type_physics_object = 4
};
If unlabelled, it is assumed that the first entry is assigned the value
0.
Because of this, we can use another trick that helps when we want to iterate over each type.
Iteration will be explained in more depth later, but in this context, the word describing looping over a group of variables, doing something for each. Most commonly we give it a start count and max count of iterations it should do.
With enums, we can add a _count or _max value to the end like this:
1
2
3
4
5
6
7
8
9
10
typedef enum entity_type entity_type;
enum entity_type
{
entity_type_none,
entity_type_player,
entity_type_trader,
entity_type_enemy,
entity_type_physics_object,
entity_type_count
};
Then keep adding entity types just before this last value
1
2
3
4
5
6
7
8
9
10
11
typedef enum entity_type entity_type;
enum entity_type
{
entity_type_none,
entity_type_player,
entity_type_trader,
entity_type_enemy,
entity_type_physics_object,
entity_type_item,
entity_type_count
};
entity_type_count will always accurately reflect how many entity types there are as long as it’s the last value specified.
Struct
We can also build up more complex types from these basic types in data structures called structs.
A struct is a way of grouping multiple variables together in the same type. This can be useful for passing around one variable rather than 6 or 7 every time you want to operate on that data. Allowing you to more easily perform common operations on sets of data.
For example, let’s take a look at a struct we might expect to see in game development: vec3
1
2
3
4
5
6
7
typedef struct vec3 vec3;
struct vec3
{
float X;
float Y;
float Z;
};
The
typedefis not required in C++, you define a struct by just using the wordstructand then the contents, like this:struct vec3 { ... };
A vec3, or 3D Vector, is a mathematical concept used to store both direction and distance (magnitude). I won’t go into it too much here, but they are fairly common in game development (and other 3D applications) due to their ability to represent locations.
They are made up of 3 floating point values describing translation in the X, Y and Z axes.
Since vectors are used frequently, and you’ll want to perform operations on them to calculate a new state for the game. It makes sense to group it like this for convenience.
You can assign a struct like you would any other variable: vec3 Position;
However when it comes to setting the initial value, how do we know which value to set? There are 3 values, so how do we tell the compiler which one we want to define?
To do this, you use what is called an “initialiser list”:
vec3 Position = { 10.0f, 0.0f, 1.0f };
This sets the values of the contained variables in order the list is given. So in this case X = 10.0f, Y = 0.0f, and Z = 1.0f
You can also assign them one by one, over multiple lines, like this:
1
2
3
4
5
6
// = {0}; sets all contained values to 0 in C
// = {}; does the same thing in C++
vec3 Position = {0};
Position.X = 10.0f;
Position.Y = 0.0f;
Position.Z = 1.0f;
You can also set initial values in an initialiser list by the name of the variable inside the struct like this:
vec3 Position = { .X = 10.0f, .Y = 0.0f, Z = 1.0f };
Here we explicitly set each of the values, so if we change the order of the data in the struct, we won’t create any unforeseen bugs.
You can also assign struct values the values of another struct if they are of the same type. For example, if I wanted to copy the value of Position into a new variable, so I can modify it without changing the original. I could do this:
1
2
3
4
5
6
// not quite a real use case, simplified for explanation
float Speed = 10.0f;
vec3 Position = { 10.0f, 0.0f, 1.0f };
vec3 PredictedPosition = Position;
PredictedPosition.X += Speed;
//... check predicted location if it's safe to move in
In this case, Position still has the value X = 10.0f, Y = 0.0f, Z = 1.0f, and PredictedPosition has the values X = 20.0f, Y = 0.0f, Z = 1.0f.
You can initialise structs with
constvalues like thisconst vec3 ZeroVec3 = {0}; ... vec3 Position = ZeroVec3;This can be useful for easily resetting variables to a zeroed state.
Union
A union is pretty similar to a struct, however rather than containing one instance of each variable contained. It only contains one instance of the largest variable contained. Each entry in the union shares memory, so When you modify one variable in the union you’re modifying all variables in a union.
Ok so I lied a little in the last section, I don’t use struct vec3 in my code. I actually use union vec3
I can explain, I promise.
The mathematical concept of a Vector allows you to scale the number of axes to any value you’d like. Most commonly I find myself using Vector 2D, Vector 3D and Vector 4D. Depending on the problem I’m trying to solve. As a result I would end up with structs: vec2, vec3 and vec4, and if I just wanted to get the X and Y components from a vec3 and use them as a vec2, I’d have to create a new vec2 and assign it like this:
1
2
3
4
5
6
vec3 Position = { 10.0f, 0.0f, 1.0f };
vec2 Position2D = { Position.X, Position.Y };
//... do stuff, pass vec2 into functions,
// assign it to a new value
Position.X = Position2D.X;
Position.Y = Position2D.Y;
While this only adds a couple lines of code, if you’re doing it often, it can quickly build up. It also adds extra steps for the CPU.
Using a union solves this problem, because we can actually just say that a vec2 is a union containing a struct with X and Y values, and a vec3 is a union containing a struct with a vec2 and a Z component.
Like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
typedef union vec2 vec2;
union vec2
{
struct
{
float X;
float Y;
}
float Array[2];
};
typedef union vec3 vec3;
union vec3
{
struct
{
float X;
float Y;
float Z;
}
struct
{
vec2 XY;
float Z;
}
struct
{
float X;
vec2 YZ;
}
float Array[3];
};
The
typedefis not required in C++, you define a union by just using the wordunionand then the contents, like this:union vec3 { ... };
The float Array[3]; and arrays in general will be explained later but I wanted to include them to show you can use them inside a union.
Using the union vec3 we can access the values X, Y, and Z as we could with a struct, but we can also use XY to get the values of X and Y in the form of a vec2.
Since in a union the memory is the same for each entry, and since each entry is a struct containing 3 floats. Anything we do to XY will change the value of both X and Y.
And that about wraps it up for variables in C!… mostly… There will be more topics regarding variables and how to use them (like pointers) coming up.
Previous: Compiling
Next Up: TBD
