Tokens are the smallest lexical component of a programming language. EGs: A keyword, an operator, a variable, a scalar. Tokens are separated by positioning, whitespace, or syntax. A compiler typically groups all the characters into tokens and then checks to see if the tokens follow the syntax of the language.

In C, there are six kinds of tokens: keywords, identifiers, operators, punctuators, constants, and string constants.

In C#, there seven kinds of tokens: keywords, identifiers, operators/punctuators, integer-literals, real-literals, character-literals, string-literals.


The characters allowed in C programs are every printing ASCII characters (except `) plus white space

  • Lower case letters: abcdefghijklmnopqrstuvwxyz.
  • Digits: 0123456789.
  • Other characters (in ASCII order): !"$%&'()*+,-./:;<=>?@[\]^_.
  • Whitespace (in ASCII order):
    • Tab = \t = 9 = x9 = HT= Horizontal Tab
    • New Line = NL = \n = 10 = xA = LF = Line Feed = EOL = End Of Line
    • Vertical Tab = VT = \v = 11 = xB
    • Form Feed = FF = \f = 12 = xC = NP = New Page
    • Carriage Return = CR = \r = 13 = xD
    • Space =   = 32 = x20

The Cs are free-form. All white space is equivalent. EG: 2+2 is the same as 2 + 2. However care should be taken in writing human readable code through the proper use of white space.

The Cs are case sensitive! EG: x=3; X=4;.

Some characters have different meaning depending on context. EG: For x = y % 3, the % symbol is an operator called modulus which return the integer remainder, but for printf("%d",a);, the % symbol means substitute with a.

Comments are converted into non-tokens and are treated as white space.

// Single-line comment
   Multi-line or delimited comment
   Introduced with C++ and C99

Identifiers give unique names to objects in a program.

  • The first character of must be a letter (a-z or A- Z) or an underscore (_).
  • The other characters can be letters, underscores, or numbers (0-9).
  • Identifiers must not be a reserved word.

Punctuators usually organize tokens.

  • Punctuators include:
    • (). Usually groups. EG: 2*3+4 is not the same as 2*(3+4).
    • {}. Usually enclose a multiple statements to be seen as one block of statements. EG: main(){ many_lines }.
    • ,. Usually separate items in a list. EG: (a,b,c).
    • ;. Found at the end of most statements. EG: x=3;.
  • In some instances a punctuator may be an operator. EG: In main(), the first parentheses is an operator.

Preprocessing directives

Preprocessing directives are executed before the rest of the program.

#include <headerFile>

#include includes a header file so its functions are available to the program.

The C header files include:

<assert.h>, <ctype.h>, <errno.h>, <float.h>, <iso646.h>, <limits.h>, <locale.h>, <math.h>, <setjmp.h>, <signal.h>, <stdarg.h>, <stddef.h>, <stdio.h>, <stdlib.h>, <string.h>, <time.h>, <wchar.h>, <wtype.h>.

The C++ has all the C header files except that they are prefixed with a c and the extension has been removed (EG: stdio.h in C is cstdio in C++). C++ has the following header files:

<algorithm>, <bitset>, <deque>, <exception>, <fstream>, <functional>, <iomanip>, <ios>, <iosfwd>, <iostream>, <istream>, <iterator>, <limits>, <list>, <locale>, <map>, <memory>, <new>, <numeric>, <ostream>, <queue>, <set>, <sstream>, <stack>, <stdexcept>, <streambuf>, <string>, <typeinfo>, <utility>, <valarray>, <vector>.

C++ uses namespaces and all header files are part of the std namespace. EG:

#include <stdio.h>    //In C

#include <cstdio>     //In C++
using namespace std

Flow Control

[TODO: Fill out later.]


returnType functionName(parameterList)
   return variableOfReturnType;

where parameterList is void (takes no arguments and the function does not need a return statement) or a comma separated list:

dataType var1, dataType var2, ... dataType varN

Parameters and variables are by value (works with copies of the values) by default. In order to use by reference (work with pointers), the * modifier and the & (a unary operator) have to be used.

GeorgeHernandez.comSome rights reserved