C Crash Course

Hello World

Why C?

A computer is a physical machine.

Hello World

hello.c
#include <stdio.h>

int main(void) {
  printf("Hello World\n");
  return 0;
}
  • printf is “imported” from stdio
  • main is special
  • main takes void
  • main returns int
  • { }
  • ;

Running at command line

$ gcc hello.c
$ ./a.out
Hello World
$

Newline must be explicit

Comparison

Python

  • dynamic typing
  • compiles to byte code
  • run by virtual machine

C

  • static typing
  • compiles to machine code
  • run by CPU

Standard implementation of Python - “CPython” - is written in C.

Compilation Example

int increment(int num) {
    return num + 1;
}

compiles to (https://godbolt.org)

GCC -O0 on x86-64
push    rbp
mov     rbp, rsp
mov     DWORD PTR [rbp-4], edi
mov     eax, DWORD PTR [rbp-4]
add     eax, 1
pop     rbp
ret

Compilation Example

  • no variables!
  • function is just a sequence of instructions
  • each instruction maps to sequence of bits
  • C is a shorthand for writing assembly

Basic Syntax

Variables

  • lower snake case
  • historically, very short
  • but you can be more descriptive
  • fixed memory location
  • But how much memory?

Types

Basic Types

  • char - one byte, ie, small integer
  • int - 4 bytes (Python3 int has arbitrary size)
  • float - 4 bytes
  • double - double precision floating point

Modifiers for integer types

  • signed - negative to positive
  • unsigned - zero to positive

Exact sizes and ranges somewhat platform dependent :(

Types

Because type corresponds to memory size, be careful to stay within bounds:

  • (low + high) / 2 may give gibberish
  • low + (high - low) / 2 is safe
  • This bug went undetected in Java’s binary search until 2006!
  • sizeof operator gives size in bytes

Literals

a char is an integer but written as a single quoted character

  • '0' is same as 48 (ASCII value)
  • '0' is conventional and preferred
  • char most often used for textual stuff, not math
  • special characters begin with , e.g. '\n'
  • '\0' is null character (equal to 0)

Literals

strings are denoted by double quotes

  • stored in memory as array of characters
  • '\0' at the end
  • "a" is not same as 'a' !

Literals

enum for autoincrementing constants

enum.c
#include <stdio.h>

int main(void) {
  enum fruits { APPLE, BERRY, CHERRY };
  printf("%d %d %d\n", APPLE, BERRY, CHERRY);
  return 0;
}
$ gcc enum.c
$ ./a.out
0 1 2

(You can also explicitly assign values.)

Declarations

Variables must be declared (and initialized) before use.

  • int x, y;
  • char msg[] = "hello";
  • const double pi = 3.1415;

const: value can’t change (compiler can enforce)

Declarations

Global variables

  • initialized exactly once
  • may need to be declared within function with extern keyword

Local variables

  • initialized wth each function call
  • value does not persist

Operators

Arithmetic Operators

  • +, -, *, /, %
  • / is integer division

Relational Operators

  • >, >=, <, <=
  • ==, !=

Logical Operators

  • && (and), || (or), ! (not)

Operators

Boolean expression

  • evaluates to 1 (true) or 0 (false)
  • ! turns 0 to 1, and nonzero to 0

compound expressions separated by &&/||

  • evaluated left to right
  • with short circuit

not true in general

  • in f(x) + g(x), f or g may be evaluated first

Type Casting

Usually, C automatically casts type

  • 1 + 1.0 is a double float
  • sqrt(2) works

To explicitly cast:

(type) expression

e.g., (float) 5 / 2 evaluates to 2.5

Increment / Decrement

  • ++ increases variable by 1
  • - - decreases variable by 1
  • prefix: change value then get it
  • postfix: get value then change it
prefix_postfix.c
#include <stdio.h>

int main(void) {
  char x = 'a';

  printf("%c\n", ++x);
  printf("%c\n", x);

  printf("%c\n", x++);
  printf("%c\n", x);

  return 0;
}

Increment / Decrement

$ gcc prefix_postfix.c
$ ./a.out
b
b
b
c
$

Increment / Decrement

Often idiomatic (but confusing) to increment within expression.

ones.c
#include <stdio.h>

int main(void) {
  int i = 0, ones[3];

  while (i < 3) {
    ones[i++] = 1; // ++i does not work
  }

  for (i = 0; i < 3; i++) {
    printf("%d: %d\n", i, ones[i]);
  }

  return 0;
}

Increment / Decrement

$ gcc ones.c
$ ./a.out
0: 1
1: 1
2: 1
$

Assignment

Assignment Operators

  • x = x + 3 same as x += 3
  • similarly, -=, *=, /=, ...
  • in Python, += is not same as reassignment!

Assignment

Assignment Expressions

  • printf("%d\n", x = 42) outputs 42
  • common pattern: while ((char = getchar()) != EOF)
  • nasty bug: if (x=1) always true
  • coming to Python 3.8 (:=)

Control Flow

Statement and Block

  • code is made of statements
  • each statement ended by ;
  • block is multiple statements within { }
  • block is also a statement
  • no semicolon after block

Conditionals

General Syntax

if (expression)
    statement

else if (expression)
    statement

...

else
    statement
  • evalutation halts when an expression is true (nonzero)
  • else is optional

else if is not a separate keyword like elif. The if simply begins another conditional.

Switch

switch (expression) {
    case const-expr: statements
    case const-expr: statements
    ...
    default: statements
}
  • expression must be integer valued
  • case values must be unique
  • optional default executes if no condition is met
  • execution starts where expression matches value
  • continues until end or break

Switch

case.c
#include <stdio.h>

int main(void) {
  int n = 2;

  switch (n) {
  case 1:
    printf("one\n");
  case 2:
    printf("two\n");
  case 3:
    printf("three\n");
    break;
  case 4:
    printf("four\n");
  }

  return 0;
}

Switch

$ gcc case.c
$ ./a.out
two
three
$

Loops

while loop runs as long as expression is nonzero

while (expression)
    statement

for loop

for (expr1; expr2; expr3)
    statement

is equivalent to

expr1;
while (expr2) {
    statement
    expr3
}

Loops

for loop is idiomatic for iterating thru array

for (i = 0; i < n; i++) {
    ...
}
  • for(;;){ ... } is an infinte loop!

Loops

do-while loop

do
    statement
while (expression);

do-while is relatively uncommon

Break / Continue

Same as Python! Within a loop, break immediately exits the loop, and continue goes back to the top of the loop.

Goto

You can label code and jump to a label.

goto.c
#include <stdio.h>

int main(void) {

  goto yay;

  printf("hello\n");

yay:
  printf("world\n");

  return 0;
}

Goto

$ gcc goto.c
$ ./a.out
world
$

“Although we are not dogmatic about the matter, it does seem that goto statements should be used rarely, if at all.”

– K&R

Functions

General Structure

return-type name(parameters) {
    declarations and statements
}
  • return returns value (or nothing)
  • if no return then returns nothing
  • void denotes no return or no parameter
  • no inner functions in C

Multiple Files

C program is a bunch of variable declarations and functions. Source code can be split among multiple files.

half_main.c
#include <stdio.h>

int main(void) {

  double half(double);

  printf("%f\n", half(5));
  return 0;
}
half.c
double half(double x) { return x / 2; }

Multiple Files

$ gcc half_main.c half.c
$ ./a.out
2.500000
$

The function prototype double half(double); is necessary, otherwise the compiler seeing a function used for the first time assumes it has return type int.

This the main purpose of header (.h) files - get all the prototypes and other global declarations out of the way.

Global Variables

  • AKA external variables
  • defined outside of any function
  • useful for shared information among functions
  • use with caution, for the usual reasons

Scope

Where can a variable be used?

  • local (“automatic”) variable: within the same function
  • parameters become local variables
  • global (“external”) variable: within the same file
  • extern declaration may be necessary
  • when local and global variable has same name, local wins

Static Variables

  • static declaration before global variable: visible within file only
  • static local variable: persist between function calls
counter.c
#include <stdio.h>

int counter() {
  static int n = 0;
  ++n;
  return n;
}

int main(void) {
  printf("%d\n", counter());
  printf("%d\n", counter());
  printf("%d\n", counter());

  return 0;
}

Static Variables

$ gcc counter.c
$ ./a.out
1
2
3
$

Block Scope

Variable can be local to any block, not just function block.

block_scope.c
#include <stdio.h>

int main(void) {
  int x = 2;

  if (1) {
    int x = 3;
    printf("%d\n", x);
  }

  printf("%d\n", x);
  return 0;
}

Block Scope

$ gcc block_scope.c
$ ./a.out
3
2
$

Probably should exploit this sparingly...

Initialization

  • global variables default to zero
  • local vaiables must be initialized, or have garbage values

Array can be initialized with a literal

  • int nums[] = {1, 2, 3};

Initialization

Strings are arrays of characters

  • char message[] = "hello";

is short for

  • char message[] = {'h', 'e', 'l', 'l', 'o', '\0'};

Recursion

Yes.

fib.c
#include <stdio.h>

int fib(int n) {
  if (n <= 2)
    return 1;

  return fib(n - 1) + fib(n - 2);
}

int main(void) {
  int n = 1;

  while (n <= 10) {
    printf("%d ", fib(n));
    n++;
  }
  printf("\n");
}

Recursion

$ gcc fib.c
$ ./a.out
1 1 2 3 5 8 13 21 34 55
$

Preprocessor

C processor manipulates souce code before compilation.

#include injects content of another file

  • #include <stdio.h> adds standard IO library headers
  • #include "my_headers.h" adds file from current directory

Preprocessor

#define creates substitution rule

  • #define MAX 1000 substitutes 1000 wherever MAX occurs
  • most often used to define constants
  • can also take arguments

Preprocessor

#ifndef and #endif often wrap a header file

  • make sure include only once
#ifndef MY_HEADER_H
#define MY_HEADER_H

header content goes here

#endif
  • pattern is called “#include guards”
  • Output preprocessed source code with -E command line option.

Pointers and Arrays

Pointers

A pointer is a variable whose value is the address of another variable.

pointer.c
#include <stdio.h>

int main(void) {
  int x;
  int *p;

  p = &x;
  x = 42;

  printf("address %p contains %d\n", p, *p);
  return 0;
}

Pointers

$ gcc pointer.c
$ ./a.out
address 0x7ffee1317858 contains 42
$

Pointers

  • pointers are declared with *
  • * operator also gives pointed-at value (dereferencing)
  • & operator gives address of a variable
  • a pointer may be assigned zero (or NULL conventionally)
  • pointers make possible dynamic data structures
  • void *p; declares pointer to nothing (yet)

Pointers and Functions

Because C functions pass by value, pointers allow “in-place” changes.

triple.c
#include <stdio.h>

void triple(int *p) { 
    *p = *p * 3;
}

int main(void) {
  int x = 5;
  triple(&x);
  printf("%d\n", x);

  return 0;
}

Pointers and Functions

$ gcc triple.c
$ ./a.out
15
$

Pointers and Arrays

Arrays and pointers are suprisingly close.

pointer_and_array.c
#include <stdio.h>

int main(void) {
  int arr[] = {5, 4, 3, 2, 1};
  int *p;

  p = &arr[0];
  printf("%d\n", *(p + 2));

  p = arr;
  printf("%d\n", *(p + 2));
  printf("%d\n", *(arr + 2));
  printf("%d\n", p[2]);

  return 0;
}

Pointers and Arrays

$ $ gcc pointer_and_array.c
3
3
3
3
$
  • adding integer to pointer advances to next element
  • one reason a pointer is to a specific type
  • name of array is alias for address of first element

Pointers and Arrays

Because addresses are equally sized, can freely cast pointers

little_endian.c
#include <stdio.h>

int main(void) {

  unsigned long n = 256 * 256;
  unsigned char *p = (unsigned char *)&n;

  int i;
  for (i = 0; i < sizeof n; i++) {
    printf("%d ", *(p + i));
  }

  printf("\n");
  return 0;
}

Pointers and Arrays

$ gcc little_endian.c
$ ./a.out
0 0 1 0 0 0 0 0
$

In most computers, integer digits from small to large (“little-endian”).

Pointers and Arrays

When passing array name to function, it is the address that is passed.

print_array.c
#include <stdio.h>

void print_array(int *p, int len) {
  int i;
  for (i = 0; i < len; ++i)
    printf("%d ", p[i]);
  printf("\n");
}

int main(void) {
  int arr[] = {5, 4, 3, 2, 1};
  print_array(arr, 5);

  return 0;
}

Pointers and Arrays

$ gcc print_array.c
$ ./a.out
5 4 3 2 1
$

This is known as “array decay”.

Character Pointers

Since strings are arrays, functions that take a string receive pointer.

string_copy.c (adapted from K&R, p.106)
#include <stdio.h>

void string_copy(char *source, char *target) {
  while ((*source++ = *target++))
    ;
}

int main(void) {
  char original[] = "hello";
  char copy[6];
  string_copy(copy, original);
  printf("%s\n", copy);
  return 0;
}

Character Pointers

$ gcc string_copy.c
$ ./a.out
hello
$

Other Uses

  • array of pointers (Python list)
  • array of arrays (matrices)

Command Line Arguments

Command line arguments can be passed to main function with the signature

int main(int argc, char *argv[])

  • argc (argument count) is number of command line arguments
  • argv (argument vector) is array of strings, ie, the arguments
  • argv[0] is the name of the program itself
  • argv[argc] is NULL pointer

Command Line Arguments

factorial.c
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {

  static int acc = 1;

  if (argc < 2) {
    printf("error: need argument\n");
    return 0;
  }

  int n = atoi(argv[1]);

  if (n <= 1) {
    printf("%d\n", acc);
    return 1;
  } else {
    acc = n * acc;
    sprintf(argv[1], "%d", --n);
    return main(argc, argv);
  }
}

Command Line Arguments

$ gcc factorial.c
$ ./a.out 5
120
$
  • static int to persist result
  • atoi converts string to integer
  • sprintf prints to string
  • Yes, main is a function like any other
  • No, don’t make main recursive in real life

Function Pointers

Since a function is in memory, pointer can point to it

function_pointer.c
#include <stdio.h>

int add_two(int n) { return n + 2; }

int triple(int n) { return n * 3; }

int main(void) {

  int (*fp)(int);

  fp = &add_two;
  printf("%d\n", (*fp)(42));

  fp = triple;
  printf("%d\n", fp(42));

  return 0;
}

& and * are optional, but clarifying

Function Pointers

$ gcc function_pointer.c
$ ./a.out
44
126
$
  • fp points to a function that takes int and returns int
  • function pointer useful for passing small functions
  • example: comparison function for sorting

Complicated Declarations

With arrays, pointers, and functions, declarations can get complex

  • char ( * ( *x () ) [ ] ) () ???
  • function returning pointer to array of pointer to function returning char
  • there are rules, but barely human understandable
  • keep it simple

Structures

Structures

Structures are basically classes but without methods

cat.c
#include <stdio.h>

int main(void) {

  struct cat {
    char *name;
    char *greeting;
    int hunger;
  } grumpy;

  grumpy.name = "Grumpy";
  grumpy.greeting = "Meh";
  grumpy.hunger = 80;

  printf("%s, I am %s. My hunger is %d.\n",
          grumpy.greeting, grumpy.name, grumpy.hunger);

  struct cat bub = {"Lil Bub", "Meow", 50};
  printf("%s, I am %s. My hunger is %d.\n",
          bub.greeting, bub.name, bub.hunger);

  return 0;
}

Structures

$ gcc cat.c
$ ./a.out
Meh, I am Grumpy. My hunger is 80.
Meow, I am Lil Bub. My hunger is 50.
$
  • struct defines a type or class
  • cat is (optional) structure tag, a short hand for { ... }
  • variable in structure is a member
  • you can have structures within structures

Structure Pointers

Like other types, structure are passed by value. May be more efficient to pass pointer to structure.

pointer_cat.c
#include <stdio.h>

struct cat {
  char *name;
  char *greeting;
  int hunger;
};

void speak(struct cat *cp) {
  printf("%s, I am %s. My hunger is %d.\n",
          (*cp).greeting, cp->name, cp->hunger);
}

int main(void) {
  struct cat grumpy = {"Grumpy", "Meh", 80};
  struct cat *cp = &grumpy;

  speak(cp);

  return 0;
}

Structure Pointers

$ gcc pointer_cat.c
* ./a.out
Meh, I am Grumpy. My hunger is 80.
$
  • pointer->member is shorthand for (*pointer).member
  • structures allow for a quasi-object oriented style
  • function pointer member to mock up method?
  • array of structures store uniform data, like list of dictionaries

Nested Structures

A member of a structure can be pointer to another structure (or itself), you can build a larger data structure out of them.

  • binary search tree, with node.left and node.right
  • hash table: collisions resolved by chaining
  • Python objects!

typedef

You can alias a type by using typedef

typedef_cat.c
#include <stdio.h>

typedef struct cat {
  char *name;
  char *greeting;
  int hunger;
} Cat;

void speak(Cat *cp) {
  printf("%s, I am %s. My hunger is %d.\n",
        (*cp).greeting, cp->name, cp->hunger);
}

int main(void) {
  Cat grumpy = {"Grumpy", "Meh", 80};
  Cat *cp = &grumpy;

  speak(cp);

  return 0;
}

typedef

$ gcc typedef_cat.c
$ ./a.out
Meh, I am Grumpy. My hunger is 80.
$
  • alias appears where a variable would be
  • conventionally uppercase (classes?)
  • cleaner code and more semantic

I/O

Standard I/O

Input / output not part of C language but handled by library functions, mostly in stdio.h.

  • int getchar(void) returns one character or EOF (-1) from STDIN
  • int putchar(int) sends one character to STDOUT and returns it or EOF (error)
  • a lot of command line utilities are based on stream processing

Formatted I/O

Formatted output with printf

int printf(char *format, ...)

printf returns count of characters printed

common format options:

  • %d - integer
  • %c - char
  • %s - string
  • %f - double float
  • %p - pointer value

Formatted I/O

printf is a variadic function. Use <stddarg.h> library to implement your own.

Formatted I/O

Formatted input with scanf

int scanf(char *format, ...)

scanf returns number of assigned items. Arguments must be pointers!

File Access

To access files, use the FILE type (a structure)

$ gcc open_file.c
$ ./a.out
#include <stdio.h>

int main(void) {
FILE *fp = fopen("open_file.c", "r");

int c;
while ((c = getc(fp)) != EOF) {
    putc(c, stdout);
}
printf("n");
fclose(fp);
return 0;
}
$

File Access

  • fopen returns pointer to FILE in read mode
  • getc gets next character from file
  • putc sends character to file
  • stdin, stdout, stderr are included file pointers
  • STDIN, STDOUT, STDERR are “file-like objects” provided by OS
  • limit on number of open files, so fclose when done

Memory Management

The compiler allocates memory for variables, but often more memory is required during execution. For example, instantiate an object in Python.

C provides the malloc function to dynamically allocate memory.

Memory Management

malloc_cat.c
#include <stdio.h>
#include <stdlib.h>

typedef struct cat {
  char *name;
  char *greeting;
  int hunger;
} Cat;

void speak(Cat *cp) {
  printf("%s, I am %s. My hunger is %d.\n",
          cp->greeting, cp->name, cp->hunger);
}
int main(void) {
  int i; for (i = 0; i < 3; i++) {
    Cat *cp;
    cp = (Cat *)malloc(sizeof(Cat));
    cp->name = "Grumpy"; cp->greeting = "Meh";
    cp->hunger = 15;

    speak(cp);
    free(cp);
  }
  return 0;
}

Memory Management

$ gcc malloc_cat.c
$ ./a.out
Meh, I am Grumpy. My hunger is 15.
Meh, I am Grumpy. My hunger is 15.
Meh, I am Grumpy. My hunger is 15.
$
  • malloc takes an integer and reserves that number of bytes
  • malloc returns a pointer to void, so need to cast it
  • free takes a pointer and frees up memory it points to
  • neglecting to free often leads to memory leak
  • free only memory that is mallocated!

Operating System Interface

File Descriptors

  • all input / output is thru files
  • files include normal files and peripherals
  • program requests file, gets a file descriptor integer
  • 0, 1, 2 represent STDIN, STDOUT, STDERR, respectively
  • program communicates with OS thru special functions - system calls

Low Level I/O

readwrite.c (adapted from K&R p.171)
#include <stdio.h>
#include <unistd.h>

int main(void) {
  char buffer[BUFSIZ];
  int n;

  while ((n = read(0, buffer, BUFSIZ)) > 0)
    write(1, buffer, n);

  return 0;
}

Low Level I/O

$ gcc readwrite.c
$ echo "hello" | ./a.out
hello
$
  • read and write takes file descriptor, char array, transfer size
  • return number of bytes transferred
  • large buffer means fewer calls (BUFSIZ is 1024)
  • unistd.h declares many system calls

Some Other System calls

  • open - open file by name, returns descriptor
  • creat - make new file
  • close - frees up descriptor
  • unlink - removes file by name
  • lseek - changes current position in file (random access)

Higher level I/O functions such as printf ultimately make system calls, which are implemented by OS.

Memory Allocation

How do malloc and free work? Here’s one possible way:

  • blocks of memory available to program form a circular linked list
  • each block has a header - block size, pointer to next block, ...
  • malloc traverses linked list for large enough block
  • if found, return pointer to needed portion, remainder is new block
  • if not found, make system call for additional memory and add to list
  • system call is expensive, so request a big amount
  • free simply adds block back to list, merging if possible

See K&R (8.7) for detailed implementation. It is a masterpiece.

Further Study

What we left out

  • bitwise operations
  • ternary operator
  • unions - like struct but only one value at a time
  • bit-fields - a struct that accesses individual bits
  • error handling - no try/except, just STDERR and fast exit

Where to go next

  • standard library - networking, threading, ...
  • CPython, C extensions, ctypes, Cython, CFFI, ...
  • assembly
  • C++
  • Go, Rust
  • operating system
  • compilers!