This is the portion of data structures and algorithms that I got completely lost on as a third year ECE major. It was also THE FIRST thing they taught us in that class (why they wouldn’t start with linear data structures like arrays and lists is beyond me). Fortunately there are TONS of resources to brush up on tree data structures and thus the implementation of this was not as painful as I thought it was going to be.
Linear vs Tree Data Structures — Why Shape Matters
Up until now, every data structure we’ve implemented has been linear.
Linked lists, stacks, queues, and deques all share one core idea:
There is exactly one path through the data.
If you imagine them visually, they look like a chain:
[ A ] -> [ B ] -> [ C ] -> [ D ] -> [ E ] // should remind you of our previous linked list implementations
Each element knows about at most one next element (and sometimes one previous element).
This makes them simple and predictable, but it also limits how efficiently we can search them.
If you want to find something in a linked list, you usually have to walk element by element:
Worst case:
O(N) // for other ops as we have shown, we can augment the linked list data structure with a tail or previous pointer to make other ops O(1)
This is fine for small datasets, but once data gets large, linear traversal becomes expensive.
Enter Tree Data Structures
Tree data structures break away from the idea of a single path.
Instead of each element pointing to just one next element, tree elements can branch (hence the tree name):
A
/ \
B C
/ \
D E
The advantage here should become pretty clear. Let’s see we want to find the E Node. For our linked list, that means walking the entire list until we hit the E node. That is 5 ops. For this tree above, we simply visit A, then B, then we hit E (how this actually occurs will become more clear as we step into the specific definition of what makes a binary search tree a binary search tree).
Introducing the Binary Search Tree (BST)
So, how does that specific case above actually occur and optimize our search for a node? A Binary Search Tree is a specific kind of tree with two rules:
Rule 1 — Binary
Each node has at most two children:
left child
right child
Rule 2 — Ordered (This Is The Important Part)
For every node:
left subtree < node value < right subtree // for out letters, it means being in alphabetical order. for numbers, we can just use more than or less than operators
It is this second rule that really solidifies why our search for E case is faster for a BST vs a linear data structure.
Another Example:
Search example:
50
/ \
30 70
/ \ / \
20 40 60 80
Everything left of 50 is smaller.
Everything right of 50 is larger.
This rule is applied recursively throughout the tree.
So lets say we want to look for the value 60 in this tree:
Start at 50 → 60 > 50 → go right
At 70 → 60 < 70 → go left
Found 60
Instead of checking every element, you eliminate half the tree at each step (in ideal cases where we have a balanced tree).
The Three Core BST Operations
BSTs are defined primarily by three operations:
Insert
Place a new value in the correct position while maintaining ordering.
Search
Find whether a value exists by following left/right decisions.
Delete
Remove a value while preserving BST structure.
This is the hardest operation because removing nodes can break ordering if done incorrectly.
The Hidden Fourth Operation: Traversal
Traversal isn’t always listed as a “core BST operation,” but it is extremely important.
Traversal means visiting every node in a specific order.
For BSTs, the most important traversal is:
In-Order Traversal
Left → Node → Right
This prints BST values in sorted order.
This is one of the biggest reasons BSTs are useful.
Important Reality: BSTs Are Not Always Fast
BST performance depends on tree shape.
If the tree stays balanced:
Search ≈ O(log N)
If the tree becomes skewed (like inserting sorted data):
Search ≈ O(N)
Worst case BST becomes a linked list.
This is exactly what our benchmarks later will demonstrate.
The Actual Challenge: Implementation in C:
So like before, I will show you the header file and then we will discuss each function. This code is FULL of comments since the concepts here are sometimes a bit rough to grasp (or at least I found them hard to grasp upon first look):
bst.h
#ifndef BST_H
#define BST_H
#include <stdio.h>
#include <stdlib.h>
struct Node {
int key;
struct Node* l;
struct Node* r;
};
struct BST {
struct Node* root;
};
struct Node* alloc_node(int key);
int insert(int key, struct BST* tree);
struct Node* search(int key, struct BST* tree);
struct Node** search_for_delete(int key, struct BST* tree);
int delete(int key, struct BST* tree);
void free_subtree(struct Node* node);
void destroy_bst(struct BST* tree);
void print_sub_tree_in_order(struct Node* node);
void print_tree_in_order(struct BST* tree);
#endif
bst.c
#include "bst.h"
struct Node* alloc_node(int key) {
struct Node* new_node = malloc(sizeof(struct Node));
if (!new_node) return NULL; //malloc failed, return NULL
new_node->key = key;
new_node->l = NULL;
new_node->r = NULL;
return new_node;
}
int insert(int key, struct BST* tree) {
if (!tree) return 1; //error, bst pointer is invalid!
//first, we handle the case where we are inserting into an empty tree
if (!tree->root) {
struct Node* new_root = alloc_node(key);
tree->root = new_root;
return 0;
}
//okay so now, if the tree is not empty, we have to traverse the tree
//to start, we grab the current root node
struct Node* curr_node = tree->root;
while (1) {
if (key < curr_node->key) {
if (curr_node->l == NULL) {
struct Node* new_node = alloc_node(key);
if (!new_node) return 1; //node allocation failed
curr_node->l = new_node;
return 0;
}
curr_node = curr_node->l; //if the node is populated, we keep traversing
}
else if (key > curr_node->key) {
if (curr_node->r == NULL) {
struct Node* new_node = alloc_node(key);
if (!new_node) return 1; //node allocation failed
curr_node->r = new_node;
return 0;
}
curr_node = curr_node->r; //if the node is populated, we keep traversing
}
else {
//the key is equal... we will exlcude duplicate
return 2; //duplicate error code!
}
}
return 0;
}
struct Node* search(int key, struct BST* tree) {
if (!tree || !tree->root) return NULL; //invalid tree pointer or tree is empty
struct Node* curr_node = tree->root;
while (1) {
if (key < curr_node->key) { //if less, we need to search down the left node
if (curr_node->l == NULL) return NULL; //this means there is no left node, which means node is not in the tree, return NULL
curr_node = curr_node->l;
}
else if (key > curr_node->key) { //if more, we need to search down the right node
if (curr_node->r == NULL) return NULL; //same as above, just for the right node
curr_node = curr_node->r;
}
else {
return curr_node;
}
}
return NULL; //node not found, return NULL
}
struct Node** search_for_delete(int key, struct BST* tree) {
if (!tree || !tree->root) return NULL;
struct Node** curr_node = &tree->root;
while (1) {
if (key < (*curr_node)->key) {
if ((*curr_node)->l == NULL) return NULL;
curr_node = &(*curr_node)->l;
}
else if (key > (*curr_node)->key) {
if ((*curr_node)->r == NULL) return NULL;
curr_node = &(*curr_node)->r;
}
else {
return curr_node;
}
}
return NULL;
}
int delete(int key, struct BST* tree) {
struct Node** node_link = search_for_delete(key, tree);
if (!node_link) return 1; //node not found, return 1
struct Node* node = *node_link;
//this node is leaf node, deletion is easy
if (!node->l && !node->r) {
*node_link = NULL;
free(node);
return 0;
}
else if (node->l && !node->r) { //only left node
*node_link = node->l;
free(node);
return 0;
}
else if (!node->l && node->r) { //only right node
*node_link = node->r;
free(node);
return 0;
}
else if (node->l && node->r) {
//if node has both children, it gets complicated, we need to preserve the correctness of the tree
//to do this, we have to use a little trick
//its called successor (another apporach is precessor)
//we take the node to be deleted (which in this case, is *node_link or node as shown above)
//then, we go down the RIGHT subtree of that node
//once we go down the RIGHT subtree, we find the minimum node of that right subtree
//since we know this subtree is valid (or we at least assume it is), we simply keep following the left nodes
//once we hit the last left node, IE the mimimum, we stop
//using that miminum value (the key of the node), we insert that value into the node we wish to delete
//so now, instead of free(node), we ACTUALLY free the minimum node of the right subtree (the node we stole the key from)
//so, grab the right node
struct Node** curr_node_link = &node->r;
//then, we iterate until we hit the minimum value by going deeper and deeper down the left path
while ((*curr_node_link)->l != NULL) {
curr_node_link = &(*curr_node_link)->l;
}
//the node we wish to delete is not deleted
//instead we copy the value of minimum node into the node we wish to delete
struct Node* successor = *curr_node_link;
node->key = successor->key;
//if our successor node has a right subtree, we need to make sure it stays connected to our BST
*curr_node_link = successor->r;
//THEN, we actually delete the minimum node
free(successor);
return 0;
}
return 0;
}
void free_subtree(struct Node* node) {
if (node == NULL) return;
free_subtree(node->l);
free_subtree(node->r);
free(node);
}
void destroy_bst(struct BST* tree) {
if (tree == NULL) return;
free_subtree(tree->root);
tree->root = NULL;
}
void print_sub_tree_in_order(struct Node* node) {
if (node == NULL) return; //we have reached the end, early exit
print_sub_tree_in_order(node->l);
printf("%d ", node->key);
print_sub_tree_in_order(node->r);
}
//and to close, we are going to write a function that traverses the tree in order and prints the keys
//doing this should just print a sorted list of numbers from low to high!
//that is what makes BSTs so useful, their definition forces useful properties
void print_tree_in_order(struct BST* tree) {
//to do this, we need print in this order:
//left subtree -> parent node -> right subtree
//this will be done recursively for this example
//in the future we might use a stack or other structure to implement this functionality, havent made my mind up yet
//therefore for this to work, we need to go all the way left (the smallest value), THEN PRINT
if (tree == NULL || tree->root == NULL) return; //invalid tree pointer, early exit dont even try and print
print_sub_tree_in_order(tree->root);
printf("\n");
}
And here is what we get when we benchmark our tree structure:
=== Binary Search Tree Benchmark ===
Real implementation timing (includes malloc/free)
=============================
BENCH N=1000
=============================
--- CASE: SORTED INSERT (worst-case shape) ---
insert worst O(N)/op | 0.819 ms | 818.6 ns/op
search worst O(N)/op | 914.651 ms | 457.3 ns/op (REPS=2000000)
delete worst O(N)/op | 0.008 ms | 8.1 ns/op (OPS=1000)
--- CASE: PSEUDO-RANDOM INSERT (average-ish shape) ---
insert avg ~O(log N)/op | 0.055 ms | 54.8 ns/op
search avg ~O(log N)/op | 89.296 ms | 44.6 ns/op (REPS=2000000)
delete avg ~O(log N)/op | 0.069 ms | 69.0 ns/op (OPS=1000)
=============================
BENCH N=3000
=============================
--- CASE: SORTED INSERT (worst-case shape) ---
insert worst O(N)/op | 5.284 ms | 1761.4 ns/op
search worst O(N)/op | 3737.766 ms | 1868.9 ns/op (REPS=2000000)
delete worst O(N)/op | 0.021 ms | 7.0 ns/op (OPS=3000)
--- CASE: PSEUDO-RANDOM INSERT (average-ish shape) ---
insert avg ~O(log N)/op | 0.157 ms | 52.4 ns/op
search avg ~O(log N)/op | 121.852 ms | 60.9 ns/op (REPS=2000000)
delete avg ~O(log N)/op | 0.237 ms | 78.9 ns/op (OPS=3000)
=============================
BENCH N=10000
=============================
--- CASE: SORTED INSERT (worst-case shape) ---
insert worst O(N)/op | 99.876 ms | 9987.6 ns/op
search worst O(N)/op | 16909.552 ms | 8454.8 ns/op (REPS=2000000)
delete worst O(N)/op | 0.088 ms | 8.8 ns/op (OPS=10000)
--- CASE: PSEUDO-RANDOM INSERT (average-ish shape) ---
insert avg ~O(log N)/op | 0.661 ms | 66.1 ns/op
search avg ~O(log N)/op | 152.629 ms | 76.3 ns/op (REPS=2000000)
delete avg ~O(log N)/op | 0.977 ms | 97.7 ns/op (OPS=10000)
Some Important Notes on the Benchmark
Throughout this entire blog post I have hinted at the idea of a balanced tree. This tree we implemented is not “balanced” by any means depending on how you insert elements. If we were to insert elements that are already in order, we literally would have a BST that looks like a linked list at a 45 degree angle:
Insert: 1, 2, 3, 4, 5
1
\
2
\
3
\
4
\
5
Pretty funny if you ask me, and the result of this is that our complexity table for operations ends up being identical to a linked list. The REAL power comes when we insert the values in a random order:
Insert Order: 3, 1, 5, 2, 4
3
/ \
1 5
\ /
2 4
I apologize for the really bad formatting on that above tree, but it should get the point across that the operations on this tree are faster than our unbalanced tree from before. It results in O(log(n)) complexity at best, and at worst, O(n) complexity since it essentially is a linked list.
So… what if there was a way to insert into a tree to force it to be balanced? There is (AVL tree), and that is going to be our next post which will be delayed since the Mardi Gras season is now upon us. For those of you unaware, Mardi Gras is Louisiana’s excuse to drop everything we are doing and party for a week straight 🙂
Leave a Reply