Posts:

  • Engineering Real Roll-Marks Using My Laser Engraver

    Well, I survived Mardi Gras, and in lieu of writing another post about C data structures, I have decided to showcase some code I wrote to simulate roll-markings using my MOPA laser engraver.


    Why Standard Grayscale Engraving Fails — and How Geometry Fixes It

    When engraving roll-marks with a laser, the common workflow is simple:

    • White = no cut
    • Black = maximum cut
    • Midtones = proportional depth

    This works for artistic relief.
    It does not produce authentic roll mark geometry.

    The issue isn’t power.
    It’s shape.

    Here is a photo of an authentic roll-mark:

    And here are the ACTUAL dies used to produce said roll-mark:

    And finally, here is a photo of an MP5 build I engraved:

    And here is where our problem begins…


    The Hidden Assumption Behind Grayscale Engraving

    Most grayscale workflows assume:

    Pixel intensity directly maps to depth.

    But that mapping produces a depth field — not controlled wall geometry.

    If you engrave a solid black shape using standard grayscale or binary engraving, the cross-section looks like this:

    The walls are vertical.
    The edge is a cliff.
    The geometry is boxy.

    That is not what a real stamped roll mark looks like.


    What a Real Roll-Mark Looks Like

    Real stamped or roll-marked engravings have controlled wall taper.

    Typical cross-section:

    Or more accurately, a blended geometry:

    The wall is not purely vertical.
    There is a transition from edge to interior.

    That transition affects:

    • Edge sharpness
    • Shadow behavior
    • Light reflection
    • Perceived depth
    • Authenticity

    Laser engraving often fails here because as stated before, it gives us flat walls that don’t really give the effect of an authentic roll-mark.


    Why Grayscale Alone Cannot Fix This

    You can attempt to fake taper using gradients in the artwork itself.

    But that has problems:

    • Gradient width does not scale with stroke thickness.
    • Depth becomes power-curve dependent.
    • You lose geometric consistency in millimeters.
    • Complex shapes produce unpredictable transitions.

    The fundamental issue:

    Grayscale encoding does not know where the edge of the shape is.

    It only knows brightness.

    To control wall shape, we need edge-aware depth generation.


    Distance Transform as a Geometric Tool

    Instead of mapping pixel intensity to depth, we can compute:

    For every pixel inside the shape:
    Distance to the nearest boundary.

    That gives us a scalar field that describes how far each point is from the edge.

    If we normalize that distance over a defined wall width, we can generate a linear inward ramp.

    Mathematically:

    • Let d(x,y) be distance to nearest edge.
    • Let w be wall width in pixels.
    • Ramp factor t = clamp(d / w, 0, 1)
    • Depth fraction = 1 - t

    Result:

    • Edge pixels stay shallow.
    • Interior pixels reach full depth.
    • Transition width is controlled in millimeters.

    This produces a physically meaningful wall taper.


    Comparing Profiles

    1. Binary Engraving

    2. Distance-Based Ramp

    3. Square + Ramp Blen

    The third profile most closely resembles real roll mark geometry.

    It preserves a defined vertical component while adding a controlled taper.


    Controlling Geometry in Real Units

    One of the key improvements is unit consistency.

    Instead of defining ramp width in pixels, we define:

    Wall width in millimeters.

    Using export DPI:

    mm = (pixels / DPI) * 25.4

    That means:

    • Ramp geometry scales correctly.
    • Stroke thickness does not break taper width.
    • The profile is material-aware.

    Now geometry is independent of artwork resolution and this makes our code a lot easier to reason about.


    Blending Straight and Sloped Depth

    Real markings often have a square drop before taper.

    We can represent total depth as:

    Total depth = straight_wall_depth + sloped_wall_depth

    We generate two images:

    1. Full-depth mask (binary ink)
    2. Sloped ramp (distance-based)

    Then blend them proportionally:

    w_full = straight / (straight + sloped)

    The result is a controllable square-on-trapezoid profile.


    Why This Matters Visually

    When you engrave metal:

    • Light grazes edges.
    • Micro-shadowing enhances perception.
    • Taper affects highlight width.

    A vertical wall reflects differently than a tapered wall.

    Even small taper changes dramatically improve authenticity.

    The difference is subtle in isolation, but obvious side-by-side.


    Depth Calibration and Pass Control

    Laser depth is rarely linear.

    Instead of relying on grayscale power mapping, we treat the grayscale output as:

    A heightmap target.

    Then calculate required passes using:

    passes = ceil(total_depth_mm / mm_per_pass)

    This decouples geometry from laser power tuning.

    Now:

    • Geometry is computed.
    • Laser simply executes depth.

    And from there, Lightburn (the software that most people use to engrave with Galvo lasers) handles the rest.


    Github Repo:

    To see the ACTUAL code that does all of this, click here

    Yes, it is in python. The good thing about python is that its image processing libraries are really nice to work with and for that reason and that reason alone I decided to implement all of this math and logic in python. It also made getting to an MVP incredibly quick. All of this took me about a day of reasoning (lots of whiteboard shuffling and schizo lecturing to my girlfriend) and then about 2 hours to write the actual code. Put any other problem in front of me that requires actual speed and complexity and I for sure wouldn’t get within 10 feet of python.

    You will also notice the lack of results photos. This is not because the code does not work, but because if I post a copy of a Colt roll-mark I did for a customer, I will get a very prompt C&D from Colt 🙂

  • Data Structure Every Day, Day 6: Binary Search Tree

    This is the portion of data structures and algorithms that I got completely lost on as a third year ECE major. It was also THE FIRST thing they taught us in that class (why they wouldn’t start with linear data structures like arrays and lists is beyond me). Fortunately there are TONS of resources to brush up on tree data structures and thus the implementation of this was not as painful as I thought it was going to be.


    Linear vs Tree Data Structures — Why Shape Matters

    Up until now, every data structure we’ve implemented has been linear.
    Linked lists, stacks, queues, and deques all share one core idea:

    There is exactly one path through the data.

    If you imagine them visually, they look like a chain:

    [ A ] -> [ B ] -> [ C ] -> [ D ] -> [ E ] // should remind you of our previous linked list implementations 

    Each element knows about at most one next element (and sometimes one previous element).
    This makes them simple and predictable, but it also limits how efficiently we can search them.

    If you want to find something in a linked list, you usually have to walk element by element:

    Worst case:

    O(N) // for other ops as we have shown, we can augment the linked list data structure with a tail or previous pointer to make other ops O(1)

    This is fine for small datasets, but once data gets large, linear traversal becomes expensive.


    Enter Tree Data Structures

    Tree data structures break away from the idea of a single path.

    Instead of each element pointing to just one next element, tree elements can branch (hence the tree name):

            A
           / \
          B   C
         / \
        D   E

    The advantage here should become pretty clear. Let’s see we want to find the E Node. For our linked list, that means walking the entire list until we hit the E node. That is 5 ops. For this tree above, we simply visit A, then B, then we hit E (how this actually occurs will become more clear as we step into the specific definition of what makes a binary search tree a binary search tree).


    Introducing the Binary Search Tree (BST)

    So, how does that specific case above actually occur and optimize our search for a node? A Binary Search Tree is a specific kind of tree with two rules:

    Rule 1 — Binary

    Each node has at most two children:

    left child
    right child

    Rule 2 — Ordered (This Is The Important Part)

    For every node:

    left subtree  <  node value  <  right subtree // for out letters, it means being in alphabetical order. for numbers, we can just use more than or less than operators

    It is this second rule that really solidifies why our search for E case is faster for a BST vs a linear data structure.

    Another Example:

    Search example:

            50
           /  \
         30    70
        / \   / \
       20 40 60 80

    Everything left of 50 is smaller.
    Everything right of 50 is larger.

    This rule is applied recursively throughout the tree.

    So lets say we want to look for the value 60 in this tree:

    Start at 50 → 60 > 50 → go right
    At 70 → 60 < 70 → go left
    Found 60

    Instead of checking every element, you eliminate half the tree at each step (in ideal cases where we have a balanced tree).


    The Three Core BST Operations

    BSTs are defined primarily by three operations:

    Insert

    Place a new value in the correct position while maintaining ordering.


    Search

    Find whether a value exists by following left/right decisions.


    Delete

    Remove a value while preserving BST structure.

    This is the hardest operation because removing nodes can break ordering if done incorrectly.


    The Hidden Fourth Operation: Traversal

    Traversal isn’t always listed as a “core BST operation,” but it is extremely important.

    Traversal means visiting every node in a specific order.

    For BSTs, the most important traversal is:

    In-Order Traversal

    Left → Node → Right

    This prints BST values in sorted order.

    This is one of the biggest reasons BSTs are useful.


    Important Reality: BSTs Are Not Always Fast

    BST performance depends on tree shape.

    If the tree stays balanced:

    Search ≈ O(log N)

    If the tree becomes skewed (like inserting sorted data):

    Search ≈ O(N)

    Worst case BST becomes a linked list.

    This is exactly what our benchmarks later will demonstrate.


    The Actual Challenge: Implementation in C:

    So like before, I will show you the header file and then we will discuss each function. This code is FULL of comments since the concepts here are sometimes a bit rough to grasp (or at least I found them hard to grasp upon first look):

    bst.h

    #ifndef BST_H
    #define BST_H
    
    #include <stdio.h>
    #include <stdlib.h>
    
    struct Node {
    
            int key;
            struct Node* l;
            struct Node* r;
    };
    
    struct BST {
            struct Node* root;
    };
    
    struct Node* alloc_node(int key);
    
    int insert(int key, struct BST* tree);
    
    struct Node* search(int key, struct BST* tree);
    
    struct Node** search_for_delete(int key, struct BST* tree);
    
    int delete(int key, struct BST* tree);
    
    void free_subtree(struct Node* node);
    
    void destroy_bst(struct BST* tree);
    
    void print_sub_tree_in_order(struct Node* node);
    
    void print_tree_in_order(struct BST* tree);
    
    #endif

    bst.c

    #include "bst.h"
    
    struct Node* alloc_node(int key) {
        struct Node* new_node = malloc(sizeof(struct Node));
        if (!new_node) return NULL; //malloc failed, return NULL
        new_node->key = key;
        new_node->l = NULL;
        new_node->r = NULL;
        return new_node;
    }
    
    int insert(int key, struct BST* tree) {
    
        if (!tree) return 1; //error, bst pointer is invalid!
    
        //first, we handle the case where we are inserting into an empty tree
        if (!tree->root) {
            struct Node* new_root = alloc_node(key);
            tree->root = new_root;
            return 0;
        }
    
        //okay so now, if the tree is not empty, we have to traverse the tree
        //to start, we grab the current root node
        struct Node* curr_node = tree->root;
    
        while (1) {
            if (key < curr_node->key) {
                if (curr_node->l == NULL) {
                    struct Node* new_node = alloc_node(key);
                    if (!new_node) return 1; //node allocation failed
                    curr_node->l = new_node;
                    return 0;
                }
                curr_node = curr_node->l; //if the node is populated, we keep traversing
            }
            else if (key > curr_node->key) {
                if (curr_node->r == NULL) {
                    struct Node* new_node = alloc_node(key);
                    if (!new_node) return 1; //node allocation failed
                    curr_node->r = new_node;
                    return 0;
                }
                curr_node = curr_node->r; //if the node is populated, we keep traversing
            }
            else {
                //the key is equal... we will exlcude duplicate
                return 2; //duplicate error code!
            }
        }
    
        return 0;
    }
    
    struct Node* search(int key, struct BST* tree) {
    
        if (!tree || !tree->root) return NULL; //invalid tree pointer or tree is empty
    
        struct Node* curr_node = tree->root;
    
        while (1) {
            if (key < curr_node->key) { //if less, we need to search down the left node
                if (curr_node->l == NULL) return NULL; //this means there is no left node, which means node is not in the tree, return NULL
                curr_node = curr_node->l;
            }
            else if (key > curr_node->key) { //if more, we need to search down the right node
                if (curr_node->r == NULL) return NULL; //same as above, just for the right node
                curr_node = curr_node->r;
            }
            else {
                return curr_node;
            }
        }
    
        return NULL; //node not found, return NULL
    }
    
    struct Node** search_for_delete(int key, struct BST* tree) {
    
        if (!tree || !tree->root) return NULL;
    
        struct Node** curr_node = &tree->root;
    
        while (1) {
            if (key < (*curr_node)->key) {
                if ((*curr_node)->l == NULL) return NULL;
                curr_node = &(*curr_node)->l;
            }
            else if (key > (*curr_node)->key) {
                if ((*curr_node)->r == NULL) return NULL;
                curr_node = &(*curr_node)->r;
            }
            else {
                return curr_node;
            }
        }
    
        return NULL;
    }
    
    int delete(int key, struct BST* tree) {
    
        struct Node** node_link = search_for_delete(key, tree);
        if (!node_link) return 1; //node not found, return 1
    
        struct Node* node = *node_link;
    
        //this node is leaf node, deletion is easy
        if (!node->l && !node->r) {
            *node_link = NULL;
            free(node);
            return 0;
        }
        else if (node->l && !node->r) { //only left node
            *node_link = node->l;
            free(node);
            return 0;
        }
        else if (!node->l && node->r) { //only right node
            *node_link = node->r;
            free(node);
            return 0;
        }
        else if (node->l && node->r) {
    
            //if node has both children, it gets complicated, we need to preserve the correctness of the tree
            //to do this, we have to use a little trick
            //its called successor (another apporach is precessor)
            //we take the node to be deleted (which in this case, is *node_link or node as shown above)
            //then, we go down the RIGHT subtree of that node
            //once we go down the RIGHT subtree, we find the minimum node of that right subtree
            //since we know this subtree is valid (or we at least assume it is), we simply keep following the left nodes
            //once we hit the last left node, IE the mimimum, we stop
            //using that miminum value (the key of the node), we insert that value into the node we wish to delete
            //so now, instead of free(node), we ACTUALLY free the minimum node of the right subtree (the node we stole the key from)
    
            //so, grab the right node
            struct Node** curr_node_link = &node->r;
    
            //then, we iterate until we hit the minimum value by going deeper and deeper down the left path
            while ((*curr_node_link)->l != NULL) {
                curr_node_link = &(*curr_node_link)->l;
            }
    
            //the node we wish to delete is not deleted
            //instead we copy the value of minimum node into the node we wish to delete
            struct Node* successor = *curr_node_link;
            node->key = successor->key;
    
            //if our successor node has a right subtree, we need to make sure it stays connected to our BST
            *curr_node_link = successor->r;
    
            //THEN, we actually delete the minimum node
            free(successor);
            return 0;
        }
    
        return 0;
    }
    
    void free_subtree(struct Node* node) {
    
        if (node == NULL) return;
    
        free_subtree(node->l);
        free_subtree(node->r);
        free(node);
    }
    
    void destroy_bst(struct BST* tree) {
    
        if (tree == NULL) return;
    
        free_subtree(tree->root);
        tree->root = NULL;
    }
    
    void print_sub_tree_in_order(struct Node* node) {
    
        if (node == NULL) return; //we have reached the end, early exit
    
        print_sub_tree_in_order(node->l);
        printf("%d ", node->key);
        print_sub_tree_in_order(node->r);
    }
    
    //and to close, we are going to write a function that traverses the tree in order and prints the keys
    //doing this should just print a sorted list of numbers from low to high!
    //that is what makes BSTs so useful, their definition forces useful properties
    void print_tree_in_order(struct BST* tree) {
    
        //to do this, we need print in this order:
        //left subtree -> parent node -> right subtree
        //this will be done recursively for this example
        //in the future we might use a stack or other structure to implement this functionality, havent made my mind up yet
    
        //therefore for this to work, we need to go all the way left (the smallest value), THEN PRINT
    
        if (tree == NULL || tree->root == NULL) return; //invalid tree pointer, early exit dont even try and print
    
        print_sub_tree_in_order(tree->root);
        printf("\n");
    }

    And here is what we get when we benchmark our tree structure:

    === Binary Search Tree Benchmark ===
    Real implementation timing (includes malloc/free)
    
    =============================
    BENCH N=1000
    =============================
    
    --- CASE: SORTED INSERT (worst-case shape) ---
    insert       worst O(N)/op        |      0.819 ms |      818.6 ns/op
    search       worst O(N)/op        |    914.651 ms |      457.3 ns/op (REPS=2000000)
    delete       worst O(N)/op        |      0.008 ms |        8.1 ns/op (OPS=1000)
    
    --- CASE: PSEUDO-RANDOM INSERT (average-ish shape) ---
    insert       avg ~O(log N)/op     |      0.055 ms |       54.8 ns/op
    search       avg ~O(log N)/op     |     89.296 ms |       44.6 ns/op (REPS=2000000)
    delete       avg ~O(log N)/op     |      0.069 ms |       69.0 ns/op (OPS=1000)
    
    =============================
    BENCH N=3000
    =============================
    
    --- CASE: SORTED INSERT (worst-case shape) ---
    insert       worst O(N)/op        |      5.284 ms |     1761.4 ns/op
    search       worst O(N)/op        |   3737.766 ms |     1868.9 ns/op (REPS=2000000)
    delete       worst O(N)/op        |      0.021 ms |        7.0 ns/op (OPS=3000)
    
    --- CASE: PSEUDO-RANDOM INSERT (average-ish shape) ---
    insert       avg ~O(log N)/op     |      0.157 ms |       52.4 ns/op
    search       avg ~O(log N)/op     |    121.852 ms |       60.9 ns/op (REPS=2000000)
    delete       avg ~O(log N)/op     |      0.237 ms |       78.9 ns/op (OPS=3000)
    
    =============================
    BENCH N=10000
    =============================
    
    --- CASE: SORTED INSERT (worst-case shape) ---
    insert       worst O(N)/op        |     99.876 ms |     9987.6 ns/op
    search       worst O(N)/op        |  16909.552 ms |     8454.8 ns/op (REPS=2000000)
    delete       worst O(N)/op        |      0.088 ms |        8.8 ns/op (OPS=10000)
    
    --- CASE: PSEUDO-RANDOM INSERT (average-ish shape) ---
    insert       avg ~O(log N)/op     |      0.661 ms |       66.1 ns/op
    search       avg ~O(log N)/op     |    152.629 ms |       76.3 ns/op (REPS=2000000)
    delete       avg ~O(log N)/op     |      0.977 ms |       97.7 ns/op (OPS=10000)
    

    Some Important Notes on the Benchmark

    Throughout this entire blog post I have hinted at the idea of a balanced tree. This tree we implemented is not “balanced” by any means depending on how you insert elements. If we were to insert elements that are already in order, we literally would have a BST that looks like a linked list at a 45 degree angle:

    Insert: 1, 2, 3, 4, 5
    
    1
     \
      2
       \
        3
         \
          4
           \
            5

    Pretty funny if you ask me, and the result of this is that our complexity table for operations ends up being identical to a linked list. The REAL power comes when we insert the values in a random order:

    Insert Order: 3, 1, 5, 2, 4
            3
           / \
          1    5
          \   /
           2  4

    I apologize for the really bad formatting on that above tree, but it should get the point across that the operations on this tree are faster than our unbalanced tree from before. It results in O(log(n)) complexity at best, and at worst, O(n) complexity since it essentially is a linked list.

    So… what if there was a way to insert into a tree to force it to be balanced? There is (AVL tree), and that is going to be our next post which will be delayed since the Mardi Gras season is now upon us. For those of you unaware, Mardi Gras is Louisiana’s excuse to drop everything we are doing and party for a week straight 🙂

  • Data Structure Every Day, Day 5: Queue and Deque

    In my last post, I showed how a simple Stack data structure could be implemented by simply reusing our already existing tailless linked list code. In today’s entry, we are are going to knock out TWO data structures following the same approach. The queue and the deque.


    Little Refresher:

    For those unfamiliar, here is a general look at how these data structures compare to each other:

    Stack:

    One end only → LIFO

    Queue:

    Insert at tail
    Remove at head → FIFO

    Deque:

    Insert/remove at both ends

    What Is a Queue?

    A queue follows a simple rule:

    First In, First Out (FIFO)

    Example:

    enqueue A
    enqueue B
    enqueue C
    
    dequeue → A
    dequeue → B
    dequeue → C

    Queue API

    Typical queue operations:

    enqueue()
    dequeue()
    front()
    is_empty()

    What Is a Deque?

    Deque = Double Ended Queue

    You can:

    push_front()
    push_back()
    pop_front()
    pop_back()

    Why Our Previous Linked Lists Are Perfect For These

    Let’s think in terms of requirements.

    Queue Needs

    Add at tail → O(1)
    Remove at head → O(1)

    That means we need:

    head pointer
    tail pointer

    Ring a bell? That is our tailed link list implementation, which can be found here.


    Deque Needs

    Remove at tail → O(1)

    To do that efficiently, we need:

    prev pointer

    Our previous doubly linked list implementation fulfills this need! That can be found here.


    Implementation Strategy

    Just like with our stack implementation, we are simply writing wrappers that use our pre-existing tailed linked list and doubly linked list code:

    queue.h

    #ifndef QUEUE_H
    #define QUEUE_H
    
    #include "linked_list.h"
    #include <stddef.h>
    
    struct Queue {
            struct LinkedList list; 
            size_t size;            
    };
    
    
    void queue_init(struct Queue* q);
    void queue_destroy(struct Queue* q); 
    
    
    int queue_enqueue(struct Queue* q, struct Node* n); 
    int queue_dequeue(struct Queue* q);                 
    struct Node* queue_peek(const struct Queue* q);     
    
    int queue_is_empty(const struct Queue* q);
    size_t queue_size(const struct Queue* q);
    
    #endif

    queue.c

    #include "queue.h"
    
    void queue_init(struct Queue* q) {
            if (!q) return;
            list_init(&q->list);
            q->size = 0;
    }
    
    void queue_destroy(struct Queue* q) {
            if (!q) return;
            free_list(&q->list); /* frees nodes, leaves list usable */
            q->size = 0;
    }
    
    /*
     * enqueue = push_back
     */
    int queue_enqueue(struct Queue* q, struct Node* n) {
            if (!q || !n) return -1;
            push_back(&q->list, n);
            q->size++;
            return 0;
    }
    
    /*
     * dequeue = pop_front
     */
    int queue_dequeue(struct Queue* q) {
            if (!q) return -1;
            if (q->list.head == NULL) return -1;
    
            pop_front(&q->list);
            q->size--;
            return 0;
    }
    
    struct Node* queue_peek(const struct Queue* q) {
            if (!q) return NULL;
            /* front() takes non-const in your header, so cast is easiest */
            return front((struct LinkedList*) &q->list);
    }
    
    int queue_is_empty(const struct Queue* q) {
            return (!q || q->list.head == NULL);
    }
    
    size_t queue_size(const struct Queue* q) {
            return q ? q->size : 0;
    }

    deque.h

    #ifndef DEQUE_H
    #define DEQUE_H
    
    #include <stddef.h>
    #include "linked_list.h"
    
    struct Deque {
            struct LinkedList list; 
            size_t size;            
    };
    
    
    void deque_init(struct Deque* d);
    void deque_destroy(struct Deque* d); 
    
    int deque_push_front(struct Deque* d, struct Node* n); 
    int deque_push_back(struct Deque* d, struct Node* n); 
    
    int deque_pop_front(struct Deque* d); 
    int deque_pop_back(struct Deque* d);  
    
    struct Node* deque_peek_front(const struct Deque* d);
    struct Node* deque_peek_back(const struct Deque* d);
    
    int deque_is_empty(const struct Deque* d);
    size_t deque_size(const struct Deque* d);
    
    #endif

    deque.c

    #include "deque.h"
    
    void deque_init(struct Deque* d) {
            if (!d) return;
            list_init(&d->list);
            d->size = 0;
    }
    
    void deque_destroy(struct Deque* d) {
            if (!d) return;
            free_list(&d->list); 
            d->size = 0;
    }
    
    int deque_push_front(struct Deque* d, struct Node* n) {
            if (!d || !n) return -1;
            push_front(&d->list, n);
            d->size++;
            return 0;
    }
    
    int deque_push_back(struct Deque* d, struct Node* n) {
            if (!d || !n) return -1;
            push_back(&d->list, n);
            d->size++;
            return 0;
    }
    
    int deque_pop_front(struct Deque* d) {
            if (!d) return -1;
            if (d->list.head == NULL) return -1;
    
            pop_front(&d->list);
            d->size--;
            return 0;
    }
    
    int deque_pop_back(struct Deque* d) {
            if (!d) return -1;
            if (d->list.tail == NULL) return -1;
    
            pop_back(&d->list);
            d->size--;
            return 0;
    }
    
    struct Node* deque_peek_front(const struct Deque* d) {
            if (!d) return NULL;
            return front((struct LinkedList*) &d->list);
    }
    
    struct Node* deque_peek_back(const struct Deque* d) {
            if (!d) return NULL;
            return back((struct LinkedList*) &d->list);
    }
    
    int deque_is_empty(const struct Deque* d) {
            return (!d || d->list.head == NULL);
    }
    
    size_t deque_size(const struct Deque* d) {
            return d ? d->size : 0;
    }

    The Big Takeaway

    A stack, queue, and deque are not fundamentally different storage systems when compared to their corresponding linked list implementations.

    For my next post, I promise I will NOT be hitting you with another rehash of old code 🙂

  • Data Structure Every Day, Day 4: Stack

    Over the last three posts, we created a tailless linked list, extended it into a tailed list, and finally we implemented a double linked list that solved the O(n) complexity hurdles. Each step added speed, but with that came some tradeoffs along with complexity.

    For Day 4, I have decided that the next logical step is to implement a Stack data structure, but instead of starting from raw memory and building everything from scratch, we’re going to focus on behavior and benchmarks first (this will all make sense at the end of the blog post so hang with me).


    What Is a Stack?

    A stack is one of the simplest and most fundamental data structures in computer science.

    It follows a single rule:

    Last In, First Out (LIFO)

    If you push items onto a stack:

    Push A
    Push B
    Push C

    Then popping removes them in reverse order:

    Pop → C
    Pop → B
    Pop → A

    Core Stack Operations

    A stack usually exposes a very small API, and we will keep the operations to just these to make our lives easy:

    push()      // Add item to top
    pop()       // Remove item from top
    peek()      // Read top without removing
    is_empty()  // Check if stack is empty

    Notice something important here:

    A stack only ever interacts with one end of the structure — the top.

    This turns out to matter a lot when choosing how to implement one. For the astute among you, you can probably see where we are going with this given our last few posts.


    Why Stacks Matter in Real Systems

    Stacks show up everywhere:

    • Function call stacks
    • Undo/redo systems
    • Expression parsing
    • DFS graph traversal
    • Embedded buffering patterns
    • Temporary working memory in algorithms


    Performance First — Benchmarking the Implementation

    Before we talk about how this stack is implemented, let’s look at real performance numbers.

    These benchmarks are:

    • push
    • peek
    • pop

    Each push allocates memory.
    Each pop frees memory.

    This is the “honest” cost of dynamic allocation, and this performance readout follows the same general structure as the ones in our last posts:

    === Stack (L****d L**t) Benchmark ===
    Real implementation timing (includes malloc/free)
    
    =============================
    BENCH N=1000
    =============================
    push         O(1)/op total O(N)   |      0.031 ms |       30.9 ns/op
    peek         O(1)/op              |      0.987 ms |        0.5 ns/op (REPS=2000000)
    pop          O(1)/op (+free)      |      0.008 ms |        8.3 ns/op
    
    =============================
    BENCH N=3000
    =============================
    push         O(1)/op total O(N)   |      0.089 ms |       29.6 ns/op
    peek         O(1)/op              |      1.112 ms |        0.6 ns/op (REPS=2000000)
    pop          O(1)/op (+free)      |      0.026 ms |        8.7 ns/op
    
    =============================
    BENCH N=10000
    =============================
    push         O(1)/op total O(N)   |      0.272 ms |       27.2 ns/op
    peek         O(1)/op              |      1.023 ms |        0.5 ns/op (REPS=2000000)
    pop          O(1)/op (+free)      |      0.087 ms |        8.7 ns/op
    
    =============================
    BENCH N=30000
    =============================
    push         O(1)/op total O(N)   |      0.699 ms |       23.3 ns/op
    peek         O(1)/op              |      0.939 ms |        0.5 ns/op (REPS=2000000)
    pop          O(1)/op (+free)      |      0.235 ms |        7.8 ns/op
    
    =============================
    BENCH N=100000
    =============================
    push         O(1)/op total O(N)   |      2.360 ms |       23.6 ns/op
    peek         O(1)/op              |      0.935 ms |        0.5 ns/op (REPS=2000000)
    pop          O(1)/op (+free)      |      0.786 ms |        7.9 ns/op
    

    Ignore the censored part of the metrics 🙂

    The key takeaways:

    OperationComplexity
    pushO(1)
    popO(1)
    peekO(1)

    This is exactly what we want from a stack, and it is also identical to the previous charts for our tailless, tailed, and doubly linked list implementations.


    The Important Engineering Idea

    Here’s the key concept for this post:

    A data structure is defined by its behavior and guarantees — not by how it is stored internally.

    If another structure already provides the complexity guarantees we need…

    We can reuse it.


    Surprise! — The Stack Is Just Our Linked List!

    This entire stack implementation is built on top of our tailless singly linked list.

    No new node structures.
    No new memory management.
    No new traversal logic.

    Just restricted access.


    Mapping Stack Operations → Linked List Operations

    StackLinked List
    pushpush_front
    poppop_front
    peekfront

    That’s it.

    A stack is literally:

    A linked list where you are only allowed to touch the head.

    Because of this, our tailless linked list fits the bill perfectly. Here is ALL the code for your reference:


    stack.h

    #ifndef STACK_H
    #define STACK_H
    
    #include "linked_list.h"
    
    // our stack is literally just wrapping our tailless linked list
    struct Stack {
            struct Node* head;
            size_t size;
    };
    
    void stack_init(struct Stack* s);
    void stack_destroy(struct Stack* s);
    
    int stack_push(struct Stack* s, struct Node* n);
    int stack_pop(struct Stack* s);
    struct Node* stack_peek(const struct Stack* s);
    
    int stack_is_empty(const struct Stack* s);
    size_t stack_size(const struct Stack* s);
    
    #endif

    stack.c

    #include "stack.h"
    
    void stack_init(struct Stack* s) {
            if (!s) return;
            s->head = NULL;
            s->size = 0;
    }
    
    void stack_destroy(struct Stack* s) {
            if (!s) return;
            list_destroy(&s->head); // uses linked_list.c
            s->size = 0;
    }
    
    int stack_push(struct Stack* s, struct Node* n) {
            if (!s || !n) return -1;
            push_front(n, &s->head); // uses linked_list.c
            s->size++;
            return 0;
    }
    
    int stack_pop(struct Stack* s) {
            if (!s || !s->head) return -1;
            pop_front(&s->head); // uses linked_list.c (and frees node)
            s->size--;
            return 0;
    }
    
    struct Node* stack_peek(const struct Stack* s) {
            if (!s) return NULL;
            return front(s->head); // uses linked_list.c
    }
    
    int stack_is_empty(const struct Stack* s) {
            return (!s || s->head == NULL);
    }
    
    size_t stack_size(const struct Stack* s) {
            return s ? s->size : 0;
    }

    And for the code we reference in the above files, you can simply visit my post on tailless linked lists.

    So, did I cheat for today by using a bunch of code I already wrote? Kind of. However, later on, I promise it will all make sense when we eventually revisit some of these data structures in the future, but that is far, far away.

  • Data Structure Every Day, Day 3: Doubly Linked List

    From Tailed Lists to Doubly Linked Lists — Killing O(n) pop_back

    In yesterday’s post, we improved a basic singly linked list by adding a tail pointer:

    struct LinkedList {
        struct Node* head;
        struct Node* tail;
    };

    That change alone gave us:

    OperationComplexity
    push_frontO(1)
    pop_frontO(1)
    frontO(1)
    push_backO(1)
    pop_backO(n)
    backO(1)

    The big remaining problem was pop_back().

    Because singly linked lists only store next, removing the tail requires walking the list to find the node before it:

    O(n) traversal → find prev → remove tail

    Today we fix that permanently.


    The Upgrade: Doubly Linked Nodes

    We add a prev pointer:

    struct Node {
        void* data;
        size_t data_size;
        struct Node* next;
        struct Node* prev;
    };
    

    Now every node knows:

    • Who comes after it
    • Who comes before it

    That single extra pointer changes everything.


    New Complexity Table

    OperationComplexity
    push_frontO(1)
    pop_frontO(1)
    frontO(1)
    push_backO(1)
    pop_backO(1)
    backO(1)

    No more traversal. Ever.


    Why pop_back Becomes O(1)

    Old singly linked list:

    HEAD → A → B → C → D → NULL
                            ↑ tail

    To remove D, we had to find C by scanning from HEAD.


    New doubly linked list:

    HEAD → A ↔ B ↔ C ↔ D ← tail

    Now:

    tail->prev = C

    So removal becomes:

    free(tail)
    tail = prev

    Constant time.


    Key Implementation Details

    push_front (Must Update Old Head’s prev)

    //onyl diff is that we enforce the standard that the head node has a NULL previous pointer and account for empty lists in a diff way
    void push_front(struct LinkedList* list, struct Node* new_head) {
            if (!list || !new_head) return;
    
            new_head->next = list->head;
            new_head->prev = NULL; //as stated in our header, head previous pointer is NULL since its the head
    
            if (list->head) {
                    list->head->prev = new_head;
            } else {
                    list->tail = new_head;
            }
    
            list->head = new_head;
    }

    pop_front (Must Handle Single Node Case)

    void pop_front(struct LinkedList* list) {
            if (!list || !list->head) return;
    
            struct Node* old = list->head;
    
            //now we grab the new head so that we can set the previous pointer to NULL
            struct Node* new_head = old->next;
    
            //this is how we handle the condition of one node
            if (new_head) {
                    new_head->prev = NULL;
                    list->head = new_head;
            } else {
                    //we rmeoved the last node, empty list
                    list->head = NULL;
                    list->tail = NULL;
            }
    
            free_node(old);
    }

    pop_back (Now Trivial)

    void pop_back(struct LinkedList* list) {
            if (!list || !list->head) return;
    
            // single node
            if (list->head == list->tail) {
                    free_node(list->head);
                    list->head = NULL;
                    list->tail = NULL;
                    return;
            }
    
            // find node right before tail - before, we had to iterate and thus it was O(n), now we can just use the previous pointer!
            struct Node* prev = list->tail->prev;
    
            free_node(list->tail);
            prev->next = NULL;
            list->tail = prev;
    }

    This now brings us to the new benchmark profile:

    === Doubly Linked List Benchmark ===
    Implementation: struct LinkedList { head, tail }
    Benchmarks: push_front, push_back, pop_front, pop_back, front, back
    Note: pop_* includes free_node() time.
    
    =============================
    BENCH N=1000
    =============================
    push_front   O(1)/op total O(N)   |      0.061 ms |       61.1 ns/op
    front        O(1)/op              |      2.876 ms |        1.4 ns/op (REPS=2000000)
    back         O(1)/op              |      2.635 ms |        1.3 ns/op (REPS=2000000)
    pop_front    O(1)/op (+free)      |      0.012 ms |       11.8 ns/op
    push_back    O(1)/op total O(N)   |      0.014 ms |       14.0 ns/op
    pop_back     O(1)/op total O(N) |      0.011 ms |       11.0 ns/op
    
    =============================
    BENCH N=3000
    =============================
    push_front   O(1)/op total O(N)   |      0.113 ms |       37.6 ns/op
    front        O(1)/op              |      1.779 ms |        0.9 ns/op (REPS=2000000)
    back         O(1)/op              |      1.611 ms |        0.8 ns/op (REPS=2000000)
    pop_front    O(1)/op (+free)      |      0.026 ms |        8.8 ns/op
    push_back    O(1)/op total O(N)   |      0.031 ms |       10.3 ns/op
    pop_back     O(1)/op total O(N) |      0.025 ms |        8.2 ns/op
    
    =============================
    BENCH N=10000
    =============================
    push_front   O(1)/op total O(N)   |      0.293 ms |       29.3 ns/op
    front        O(1)/op              |      1.614 ms |        0.8 ns/op (REPS=2000000)
    back         O(1)/op              |      1.590 ms |        0.8 ns/op (REPS=2000000)
    pop_front    O(1)/op (+free)      |      0.086 ms |        8.6 ns/op
    push_back    O(1)/op total O(N)   |      0.108 ms |       10.8 ns/op
    pop_back     O(1)/op total O(N) |      0.089 ms |        8.9 ns/op
    
    =============================
    BENCH N=30000
    =============================
    push_front   O(1)/op total O(N)   |      0.826 ms |       27.5 ns/op
    front        O(1)/op              |      1.525 ms |        0.8 ns/op (REPS=2000000)
    back         O(1)/op              |      1.613 ms |        0.8 ns/op (REPS=2000000)
    pop_front    O(1)/op (+free)      |      0.251 ms |        8.4 ns/op
    push_back    O(1)/op total O(N)   |      0.300 ms |       10.0 ns/op
    pop_back     O(1)/op total O(N) |      0.264 ms |        8.8 ns/op
    
    =============================
    BENCH N=100000
    =============================
    push_front   O(1)/op total O(N)   |      2.745 ms |       27.5 ns/op
    front        O(1)/op              |      1.530 ms |        0.8 ns/op (REPS=2000000)
    back         O(1)/op              |      1.553 ms |        0.8 ns/op (REPS=2000000)
    pop_front    O(1)/op (+free)      |      0.854 ms |        8.5 ns/op
    push_back    O(1)/op total O(N)   |      1.109 ms |       11.1 ns/op
    pop_back     O(1)/op total O(N) |      0.863 ms |        8.6 ns/op
    
    TOTAL TIME: 26.819 ms

    Which compared to yesterday’s post, is WAY faster, but this comes at a cost:

    • We now store an extra pointer for every node

    The classic trade-off of allocating more memory to save CPU time… this is a tradeoff that you will run into in just about every piece of code you write. In this instance, the speed is well worth it however.

  • Data Structure Every Day, Day 2: Linked List with Tail!

    Yesterday we went over a basic implementation of a Linked List that lacked a tail. In other words, the entire list was implicitly stored through the use of a single head node and all subsequent list operations used this head pointer. From this, we derived a complexity table:


    Complexity Table (Tailless List)

    OperationComplexity
    push_frontO(1)
    pop_frontO(1)
    frontO(1)
    push_backO(n)
    pop_backO(n)
    backO(n)

    Also in that last post, I hinted at the idea of removing some of the O(n) complexity entries in this table by implementing a linked list that stores the tail node of the list. If we store a tail of the list, we no longer have to walk the entire list to perform push_back and back functionality.

    We do this by removing the implicit linked list by head paradigm and instead store the linked list as a struct that contains a head node (just like before), but also a tail node! We end up getting code that looks like this:

    #ifndef LINKED_LIST_H
    #define LINKED_LIST_H
    
    #include <stddef.h>
    
    struct Node {
            void* data;
            size_t data_size;
            struct Node* next;
    };
    
    // Caller-owned list container (stack-allocated is recommended)
    struct LinkedList {
            struct Node* head;
            struct Node* tail;
    };
    
    // Node allocation / creation
    struct Node* alloc_node(size_t data_size);
    struct Node* create_node_from_data(void* data, size_t data_size);
    
    // Freeing
    void free_node(struct Node* node);
    
    // List operations
    void list_init(struct LinkedList* list);
    void free_list(struct LinkedList* list); // frees all nodes, leaves list usable
    
    void push_front(struct LinkedList* list, struct Node* new_head);
    void pop_front(struct LinkedList* list);
    
    void push_back(struct LinkedList* list, struct Node* new_tail);
    void pop_back(struct LinkedList* list);
    
    struct Node* front(struct LinkedList* list);
    struct Node* back(struct LinkedList* list);
    
    #endif
    

    Of note is that in this declaration file, we have the EXACT same node structure as before, but the overall LinkedList datatype represents not only a head node but also a tail node. Because of this, the shape of our functions change a fair bit. We no longer pass a head pointer around to refer to the entire list. Instead, we pass the pointer pointing to the LinkedList struct and perform operations on this structure instead. The resulting implementation code looks like this:

    #include "linked_list.h"
    #include <stdlib.h>
    #include <string.h>
    
    struct Node* alloc_node(size_t data_size) {
            struct Node* node = malloc(sizeof *node);
            if (!node) return NULL;
    
            node->data = malloc(data_size);
            if (!node->data) {
                    free(node);
                    return NULL;
            }
    
            node->data_size = data_size;
            node->next = NULL;
            return node;
    }
    
    struct Node* create_node_from_data(void* data, size_t data_size) {
            struct Node* node = alloc_node(data_size);
            if (!node) return NULL;
            memcpy(node->data, data, data_size);
            return node;
    }
    
    void free_node(struct Node* node) {
            if (!node) return;
    
            if (node->data) {
                    free(node->data);
                    node->data = NULL;
            }
    
            free(node);
    }
    
    void list_init(struct LinkedList* list) {
            if (!list) return;
            list->head = NULL;
            list->tail = NULL;
    }
    
    void free_list(struct LinkedList* list) {
            if (!list) return;
    
            struct Node* cur = list->head;
            while (cur) {
                    struct Node* next = cur->next;
                    free_node(cur);
                    cur = next;
            }
    
            list->head = NULL;
            list->tail = NULL;
    }
    
    void push_front(struct LinkedList* list, struct Node* new_head) {
            if (!list || !new_head) return;
    
            new_head->next = list->head;
            list->head = new_head;
    
            // if list was empty, tail must also point to the new node
            if (list->tail == NULL) {
                    list->tail = new_head;
            }
    }
    
    void pop_front(struct LinkedList* list) {
            if (!list || !list->head) return;
    
            struct Node* old = list->head;
            list->head = old->next;
    
            // if we removed the last node, tail must become NULL too
            if (list->head == NULL) {
                    list->tail = NULL;
            }
    
            free_node(old);
    }
    
    void push_back(struct LinkedList* list, struct Node* new_tail) {
            if (!list || !new_tail) return;
    
            new_tail->next = NULL;
    
            // empty list
            if (list->head == NULL) {
                    list->head = new_tail;
                    list->tail = new_tail;
                    return;
            }
    
            // non-empty list: O(1) append
            list->tail->next = new_tail;
            list->tail = new_tail;
    }
    
    void pop_back(struct LinkedList* list) {
            if (!list || !list->head) return;
    
            // single node
            if (list->head == list->tail) {
                    free_node(list->head);
                    list->head = NULL;
                    list->tail = NULL;
                    return;
            }
    
            // find node right before tail - this is where we cannot reduce down to O(1) unfortunately unless we implemeneted a doubly linked list (next thing to do)
            struct Node* prev = list->head;
            while (prev->next != list->tail) {
                    prev = prev->next;
            }
    
            free_node(list->tail);
            prev->next = NULL;
            list->tail = prev;
    }
    
    struct Node* front(struct LinkedList* list) {
            if (!list) return NULL;
            return list->head;
    }
    
    struct Node* back(struct LinkedList* list) {
            if (!list) return NULL;
            return list->tail;
    }

    The changes to pay attention to:

    Of note are the “back” and “push_back” function. Now, instead of having to traverse the entire list which results in O(n) complexity, we simply use our tail pointer to implement the same functionality. The result is that we end up with a complexity chart that looks like this:

    Complexity Table (Tailed Singly Linked List)

    OperationComplexity
    push_frontO(1)
    pop_frontO(1)
    frontO(1)
    push_backO(1)
    pop_backO(n)
    backO(1)

    I have highlighted the differences, and below we have a simple profiling test bed that I ran to compare the real world speed differences of the implementations:

    Tailless Linked List Implementation Benchmark:

    === Tailless Linked List Benchmark ===
    Real implementation timing (includes malloc/free)
    
    =============================
    BENCH N=1000
    =============================
    push_front   O(1)/op total O(N)   |      0.060 ms |       60.3 ns/op
    front        O(1)/op              |      3.136 ms |        1.6 ns/op (REPS=2000000)
    back         O(N)/op              |      7.174 ms |     3587.1 ns/op (REPS=2000)
    pop_front    O(1)/op (+free)      |      0.008 ms |        8.5 ns/op
    push_back    O(N)/op total O(N^2) |      0.904 ms |      903.8 ns/op
    pop_back     O(N)/op total O(N^2) |      0.930 ms |      930.4 ns/op
    
    =============================
    BENCH N=3000
    =============================
    push_front   O(1)/op total O(N)   |      0.090 ms |       29.9 ns/op
    front        O(1)/op              |      1.770 ms |        0.9 ns/op (REPS=2000000)
    back         O(N)/op              |     14.655 ms |     7327.6 ns/op (REPS=2000)
    pop_front    O(1)/op (+free)      |      0.025 ms |        8.2 ns/op
    push_back    O(N)/op total O(N^2) |     12.537 ms |     4178.9 ns/op
    pop_back     O(N)/op total O(N^2) |      9.181 ms |     3060.4 ns/op
    
    =============================
    BENCH N=10000
    =============================
    push_front   O(1)/op total O(N)   |      0.300 ms |       30.0 ns/op
    front        O(1)/op              |      1.756 ms |        0.9 ns/op (REPS=2000000)
    back         O(N)/op              |     23.471 ms |    11735.7 ns/op (REPS=2000)
    pop_front    O(1)/op (+free)      |      0.117 ms |       11.7 ns/op
    push_back    O(N)/op total O(N^2) |    124.918 ms |    12491.8 ns/op
    pop_back     O(N)/op total O(N^2) |    111.918 ms |    11191.8 ns/op
    
    =============================
    BENCH N=30000
    =============================
    push_front   O(1)/op total O(N)   |      0.772 ms |       25.7 ns/op
    front        O(1)/op              |      1.740 ms |        0.9 ns/op (REPS=2000000)
    back         O(N)/op              |     95.027 ms |    47513.3 ns/op (REPS=2000)
    pop_front    O(1)/op (+free)      |      0.441 ms |       14.7 ns/op
    push_back    O(N)/op total O(N^2) |   1301.376 ms |    43379.2 ns/op
    pop_back     O(N)/op total O(N^2) |   1083.715 ms |    36123.8 ns/op
    
    =============================
    BENCH N=100000
    =============================
    push_front   O(1)/op total O(N)   |      2.597 ms |       26.0 ns/op
    front        O(1)/op              |      1.755 ms |        0.9 ns/op (REPS=2000000)
    back         O(N)/op              |    291.028 ms |   145514.2 ns/op (REPS=2000)
    pop_front    O(1)/op (+free)      |      1.592 ms |       15.9 ns/op
    push_back    O(N)/op total O(N^2) |  14328.900 ms |   143289.0 ns/op
    pop_back     O(N)/op total O(N^2) |  12886.971 ms |   128869.7 ns/op

    Tailed Linked List Implementation Benchmark:

    === Tailed Linked List Benchmark ===
    Implementation: struct LinkedList { head, tail }
    Benchmarks: push_front, push_back, pop_front, pop_back, front, back
    Note: pop_* includes free_node() time.
    
    =============================
    BENCH N=1000
    =============================
    push_front   O(1)/op total O(N)   |      0.033 ms |       33.3 ns/op
    front        O(1)/op              |      1.594 ms |        0.8 ns/op (REPS=2000000)
    back         O(1)/op              |      1.551 ms |        0.8 ns/op (REPS=2000000)
    pop_front    O(1)/op (+free)      |      0.009 ms |        9.4 ns/op
    push_back    O(1)/op total O(N)   |      0.011 ms |       10.7 ns/op
    pop_back     O(N)/op total O(N^2) |      0.834 ms |      833.9 ns/op
    
    =============================
    BENCH N=3000
    =============================
    push_front   O(1)/op total O(N)   |      0.075 ms |       24.9 ns/op
    front        O(1)/op              |      1.626 ms |        0.8 ns/op (REPS=2000000)
    back         O(1)/op              |      1.564 ms |        0.8 ns/op (REPS=2000000)
    pop_front    O(1)/op (+free)      |      0.026 ms |        8.5 ns/op
    push_back    O(1)/op total O(N)   |      0.032 ms |       10.7 ns/op
    pop_back     O(N)/op total O(N^2) |     11.561 ms |     3853.5 ns/op
    
    =============================
    BENCH N=10000
    =============================
    push_front   O(1)/op total O(N)   |      0.257 ms |       25.7 ns/op
    front        O(1)/op              |      1.625 ms |        0.8 ns/op (REPS=2000000)
    back         O(1)/op              |      1.612 ms |        0.8 ns/op (REPS=2000000)
    pop_front    O(1)/op (+free)      |      0.089 ms |        8.9 ns/op
    push_back    O(1)/op total O(N)   |      0.110 ms |       11.0 ns/op
    pop_back     O(N)/op total O(N^2) |    127.046 ms |    12704.6 ns/op
    
    =============================
    BENCH N=30000
    =============================
    push_front   O(1)/op total O(N)   |      0.775 ms |       25.8 ns/op
    front        O(1)/op              |      1.611 ms |        0.8 ns/op (REPS=2000000)
    back         O(1)/op              |      1.623 ms |        0.8 ns/op (REPS=2000000)
    pop_front    O(1)/op (+free)      |      0.281 ms |        9.4 ns/op
    push_back    O(1)/op total O(N)   |      0.313 ms |       10.4 ns/op
    pop_back     O(N)/op total O(N^2) |   1197.998 ms |    39933.3 ns/op
    
    =============================
    BENCH N=100000
    =============================
    push_front   O(1)/op total O(N)   |      2.449 ms |       24.5 ns/op
    front        O(1)/op              |      1.588 ms |        0.8 ns/op (REPS=2000000)
    back         O(1)/op              |      1.570 ms |        0.8 ns/op (REPS=2000000)
    pop_front    O(1)/op (+free)      |      1.006 ms |       10.1 ns/op
    push_back    O(1)/op total O(N)   |      1.133 ms |       11.3 ns/op
    pop_back     O(N)/op total O(N^2) |  15171.713 ms |   151717.1 ns/op

    We get HUGE speedups here in the back and push_back functionality, but what about that darned pop_back function?!

    Well, this occurs because despite us storing the tail of the list, a pop_back operation still requires that we retrieve the node BEFORE the current tail so that we can set it as the new tail. There are a few ways to approach this, but the one that transitions the best into the next day of our challenge is implementing a DOUBLY Linked List. This will allow us to show a chart that is fully O(1), but with some trade-offs that we will discuss when the time comes.

  • Relearning C After Time Away — Building a New Data Structure Every Day (Day 1: Tailless Linked List)

    I feel like I keep harping on my time as a Computer Engineering student, but I think it is important to keep reiterating the fact that we truly only narrow in on certain languages and lines of thinking, and then we just stop. For ECE students, that means learning a few basic C++ principles, a bit of ARM/MIPS Assembly, and then a little sprinkle of C whenever we take Operating Systems.

    The class that stuck with me the most was not Data Structures, intro to C++, or any of that. Instead, it was Operating Systems. Never before had I really delved deep into systems programming, kernel development, or anything of the sort. However, my professor for this class, Dr. Hartmut Kaiser, was extremely skillful in taking the tiny bit of C++ we learned as ECEs and slowly introducing us to the wild-west of C systems programming and kernel development. He still goes down as THE BEST professor I have had the pleasure of learning under to this day, and it really does go to show just how influential a single class/professor can be on a student’s development.

    But that was over a year ago, and since then I have really only been messing with .NET code for my internship and a fair bit of PHP for FFLHub. Throughout all this time, I always wanted to revisit the principles and ideas that I learned in that class. Therefore, I have decided to settle on a simple challenge for myself:

    Relearn C by implementing one data structure from scratch every day.

    No libraries.
    No header helpers.
    No hiding memory behavior behind abstractions.

    Just raw C, deliberate design, and intentional tradeoffs.

    Today is Day 1, and we’re starting with something classic but deceptively deep:

    A tailless singly linked list.

    Why Start With a Tailless Linked List?

    Because it forces you to think about:

    • Pointer mutation
    • Memory ownership
    • Traversal cost
    • Edge cases (empty list, single node list)
    • Allocation failure handling
    • Free order correctness

    And importantly — it’s incomplete in ways that matter.

    We are not using:

    • Tail pointer
    • Size tracking
    • Container wrapper struct

    And that’s intentional. Because in later posts, we’ll add those and see exactly just how those things aid us in complexity times and such. But first, a few ground rules and general design principles…


    Memory Model — Caller Owned

    This implementation uses a caller-owned memory model.

    That means:

    The list:

    • Does NOT automatically allocate nodes
    • Does NOT automatically free nodes
    • Does NOT manage payload lifetime beyond copying it into the node

    The caller:

    • Allocates nodes
    • Inserts nodes
    • Removes nodes
    • Destroys nodes

    Why?

    Because this is extremely common in systems programming. It gives maximum flexibility and makes ownership boundaries explicit.


    Node Structure:

    Each node stores:

    • Pointer to payload
    • Size of payload
    • Pointer to next node

    This allows the list to store heterogeneous data — not just one fixed type.

    //For this linked list, I am using a caller owned approach, aka, all memeory is allocated and freed by the caller, not the list or list node itself
    struct Node {
            void* data;
            size_t data_size;
            struct Node* next;
    };

    Allocation and Freeing Strategy:

    Allocation happens in two steps:

    1. Allocate node struct
    2. Allocate payload buffer

    If payload allocation fails, we roll back and free the node.

    This avoids partial allocations and leaks.

    I also implemented a function that allows us to allocate and populate a node with data:

    //this is how we allocate the memory for nodes
    struct Node* alloc_node(size_t data_size) {
    
            //first, we allocate the memory required for the base data size of each node
            struct Node* node = malloc(sizeof *node);
            if (!node) return NULL; //if malloc fails, return NULL
    
            //second, we allocate the memory for the data stored inside our node using our data size passed in
            node->data = malloc(data_size);
    
            //if that malloc fails, we free the previous base allocation and return null
            if (!node->data) {
                    free(node);
                    return NULL;
            }
    
            //set the data size member
            node->data_size = data_size;
    
            //make sure the next pointer is nll
            node->next = NULL;
    
            //return the allocation pointer
            return node;
    }
    
    //we simply allocate the node using our function above, then we memcpy the data into the node
    struct Node* create_node_from_data(void* data, size_t data_size) {
            struct Node* node = alloc_node(data_size);
            if (!node) return NULL;
            memcpy(node->data, data, data_size);
            return node;
    }

    And for freeing, we free in reverse allocation order:

    //we free a singular node
    void free_node(struct Node* node) {
            if (!node) return; //if node is invalid, return
    
            //we only want to free the data portion of our node if the data is valid, IE its been allocated properly
            if (node->data) {
                    free(node->data);
                    node->data = NULL;
            }
    
            //only after this we free the memeory allocated for the base node size
            //we deallocate in REVERSE order of allocation
            free(node);
    }

    Using these functions, we can create general Linked List helper functions that free and destroy the entire list:

    //free the linked list using our free node function, we step through the linked list
    //we save the next ptr, then free that node, then we use that next ptr to iterate to the next node
    void free_list(struct Node* head) {
            while (head) {
                    struct Node* next = head->next;
                    free_node(head);
                    head = next;
            }
    }
    
    //uses the method above then also sets the base pointer to null to avoid a dangling pointer
    void list_destroy(struct Node** head) {
            if (!head || !*head) return;
    
            free_list(*head);
            *head = NULL;
    }
    

    Core Operations Implemented

    I also wrote some functions to handle the most basic of operations that you would want to perform on a linked list:

    Insertion

    • push_front — add a node to the front of the linked list
    • push_back — add a node to the back of the linked list

    Removal

    • pop_front — remove a node from the front of the linked list
    • pop_back — remove a node from the back of the linked list

    Access

    • front — grab the head of the linked list
    • back — grab the tail of the linked list
    void push_front(struct Node* new_head, struct Node** head) {
            //if our new head is invalid, or if our current head pointer is invalid, early exit
            if (new_head == NULL || head == NULL) return;
            new_head->next = *head; //the new heads next pointer must point to the old head
            *head = new_head;       //now we set the head equal to our new head that we want
    }
    
    void pop_front(struct Node** head) {
            if (head == NULL || *head == NULL) return; //if list is empty or if the head pointer is NULL, early exit
            struct Node* new_head = (*head)->next;     //we store the new head ptr, which is the node that occurs after the current head
            free_node(*head);                          //free the old head node, deallocate its memeory
            *head = new_head;                          //set the head pointer equal to the new head we want, aka the next node in the list
    }
    
    void push_back(struct Node* new_tail, struct Node** head) {
    
            if (head == NULL || new_tail == NULL) return; //invalid head or tail node, retunrn early
            //
            ////if list is empty, then the head is just the node we are appending
            if (*head == NULL) {
                    *head = new_tail;
                    new_tail->next = NULL;
                    return;
            }
    
            //otherwise, we have to iterate through the entire loop
            struct Node* cur = (*head);
    
            //we need to keep iterating until we reach the end, IE the next pointer is NULL
            while (cur->next != NULL) cur = cur->next;
    
            //once we are at the end, set the next pointer to point to our new tail node
            cur->next = new_tail;
            new_tail->next = NULL;
    }
    
    void pop_back(struct Node** head) {
            if (head == NULL) return; //invalid header pointer, return early
    
            if (*head == NULL) return; //list is empty, nothing to pop
    
            //if we have a one node list, we need to handle that situation since our while loop iterator checks TWO nodes ahead, not one like the above function
            if ((*head)->next == NULL) {
                    free_node(*head);
                    *head = NULL;
                    return;
            }
    
            //same iteration as the function above, except we check TWO nodes ahead
            struct Node* prev = (*head);
            while (prev->next->next != NULL) prev = prev->next;
    
            //okay so now that we are at the end of the node, we have to free the node
            free_node(prev->next);
            prev->next = NULL;
    }
    
    struct Node* front(struct Node* head) {
            if (head == NULL) return NULL; //invalid head pointer, return NULL, or if the list is empty, return null
            return head;                   //so yeah even if *head is NULL itll still return NULL here, but whateever its extra safe I guess
    }
    
    struct Node* back(struct Node* head) {
            if (head == NULL) return NULL;       //invalid head pointer, return NULL
            if (head->next == NULL) return head; //we are already at the endm return head - like before, this is redudant, but oh well maybe the compiler will make something useful out of this
    
            //now we iterate through the entire list until we reach the end
            struct Node* cur = head;
            while (cur->next != NULL) cur = cur->next;
    
            return cur;
    }

    Now, the astute among you probably see why I mentioned we are making a TAILLESS linked list. Because we don’t store the tail of our linked list, any function that has to do with mutating the end of the our linked list has to traverse the entire list first. This yields the following complexity charge for each function we implemented:

    Complexity Table (Tailless List)

    OperationComplexity
    push_frontO(1)
    pop_frontO(1)
    frontO(1)
    push_backO(n)
    pop_backO(n)
    backO(n)

    We get a O(n) complexity for every tail associated function. In the next post, we will add a tail and see what this yields (hint: it makes for a super boring chart).


    And to top everything off, I wrote a little test-bed that tests our functions:

    int main(void) {
            const int MAX_ITERS = 10000; // change this one value to scale all tests
            struct Node* head = NULL;
    
            printf("=== Linked List Stress Test (MAX_ITERS=%d) ===\n\n", MAX_ITERS);
    
            // -------------------------
            // TEST 1: push_front
            // -------------------------
            printf("TEST 1: push_front %d ints (0..%d)\n", MAX_ITERS, MAX_ITERS - 1);
            for (int i = 0; i < MAX_ITERS; i++) {
                    int v = i;
                    struct Node* n = create_node_from_data(&v, sizeof v);
                    if (!n) {
                            printf("alloc failed in TEST 1 at i=%d\n", i);
                            list_destroy(&head);
                            return 1;
                    }
                    push_front(n, &head);
            }
    
            struct Node* f = front(head);
            struct Node* b = back(head);
            if (!f || !b) {
                    printf("FAIL: front/back returned NULL after push_front\n");
                    list_destroy(&head);
                    return 1;
            }
    
            int front_v = *(int*) f->data;
            int back_v = *(int*) b->data;
    
            printf("Check: front=%d (expected %d), back=%d (expected %d)\n",
                   front_v, MAX_ITERS - 1, back_v, 0);
    
            if (front_v != MAX_ITERS - 1 || back_v != 0) {
                    printf("FAIL: unexpected front/back values in TEST 1\n");
                    list_destroy(&head);
                    return 1;
            }
    
            // Pop everything and ensure order is descending MAX_ITERS-1 .. 0
            printf("Pop all with pop_front and verify descending order...\n");
            for (int expected = MAX_ITERS - 1; expected >= 0; expected--) {
                    struct Node* cur = front(head);
                    if (!cur) {
                            printf("FAIL: list became empty early at expected=%d\n", expected);
                            list_destroy(&head);
                            return 1;
                    }
    
                    int cur_v = *(int*) cur->data;
                    if (cur_v != expected) {
                            printf("FAIL: pop_front order mismatch: got %d expected %d\n", cur_v, expected);
                            list_destroy(&head);
                            return 1;
                    }
    
                    pop_front(&head);
            }
    
            if (head != NULL) {
                    printf("FAIL: head not NULL after popping all nodes in TEST 1\n");
                    list_destroy(&head);
                    return 1;
            }
    
            printf("PASS: TEST 1\n\n");
    
            // -------------------------
            // TEST 2: push_back
            // -------------------------
            printf("TEST 2: push_back %d ints (0..%d)\n", MAX_ITERS, MAX_ITERS - 1);
            for (int i = 0; i < MAX_ITERS; i++) {
                    int v = i;
                    struct Node* n = create_node_from_data(&v, sizeof v);
                    if (!n) {
                            printf("alloc failed in TEST 2 at i=%d\n", i);
                            list_destroy(&head);
                            return 1;
                    }
                    push_back(n, &head);
            }
    
            f = front(head);
            b = back(head);
            if (!f || !b) {
                    printf("FAIL: front/back returned NULL after push_back\n");
                    list_destroy(&head);
                    return 1;
            }
    
            front_v = *(int*) f->data;
            back_v = *(int*) b->data;
    
            printf("Check: front=%d (expected %d), back=%d (expected %d)\n",
                   front_v, 0, back_v, MAX_ITERS - 1);
    
            if (front_v != 0 || back_v != MAX_ITERS - 1) {
                    printf("FAIL: unexpected front/back values in TEST 2\n");
                    list_destroy(&head);
                    return 1;
            }
    
            // Pop everything and ensure order is descending from the back: MAX_ITERS-1 .. 0
            printf("Pop all with pop_back and verify descending-from-back order...\n");
            for (int expected = MAX_ITERS - 1; expected >= 0; expected--) {
                    struct Node* cur = back(head);
                    if (!cur) {
                            printf("FAIL: list became empty early at expected=%d\n", expected);
                            list_destroy(&head);
                            return 1;
                    }
    
                    int cur_v = *(int*) cur->data;
                    if (cur_v != expected) {
                            printf("FAIL: pop_back order mismatch: got %d expected %d\n", cur_v, expected);
                            list_destroy(&head);
                            return 1;
                    }
    
                    pop_back(&head);
            }
    
            if (head != NULL) {
                    printf("FAIL: head not NULL after popping all nodes in TEST 2\n");
                    list_destroy(&head);
                    return 1;
            }
    
            printf("PASS: TEST 2\n\n");
    
            // -------------------------
            // TEST 3: mixed operations with exact expectations
            // -------------------------
            printf("TEST 3: mixed ops with exact expected front/back\n");
    
            // Step A: build 0..MAX_ITERS-1 using push_back
            for (int i = 0; i < MAX_ITERS; i++) {
                    int v = i;
                    struct Node* n = create_node_from_data(&v, sizeof v);
                    if (!n) {
                            printf("alloc failed in TEST 3A at i=%d\n", i);
                            list_destroy(&head);
                            return 1;
                    }
                    push_back(n, &head);
            }
    
            // Step B: pop_front K times (removes 0..K-1)
            const int K = MAX_ITERS / 10; // 10% of max iterations
            for (int expected = 0; expected < K; expected++) {
                    struct Node* cur = front(head);
                    if (!cur) {
                            printf("FAIL: empty early during TEST 3B at expected=%d\n", expected);
                            list_destroy(&head);
                            return 1;
                    }
                    int cur_v = *(int*) cur->data;
                    if (cur_v != expected) {
                            printf("FAIL: TEST 3B pop_front mismatch: got %d expected %d\n", cur_v, expected);
                            list_destroy(&head);
                            return 1;
                    }
                    pop_front(&head);
            }
    
            // After popping K from front, expected front is K, expected back is MAX_ITERS-1
            f = front(head);
            b = back(head);
            if (!f || !b) {
                    printf("FAIL: front/back NULL after TEST 3B\n");
                    list_destroy(&head);
                    return 1;
            }
            if (*(int*) f->data != K || *(int*) b->data != MAX_ITERS - 1) {
                    printf("FAIL: after TEST 3B expected front=%d back=%d, got front=%d back=%d\n",
                           K, MAX_ITERS - 1, *(int*) f->data, *(int*) b->data);
                    list_destroy(&head);
                    return 1;
            }
    
            // Step C: push_front K values (100000..100000+K-1)
            // Because push_front reverses order, final front should be 100000+K-1
            for (int i = 0; i < K; i++) {
                    int v = 100000 + i;
                    struct Node* n = create_node_from_data(&v, sizeof v);
                    if (!n) {
                            printf("alloc failed in TEST 3C at i=%d\n", i);
                            list_destroy(&head);
                            return 1;
                    }
                    push_front(n, &head);
            }
    
            f = front(head);
            b = back(head);
            if (!f || !b) {
                    printf("FAIL: front/back NULL after TEST 3C\n");
                    list_destroy(&head);
                    return 1;
            }
            int expected_front = 100000 + (K - 1);
            int expected_back = MAX_ITERS - 1;
            if (*(int*) f->data != expected_front || *(int*) b->data != expected_back) {
                    printf("FAIL: after TEST 3C expected front=%d back=%d, got front=%d back=%d\n",
                           expected_front, expected_back, *(int*) f->data, *(int*) b->data);
                    list_destroy(&head);
                    return 1;
            }
    
            // Step D: pop_back K times (removes MAX_ITERS-1 down to MAX_ITERS-K)
            for (int j = 0; j < K; j++) {
                    int expected = (MAX_ITERS - 1) - j;
                    struct Node* cur = back(head);
                    if (!cur) {
                            printf("FAIL: empty early during TEST 3D at expected=%d\n", expected);
                            list_destroy(&head);
                            return 1;
                    }
                    int cur_v = *(int*) cur->data;
                    if (cur_v != expected) {
                            printf("FAIL: TEST 3D pop_back mismatch: got %d expected %d\n", cur_v, expected);
                            list_destroy(&head);
                            return 1;
                    }
                    pop_back(&head);
            }
    
            // After popping K from back, expected back is MAX_ITERS-K-1
            f = front(head);
            b = back(head);
            if (!f || !b) {
                    printf("FAIL: front/back NULL after TEST 3D\n");
                    list_destroy(&head);
                    return 1;
            }
            expected_front = 100000 + (K - 1);
            expected_back = (MAX_ITERS - K - 1);
            if (*(int*) f->data != expected_front || *(int*) b->data != expected_back) {
                    printf("FAIL: after TEST 3D expected front=%d back=%d, got front=%d back=%d\n",
                           expected_front, expected_back, *(int*) f->data, *(int*) b->data);
                    list_destroy(&head);
                    return 1;
            }
    
            // Cleanup
            list_destroy(&head);
            if (head != NULL) {
                    printf("FAIL: head not NULL after list_destroy at end\n");
                    return 1;
            }
    
            printf("PASS: TEST 3\n\n");
            printf("ALL TESTS PASSED!\n");
            return 0;
    }

    Which yields the following result:

    === Linked List Stress Test (MAX_ITERS=10000) ===
    
    TEST 1: push_front 10000 ints (0..9999)
    Check: front=9999 (expected 9999), back=0 (expected 0)
    Pop all with pop_front and verify descending order...
    PASS: TEST 1
    
    TEST 2: push_back 10000 ints (0..9999)
    Check: front=0 (expected 0), back=9999 (expected 9999)
    Pop all with pop_back and verify descending-from-back order...
    PASS: TEST 2
    
    TEST 3: mixed ops with exact expected front/back
    PASS: TEST 3
    
    ALL TESTS PASSED!

    Retrospect on this challenge:

    Like I said before, I mainly deal with C#, web-code, etc. We rarely have to actually think about the memory we are allocating because all of that is handled behind the scenes for us whether it be through GC or other means (Rust uses a borrow checker that I want to mess with at some point).

    C makes you honest.

    It makes you think about:

    • Where memory comes from
    • Who owns memory
    • When memory dies
    • What pointer actually means
    • How data really moves through systems

    And honestly — it’s refreshing.


    Closing

    If you haven’t touched C in a while, I highly recommend doing something like this.

    Pick a data structure.
    Write it from scratch.
    Test it hard.
    Then improve it iteratively.

    Tomorrow:

    Another data structure.
    More pointer pain.
    More learning.

  • It’s Not Your Language Slowing You Down — It’s Your Architecture

    Recently I made the decision to rewrite a portion of my Lipsey’s distributor integration in good ole C using libcurl. I didn’t need to do this, but I was under the “it must be faster because it’s C” mindset.

    So I went ahead and and flexed my C muscles, something that I haven’t done in a LONG time, and put together a small little implementation that calls that Catalog data endpoint using a POST request:

    #include <curl/curl.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <time.h>
    
    // ----------------------
    // Timing helpers (because "it feels slow" is not a metric)
    // ----------------------
    static double now_ms(void) {
        struct timespec ts;
        clock_gettime(CLOCK_MONOTONIC, &ts);
        return (double)ts.tv_sec * 1000.0 + (double)ts.tv_nsec / 1000000.0;
    }
    
    // ----------------------
    // Dynamic response buffer (cheap, simple, works)
    // ----------------------
    typedef struct {
        char *data;
        size_t size;
    } Memory;
    
    // Curl streams data into here. We just keep realloc’ing like animals.
    // Totally fine for API responses.
    static size_t write_cb(void *contents, size_t size, size_t nmemb, void *userp) {
        size_t realsize = size * nmemb;
        Memory *mem = (Memory *)userp;
    
        char *ptr = realloc(mem->data, mem->size + realsize + 1);
        if (!ptr) return 0; // malloc said nope
    
        mem->data = ptr;
        memcpy(mem->data + mem->size, contents, realsize);
        mem->size += realsize;
        mem->data[mem->size] = '\0';
    
        return realsize;
    }
    
    // ----------------------
    // Extremely lazy JSON helpers (string searching > writing parser today)
    // ----------------------
    
    static int json_contains(const char *json, const char *needle) {
        return (json && needle && strstr(json, needle) != NULL);
    }
    
    // We only care if econtact.success is true or 1. Not building a schema validator today.
    static int json_econtact_success_is_true_or_one(const char *json) {
        const char *p = strstr(json, "\"econtact\"");
        if (!p) return 0;
    
        // Only scan a window after econtact so we don't match random success fields later
        const size_t WINDOW = 800;
        size_t len = strlen(p);
        if (len > WINDOW) len = WINDOW;
    
        char *tmp = (char *)malloc(len + 1);
        if (!tmp) return 0;
    
        memcpy(tmp, p, len);
        tmp[len] = '\0';
    
        int ok = 0;
        if (strstr(tmp, "\"success\":true") ||
            strstr(tmp, "\"success\": true") ||
            strstr(tmp, "\"success\":1") ||
            strstr(tmp, "\"success\": 1")) {
            ok = 1;
        }
    
        free(tmp);
        return ok;
    }
    
    // Pulls token out of `"token":"value"`
    // Yes this is brittle. No I do not care for this use case.
    static int extract_token(const char *json, char *out_token, size_t max_len) {
        if (!json || !out_token || max_len == 0) return -1;
    
        const char *t = strstr(json, "\"token\"");
        if (!t) return -1;
    
        t = strchr(t, ':');
        if (!t) return -1;
        t++;
    
        while (*t == ' ' || *t == '\t') t++;
    
        if (*t != '"') return -1;
        t++;
    
        const char *end = strchr(t, '"');
        if (!end) return -1;
    
        size_t n = (size_t)(end - t);
        if (n == 0 || n >= max_len) return -1;
    
        memcpy(out_token, t, n);
        out_token[n] = '\0';
        return 0;
    }
    
    // Try to dump API error array if present. Otherwise just dump tail and move on.
    static void print_api_errors_best_effort(const char *json) {
        const char *p = strstr(json, "\"errors\"");
        if (!p) {
            fprintf(stderr, "No \"errors\" array found.\n");
            return;
        }
    
        const char *lb = strchr(p, '[');
        const char *rb = lb ? strchr(lb, ']') : NULL;
    
        if (lb && rb && rb >= lb) {
            fprintf(stderr, "API errors: ");
            fwrite(lb, 1, (size_t)(rb - lb + 1), stderr);
            fputc('\n', stderr);
        } else {
            fprintf(stderr, "API errors (raw tail): %s\n", p);
        }
    }
    
    // ----------------------
    // Curl helpers
    // ----------------------
    
    // Because env vars are nice when you don’t want creds in source
    static int env_truthy(const char *name, int default_val) {
        const char *v = getenv(name);
        if (!v || !*v) return default_val;
    
        if (strcmp(v, "1") == 0 || strcasecmp(v, "true") == 0 || strcasecmp(v, "yes") == 0)
            return 1;
    
        if (strcmp(v, "0") == 0 || strcasecmp(v, "false") == 0 || strcasecmp(v, "no") == 0)
            return 0;
    
        return default_val;
    }
    
    // Applies shared curl config so we don't copy/paste 40 lines everywhere
    static void apply_common_opts(CURL *curl, int insecure, Memory *out) {
        curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
        curl_easy_setopt(curl, CURLOPT_MAXREDIRS, 10L);
        curl_easy_setopt(curl, CURLOPT_TIMEOUT, 60L);
        curl_easy_setopt(curl, CURLOPT_USERAGENT, "bickhamfirearms-lipseys/0.1");
        curl_easy_setopt(curl, CURLOPT_HTTP_VERSION, (long)CURL_HTTP_VERSION_1_1);
    
        // Equivalent to PHP CURLOPT_ENCODING ""
        curl_easy_setopt(curl, CURLOPT_ACCEPT_ENCODING, "");
    
        // Match vendor PHP behavior exactly (yes this disables TLS validation)
        if (insecure) {
            curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
            curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
        } else {
            curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 2L);
            curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 1L);
        }
    
        curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_cb);
        curl_easy_setopt(curl, CURLOPT_WRITEDATA, out);
    }
    
    // ----------------------
    // MAIN PROGRAM
    // ----------------------
    
    int main(void) {
    
        const double t_program_start = now_ms();
    
        // You can swap these to getenv if you want later
        const char *email = "[email protected]";
        const char *password = "Lacro$$e1";
    
        if (!email || !password) {
            fprintf(stderr,
                "Missing credentials.\n"
                "Set:\n"
                "export LIPSEYS_EMAIL=...\n"
                "export LIPSEYS_PASSWORD=...\n");
            return 1;
        }
    
        // Default insecure to match vendor PHP exactly
        int insecure = env_truthy("LIPSEYS_INSECURE", 1);
    
        curl_global_init(CURL_GLOBAL_DEFAULT);
    
        CURL *curl = curl_easy_init();
        if (!curl) {
            fprintf(stderr, "curl init failed\n");
            curl_global_cleanup();
            return 1;
        }
    
        const char *BASE = "https://api.lipseys.com/api/";
        const char *LOGIN_PATH = "integration/authentication/login";
        const char *CATALOG_PATH = "integration/items/CatalogFeed";
    
        // ---------------- LOGIN ----------------
        const double t_login_start = now_ms();
    
        char login_url[512];
        snprintf(login_url, sizeof(login_url), "%s%s", BASE, LOGIN_PATH);
    
        char json_body[1024];
        snprintf(json_body, sizeof(json_body),
            "{\"Email\":\"%s\",\"Password\":\"%s\"}",
            email, password);
    
        Memory login_resp = {0};
    
        curl_easy_reset(curl);
        apply_common_opts(curl, insecure, &login_resp);
    
        struct curl_slist *login_headers = NULL;
        login_headers = curl_slist_append(login_headers, "Content-Type: application/json");
        login_headers = curl_slist_append(login_headers, "Accept: application/json");
        login_headers = curl_slist_append(login_headers, "Token: "); // vendor sends even if blank
    
        curl_easy_setopt(curl, CURLOPT_URL, login_url);
        curl_easy_setopt(curl, CURLOPT_HTTPHEADER, login_headers);
        curl_easy_setopt(curl, CURLOPT_CUSTOMREQUEST, "POST");
        curl_easy_setopt(curl, CURLOPT_POSTFIELDS, json_body);
    
        printf("Logging in...\n");
    
        CURLcode rc = curl_easy_perform(curl);
    
        long http_code = 0;
        curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &http_code);
    
        const double t_login_end = now_ms();
    
        if (rc != CURLE_OK) {
            fprintf(stderr, "Login failed: %s\n", curl_easy_strerror(rc));
            fprintf(stderr, "[PROFILE] login %.2f ms\n", t_login_end - t_login_start);
            goto cleanup;
        }
    
        printf("[HTTP] Login status: %ld\n", http_code);
        printf("[PROFILE] Login took %.2f ms\n", t_login_end - t_login_start);
    
        char token[512] = {0};
    
        int token_ok = extract_token(login_resp.data, token, sizeof(token)) == 0;
        int econtact_ok = json_econtact_success_is_true_or_one(login_resp.data);
    
        if (!token_ok || !econtact_ok) {
            fprintf(stderr, "Login rejected\n");
            print_api_errors_best_effort(login_resp.data);
            goto cleanup;
        }
    
        printf("Token extracted OK (len=%zu)\n", strlen(token));
    
        // ---------------- CATALOG ----------------
        const double t_catalog_start = now_ms();
    
        char catalog_url[512];
        snprintf(catalog_url, sizeof(catalog_url), "%s%s", BASE, CATALOG_PATH);
    
        Memory catalog_resp = {0};
    
        curl_easy_reset(curl);
        apply_common_opts(curl, insecure, &catalog_resp);
    
        char token_header[1024];
        snprintf(token_header, sizeof(token_header), "Token: %s", token);
    
        struct curl_slist *catalog_headers = NULL;
        catalog_headers = curl_slist_append(catalog_headers, token_header);
        catalog_headers = curl_slist_append(catalog_headers, "cache-control: no-cache");
        catalog_headers = curl_slist_append(catalog_headers, "Accept: application/json");
    
        curl_easy_setopt(curl, CURLOPT_URL, catalog_url);
        curl_easy_setopt(curl, CURLOPT_HTTPHEADER, catalog_headers);
        curl_easy_setopt(curl, CURLOPT_CUSTOMREQUEST, "GET");
        curl_easy_setopt(curl, CURLOPT_POSTFIELDS, "");
    
        printf("\nFetching catalog...\n");
    
        rc = curl_easy_perform(curl);
    
        long catalog_http = 0;
        curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &catalog_http);
    
        const double t_catalog_end = now_ms();
    
        if (rc == CURLE_OK) {
            printf("[HTTP] Catalog status: %ld\n", catalog_http);
            printf("[PROFILE] Catalog took %.2f ms\n", t_catalog_end - t_catalog_start);
    
            printf("\nCatalog preview (first 2000 chars):\n");
            if (catalog_resp.data) {
                size_t n = catalog_resp.size > 2000 ? 2000 : catalog_resp.size;
                fwrite(catalog_resp.data, 1, n, stdout);
                printf("\n");
            }
        }
    
        // ---------------- TOTAL ----------------
        const double t_program_end = now_ms();
        printf("\n[PROFILE] TOTAL PROGRAM TIME: %.2f ms (%.2f sec)\n",
            t_program_end - t_program_start,
            (t_program_end - t_program_start) / 1000.0);
    
    cleanup:
        curl_slist_free_all(login_headers);
        curl_easy_cleanup(curl);
        curl_global_cleanup();
        free(login_resp.data);
        return 0;
    }
    

    So, all of this code… what did it result in? Here ya go:

    Catalog response preview (first 2000 chars):
    {"success":true,"authorized":true,"errors":[],"data":[{"itemNo":"RU1022RB","description1":"10/22 CARBINE 22LR BL/WD 10+1#","description2":"1103","upc":"736676011032","manufacturerModelNo":"1103","msrp":349.0,"model":"10/22 Carbine","caliberGauge":"22 LR","manufacturer":"Ruger","type":"Rifle","action":"Semi-Auto","barrelLength":"18.5\"","capacity":"10 + 1","finish":"Satin Black","overallLength":"37\"","receiver":null,"safety":null,"sights":"Gold Bead Front/Folding Rear","stockFrameGrips":"Wood Stock Hardwood","magazine":"1 10 rd.","weight":"5 lbs.","imageName":"1103534d.jpg","chamber":null,"drilledAndTapped":true,"rateOfTwist":"1-in-16","itemType":"Firearm","additionalFeature1":"Includes Scope Base","additionalFeature2":"Barrel Band Stock","additionalFeature3":null,"shippingWeight":6.25,"boundBookManufacturer":"Ruger","boundBookModel":"10/22","boundBookType":"Rifle","nfaThreadPattern":null,"nfaAttachmentMethod":null,"nfaBaffleType":null,"silencerCanBeDisassembled":false,"silencerConstructionMaterial":null,"nfaDbReduction":null,"silencerOutsideDiameter":null,"nfaForm3Caliber":null,"opticMagnification":null,"maintubeSize":null,"adjustableObjective":false,"objectiveSize":null,"opticAdjustments":null,"illuminatedReticle":false,"reticle":null,"exclusive":false,"quantity":0,"allocated":false,"canDropship":false,"onSale":false,"price":199.0,"currentPrice":199.0,"retailMap":0.0,"fflRequired":true,"sotRequired":false,"exclusiveType":"","scopeCoverIncluded":false,"special":null,"sightsType":"Adjustable Sights","case":null,"choke":null,"dbReduction":null,"family":"10/22 Series","finishType":"Blued","frame":null,"gripType":"Hardwood","handgunSlideMaterial":null,"countryOfOrigin":"US","itemLength":"40.50","itemWidth":"6.50","itemHeight":"14.50","packageLength":"40.0000","packageWidth":"5.9000","packageHeight":"2.9000","itemGroup":"Sporting Semi-Auto Rimfire Rifles"},{"itemNo":"SP0011710305-F","description1":"PH3 6MMCR MOUNTAIN SHADOW 20\"","description2":"0011710305-F","upc":"811
    
    [PROFILE] TOTAL PROGRAM TIME: 7309.60 ms (7.31 sec)

    A total run time of 7.3 seconds… the exact same as the PHP implementation that is within our current code. But how can this be?

    Well, like most software engineers, I took the arrogant perspective that I TOTALLY could write something faster than what the default PHP implementation could be. In reality, the PHP implementation uses the SAME EXACT curl calls that are all C under the hood. I spent all this time writing this code only to write the exact code that was running this entire time. Funny. At the end of the day, this Catalog refresh pipeline is dominated by the 7 second response time it takes to grab all that sweet, sweet catalog data.


    The Reality Most Developers Don’t Want to Hear

    If your program:

    • Calls remote APIs
    • Waits on databases
    • Pulls files over FTP / HTTPS
    • Talks to third-party vendors

    Then your performance is dominated by:

    • Network latency
    • Remote server processing
    • TLS handshakes
    • Payload transfer time

    Not your language.

    You can rewrite PHP → C → Rust → Assembly → Hand-written transistor logic.

    If the server takes 7 seconds to respond, you’re still waiting 7 seconds. Period.


    The Hidden Truth: You’re Probably Already Running C Anyway

    Here’s the funny part.

    Lipsey’s PHP example uses the PHP cURL extension as I stated above.

    The PHP cURL extension is just a wrapper around:

    • libcurl (C)
    • OpenSSL (C)
    • OS networking stack (C / kernel space)

    So the real execution path looks like:

    PHP Version

    PHP Script
     ↓
    PHP cURL Extension
     ↓
    libcurl (C)
     ↓
    OpenSSL (C)
     ↓
    Kernel TCP Stack
     ↓
    Internet

    My C Version

    My C Program
     ↓
    libcurl (C)
     ↓
    OpenSSL (C)
     ↓
    Kernel TCP Stack
     ↓
    Internet

    So once the request leaves your process…

    You’re in C anyway.


    Where The Time Actually Goes

    When you break down HTTP requests, time is usually spent in:

    DNS Lookup

    Usually tiny.

    TCP Connect

    Usually small unless cross-region.

    TLS Handshake

    Moderate but predictable.

    Time To First Byte (TTFB)

    This is usually the killer.

    TTFB includes:

    • Vendor backend processing
    • Cache lookups
    • DB queries
    • Internal microservice calls
    • Queue waits

    If TTFB is 5 seconds, you’re done. That’s your bottleneck. No more speedups or optimizations for you past this hard wall. Think of it like having 10 seconds of work, but only 5 seconds of it can be parallelizable. You are now stuck with 5 seconds of sequential runtime that wont be going away.


    What Actually Makes Systems Faster

    Not rewriting in C.

    Usually:

    Fewer Requests

    Don’t poll APIs that update every 4 hours… every 5 minutes.

    Smarter Scheduling

    Run ingestion when vendor data actually changes.

    Caching

    Token caching. Response caching. Etc.

    Connection Reuse

    Keep TCP + TLS sessions alive.

    Streaming Instead of Buffering

    Especially for giant feeds. I ended up getting massive memory savings as explained in my last post by streaming to a TSV.


    The Dangerous Developer Trap

    Developers love optimizing:

    • Language choice
    • Micro-allocations
    • Loop performance
    • Branch prediction
    • SIMD tricks

    Meanwhile:

    They make 5 redundant API calls per request.


    The Architecture Mindset Shift

    Good performance thinking sometimes actually boils down to this:

    Instead of:

    “How do I make this code faster?”

    Ask:

    “How do I do less work?”

    Instead of:

    “How do I optimize this loop?”

    Ask:

    “Why does this loop exist?”

    Instead of:

    “Should I rewrite this in C?”

    Ask:

    “Why am I making this network call at all?”


    The Real Lesson

    My goal was to beat PHP with C. Then it quickly was not my goal when I realized the folly of my ways.

    In reality, we answered:

    “Is my architecture the bottleneck?”

    The answer was yes.

    And that’s valuable.

    Because now I know where my real speed-ups occur:

    • Caching vendor feeds
    • Reducing unnecessary logins
    • Scheduling ingestion intelligently

    Thankfully, I already had the foresight to implement these things, but only after banging my head against the wall when I saw that my VPS CPU time was being eaten 24/7. But you live and you learn I suppose.


    Final Thought

    Languages matter.

    But architecture matters more.

    If you’re I/O bound:
    Language gains are noise.

    If you’re CPU bound:
    Language matters.

    Most modern backend systems?

    Are I/O bound.

  • FFL Hub, SQL Batched Inserts vs Joins, and the CS/CE Knowledge Divide

    Before diving into this monster of a post (that I am sure is going to ruffle some feathers), its probably helpful to give some context.

    Alongside being a Computer Engineering student, I have also been a long-time gun owner, machinist, and, since last year, the owner of a firearms business called Bickham Firearms. This business has an online presence, and I decided that it would perhaps be lucrative to delve into the firearms drop-shipping space as another source of revenue for the business.

    Which brings me to the first challenge that needed to be solved: How do we manage and create a website that facilitates ordering firearms-related products online (think Amazon, but for guns)? Turns out, there are tons of solutions already available! Do they fit my needs? No!

    Why the pre-existing solutions don’t fit my needs:

    1. They cost money. Upwards of $80/month just for the entry-level option. I am a poor college student. I don’t even have $80/month to spend on chicken nuggets and mac and cheese, let alone a piece of software that only MIGHT make money.
    2. They have authority over your business, your offered items, etc. I don’t like this. The entire spirit of gun ownership is self-reliance and independence.

    So, based on these two gripes, I decided to roll my own! For one, it’s free. This solves issue #1. However, we also have FULL control over everything. I choose how catalogs are imported and updated, I choose the shipping cost rules, etc.

    I call it FFL Hub, and it is a WordPress plugin that integrates into WooCommerce that handles external distributors, large product catalogs, the syncing of the pricing and quantity data of these catalogs, and automated ordering/shipping processing. Take all of these things that are already hard on their base level, and then add the fact that the firearms industry is regulated into the dirt. In total, it makes this entire ordeal a big one to tackle.


    The Archecture:

    At a high level, the system pulls large product catalogs and inventory data from external vendor APIs and FTP feeds, normalizes that data into a consistent internal format, and keeps an online storefront synchronized with real distributor inventory.

    On the order side, it builds structured order placement jobs and dispatches them to the appropriate distributor systems through background processing pipelines.

    From an engineering perspective, this means dealing with:

    • Large periodic data refreshes
    • Inconsistent external schemas
    • Partial API failures
    • Retry and backoff logic
    • Idempotency guarantees to prevent duplicate external orders

    In this post, we’ll primarily focus on the catalog syncing and inventory quantity synchronization portion of the system.


    The Catalog Syncing System: A Tale of Two APIs

    When I first started to write this code, I had two main distributors that I dealt with: Lipsey’s and RSR. Lipsey’s mainly supplies me with my firearms, whilst RSR supplies me with my ammo/accessories (RSR bans small-time businesses from purchasing firearms).

    Both of these distributors offer ways in which you can grab catalog data, just in two VERY different ways.

    Lipsey’s

    Lipsey’s serves up two API endpoints that are of interest to us:

    1. Catalog
    2. PricingAndQuantity

    These two are pretty self-explanatory. They are both retrieved through a POST request to the endpoint URLs they provide, but each one behaves a bit differently.

    The Catalog endpoint updates once every 4 hours. It provides no context as to when the next update will be. This means that staying in phase with catalog updates is impossible. Fortunately, this is not an issue because of our second endpoint.

    The PricingAndQuantity endpoint updates every hour. This is stated in their API docs. This is a lie. It actually updates every 30 minutes, and I know this because they DO provide a field in the POST request return message called “nextUpdate”. We use this nextUpdate value to time our API calls so that it only really runs when there is new data for us to churn through.

    It is CRUCIAL that we time the pricing and quantity updates in phase so that we aren’t serving stale pricing and quantity data to the customer. We don’t want a customer ordering something that is out of stock. That is bad.

    RSR

    RSR serves up API endpoints, but they are non-functional. Their documentation does not state this (and it also does not state the proper calling parameters for their ordering calls, but that is another issue).

    Instead, they provide two files on an FTP server. One is for the overall catalog (updated once every 2 hours), and the other one is for the quantity data (updated every 5 minutes).

    There is ZERO heads up as to when the next update will be; therefore, we use a heat-up and cooldown rule to log in to the FTP server only when we can safely assume a file update is reasonably close. This is to save on the time it takes to establish an FTP connection. Once we have established a connection, we check the last modified date along with file size to make sure it is fresh data that is not currently being edited.


    How this ties into SQL Optimizations

    So, with all of these methods of grabbing product data figured out, how do we efficiently update our SQL tables to reflect the latest pricing and quantity info?

    I can tell you how to NOT do it: Batched Inserts.

    And I learned this one the hard way. For the Lipsey’s catalog/quantity update cron jobs, I was seeing runtimes of up to 15 seconds and memory usage of up to 200MB. The results were similar for RSR, minus the memory usage.

    Here is a little rule that I have learned through this entire ordeal that will stick with me until the day I die:

    Don’t trust code you didn’t write without reading it first.

    As it turns out, the default PHP implementation that Lipsey’s provides streams the entire endpoint result straight into memory and lets it sit there as a giant keyed array until you do something with it (like taking each item and batch inserting the new data into your SQL table).

    This problem was solved by taking their code, throwing it out the window, and instead writing a little snippet that streams the endpoint result into a TSV file (for both Catalog updates and for Quantity updates). This solved the memory usage problem, but what about the run time?

    Enter: Join statements

    Here’s another rule that I picked up:

    For things such as SQL, its better to rely on the low-level optimized code for massive table edits rather than rolling your own (most of the time).

    The thing is, inserting the new data into the table as batches was SLOW. REALLY SLOW.

    What ended up being the silver bullet to this problem was instead of batching inserts, we simply take that TSV file (whether it be a Catalog file or a quantity update file), use file loading to load it into a staging table, then we INSERT JOIN that table into our authoritative catalog table (in reality, there is a bit more to it, but this was the BIG gotcha that ended up halving our update times).


    The Results:

    By doing this, here is what we achieved speed wise:

    Lispsey’s:

    • Old Catalog Update: 15s (7s from endpoint + 8s SQL updates)
    • New Catalog Update: 7.1s (7s from endpoint + 100ms SQL updates)

    RSR:

    • Old Catalog Update: 5s (2s FTP download + 3s SQL updates)
    • New Catalog Update: 2.1s (2s FTP download + 100ms SQL updates)

    Pretty astonishing speedups, and now most the time spent is spent on things out of our hands (which we mitigate as stated early through heat up and cooldown rules).


    The ECE vs CS Knowledge Divide

    If you have read this far, you have probably been scratching your head as to why I never knew that JOIN statements were preferred in situations like this in the first place.

    This is because as an ECE (Computer Engineering) major, we learn NOTHING about SQL, databases, or anything web stack related. I come from the land of Assembly, C, and systems programming.

    In contrast, CS (Computer Science) majors deal with this sort of stuff all the time. It is just about every other day that I hear a CS major walking down the hall talking about their new React project or their new C# web app that deals with databases. On the flip side, when they do end up having to take Operating Systems (a class that is exclusively taught through C and PintOS), they get absolutely walloped. So I guess in the end the learning struggles just about even out.

  • Why I Built This Dev Blog

    This blog has been a long time coming, and to be perfectly honest, LONG overdue. At the time of writing, it is February 6th, 2026. This puts me in my fifth year as an LSU Computer Engineering student and about three months away from graduation (I am very, very scared).

    And like a lot of engineering students approaching graduation, I started asking myself a simple question: What have I actually built?

    LSU gave me a strong technical foundation in some ways. There have been a few classes, such as Computer Organization, Operating Systems, and ARM Assembly, that really challenged me and forced me to learn.

    But what I’ve learned over the last few years is that engineering isn’t about getting a good grade and completing coursework — it’s about building real products and real systems, breaking them, debugging them at 2 AM, and figuring out how to make them reliable when real users and real money are involved.

    And when I look back at the work I’m most proud of, almost all of it came from projects I started on my own, work I did at my job, or things I pursued through extracurricular activities early on.

    To be honest, most of the technical depth I have today didn’t come from the classroom.

    A lot of it started in high school through FIRST Robotics, where I got my first exposure to real engineering constraints: hardware that fails, deadlines that don’t move, and systems that have to work outside of a controlled lab environment. More posts are to come on each of these aspects whenever I get a shower-thought to post about them.

    That accelerated during my internship and through personal projects, where I had to design, build, and maintain systems that people actually depended on.

    College gave me exposure to a wide range of topics, but the moments where I grew the fastest were always when I was forced to build something real — something that had to work outside of a textbook or a grading rubric.

    That’s when the idea of engineering really clicked for me.

    At the end of the day, it’s not about getting the diploma and walking the stage. I know plenty of fellow engineering graduates who will get the diploma but have no real interest in pursuing their careers, and who will fall behind in this ever-changing world (with AI being a major driver of that change).

    In reality, it’s about the brass tacks — getting real things done and gaining lessons and skills that make the next engineering challenge easier to tackle.

    That is what this dev blog is about: showing the real, actionable work that I’ve put on the table.