Course Home L1 — Core Interfaces L2 — Hash & Sorted L3 — Specialized L4 — LINQ Basics L5 — Advanced LINQ All Guides

Lesson 2: Hash-Based & Sorted Data Structures

Master non-linear data structures, hashing fundamentals, and their Big O tradeoffs.

How Hashing Works in .NET

Before diving into Dictionary and HashSet, it's essential to understand the hashing mechanism that powers them. Every object in .NET has a GetHashCode() method (inherited from System.Object). When you store a key in a hash-based collection, the runtime calls GetHashCode() to compute a numeric hash, which is then mapped to a "bucket" — a slot in an internal array.

Retrieving an element means: compute the hash, find the bucket, then compare with Equals() to locate the exact item. Because this lookup doesn't depend on how many elements are in the collection (it goes directly to the right bucket), it runs in O(1) average time.

Hash collisions When two different keys produce the same hash code, they land in the same bucket — this is called a collision. .NET handles this with chaining (storing a linked list of entries per bucket). In the worst case (all keys collide), operations degrade to O(n). In practice, the default hash functions distribute well, and you rarely encounter this.

The Contract: GetHashCode() and Equals()

If you use custom objects as dictionary keys or in hash sets, you must correctly implement both GetHashCode() and Equals(). The rules are:

public class Employee
{
    public int Id { get; set; }
    public string Name { get; set; }

    public override bool Equals(object obj)
    {
        return obj is Employee other && Id == other.Id;
    }

    public override int GetHashCode()
    {
        return Id.GetHashCode(); // Consistent with Equals
    }
}
Modern shortcut In C# 9+, record types automatically generate correct Equals() and GetHashCode() implementations based on all their properties. Using record for dictionary keys is a clean, safe approach.

Dictionary<TKey, TValue>

The workhorse of key-value storage in C#. It maps a unique key to a value using hashing. It is unordered — you cannot rely on the order of enumeration.

Core Operations and Complexity

OperationComplexityNotes
Add(key, value)O(1)Amortized; O(n) on resize
[key] (get/set)O(1)Throws KeyNotFoundException on missing key
ContainsKey(key)O(1)Checks existence without retrieving value
ContainsValue(value)O(n)Must scan all values (no hash on values)
Remove(key)O(1)Removes by key
TryGetValue(key, out value)O(1)Safe lookup — no exception on missing key

Best Practices

// ALWAYS prefer TryGetValue over ContainsKey + indexer
// BAD: Two hash lookups
if (userMap.ContainsKey(userId))
{
    var user = userMap[userId]; // Second lookup!
}

// GOOD: Single hash lookup
if (userMap.TryGetValue(userId, out var user))
{
    // Use 'user' directly
}

// Pre-define capacity when you know the size
var lookup = new Dictionary<string, int>(50000);
Interview favorite "Why use TryGetValue instead of checking ContainsKey and then accessing the indexer?" Because ContainsKey + [key] performs two hash lookups, while TryGetValue does it in one. In hot loops, this doubles your hash computation cost unnecessarily.

C# 9+ Collection Expressions and Patterns

// Collection initializer syntax
var capitals = new Dictionary<string, string>
{
    ["France"] = "Paris",
    ["Japan"] = "Tokyo",
    ["Brazil"] = "Brasília"
};

// GetValueOrDefault (since .NET Core 2.0 / .NET 5+)
string capital = capitals.GetValueOrDefault("Italy", "Unknown");

HashSet<T>

HashSet<T> is a collection of unique elements. Think of it as a Dictionary where you only care about the keys. It uses the same hashing mechanism, giving it O(1) for Add, Contains, and Remove.

Set Operations

What makes HashSet<T> truly powerful are its mathematical set operations:

MethodDescriptionMath Equivalent
UnionWith(other)Adds all elements from otherA ∪ B
IntersectWith(other)Keeps only elements also in otherA ∩ B
ExceptWith(other)Removes elements that are in otherA \ B
SymmetricExceptWith(other)Keeps elements in one set but not bothA △ B
IsSubsetOf(other)Checks if all elements are in otherA ⊆ B
IsSupersetOf(other)Checks if all elements of other are hereA ⊇ B
Overlaps(other)Checks if any element is sharedA ∩ B ≠ ∅
HashSet<string> teamA = new HashSet<string> { "Alice", "Bob", "Charlie" };
HashSet<string> teamB = new HashSet<string> { "Bob", "Diana", "Eve" };

// Who is on BOTH teams?
var both = new HashSet<string>(teamA);
both.IntersectWith(teamB); // {"Bob"}

// Who is on team A but NOT team B?
var onlyA = new HashSet<string>(teamA);
onlyA.ExceptWith(teamB); // {"Alice", "Charlie"}

// Case-insensitive set
var uniqueNames = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
uniqueNames.Add("Alice");
uniqueNames.Add("ALICE"); // Returns false — already exists

Sorted Collections

When you need your data to stay in sorted order at all times, .NET provides several options. They sacrifice the O(1) performance of hash-based structures for O(log n) operations in exchange for ordering.

SortedList<TKey, TValue> vs. SortedDictionary<TKey, TValue>

Both keep keys sorted, but their internal implementations differ significantly:

FeatureSortedListSortedDictionary
Internal structureTwo parallel arrays (keys + values)Red-black tree
Memory usageLower (contiguous arrays)Higher (tree node overhead)
Insert / RemoveO(n) (array shifting)O(log n)
Lookup by keyO(log n) (binary search)O(log n)
Access by indexO(1) via .Keys[i]Not supported
Best forData loaded once, queried oftenFrequent insertions/removals
// SortedList — keeps entries sorted by key
var scores = new SortedList<string, int>();
scores.Add("Charlie", 92);
scores.Add("Alice", 88);
scores.Add("Bob", 95);

// Iterating produces: Alice=88, Bob=95, Charlie=92
foreach (var kvp in scores)
{
    Console.WriteLine($"{kvp.Key}: {kvp.Value}");
}

// Access by index (unique to SortedList)
string firstKey = scores.Keys[0]; // "Alice"

SortedSet<T>

The sorted counterpart of HashSet<T>. It keeps elements in sorted order using a red-black tree. It supports the same set operations (UnionWith, IntersectWith, etc.) and also offers GetViewBetween(lower, upper) to get a live view of a range of elements.

var sortedNums = new SortedSet<int> { 50, 10, 30, 20, 40 };
// Iteration order: 10, 20, 30, 40, 50

// Get a live range view (20 to 40 inclusive)
var range = sortedNums.GetViewBetween(20, 40);
// range: {20, 30, 40}
// If you add 25 to sortedNums, range automatically includes it

Decision Guide: Which Structure to Use?

Quick decision tree Do you need key-value pairs? → Dictionary or SortedDictionary. Just unique values? → HashSet or SortedSet. Need ordering? → Use the "Sorted" variant. Need max speed? → Use the hash-based variant. Need indexed access to sorted data? → SortedList.

Bonus: ConcurrentDictionary<TKey, TValue>

In multi-threaded scenarios, a regular Dictionary is not thread-safe. ConcurrentDictionary (in System.Collections.Concurrent) uses fine-grained locking (striped locks across buckets) to allow safe concurrent reads and writes without locking the entire collection.

Key methods:

var cache = new ConcurrentDictionary<string, int>();

// Thread-safe "get or compute" pattern
int count = cache.GetOrAdd("pageViews", key => LoadFromDatabase(key));

// Atomic increment
cache.AddOrUpdate("pageViews", 1, (key, old) => old + 1);

// Safe pattern: Lazy<T> ensures factory runs only once
var safeCache = new ConcurrentDictionary<string, Lazy<ExpensiveObject>>();
var value = safeCache.GetOrAdd(
    "key",
    k => new Lazy<ExpensiveObject>(() => new ExpensiveObject(k))
).Value;
Interview trap: GetOrAdd is not fully atomic The GetOrAdd and AddOrUpdate factory delegates may execute more than once under thread contention. Only one result is stored, but if your factory allocates resources (database connections, file handles), you can leak the duplicates. The standard fix is to wrap the value in Lazy<T> so the expensive work only happens once, when .Value is accessed.

Coding Challenge

Given an array of integers with many duplicates, write a function that returns a Dictionary<int, int> where the key is the number and the value is its frequency. Do this in O(n) time.

Bonus: Make the function generic so it works with any IEnumerable<T>.

View Solution
// Basic solution — int-specific
public static Dictionary<int, int> CountFrequencies(int[] numbers)
{
    var freq = new Dictionary<int, int>(numbers.Length);

    foreach (int num in numbers)
    {
        // TryGetValue: single lookup instead of ContainsKey + indexer
        if (freq.TryGetValue(num, out int count))
        {
            freq[num] = count + 1;
        }
        else
        {
            freq[num] = 1;
        }
    }

    return freq;
}

// BONUS: Generic version using CollectionsMarshal (since .NET 6)
using System.Runtime.InteropServices;

public static Dictionary<T, int> CountFrequencies<T>(
    IEnumerable<T> items) where T : notnull
{
    var freq = new Dictionary<T, int>();

    foreach (T item in items)
    {
        // GetValueRefOrAddDefault returns a ref to the value slot
        ref int count = ref CollectionsMarshal
            .GetValueRefOrAddDefault(freq, item, out _);
        count++;
    }

    return freq;
}

// Usage:
int[] data = { 1, 3, 2, 3, 1, 1, 4, 3 };
var result = CountFrequencies(data);
// {1: 3, 3: 3, 2: 1, 4: 1}

Analysis: We iterate once through the array, performing one O(1) dictionary lookup per element. Total: O(n). Pre-allocating the dictionary capacity avoids internal resizing. The bonus solution uses CollectionsMarshal.GetValueRefOrAddDefault for a single-lookup increment — the fastest possible pattern in .NET 6+.