Course Home L1 — Core Interfaces L2 — Hash & Sorted L3 — Specialized L4 — LINQ Basics L5 — Advanced LINQ All Guides

Lesson 4: Advanced LINQ Transformations & Set Operations

Learn to project, flatten, group, and aggregate complex data using LINQ's most powerful operators.

Select — Projection (One-to-One)

Select transforms each element into exactly one new element. It's the LINQ equivalent of a map in functional programming. The shape of the output can be completely different from the input:

var employees = GetEmployees();

// Project into anonymous type (common for DTOs)
var summaries = employees.Select(e => new
{
    FullName = $"{e.FirstName} {e.LastName}",
    AnnualPay = e.MonthlySalary * 12,
    IsManager = e.DirectReports.Count > 0
});

// Select with index overload — the lambda receives the index as well
var indexed = names.Select((name, i) => $"{i + 1}. {name}");
// "1. Alice", "2. Bob", "3. Charlie"
Overload with index Many LINQ methods have a second overload that provides the element's index. Select, Where, SkipWhile, and TakeWhile all support this. It is extremely handy and often overlooked.

SelectMany — Flattening (One-to-Many)

SelectMany is one of the most powerful — and most confusing — LINQ operators. Where Select produces one output per input, SelectMany produces zero, one, or many outputs per input, then flattens them all into a single sequence.

The Problem SelectMany Solves

public class Order
{
    public int OrderId { get; set; }
    public List<string> Items { get; set; }
}

List<Order> orders = new List<Order>
{
    new Order { OrderId = 1, Items = new List<string> { "Laptop", "Mouse" } },
    new Order { OrderId = 2, Items = new List<string> { "Keyboard" } },
    new Order { OrderId = 3, Items = new List<string> { "Monitor", "Cable", "Stand" } }
};

// Select gives you a LIST OF LISTS:
var nested = orders.Select(o => o.Items);
// Type: IEnumerable<List<string>>
// [["Laptop","Mouse"], ["Keyboard"], ["Monitor","Cable","Stand"]]

// SelectMany FLATTENS into a single sequence:
var allItems = orders.SelectMany(o => o.Items);
// Type: IEnumerable<string>
// ["Laptop","Mouse","Keyboard","Monitor","Cable","Stand"]

SelectMany with Result Selector

There's a powerful overload that gives you access to both the parent element and each child element, allowing you to combine them:

// Get each item paired with its order ID
var itemDetails = orders.SelectMany(
    order => order.Items,               // collection selector
    (order, item) => new               // result selector
    {
        order.OrderId,
        ItemName = item
    }
);
// [{OrderId=1, ItemName="Laptop"}, {OrderId=1, ItemName="Mouse"}, ...]

Visualizing Select vs. SelectMany

  Select(o => o.Items):
    Order 1 ──→  [ Laptop, Mouse ]
    Order 2 ──→  [ Keyboard ]
    Order 3 ──→  [ Monitor, Cable, Stand ]
    Result: IEnumerable<List<string>>   (nested — each element is a list)

  SelectMany(o => o.Items):
    Order 1 ──→  Laptop, Mouse
    Order 2 ──→  Keyboard
    Order 3 ──→  Monitor, Cable, Stand
    Result: IEnumerable<string>          (flat — all items in one sequence)

GroupBy — Organizing Data by Key

GroupBy partitions a sequence into groups based on a key you define. Each group is an IGrouping<TKey, TElement>, which is itself an IEnumerable<TElement> with a .Key property.

var employees = new List<Employee>
{
    new("Alice",   "Engineering", 95000),
    new("Bob",     "Marketing",   72000),
    new("Charlie", "Engineering", 88000),
    new("Diana",   "Marketing",   68000),
    new("Eve",     "Engineering", 110000)
};

// Basic GroupBy
var byDept = employees.GroupBy(e => e.Department);

foreach (var group in byDept)
{
    Console.WriteLine($"--- {group.Key} ---");
    foreach (var emp in group)
    {
        Console.WriteLine($"  {emp.Name}: ${emp.Salary:N0}");
    }
}

GroupBy with Element and Result Selectors

// Element selector: choose what goes INTO each group
var namesByDept = employees.GroupBy(
    e => e.Department,   // Key selector
    e => e.Name          // Element selector — group contains only names
);

// Result selector: transform each group into a final shape
var deptStats = employees.GroupBy(
    e => e.Department,
    (dept, emps) => new  // Result selector
    {
        Department = dept,
        Count = emps.Count(),
        AvgSalary = emps.Average(e => e.Salary),
        TopEarner = emps.OrderByDescending(e => e.Salary).First().Name
    }
);
GroupBy vs. ToDictionary vs. ToLookup GroupBy is deferred — it doesn't execute until iterated. ToLookup is the immediate-execution equivalent that creates an ILookup<TKey, TElement> (like a read-only dictionary of lists). ToDictionary creates a Dictionary and throws on duplicate keys, whereas ToLookup naturally handles multiple values per key.

Aggregate — Custom Folding

Aggregate reduces a sequence to a single value using a custom accumulator function. It's the most flexible aggregation method — Sum, Count, Min, and Max are all specialized forms of Aggregate.

The Three Overloads

var words = new[] { "LINQ", "is", "incredibly", "powerful" };

// Overload 1: Accumulator only (first element is initial seed)
string sentence = words.Aggregate((acc, w) => acc + ", " + w);
// "LINQ, is, incredibly, powerful"

// Overload 2: Seed + Accumulator
int totalLength = words.Aggregate(
    0,                         // seed
    (acc, w) => acc + w.Length  // accumulator
);
// 0 + 4 + 2 + 10 + 8 = 24

// Overload 3: Seed + Accumulator + Result Selector
string result = words.Aggregate(
    new StringBuilder(),                    // seed
    (sb, w) => sb.Append(w).Append(' '),   // accumulator
    sb => sb.ToString().Trim()              // result selector
);
// "LINQ is incredibly powerful"
Performance trap The first overload using string concatenation (acc + ", " + w) creates a new string object on every iteration — this is O(n²) for strings. For joining strings, always prefer string.Join(", ", words) — it is the idiomatic solution, pre-calculates the total length, and allocates only once. Use StringBuilder via overload 3 when your aggregation logic is more complex than simple concatenation. Reserve Aggregate for non-string reductions (numeric computations, building custom objects, etc.).

Set Operations: Union, Concat, Intersect, Except

LINQ provides mathematical set operations as extension methods. Understanding the difference between Union and Concat is a common interview question:

MethodDuplicates?Description
ConcatKeeps allAppends second sequence to first. Like SQL UNION ALL.
UnionRemovesCombines and deduplicates. Like SQL UNION.
IntersectRemovesElements present in both sequences.
ExceptRemovesElements in the first but not the second.
int[] a = { 1, 2, 3, 4 };
int[] b = { 3, 4, 5, 6 };

var concat    = a.Concat(b);    // [1, 2, 3, 4, 3, 4, 5, 6]
var union     = a.Union(b);     // [1, 2, 3, 4, 5, 6]
var intersect = a.Intersect(b); // [3, 4]
var except    = a.Except(b);    // [1, 2]

// .NET 6+ "By" variants — use a key selector for comparisons
var uniqueByName = employees.UnionBy(contractors, e => e.Email);
var onlyInFirst  = employees.ExceptBy(
    contractors.Select(c => c.Email), e => e.Email
);

Bonus: Join and GroupJoin

LINQ can join two sequences, similar to SQL JOINs. Join performs an inner join; GroupJoin performs a left outer join (where each left element gets a collection of matching right elements).

var departments = new[]
{
    new { Id = 1, Name = "Engineering" },
    new { Id = 2, Name = "Marketing" }
};

var employees = new[]
{
    new { Name = "Alice", DeptId = 1 },
    new { Name = "Bob", DeptId = 2 },
    new { Name = "Charlie", DeptId = 1 }
};

// Inner Join: employees with their department names
var joined = departments.Join(
    employees,             // inner sequence
    d => d.Id,             // outer key selector
    e => e.DeptId,         // inner key selector
    (d, e) => new          // result selector
    {
        Employee = e.Name,
        Department = d.Name
    }
);
// Query syntax is often cleaner for joins:
var joined2 = from d in departments
              join e in employees on d.Id equals e.DeptId
              select new { e.Name, Dept = d.Name };

Chaining It All Together

The real power of LINQ comes from chaining multiple operators into a readable pipeline. Here's a realistic example that combines most of the operators from this lesson:

// Real-world pipeline: Build a department salary report
var report = company.Departments
    .SelectMany(d => d.Employees, (dept, emp) => new
    {
        dept.DepartmentName,
        emp.Name,
        emp.Salary,
        emp.JobTitle
    })
    .Where(x => x.Salary > 50000)
    .GroupBy(x => x.DepartmentName)
    .Select(g => new
    {
        Department = g.Key,
        HeadCount = g.Count(),
        AverageSalary = g.Average(x => x.Salary),
        TotalPayroll = g.Sum(x => x.Salary),
        TopEarner = g.OrderByDescending(x => x.Salary).First().Name
    })
    .OrderByDescending(x => x.TotalPayroll);

Coding Challenge

Create a class Department containing a Name property and a List<Employee>. The Employee class should have Name, JobTitle, and Salary properties. Create a list of several departments with employees. Then:

  1. Use SelectMany to get a flat list of all employees across all departments.
  2. Use Where to filter those earning over $50,000.
  3. Use GroupBy to group these high-earners back by their JobTitle.
View Solution
public class Employee
{
    public string Name { get; set; }
    public string JobTitle { get; set; }
    public decimal Salary { get; set; }
}

public class Department
{
    public string Name { get; set; }
    public List<Employee> Employees { get; set; } = new();
}

// Setup test data
var departments = new List<Department>
{
    new Department
    {
        Name = "Engineering",
        Employees = new List<Employee>
        {
            new Employee { Name = "Alice",   JobTitle = "Senior Dev",   Salary = 95000m },
            new Employee { Name = "Charlie", JobTitle = "Junior Dev",   Salary = 45000m },
            new Employee { Name = "Eve",     JobTitle = "Senior Dev",   Salary = 110000m }
        }
    },
    new Department
    {
        Name = "Marketing",
        Employees = new List<Employee>
        {
            new Employee { Name = "Bob",   JobTitle = "Manager",      Salary = 72000m },
            new Employee { Name = "Diana", JobTitle = "Junior Dev",   Salary = 48000m },
            new Employee { Name = "Frank", JobTitle = "Manager",      Salary = 85000m }
        }
    }
};

// The LINQ pipeline
var highEarnersByTitle = departments
    .SelectMany(d => d.Employees)          // Step 1: Flatten
    .Where(e => e.Salary > 50000m)          // Step 2: Filter
    .GroupBy(e => e.JobTitle);              // Step 3: Group

// Print results
foreach (var group in highEarnersByTitle)
{
    Console.WriteLine($"\n--- {group.Key} ---");
    foreach (var emp in group)
    {
        Console.WriteLine($"  {emp.Name}: ${emp.Salary:N0}");
    }
}

// Output:
// --- Senior Dev ---
//   Alice: $95,000
//   Eve: $110,000
// --- Manager ---
//   Bob: $72,000
//   Frank: $85,000