Lesson 4: Advanced LINQ Transformations & Set Operations
Learn to project, flatten, group, and aggregate complex data using LINQ's most powerful operators.
Select — Projection (One-to-One)
Select transforms each element into exactly one new element. It's the LINQ equivalent of a map in functional programming. The shape of the output can be completely different from the input:
var employees = GetEmployees(); // Project into anonymous type (common for DTOs) var summaries = employees.Select(e => new { FullName = $"{e.FirstName} {e.LastName}", AnnualPay = e.MonthlySalary * 12, IsManager = e.DirectReports.Count > 0 }); // Select with index overload — the lambda receives the index as well var indexed = names.Select((name, i) => $"{i + 1}. {name}"); // "1. Alice", "2. Bob", "3. Charlie"
Select, Where, SkipWhile, and TakeWhile all support this. It is extremely handy and often overlooked.
SelectMany — Flattening (One-to-Many)
SelectMany is one of the most powerful — and most confusing — LINQ operators. Where Select produces one output per input, SelectMany produces zero, one, or many outputs per input, then flattens them all into a single sequence.
The Problem SelectMany Solves
public class Order { public int OrderId { get; set; } public List<string> Items { get; set; } } List<Order> orders = new List<Order> { new Order { OrderId = 1, Items = new List<string> { "Laptop", "Mouse" } }, new Order { OrderId = 2, Items = new List<string> { "Keyboard" } }, new Order { OrderId = 3, Items = new List<string> { "Monitor", "Cable", "Stand" } } }; // Select gives you a LIST OF LISTS: var nested = orders.Select(o => o.Items); // Type: IEnumerable<List<string>> // [["Laptop","Mouse"], ["Keyboard"], ["Monitor","Cable","Stand"]] // SelectMany FLATTENS into a single sequence: var allItems = orders.SelectMany(o => o.Items); // Type: IEnumerable<string> // ["Laptop","Mouse","Keyboard","Monitor","Cable","Stand"]
SelectMany with Result Selector
There's a powerful overload that gives you access to both the parent element and each child element, allowing you to combine them:
// Get each item paired with its order ID var itemDetails = orders.SelectMany( order => order.Items, // collection selector (order, item) => new // result selector { order.OrderId, ItemName = item } ); // [{OrderId=1, ItemName="Laptop"}, {OrderId=1, ItemName="Mouse"}, ...]
Visualizing Select vs. SelectMany
Select(o => o.Items): Order 1 ──→ [ Laptop, Mouse ] Order 2 ──→ [ Keyboard ] Order 3 ──→ [ Monitor, Cable, Stand ] Result: IEnumerable<List<string>> (nested — each element is a list) SelectMany(o => o.Items): Order 1 ──→ Laptop, Mouse Order 2 ──→ Keyboard Order 3 ──→ Monitor, Cable, Stand Result: IEnumerable<string> (flat — all items in one sequence)
GroupBy — Organizing Data by Key
GroupBy partitions a sequence into groups based on a key you define. Each group is an IGrouping<TKey, TElement>, which is itself an IEnumerable<TElement> with a .Key property.
var employees = new List<Employee> { new("Alice", "Engineering", 95000), new("Bob", "Marketing", 72000), new("Charlie", "Engineering", 88000), new("Diana", "Marketing", 68000), new("Eve", "Engineering", 110000) }; // Basic GroupBy var byDept = employees.GroupBy(e => e.Department); foreach (var group in byDept) { Console.WriteLine($"--- {group.Key} ---"); foreach (var emp in group) { Console.WriteLine($" {emp.Name}: ${emp.Salary:N0}"); } }
GroupBy with Element and Result Selectors
// Element selector: choose what goes INTO each group var namesByDept = employees.GroupBy( e => e.Department, // Key selector e => e.Name // Element selector — group contains only names ); // Result selector: transform each group into a final shape var deptStats = employees.GroupBy( e => e.Department, (dept, emps) => new // Result selector { Department = dept, Count = emps.Count(), AvgSalary = emps.Average(e => e.Salary), TopEarner = emps.OrderByDescending(e => e.Salary).First().Name } );
GroupBy is deferred — it doesn't execute until iterated. ToLookup is the immediate-execution equivalent that creates an ILookup<TKey, TElement> (like a read-only dictionary of lists). ToDictionary creates a Dictionary and throws on duplicate keys, whereas ToLookup naturally handles multiple values per key.
Aggregate — Custom Folding
Aggregate reduces a sequence to a single value using a custom accumulator function. It's the most flexible aggregation method — Sum, Count, Min, and Max are all specialized forms of Aggregate.
The Three Overloads
var words = new[] { "LINQ", "is", "incredibly", "powerful" }; // Overload 1: Accumulator only (first element is initial seed) string sentence = words.Aggregate((acc, w) => acc + ", " + w); // "LINQ, is, incredibly, powerful" // Overload 2: Seed + Accumulator int totalLength = words.Aggregate( 0, // seed (acc, w) => acc + w.Length // accumulator ); // 0 + 4 + 2 + 10 + 8 = 24 // Overload 3: Seed + Accumulator + Result Selector string result = words.Aggregate( new StringBuilder(), // seed (sb, w) => sb.Append(w).Append(' '), // accumulator sb => sb.ToString().Trim() // result selector ); // "LINQ is incredibly powerful"
acc + ", " + w) creates a new string object on every iteration — this is O(n²) for strings. For joining strings, always prefer string.Join(", ", words) — it is the idiomatic solution, pre-calculates the total length, and allocates only once. Use StringBuilder via overload 3 when your aggregation logic is more complex than simple concatenation. Reserve Aggregate for non-string reductions (numeric computations, building custom objects, etc.).
Set Operations: Union, Concat, Intersect, Except
LINQ provides mathematical set operations as extension methods. Understanding the difference between Union and Concat is a common interview question:
| Method | Duplicates? | Description |
|---|---|---|
Concat | Keeps all | Appends second sequence to first. Like SQL UNION ALL. |
Union | Removes | Combines and deduplicates. Like SQL UNION. |
Intersect | Removes | Elements present in both sequences. |
Except | Removes | Elements in the first but not the second. |
int[] a = { 1, 2, 3, 4 }; int[] b = { 3, 4, 5, 6 }; var concat = a.Concat(b); // [1, 2, 3, 4, 3, 4, 5, 6] var union = a.Union(b); // [1, 2, 3, 4, 5, 6] var intersect = a.Intersect(b); // [3, 4] var except = a.Except(b); // [1, 2] // .NET 6+ "By" variants — use a key selector for comparisons var uniqueByName = employees.UnionBy(contractors, e => e.Email); var onlyInFirst = employees.ExceptBy( contractors.Select(c => c.Email), e => e.Email );
Bonus: Join and GroupJoin
LINQ can join two sequences, similar to SQL JOINs. Join performs an inner join; GroupJoin performs a left outer join (where each left element gets a collection of matching right elements).
var departments = new[] { new { Id = 1, Name = "Engineering" }, new { Id = 2, Name = "Marketing" } }; var employees = new[] { new { Name = "Alice", DeptId = 1 }, new { Name = "Bob", DeptId = 2 }, new { Name = "Charlie", DeptId = 1 } }; // Inner Join: employees with their department names var joined = departments.Join( employees, // inner sequence d => d.Id, // outer key selector e => e.DeptId, // inner key selector (d, e) => new // result selector { Employee = e.Name, Department = d.Name } ); // Query syntax is often cleaner for joins: var joined2 = from d in departments join e in employees on d.Id equals e.DeptId select new { e.Name, Dept = d.Name };
Chaining It All Together
The real power of LINQ comes from chaining multiple operators into a readable pipeline. Here's a realistic example that combines most of the operators from this lesson:
// Real-world pipeline: Build a department salary report var report = company.Departments .SelectMany(d => d.Employees, (dept, emp) => new { dept.DepartmentName, emp.Name, emp.Salary, emp.JobTitle }) .Where(x => x.Salary > 50000) .GroupBy(x => x.DepartmentName) .Select(g => new { Department = g.Key, HeadCount = g.Count(), AverageSalary = g.Average(x => x.Salary), TotalPayroll = g.Sum(x => x.Salary), TopEarner = g.OrderByDescending(x => x.Salary).First().Name }) .OrderByDescending(x => x.TotalPayroll);
Coding Challenge
Create a class Department containing a Name property and a List<Employee>. The Employee class should have Name, JobTitle, and Salary properties. Create a list of several departments with employees. Then:
- Use
SelectManyto get a flat list of all employees across all departments. - Use
Whereto filter those earning over $50,000. - Use
GroupByto group these high-earners back by theirJobTitle.
View Solution
public class Employee { public string Name { get; set; } public string JobTitle { get; set; } public decimal Salary { get; set; } } public class Department { public string Name { get; set; } public List<Employee> Employees { get; set; } = new(); } // Setup test data var departments = new List<Department> { new Department { Name = "Engineering", Employees = new List<Employee> { new Employee { Name = "Alice", JobTitle = "Senior Dev", Salary = 95000m }, new Employee { Name = "Charlie", JobTitle = "Junior Dev", Salary = 45000m }, new Employee { Name = "Eve", JobTitle = "Senior Dev", Salary = 110000m } } }, new Department { Name = "Marketing", Employees = new List<Employee> { new Employee { Name = "Bob", JobTitle = "Manager", Salary = 72000m }, new Employee { Name = "Diana", JobTitle = "Junior Dev", Salary = 48000m }, new Employee { Name = "Frank", JobTitle = "Manager", Salary = 85000m } } } }; // The LINQ pipeline var highEarnersByTitle = departments .SelectMany(d => d.Employees) // Step 1: Flatten .Where(e => e.Salary > 50000m) // Step 2: Filter .GroupBy(e => e.JobTitle); // Step 3: Group // Print results foreach (var group in highEarnersByTitle) { Console.WriteLine($"\n--- {group.Key} ---"); foreach (var emp in group) { Console.WriteLine($" {emp.Name}: ${emp.Salary:N0}"); } } // Output: // --- Senior Dev --- // Alice: $95,000 // Eve: $110,000 // --- Manager --- // Bob: $72,000 // Frank: $85,000