Every other week, the Principle Studios engineering team gets together and we challenge ourselves to accomplish a tiny task given by another team member. When we’re done, we share the results of the challenge and try to learn from each other - since there’s rarely one “right” way to accomplish even technical tasks. Sometimes these are simple, such as “write the types for this concept in TypeScript”, “use this feature in C# to accomplish this task”… and sometimes they’re a bit more open-ended.
This week, we tackled at full outer joins using functional programming:
Functional programming is a popular paradigm these days; proponents of it use the terms “correctness” and “simplicity”, but for those of us who don’t work in that methodology frequently, “simplicity” doesn’t resonate. I prefer to think of it as a paradigm that enforces many of the SOLID principles. Some examples we use every day include many (but not all) of JS’s Array.prototype functions (like
map
,filter
, andreduce
), rxjs, and C#‘s LINQ extensions. Write a program in as functional a way as you can manage that solves the following problem, given the pseudocode also provided. (Note, there’s not a “right” solution!)Given two lists of user records from two different systems, create a unified list that has both original objects, joining on the email address, which you may assume to be unique in each list. (Perform an in-memory full outer join, for those of you with a SQL background.)The pseudo-code is going to be intentionally bad to give you the idea:
List<(User1, User2)> FullOuterJoin(User1[] first, User2[] second) { List<User2> secondCopy = new(second); List<(User1?, User2?)> result = new(); foreach (var user1 in first) { bool found = false; foreach (var user2 in secondCopy) { if (user1.Email == user2.Email) { found = true; result.Add((user1, user2)); secondCopy.Remove(user2); break; } } if (!found) result.Add((user1, null)); } foreach (var user2 in secondCopy) { result.Add((null, user2)); } return result; }
The above code is easy to make a mistake (I may have made one!), somewhat difficult to read, and inefficient in more than one way. Optimize it and make it functional.
After some discussion, I provided a bit of clarification:
Okay, I didn’t explain a “full outer join” well enough yet. Given these two example lists (thanks, Tony Mishler!):
const first = [ { email: 'test@test.com', id: 1 }, { email: 'fizz@buzz.com', id: 2 }, ]; const second = [ { email: 'test@test.com', id: 1337 }, { email: 'bip@bop.com', id: 42 }, ];
I want a result list that basically tells me:
[test@test.com](mailto:test@test.com)
is infirst
with id 1 andsecond
with id 1337[fizz@buzz.com](mailto:fizz@buzz.com)
is infirst
with id 2 and missing fromsecond
[bip@bop.com](mailto:bip@bop.com)
is missing fromfirst
but insecond
with id 42My sample code provided the full objects, so your return result might look like this (but I’m really looking for the pseudo-information in my bullet points above!):
[ [{ email: 'test@test.com', id:1 }, { email:'test@test.com', id:1337 }], [{ email: 'fizz@buzz.com', id:2 }, null], [null, { email:'bip@bop.com', id:42 }], ]
Entries
This week was another busy week for our teams, but a few team members shared some excellent entries.
TypeScript entries
Tony Mishler shared a rather elegant solution, optimized to run in O(N)
time by using a Map
:
type User = {
email: string;
id: number;
};
function fullOuterJoin(first: User[], second: User[]) {
const [firstMap, secondMap] = [first, second].map(
(f) => new Map(f.map((i) => [i.email, i]))
);
return first
.map((current) => [current, secondMap.get(current.email) ?? null])
.concat(second.filter((f) => !firstMap.has(f.email)).map((f) => [null, f]));
}
Aaron White submitted a solution that implemented a basic pipe
function, but
both because of length and because it nerd-sniped me into writing a more
complete implementation of pipe
, I’m not going to share the full
implementation here, saving it instead for a future post. That said, Aaron’s
final use was fairly elegant:
const join = outerJoinPipe(inBoth, inFirstOnly, inSecondOnly);
let result = join([user1, user2, []]);
console.log('In Both: ', result[2]);
console.log('In First Only: ', result[0]);
console.log('In Second Only: ', result[1]);
Nate Grandelis also shared his first entry in our challenges, also using maps with a more imperative approach.
C#
Jordan Rhode shared a C# solution that is very similar to Tony’s solution above. I appreciate that Jordan’s implementation used different types for the two lists, keeping with the spirit of a full outer join!
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
public static void Main()
{
var list1 = new []
{
new User1("example1@example.com", "4181c201-3d20-4361-80ea-56f9079b53cb"),
new User1("example2@example.com", "029945dc-78f7-4211-a0ca-4e47a40cf5a5"),
new User1("example3@example.com", "65823177-8392-49ed-aa87-e0cb82d7bd34"),
new User1("example5@example.com", "0285cfd0-a693-4063-a8ca-a742b4e80e60"),
new User1("example9@example.com", "699d9e2e-c7bb-4e85-9295-70ed170a3ac8"),
new User1("example10@example.com", "2ce1ad3c-40ea-4f89-a9fe-d290ebf77e0a"),
};
var list2 = new []
{
new User2("example2@example.com", "6d16e6a3-82f8-47a8-8d11-1016e1bd1097"),
new User2("example4@example.com", "e6fa658f-6340-4c7a-9b28-f148f3aa60b9"),
new User2("example5@example.com", "6b988303-3e96-470d-940e-d741713049df"),
new User2("example7@example.com", "6a868c31-ccfa-4b8e-915a-9be383c8e08e"),
new User2("example10@example.com", "8b94de62-9da7-4fc2-9573-cdddb9958389"),
};
var results = FullOuterJoin(list1, list2);
foreach (var result in results)
Console.WriteLine($"Left: {result.Item1}, Right: {result.Item2}");
}
public static IEnumerable<(User1?, User2?)> FullOuterJoin(User1[] left, User2[] right) =>
left.Select(l => ((User1?)l, right.FirstOrDefault(r => r.email == l.email)))
.Concat(right.Where(l2 => !left.Any(l1 => l1.email == l2.email))
.Select(r => ((User1?)null, (User2?)r)));
public record User1(string email, string Id);
public record User2(string email, string Id);
}
Josh Berres also shared an elegant C# solution that also takes it down to O(N)
performance using the ToLookup
function (which I’d never seen before) from Linq:
var keys = userInfosFirst.Select(x => x.Email)
.Union(userInfosSecond.Select(x => x.Email));
var lookupFirst = userInfosFirst.ToLookup(x => x.Email);
var lookupSecond = userInfosSecond.ToLookup(x => x.Email);
var results = keys.Select(x => new { x = lookupFirst[x].DefaultIfEmpty(), y = lookupSecond[x].DefaultIfEmpty() });
Thanks to all our participants!
Thanks to our participants (Tony Mishler, Aaron White, Jordan Rhode, Josh
Berres, and Nate Grandelis) on this challenge. Thanks for reading, and I hope
everyone learned something - I know I’ll be using that ToLookup
in the near
future!