Tuesday, June 16, 2009

The Intersection of a List of Strings in O(N)

Given k lists of strings the intersection of the lists is the set of strings that appear in every list.

The intersection is useful for, among other things, displaying options or values common to multiple selections. An action can then be performed on the common item in each list. For example, to provide a way to delete keywords that are common to objects that each maintain their own list of keywords, the intersection of those keyword lists is needed.

The naive algorithm is quadratic (O(n^2)).

To do this in O(n) use a Dictionary where the string is used as the key and int is a count of the number of times the key has been encountered.

Go through each list of strings. If a string hasn't been added to the Dictionary then add it with count=1. If the string already exists in the Dictionary then increment its counter. Finally iterate over every Dictionary entry accumulating only those keys with counter values equal to k. The keys with counter values equal to k form the intersection of the string lists.

As long as the underlying hashing function executes in constant time then this algorithm will execute in O(n).

No comments :

Post a Comment