The Least k Numbers - Interview Question&Answer
Question: Please find out the least k numbers out of n numbers. For example, if given the 8 numbers 4, 5, 1, 6, 2, 7, 3 and 8, please return the least 4 numbers 1, 2, 3 and 4.
Analysis: The naïve solution is sort the n input numbers increasingly, and the least k numbers should be the first k numbers. Since it needs to sort, its time complexity is O(nlogn). Interviewers will ask us explore more efficient solutions.
Solution 1: O(nlogk) time efficiency, be suitable for data with huge size
A data container with capacity k is firstly created to store the least k numbers, and then a number is read out of the n input numbers at each time. If the container has less than k numbers, the number read at current round (denoted as num) is inserted into container directly. If it contains knumbers already, num cannot be inserted directly any more. However, it may replace an existing number in the container. We get the maximum number of the k numbers in the container, and compare it with num. If num is less than the maximum number, we replace the maximum number with num. Otherwise we discard num, since we already have k numbers in the container which are all less than num and it cannot be one of the least k numbers.
Three steps may be required when a number is read and the container is full: The first step is to find the maximum number, secondly we may delete the maximum number, and at last we may insert a new number. The second and third steps are optional, which depend on whether the number read at current round is greater than the maximum number in container or not. If we implement the data container as a binary tree, it costs O(logk) time for these three steps. Therefore, the overall time complexity is O(nlogk) for n input numbers.
We have different choices for the data container. Since we need to get the maximum number out of k numbers, it intuitively might a maximum heap. In a maximum heap, its root is always greater than its children, so it costs O(1) time to get the maximum number. However, it takes O(logk)time to insert and delete a number.
We have to write a lot of code for a maximum heap, and it is too difficult in the dozens-of-minute interview. We can also implement it as a red-black tree. A red-black tree classifies its nodes into red and black categories, and assure that it is somewhat balanced based on a set of rules. Therefore, it costs O(logk) time to find, insert and delete a number. The classes set and multiset in STL are all based on red-black trees. We may use data containers in STL directly if our interviewers are not against it. The following sample code is based on the multiset in STL:
typedef multiset<int, greater<int> > intSet;
typedef multiset<int, greater<int> >::iterator setIterator;
void GetLeastNumbers(const vector<int>& data, intSet& leastNumbers, int k)
{
leastNumbers.clear();
if(k < 1 || data.size() < k)
return;
vector<int>::const_iterator iter = data.begin();
for(; iter != data.end(); ++ iter)
{
if((leastNumbers.size()) < k)
leastNumbers.insert(*iter);
else
{
setIterator iterGreatest = leastNumbers.begin();
if(*iter < *(leastNumbers.begin()))
{
leastNumbers.erase(iterGreatest);
leastNumbers.insert(*iter);
}
}
}
}
Solution 2: O(n) time efficiency, be suitable only when we can reorder the input
We can also utilize the function Partition in quick sort to solve this problem with a hypothesis. It assumes that n input numbers are contained in an array. If it takes the k-th number as a pilot to partition the input array, all of numbers less than the k-th number should be at the left side and other greater ones should be at the right side. The k numbers at the left side are the least knumbers after the partition. We can develop the following code according to this solution:
void GetLeastNumbers(int* input, int n, int* output, int k)
{
if(input == NULL || output == NULL || k > n || n <= 0 || k <= 0)
return;
int start = 0;
int end = n - 1;
int index = Partition(input, n, start, end);
while(index != k - 1)
{
if(index > k - 1)
{
end = index - 1;
index = Partition(input, n, start, end);
}
else
{
start = index + 1;
index = Partition(input, n, start, end);
}
}
for(int i = 0; i < k; ++i)
output[i] = input[i];
}
Comparison between two solutions
The second solution based on the function Partition costs only O(n) time, so it is more efficient than the first one. However, it has two obvious limitations: One limitation is that it needs to load all input numbers into an array, and the other is that we have to reorder the input numbers.
Even though the first takes more time, the second solution does have the two limitations as the first one. It is not required to reorder the input numbers (data in the code above). We read a number from data at each round, and all write operations are taken in the containerleastNumbers. It does not require loading all input number into memory at one time, so it is suitable for huge-size data. Supposing our interview asks us get the least k numbers from a huge-size input. Obviously we cannot load all data with huge size into limited memory at one time. We can read a number from auxiliary space (such as disk) at each round with the first solution, and determine whether we need to insert it into the container leastNumbers. It works once memory can accommodate leastNumbers, so it is especially works when n is huge and k is small.
The characters of these two solutions can be summarized in Table 1:
First Solution
|
Second Solution
| |
Time complexity
|
O(n*logk)
|
O(n)
|
Reorder input numbers?
|
No
|
Yes
|
Suitable for huge-size data?
|
Yes
|
No
|
Table 1: Pros and cons of two solutions
Since each solution has its own pros and cons, candidates had better to ask interviews for more requirements and details to choose the most suitable solution, including the input data size and whether it is allowed to reorder the input numbers.
Comments
Post a Comment