https://app.laicode.io/app/problem/67
Given a composition with different kinds of words, return a list of the top K most frequent words in the composition.
Assumptions
- the composition is not null and is not guaranteed to be sorted
- K >= 1 and K could be larger than the number of distinct words in the composition, in this case, just return all the distinct words
Return
- a list of words ordered from most frequent one to least frequent one (the list could be of size K or smaller than K)
Examples
- Composition = ["a", "a", "b", "b", "b", "b", "c", "c", "c", "d"], top 2 frequent words are ["b", "c"]
- Composition = ["a", "a", "b", "b", "b", "b", "c", "c", "c", "d"], top 4 frequent words are ["b", "c", "a", "d"]
- Composition = ["a", "a", "b", "b", "b", "b", "c", "c", "c", "d"], top 5 frequent words are ["b", "c", "a", "d"]
Medium
Hashtable
Heap
Partition
Sort
The input string array should not be null or empty and the value of k should be a positive integer.
HashMap + Min Heap
- Iterate over the entire String array, one string at a time, counting the number of occurrence of each string. Store the <word, frequency> relationship in a HashMap
- Using a min heap of size k, for each entry in the HashMap, find the k most frequent words
public class Solution {
public String[] topKFrequent(String[] combo, int k) {
// Write your solution here
if (k < 0) {
return new String[] {};
}
if (combo == null || combo.length == 0) {
return combo;
}
// Count the frequency of each word in the input
// Use a HashMap to store the <word, frequency> relationship
Map<String, Integer> frequency = getFrequencyOfWords(combo);
// Use a min heap of size k to store the k most frequent words
Queue<Map.Entry<String, Integer>> minHeap = new PriorityQueue<>(
k, new Comparator<Map.Entry<String, Integer>>() {
@Override
public int compare(Map.Entry<String, Integer> one,
Map.Entry<String, Integer> two) {
if (one.getValue().equals(two.getValue())) {
return 0;
}
return one.getValue() < two.getValue() ? -1 : 1;
}
}
);
return getTopKWords(frequency, minHeap, k);
}
private Map<String, Integer> getFrequencyOfWords(String[] combo) {
Map<String, Integer> frequency = new HashMap<>();
for (String word : combo) {
frequency.put(word, frequency.getOrDefault(word, 0) + 1);
}
return frequency;
}
private String[] getTopKWords(Map<String, Integer> frequency,
Queue<Map.Entry<String, Integer>> minHeap,
int k) {
for (Map.Entry<String, Integer> entry : frequency.entrySet()) {
if (minHeap.size() < k) {
minHeap.offer(entry);
} else if (entry.getValue() > minHeap.peek().getValue()) {
minHeap.poll();
minHeap.offer(entry);
}
}
// K could be greater than the number of distinct words
// So the size of the result array should be determined
// by the size of the heap rather than its capacity
String[] result = new String[minHeap.size()];
for (int i = minHeap.size() - 1; i >= 0; i--) {
result[i] = minHeap.poll().getKey();
}
return result;
}
}
- Get the frequency of each word: Iterate over all n strings in the array to get the frequency map ⇒ O(n)
- Get the most k frequent words from the min heap:
- offer the first k entries to the heap ⇒ O(k)
- update the heap when the incoming entry's value is greater than the top element's value on the heap ⇒ O(log(k))
- in worst-scenario, all the rest of the strings could cause this update, (n - k) in total
- O(k + (n - k) log(k))
- In total, O(n) + O((n - k) log(k)) ⇒ O((n - k) log(k))
A map and a heap is used ⇒ O(n)