Problem
Write a bash script to calculate the frequency of each word in a text file words.txt
.
For simplicity sake, you may assume:
words.txt
contains only lowercase characters and space' '
characters.- Each word must consist of lowercase characters only.
- Words are separated by one or more whitespace characters.
Example:
Assume that words.txt
has the following content:
1 |
|
Your script should output the following, sorted by descending frequency:
1 |
|
Note:
- Don’t worry about handling ties, it is guaranteed that each word’s frequency count is unique.
- Could you write it in one-line using Unix pipes?
Explanation
-
grep -oE '[a-z]+' words.txt
print the text file’s each word in a line.1
2
3
4
5
6
7
8
9the day is sunny the the the sunny is
-
sort
sorting the output1
2
3
4
5
6
7
8
9
10day is is is sunny sunny the the the the
-
uniq
delete duplicated line, option-c
means before each word, add its occurance.1
2
3
41 day 3 is 2 sunny 4 the
-
sort -nr
means sorting by number of occurance.1
2
3
44 the 3 is 2 sunny 1 day
-
awk '{print $2 " " $1}'
print the output, reverse the number and the word.1
2
3
4the 4 is 3 sunny 2 day 1
Solution
1 |
|