[LeetCode] 192. Word Frequency

Problem

Write a bash script to calculate the frequency of each word in a text file words.txt.

For simplicity sake, you may assume:

  • words.txt contains only lowercase characters and space ' ' characters.
  • Each word must consist of lowercase characters only.
  • Words are separated by one or more whitespace characters.

Example:

Assume that words.txt has the following content:

1
2
the day is sunny the the
the sunny is is

Your script should output the following, sorted by descending frequency:

1
2
3
4
the 4
is 3
sunny 2
day 1

Note:

  • Don’t worry about handling ties, it is guaranteed that each word’s frequency count is unique.
  • Could you write it in one-line using Unix pipes?

Explanation

  1. grep -oE '[a-z]+' words.txt print the text file’s each word in a line.

    1
    2
    3
    4
    5
    6
    7
    8
    9
     the
     day
     is
     sunny
     the
     the
     the
     sunny
     is
    
  2. sort sorting the output

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
     day
     is
     is
     is
     sunny
     sunny
     the
     the
     the
     the
    
  3. uniq delete duplicated line, option -c means before each word, add its occurance.

    1
    2
    3
    4
     1 day
     3 is
     2 sunny
     4 the
    
  4. sort -nr means sorting by number of occurance.

    1
    2
    3
    4
     4 the
     3 is
     2 sunny
     1 day
    
  5. awk '{print $2 " " $1}' print the output, reverse the number and the word.

    1
    2
    3
    4
     the 4
     is 3
     sunny 2
     day 1
    

Solution

1
grep -oE [a-z]+ words.txt | sort | uniq -c | sort -nr | awk '{print $2 " " $1}'