AWK for Data Analysis: Practical Examples

October 16, 2024

AWK for Data Analysis: Practical Examples

In this post, we will apply what we’ve learned in previous lessons to real-world data analysis scenarios. You’ll see practical examples of using AWK to extract insights from text files.

Example 1: Analyzing Sales Data

Let’s say we have a sales data file named sales.txt with the following structure:

Product,Quantity,Price
Apple,10,0.5
Banana,20,0.3
Cherry,15,0.6

We want to calculate the total revenue generated from each product. We can do this using AWK as follows:

awk -F, '{revenue = $2 * $3; print $1, revenue}' sales.txt

In this command:

  • -F, specifies that the delimiter is a comma.
  • $2 refers to the Quantity field.
  • $3 refers to the Price field.
  • We calculate revenue and print the product name along with its revenue.

Example 2: Extracting Specific Data

Suppose we have a log file named server.log that contains the following entries:

192.168.1.1 - - [10/Oct/2000:13:55:36 -0700] "GET /index.html HTTP/1.1" 200 2326
192.168.1.2 - - [10/Oct/2000:13:56:01 -0700] "POST /submit HTTP/1.1" 404 512

We want to extract the IP addresses and the status codes. We can achieve this with the following AWK command:

awk '{print $1, $9}' server.log

Here, $1 represents the IP address and $9 represents the status code. This command will output:

192.168.1.1 200
192.168.1.2 404

Example 3: Summarizing Data

Consider a file named grades.txt that contains student grades:

John,85
Jane,90
Doe,75

We want to calculate the average grade. We can do this with the following AWK script:

awk -F, '{sum += $2; count++} END {print "Average Grade:", sum/count}' grades.txt

In this command:

  • sum accumulates the total of the grades.
  • count keeps track of the number of entries.
  • In the END block, we calculate and print the average.

Example 4: Filtering Data

Imagine a file named employees.txt with the following structure:

Name,Department,Salary
Alice,HR,60000
Bob,Engineering,70000
Charlie,HR,50000

To find all employees in the HR department, we can use:

awk -F, '$2 == "HR" {print $1, $3}' employees.txt

This command will output:

Alice 60000
Charlie 50000

Conclusion

AWK is a powerful tool for data analysis, allowing you to manipulate and analyze text files efficiently. In this post, we covered practical examples that demonstrate how to apply AWK to real-world data analysis scenarios. With these skills, you can start extracting insights from your own datasets!