Untitled Document

Chapter - 11  Filters

In this chapter we will see simple filter - commands which accepts data from standard input , manipulate it and write the results to standard output . We will see the their use in both standalone mode and in combination with other tools using redirection and piping .

Rather that discussing much on this , lets create a csv file emp.csv ( comma separated file ) which will have few details of employees like - emp_id, name , designation ,department , date of joining and salary  Lets run the catcommand to display how our file looks .

cat emp.csv
   100, Mangesh Pande,     5000, IT,                 Consultant,           12/02/2014 
101, Makarand Bhaleka, 10000, IT, Manager, 12/05/2013
102, Nikhil Muthal, 6000, IT, Associate Consultant, 13/04/2014
103, Amey Deshpande, 7000, Production Support, Consultant, 07/032012
104, Sanket Deoghare, 6000, Design, Software Engineer, 01/01/2015
105, Gunjan Verma, 5000, Quality Assurance, Software Engineer, 01/01/2012

The file is a (,) comma separated file which has 6 fields . We will use this file for manipulating and perform different filter operation suing filter command .

pr : Paginating Files - 

The pr commands prepares a file for printing , by adding suitable headers , footers and formatted text . The command can be invoked by just using command with filename as argument .

pr emp.csv

pr adds five lines as a margin at the top and five at the bottom . The header shows the date and time of last modification of the file along with the filename and page number . Like all other command pr command also has few option , we will see some option that can help us in formatting our file .

pr : options -

Sometimes we need to split the data of file from single column to multiple column to format the result (-t) option with pr command followed by the count splits the file in multi columns .

pr -t -5 num.txt

If you want to change the header of the file use (-h) option . There are some more options that can help us to format the file with pr

  • -d ( Double spaces input , reduces clutter)
  • -n (Number lines)
  • -o ( offset lines by spaces , increase left margion of page )

You can combine these options to produce just the format you need -

pr -t -n -d -o 5 num.txt

There few more option (+) that allows you to print from a specific page . Another option (-l) sets the page length

   pr +10 num.txt -------------------------------------- starts printing from page 10 
   pr -l 50 num.txt -------------------------------------- sets the page length to 50 lines
For numbering lines , we can use the nl command .

head : Display the Begining of the file -

The head command displays first 10 lines of the file when used without an option

head num.txt

We can use -n option to display the specific lines on terminal , e.g head -n 5 < file_name> will dispaly the first five lines of the file .

head -n 5 num.txt

tail : Dsiplaying The end of file -

tail is the counrterpart of the head command it displays the last 10 line of the file when used witout any options -

tail num.txt

like head we can specify the number of lines that has to be displayed with -n option . The below command displays the last 3 lines of the file .

tail -n 3 num.txt

you can also address lines from the begining of the line instead of the end . The +count option allows you to do that , where count represents the line number from where the selection should begin.

tail +3 num.txt -----------------------------------prints the line from 3rd line onward

tail has one more option than head command -

monitoring file (-f) -

Many UNIX/Linux programs constantly write to the system's log file as long as they are running . System administrator need to monitor the growth of these files to view the latest message .tail offers -f ( follow) option for this purpose . This is how we can monitor the file log.txt

tail -f log.txt -----------------------------------Monitors file log.txt

The prompt doesn't return even after the work is over . With this option , you have to use the interrupt key to abort the process and exit to the shell .

cut : slitting files vertically -

Sometimes we need to extract the specific data from the files and need to store in some other file . We can extract both columns and fields from any file with the cut command . Columns are specified with -c option andd fields with -f option 

cutting columns (-c) -

To extract soecific columns, you need to follow the -c option with a list of column numbers , delimited by a comma . Ranges can also be used with the hyphen . Here is how we extract the name and designation from emp.csv

   $ cut -c 6-22,24-32 emp.csv
   aangeshPande,5000IT,Consul
   akarandBHalekar,1000,IT,Ma
   ikhiilMuthal,6000IT,AS,131
   meyDeshpande,7000Productio   

Note that there should be no whitespace in the column list . Moreover, cut uses a special form for selecting a column from the begining and up to the end of a line :

cut -c -5,6-22,24-32,50- emp.csv ( Copy the rsult of command )

The command 50- indicates column number 50 to end of line . Similarly -5 is the same as 1-5

cutting fields (-f) -

The -c option is useful when the data in the file is in fixed length format . To extract the useful data from the files we need to cut fields rather than columns. cut uses the tab as the default fileld delimiter , but can also work with a different delimiter . Two options need to be used here : -d for the delimiter and -f for the field list . This is how we can cut second and third fields of our sample file

cut -d "," -f 2,3 emp.csv  ( Copy the rsult of command )

As our file is comma separated file , hence we have used , (comma) as a delimiter to -d . We anto extract the fields 2 and 3 from file hence we have give -f 2,3 .

Suppose you want to get the list of user only , who are currently logged in to the system then we can use who command output as an input to the cut with space as delimiter . Here is the command that will do this task for us -

who | cut " " -f1 ( Copy the rsult of command )

cut is a very powerful text manipulator often used in combination with other commands or filters . We will see more usage and examples on cut in our upcoming chapters.

While using cut one of the options -f and -c must be specified . These options are really not optional ; one of them is compulsory

sort : Ordering a file -

The sort command orders a file it does the ordering of data in the file in ascending or descending order . Like other commands sort commad also has some options which sorts the file in different ways -

sort emp.csv

By default , sort reorders line in ASCII collating sequence - whitespace first , then numerals , uppercase letters and finally lowercase letters . We can change this default sorting sequence using certain options . We can also sort on one or more fields or use different ordering rule .

sort options -

Sorting on the fields - lets now use the -k option to sort the file on fields . To sort the emp.csv file on the second field ( name field ) . The option used should be -k 2

sort -t "," -k 2 emp.csv

here -t is used as delimiter - as our file is comma separated so we have used delimiter as ,

We can even reverse the sort order with the -r ( reverse ) option .The following command reverse the sequnce of the previous sorting order .

sort -t "," -r -k 2 emp.csv

The above sort command can be written as -

sort -t "," -k 2r emp.csv

Sorting on columns -

We can also specify a character position within a field to be the begining or sort . If you are to sort the file according to the year of birth , then you need to sort on the seventh and eighth column positions withing the fifth field .

sort -t "," -k 5.7,5.8 emp.csv

Sorting numerals -

Strange things happen when you run sort command directly without using -n option . Try to sort the numbers without -n option and oberved the results . To sort a file having numerals , lets run the command as 

sort -n num.txt

Removing Repeated Lines : (-u) -

The -u (unique ) option lets you remove the duplicate lines from a file . Now if you want to store the designation into another file from emp.csv - here are combination of commands that will fetch unique designtaon from emp.csv

cut -d "," -f3 emp.csv | sort -u | tee designtn.txt

To store the unique sorted results , we have provided the command ( tee) along with filename .

To check whether the file has actually been sorted in the default order , use the -c ( check ) option

sort -c desgtn.txt

The table below summarises the options that we have used so far -

category Operation
-t char Uses delimiter char to identify fields
-k n Sorts on the nth field s
-k m.n Starts on nth column of the mth filed
-u removes repeated lines
-n Sorts numerically
-r reverse sort order
-f folds lowercase to equivalent uppercase (case -insensitive sort)
-c checks if file is sorted
-o filename Places output in filename

uniq : Locating repeated and non repeated lines -

When uniq is run on the file , it simply fetches one copy of each line and writes it to the standard ouput -

uniq emp.csv

Since uniq requires sorted file as an input , the general procedure is to sort the file using sort command and pipe its output as an input to uniq command . The following command also produces the same ouput , except that the output is stored in a file uniqlist

sort emp.csv | uniq - uniqlist

If you provide with two filenames as arguments , uniq will read the first file and write its output to the second.

uniq : options -

To select unique lines , we have already seen the command sort -u , that does the job in single command .But uniq has couple of features that are useul for file manipulation -

Selecting non repeated lines (-u) -

In our previous example we have copied the designation in a file desgtn.txt - to see whether that occurs uniquely or not run the following command .

cut -d "," | f3 emp.csv | sort | uniq -u

Selecting duplicate lines (-d) -

The -d (duplicate) option selects only one copy of the repeated line

cut -d "," | f3 emp.csv | sort | uniq -d

Counting frequency of occurence (-c) -

The -c (count) option displays the frequency of occurence of all lines , along with the lines :

cut -d "," | f3 emp.csv | sort | uniq -c

tr : Transalating Characters -

The tr (translate) filter manipulates indiviual character in a line . More specifically it translates characters using one or two compact expressions .

Note that tr takes input only from standard input ; it doen't take a filename as argument .

Lets use tr to replace the , with a | . Simply specify two expressions conatining these characters in the proper sequence .

r ',' '|' emp.csv

Note that the lengths of the two expressions should be equal . If they are not , the longer expression will have unmapped characters ( not in Linux )

Changing case of text -

Since tr doesn't accept a filename as argument , the input has to be redirected from a file or a pipe . The following sequence chages the case of the first three lines from lower to upper .

head -n 3 emp.csv | tr '[a-z]' '[A-Z]'

tr : Options -

Deleting Characters (-d) -

The emp.csv file is a comma separated file , which is delimited by , ( comma) to convert this file into a standard file we need to delete the , character . The following does this on the file

tr -d ',' emp.csv

Compressing multiple consecutive characters (-s) -

Unix tools work best with fields rather than columns , so its prefarable to use files with delimited fields . Now we can eliminate all redudant spaces with the -s (squeeze) option .

tr -s ' ' emp.csv

Comlementing values of expression (-c) -

Finally the -c (complement)option complements set of charcters in the expression . Thus , to delete all characters except , ( comma) we can combine the -d and -c option .

tr -cd ',' emp.csv

The command has deeleted all the characters except , from the file .

Untitled Document Scroll To Top Untitled Document