Linux文本处理

Eli He

2019-12-19

Linux

1. find

1
2
3

find PATH -option [-print] [-exec|-ok cmd] {} \;
-print 	\n
-print0 	\0, NULL

1.1 time

-atime N
-ctime N
-mtime N
-newer FILE
-amin M
-mmin M..

实例：

find . -amin -10 			# 10分钟内访问的文件
find . -mtime -2 			# 2天内修改的文件
find . -mmin +60 		# 60分钟前被修改的文件
find /etc -mmin -120   	# 两小时内被修改的文件
find / -mtime 0   		# 将过去24 小时内有改变过内容的文件

1.2 user & group

-uid n
-gid n
-user name
-group name
-nouser
-nogroup

实例：

1 2	find /var -uid +1048 # uid大于等于1048 find /home -user test

1.3 file

-name FILENAME
-type TYPE （f，b，c，d，l，s，p）

-size SIZE
-empty

-inum n
-links NUM

-depth 			sub-diectory first
-maxdepth level		Decend at most [level] levels of directories below the command
-mindepth level		Don't apply tests or actions at levels less than [level]

-xdev				不跨越filesystem   
-mount 			unix不支持
-fstype TYPE（ext3，proc）

-follow   			dereference symbolic links
-L

-prune			Ignore directory or file

-perm MODE   		许可位正好是MODE
-perm -MODE 		许可位完全符合MODE
-perm +MODE 		许可位部分符合MODE， 老式写法/MODE

实例：

find . -maxdepth 1  	
find . -maxdepth 2  	
find . -mindepth 4  
find . -mindepth 2 -maxdepth 3

find . -path ./.svn

find . -path /var/log -prune -o -print	
find . \(-path /var/log -o -path /var/spool \) -prune -o -print 

find . -size 1000  			# 1000 blocks, 1Block=512Bytes
find . -size +70M
find . -size 20c  				# 20 bytes
find . -size +10M -a -size -20M

find . -empty  		# 空文件或空目录
find . -links +2  		# 链接数超过2

find /mnt -name a.txt -fstype vfat
find / ！-fstype proc '(' -name '.??*' -o -name '.[^.]' ')'

find / -perm 664 	# 权限精确等于664
find / -perm -664 	# 权限完全满足664，但可包含额外权限，665，777
find / -perm +664 	# 权限部分满足664，例如660和600

find . -perm -007  	# others用户具有rwx权限
find . -perm -100 	# 属主至少具有x权限

特殊权限SGID，SUID，SBIT(---s--s--t)
find . -perm +7000  	# 至少具有一个特殊权限位
find . -perm -7000	#  具有全部特殊权位

# 删除7天内未被读取的core文件，只涉及/所在的文件系统(-xdev)
find / -xdev -type f  '('  -name core -o name 'core.[0-9]*'  ')' -atime +7 -exec rm -f {} ';' 

# 删除那些用#, .#或.nfs开头，或者以~及。CKP结尾且3天内都未被访问过的文件
find / -xdev -atime +3  '('  -name '#*' -o -name '.#*' -o -name '*.CKP', -o -name '*~', -o -name '.nfs*'  ')' -exec rm -f { } ';'

# 删除/tmp下载72小时内未被修的所有子目录
cd /tmp; find . ! -name . ! -name lost+found -type d -mtime +3 -exec /bin/rm -rf {} \;

# 嵌入时间
BACKUPFILE=backup-$(date +%m-%d-%Y)

# 备份一天前的文件, 缺陷：发现太多的文件或文件名包含空格，执行失败
tar cvf - `find . -mtime -1 -type f -print` > $archive.tar

# 改进型，GNU版本的find, this one is more better
find . -mtime -1 -type f -print0 | xargs -0 tar rvf "$archive.tar"

# 改进型，Unix风格的find，较慢
find . -mtime -1 -type f -exec tar rvf "$archive.tar" {} \; 

# remove file or directory it contains special characters
ls -il
find . -inum inode_num -exec rm -rf {} \;

2. xargs

可读入stdin的数据，并且以空格或换行符做分割，将stdin分割成arguments

1.1 选项说明

1) terminated  by a null character, every character is taken literally, (`, \, whitespace)
--null
-0 

2) terminated by the specified character
--delimiter=delim					
-d delim	

3) use at most max-lines nonblank input lines per command line
--max-lines[=max-lines]
-l[max-lines]	

4) use at most max-args arguments per command line.
--max-args=max-args
-n max-args					

5) replace occurrences of replace-str({} by default) in the initial-arguments with names read from standard input
-I replace-str
--replace[=replace-str]
-i[replace-str]				.

6) a placeholder for output text
{}							

7) run up to max-procs processes at a time; 1 by default; 0: run as many as possible
--max-procs=max-procs
-P max-procs

8) set the end of file string to EOF
--eof=[EOF]
-e[EOF]						

9) prompt
--interactive
-p

10) print the command line on the standard error output before executing it.
--verbose
-t

2.2 实例

find -print0 | xargs -0

# 单行输出
cat a.txt | xargs

# 每行3个参数
cat a.txt | xargs -n3 	

# list files in 8 column
ls | xargs -n 8 echo

# 前3个用户活动情况
cut -d":" -f1 /etc/passwd | head -n 3 | xargs -p finger
cut -d":" -f1 /etc/passwd | xargs -p -e"lp" finger

# 文件名含空格等
find /home -size +1M -print0 | xargs -0 ls -l

# core文件列表
find / -name "core" -print | xargs echo "" > core.log

# 清除other组的可执行权限
find .  -perm -7 -print | xargs chmod o-x

# 多文件过滤
find . -name \* -type f | xargs grep "abc"

# 定界符
echo "aXbXc" | xargs -dX

# 拷贝整个目录
ls | xargs -i -t cp ./{}  /tmp/eli

# 杀掉所有mysql进程
ps ax | grep mysql | awk '{print $1}' | xargs -i kill  {}

# 参数-I和-i
cat a.sh
echo $*

cat args.txt
aaa
bbb
ccc

cat args.txt | xargs -I {} ./a.sh -p {} -l
-p aaa -l
-p bbb -l
-p ccc -l

cat args.txt | xargs -i ./a.sh -p {} -l

cat args.txt | xargs -I % ./a.sh -p % -l

# 逐一拷贝图片
ls *.jpg | xargs -n 1 -i cp {} /home/images

# 压缩文件，每次一个
ls | xargs -p -l gzip

# to handle arguments containing whitespace or quotes
find / -type f -print0 | xargs -0 grep -liwZ GUI | xargs -0 rm -f

# 开启两个处理进程，每次压缩一个文件
ls *.txt | xargs -t -n1 -P2 gzip

# grep输出的文件名以\0结尾
grep  -lZ 'abc' "*.txt" | xargs -0

3. sed

sed, stream editor , 按顺序逐行方式工作

i. 从输入读取一行数据存入临时缓冲区，即模式空间(pattern space)
ii. 按指定的sed编辑命令处理缓冲区中的内容
iii. 把模式空间的内容送往屏幕，并将此行内容从模式空间中删除
iv. 读取下一行，重复上述过程直至全部处理
模式空间(pattern space)
保留空间(hold space)

3.1 参数说明

sed [OPTION] [-e] cmd1 [[-e cmd2] [-e cmd3] ... [-e cmdn]] [input-file]
sed [OPTION] -f script-file [input-file]

-n		不打印模式空间
-r      	使用E-REGEX
-i		直接修改文件

3.2 Address

1
1,100
1,+5		<=> 1,6
5,10!
1~2		<=>1,3,5

/pattern/
/pattern1/,/pattern2/

3.3 Command

d  	从pattern space删除所有行
D	从pattern space删除第一行

p  	打印pattern space中所有行
P	打印pattern space中的第一行

sed '3,$d' 
sed '/line/'d 
sed -n '$p'

# delete spaces in front of each row
sed 's/^[ ]*/g' 
sed 's/^ */g' 
sed 's/^[[:space:]]*//g 

# grep 'pattern'
sed -n '/pattern/p'
sed '/pattern/!d'

r file 	读取文件，将文件内容追加到匹配行
w file  将匹配行写入文件

a\string 	行后面追加一行文本
i\string 	行前面插入一行文本


s/pattern/string/  用string替换pattern
	sed 's/abc/123/g'
	sed 's#\(abc\)defg#\1#'

   sed 's/2/*/8'	# 第8个"2"替换为"*"

h  	cat pattern-space > hold-space
H  	cat "\n"pattern-space >> hold-space

g  	cat hold-space > pattern-space
G  	cat "\n"hold-space >> pattern-space

x	exchange hold-space with pattern-space

!   	no actions
	
  # tac
  sed '1!G;h;$!d' 

	# 新增空行
  sed G

  # 多个空行变一个
  sed '/^$/d;G'

       # add a blank line before the matching one
  sed '/python/{x;p;x}'

  # add a blank line after the matching one
  sed '/python/G'

  # add a blank line before and after the matching one
  sed '/python/{x;p;x;G}'
	
  # 匹配行的前一行
  sed -n '/python/{g;1!p};h'	# 先h, 将ps保存至hs; 当匹配时，g使用hs替换ps, 如果不是第一行，则打印

  # 匹配行的后一行
   sed -n '/python/{n;p}'

n  	读取下一行至pattern space，此时pattern space有2行，后续命令，只操作n读入的行
N	读取下一行至pattern space，但将当前读入行和N命令读入的下一行看成“一行"

   # 匹配行的后一行，做替换操作
   sed '/python/{n;s/job/task/}'

   # 删除偶数行
   sed 'n;d'
  sed -n '1~2p'

1 2	y 与tr类似，字符替换 sed '1y/abcdef/ABCDEF/'

1
2
3

=  	打印行号
	sed -n '/python/='
	sed -n '$='

q	退出
  # head -2
  sed 2q
  
  # tail -2
	sed '$!N;$!D'	
	
  # tail -1
  sed -e :a -e '$q;N;2,$D;ba'
  sed '$!d'	
	sed  -n '$p'

3.4 实例

# replace Unix to Unix/Linux
sed -e 's#Unix#&/Linux#g'

# squeeze continuous c to single c
sed 's/cc*/c/g'

# delete space at head of line
sed 's/^[ \t]*//'

# delete dot at end of line
sed 's/\.$//g' 

# delete first character each line
sed 's/.//g' 

# delete space at end of line
sed 's/ *$//g' 

# insert two space before head each line
sed 's/^/  /g' 

# remove punctuation(. , ? !)
sed 's/\.//g' -e 's/\,//g' -e 's/\?//g' -e 's/\!//g'

# more effective
sed 's/foo/bar/g' 
sed '/foo/ s/foo/bar/g' 
sed '/foo/ s//bar/g' 

# only replacement once, use 'q'
sed 's/foo/ s/foo/bar/;q}' 

# delete blank lines
sed '/^$/d' 
sed '/./!d'

4. awk

awk [options] 'BEGIN{action}pattern{action}...END{action}' file
awk [options] -f program.awk file

options:
-F fs                     use fs for the input field separator
-v val=val           assign the value to the variable var, before execution of the program begins.

pattern:
/regex/: extended regular expression
relational expression: if..else..
    pattern1, pattern2: pattern range

4.1 Built-in Variables:

1. NF, NR
NF              The number of fields
NR              The total number of input records seen so far

2. FS, RS, OFS, ORS
FS              The input field separator, a space by default
RS             The input record separator, by default a newline
OFS             The output field separator, a space by default
ORS             The output record separator, by default a newline

3. IGNORECASE      Not case-sensitivity

4. ENVIRON
# awk 'BEGIN{for(i in ENVIRON) print i, ENVIRON[i]}'
# awk 'BEGIN{print ENVIRON["JAVA_HOME"]}'

5. ARGC, ARGV, ARGIND
ARGC: the number of arguments
ARGIND: the index in ARGV of the current file being processed
ARGV: Array of arguments

# awk 'BEGIN{print "ARGC="ARGC; for(i in ARGV) print i"="ARGV[i]}' /etc/passwd
ARGC=2, 0=awk, 1=/etc/passwd
 
6. FILENAME : the name of the current input file
# awk 'BEGIN{print FILENAME}{print FILENAME; exit}'

7. OFMT : number output format ".6g"
# awk 'BEGIN{printf("%.2f %.2f\n", 1/6, 3.1415926)}'
# awk 'BEGIN{OFMT="%.2f"; print 1/6, 3.1415926)}'

8. FIELDWIDTH : set fields by fixed width
# date +"%Y%m%d%H%M%S" | awk 'BEGIN{FIELDWIDTH="4 2 2 2 2 2"}{print $1"-"$2"-"$3, $4":"$5":"$6}'

9. RSTART, RLENGTH
RSTART: the index of the first character matched by match(), 0
RLENGTH: the length of the string matched by matched by match(), -1

# awk 'BEGIN{start=match("this is match test", /m[a-z]+/); print start, RSTART, RLENGTH}'
9 9 5

4.2 Built-in Functions:

1. Numeric
int(x)
sqrt(x)
rand(): return a random number, [0-1)
srand([expr]): use expr as a seed for random generator, if not provided, use system current time

# awk 'BEGIN{print int(2.3), int(012), int(0xFF), int(3a), int(a3)}'   
2 10 255 3 0

# awk 'BEGIN{print rand(), 10*rand()}'

# awk 'BEGIN{srand(); print rand(), 10*rand()}'


2. String
sub(regex, replacement[, target]): use the replacement to replace the regex matched in string target(by default $0)
gsub(regex, replacement[, target]): global sub

gensub(regex, replacement, how[, target]): gawk
gensub(regex, replacement, "g|G" target)    => gsub
gensub(regex, replacement, 0, target)       => gsub, but with warning
gensub(regex, replacement, N, target)       => N is a digit from 1 to 9, index of the matched sub-expression

# awk 'BEGIN{info="this is a test2010test!"; gsub(/[0-9]+/,info); print info}'

# gawk 'BEGIN{a="abc def"; b=gensub(/(.+) (.+)/, "\\2 \\1", "g", a); print b}' 
def abc

# echo "a b c a b c" | gawk '{print gensub(/a/,"AA",2)}' 
a b c AA b c

index(string, find): index of the find in the string, or 0 if not present
match(string, regex[, array]): position of the regex occurring in the string

length([string])
substr(string, position[, len])
split(string, array[, regexp]): split the string into the array on the regex

# awk 'BEGIN{s="this is a test"; print index(s, "a")}'

# awk 'BEGIN{s="this is a test"; print index(s, "a") ? "ok" : :no found"}'

# awk 'BEGIN{s="this is a match test"; pos=match(s, /m[a-z]+/, array); print pos; for(i in array) print i, array[i]}'
11
0start 11
0length 5
0 match

# awk 'BEGIN{s="this is a test"; print substr(s, 9, 1)}'
a

# awk 'BEGIN{s="this is a test"; print substr(s, 9)}'
a test

# awk 'BEGIN{s="this is a split test"; len=split(s,array); print len; for(i in array) print i, array[i]}'
5
4 split
5 test
1 this
2 is
3 a

# awk 'BEGIN{FS=":"}/^root/{split($0,array); for(i in array) print i, array[i]}' /etc/passwd
# awk '/^root/{split($0,array,/:/); for(i in array) print i, array[i]}' /etc/passwd
4 0
5 root
6 /root
7 /bin/bash
1 root
2 x
3 0

Associative Arrays
a. sorting by values
len = asort(s)      # s: changed, the indexes are replaced with sequential integers
len = asort(s, d)   # s: unchanged; d: a sorted duplicate array of s

b. sorting by indexes
len = asorti(s)     # s: changed, restorted by indexes
len = asorti(s, d)  # s: unchanged; d: a new array of sorted indexes

# awk '{a[$1]=$2}END{for(i in a) print i, a[i]}' abc.txt
10 35
12 30
22 13
24 20

# awk '{a[$1]=$2}END{for(i=1;i<=asort(a,b);i++) print i, b[i]}' abc.txt
1 13
2 20
3 30
4 40

# awk '{a[$1]=$2}END{for(i=1;i<=asorti(a,b);i++) print i, b[i]}' abc.txt
1 10
2 12
3 22
4 24

sprintf(format, expr): return the printed expr according to the format
tolower(string)
toupper(string)

# awk 'BEGIN{s=sprintf("%.2g %s %d", 3.1415926, 3.1415926, 3.1415926); print s}'
3.1 3.14159 3


3. Time
mktime("YYYY MM DD HH MM SS[ DST]"): return a time stamp
systime(): return current time stmap
strftime([format[, ts])

# awk 'BEGIN{print mktime("2014 12 20 14 25 32")}'
# awk 'BEGIN{print systime()}'
# awk 'BEGIN{print strftime()}'
# awk 'BEGIN{print strftime("%c", systime())}'   # date +%c


4. IO
close(file[, how]): close file, pipe or co-process; how is either "from" or "to"

getline              set $0 from next input record, set NF, NR, FNR
getline <file       set $0 from next record of file, set NF
getline var         set var from next input record, set NR, FNR
getline var <file   set vat from next record of file

command | getline [var]     run command piping the output either into $0 or var
command | & getline [var]   run command as a co-process piping the output either into $0 or var. co-processes are a gawk extension

next: stop processing the current input record

print [expr-list [>file] ]
printf [format, expr-list [>file] ]

system("cmd")   execute the command, and return the exit status

fflush([file])  flush any buffers

print ... | command     write on a pipe
print ... |& command    write on a co-process


# awk 'BEGIN{while("cat /etc/passwd" | getline) print; close("/etc/passwd")}'
# awk 'BEGIN{while(getline <"/etc/passwd") print; close("/etc/passwd")}'

# awk 'BEGIN{"date" | getline d; print d}'
# awk 'BEGIN{"date" | getline d; split(d,mon); print mon[2]}'

# awk 'BEGIN{while("ls" | getline) print}'

# awk 'BEGIN{printf("Enter your account: "); getline name; print name}'
awk 'BEGIN{l=system("ls -l"); print l}'

// prompting, wait for input
# awk 'BEGIN{printf "What is your name? "; getline name <"/dev/tty"} $1~name {print "Found" name "on line ", NR".} END{print "See you," name "."}' /etc/passwd

// count number of file
# awk 'BEGIN{while(getline <"/etc/passwd" >0) lc++; print lc}

// sort
# awk '{print $1, $2 | "sort"} END{close("sort)}' abc.txt

4.3 FILE SPACING:

# insert a blank line
awk '1; {print ""}'
awk 'BEGIN{ORS="\n\n"}; 1'

# insert two blank lines
awk '1;{print "\n"}


NUMBERING AND CALCULATIONS:

# using a TAB instead of space will perserve margins
awk '{print FNR "\t" $0}'
awk '{print NR "\t" $0}'

# number each line of a file
awk '{printf("%5d : %s\n, NR, $0)}'

# number each line of a file, but only print numbers if line is not blank
awk 'NF{$0=++a ":" $0};{print}'
awk '{print (NF ? ++a ":" : "") $0}'

# wc -l
awk 'END{print NR}'

# wc -w
awk '{total=total+NF}END{print total}'

# print the sum of the fields of every line
awk '{s=0; for(i=1;i<=NF;i++) s=s+$i; print s}'

# add all fields in all lines and print the sum
awk '{for(i=1;i<=NF;i++) s=s+$i}END{print s}'

# print absolute value of fields
awk '{for(i=1;i<=NF;i++) if($i<0) $i=-$i; print}'
awk '(for(i=1;i<=NF;i++) $i = ($i<0) ? -$i : $i; print}'

# print the total number of lines that contains "Beth"
awk '/Beth/{n++}; END{print n+0}'

# print the largest first field
awk '$1>max{max=$1; maxline=$0}; END{print max, maxline}'

# print the last field
awk '{print $NF}'

# print the last field of the last line
awk '{field=$NF}; END{print feild}'


TEXT CONVERSION AND SUBSTITUTION:

# dos2unix
awk '{sub(/\r$/,""); print}'

# unix2dos
awk '{sub(/$/,"\r"); print}'

# delete leading whitespace
awk '{sub(/^[ \t]+/,""); print}'

# delete trailing whitespace
awk '{sub(/[ \t]+$/,""); print}'

# delete both leading and trailing whitespace
awk '{sub(/^[ \t]+|[ \t]+$/); print}'

# insert 5 whitespace at the beginning of line
awk '{sub/^/, "     "); print}'

# align all text flush right in a 70-column width
awk '{printf "%79s\n", $0}'

# center all text on a 79-character width
awk '{l=length(); s=int((79-l)/2); printf "%"(s+l}"s\n",$0}'

# substitute
awk '{sub(/foo/,"bar"); print}'             # 1st
awk '{$0=gensub(/foo/,"bar",4); print}'     # 4st
awk '{gsub(/foo/,"bar"); print}'            # all

#
awk '{gsub(/scarlet|ruby|puce/, "red"); print}

# tac
awk '{a[i++]=$0}END{for(j=i-1;i>=0;j--) print a[j]}

# append the next line, if line ends with a backslash.(fails to handle mutiple lines ending with backslash)
awk '/\\$/{sub(/\\$/,""); getline t; print $0 t; next}; 1'

# sort
awk -F":" '{print $1 | "sort"}' /etc/passwd

# delete the 2nd field
awk '{$2=""; print}'

# print in reverse order the fields
awk '{for(i=NF;i>0;i--) printf("%s ",i); print ""}'

# remove duplicate, consecutive lines, uniq
awk 'a!~$0; {a=$0}'

# remove duplicate, nonconsecutive lines
awk '!a[$0]++'
awk '!($0 in a){a[$0]; print}'  # most efficient

# concatenate every 5 lines of input, using a comma separator
awk 'ORS=%NR%5 ? "," : "\n"'

4.4 SLECTIVE PRINTING OF CERTAIN LINES:

# head
awk 'NR<11'

# head -1
awk 'NR>1{exit};1'

# tail -2
awk '{y=x "\n" $0; x=$0} END{print y}'

# tail -1
awk 'END{print}'

# grep
awk '/regex/'

# print the line immediately before a regex
awk '/regex/{print x};{x=$0}'   # grep 'regex' -B1

# print the line immediately after a regex
awk '/regex/{getline; print}'   # grep 'regex' -A1

# grep -E "AAA|BBB|CCC"
awk '/AAA/;/BBB/;/CCC/'

# print only lines of 65 characters or longer
awk 'length>64'

# print section of file from regular expression to end of file
awk '/regex/,0'
awk '/regex/,EOF'

# print section of file based on line numbers(lines 8-12)
awk'NR==8,NR==12'

# print line 8 & 12
awk 'NR==8;NR==12'

# print line number 52
awk 'NR==52'
awk 'NR==52{print; exit}'   # more efficient

4.5 SELECTIVE DELETION OF CERTAIN LINES:

# delete all blank lines
awk NF
awk '/./'

# print x by 512 times
awk 'BEGIN{while(++a<512) s=s "x"; print s}'

# merge file
awk 'NR==FNR{a[$0]=1; print} NR>FNR{if(!a[$0]) print}'
awk '{a[$0]}END{for(i in a) print i}'

# sort
awk '{a[j++]=$0} END{len=asort(a); for(i=1;i<=len;i++) print a[i]}'

44. convert DEC to OTC
echo 37 | awk '{printf "%o\n", $0}'

45. print single quota
awk 'BEGIN{print "\0472004-12-12\047"}'


# obtain total memory
cat /proc/meminfo | grep -i memtotal | awk -F':' '{print $2}'
cat /proc/meminfo | grep -i memtotal | awk -F\: '{print $2}'

# display all NICs exclude local
ifconfig -a | grep '^\w' | awk '!/lo/{print $1}'

# obtain IP address of NIC eth0
ifconfig eth0 | grep 'inet '
ifconfig eth0 | grep 'inet ' | awk -F':' '{print $2}' | awk '{print $1}'
ifconfig eth0 | grep 'inet ' | tr ':' ' ' | awk '{print $3}'
ifconfig eth0 | awk -F':' '/inet / {print $2}' | awk '{print $1}'

# $1 is a space, cause it begins with spaces
ifconfig eth0 | grep 'inet ' | awk -F'[ :]+' '{print $4}'   

# kill all process named foo
kill `ps -ax | grep 'foo' | grep -v 'grep' | awk '{print $1}'`

# print the result that executed command "date"
awk 'BEGIN{"date" | getline d;print d}'

# print month
awk 'BEGIN{"date" | getline d; split(d, mon); print mon[2]}'

# print the result that executed comand "ls"
awk 'BEGIN{while("ls" | getline) print}'

# prompting, wait for input
awk 'BEGIN{printf "What is your name? "; getline name <"/dev/tty" } $1~name {print "Found " name " on line ", NR"."} END{print "See you," name "."}' /etc/passwd

# count number of linux users
awk 'BEGIN{while(getline<"/etc/passwd" >0) lc++; print lc}'   # must be contain quotations

awk '{print $1, $2 | "sort" }END{close("sort")}' myfile

awk 'BEGIN{system("clear")}'

awk '{gsub(/test/, "xxxx", $2); print}' myfile

awk 'BEGIN{print index("mytest", "test")}' # 3


# 多维数组，array[index1,index2,……] ，SUBSEP是数组下标分割符，默认为 "\034"
awk 'BEGIN{SUBSEP=":"; array["a","b"]=1; for(i in array) print i}'
awk 'BEGIN{array["a"":""b"]=1;for(i in array) print i}'


# cat file1
g1.1 2
g2.2 4
g2.1 5
g4.1 3
# cat file2
g1.1 2
g1.2 3
g4.1 4
# cat file3
g1.2 3
g5.1 3

# awk '{a[ARGIND" "$1]=$2; b[$1]}
    END {
    for(i in b) {
        printf i" ";
        for(j=1;j<=ARGIND;j++)
            printf "%s ", a[j" "i] ? a[j" "i] : "-";
        print "";
        }
    }' file1 file2 file3
g2.2 4 - -
g5.1 - - 3
g1.1 2 2 -
g1.2 - 3 3
g4.1 3 4 -
g2.1 5 - -

  
echo "37" |awk '{printf "%o\n",$0}'

/home/lee#awk 'BEGIN{print "\0472004-12-12\047"}'
'2004-12-12'

5. chatrr

1
2
3

chattr +i /etc/passwd
chatrr +a /etc/passwd
lsattr /etc/passwd

6. grep

# 网卡信息
dmesg | grep -n --color=auto eth

# 显示关键字前3行和后2行
dmesg | grep -n -A2 -B3 --color=always eth0

# 扩展表达式
egrep '^$|^#'myfile

# 前面的字符出现1次以上
egrep -n --color=auto 'go+d' myfile

# 前面的字符出现0~1次
egrep -n --color=auto 'go?d' myfile

# or方式匹配
egrep -n --color=auto 'gd|good|dog' myfile

# 组匹配
egrep -n --color=auto 'g(la|oo)d' myfile

# 多重复组匹配
echo 'AxyzxyzxyzxyzC' | egrep 'A(xyz)+C'


man grep | col -c > grep.txt

7. sort

# output sorted result
sort -o result.out video.txt 

# split the fields by ':'
sort -t: -r video.txt 

# test whether it has been sorted
sort -c video.txt  

# sort by 2nd field
sort -t: +1 video.txt 

# sort 3rd field using ascii order
sort -t: +2 video.txt 
sort -t: -k3 video.txt

# sort 3rd field using number order 
sort -t: +2n video.txt
sort -t: -k3n video.txt 

# uniq
sort -u video.txt

# sort 4th field, then sort 1st field
sort -t: -k4 -k1 video.txt

# sort +field_number.characters_in
# sort 2nd filed, begining with 3rd character
sort -t: +1.2 video.txt

# list all unix users
cat /etc/passwd | sort -t: +0 | awk -F":" '{print $1}'

# only not duplicate. here, sort is recommendable
sort video.txt | uniq -u 

# only duplicate
sort video.txt | uniq -d

# count dulicate times
sort video.txt | uniq -c


# sorting ignore case sensitive
sort -f

8. join

# join, the files must have a common content; the fields must be splited by single tab or space
join [options] input-file1 input-file2
-an     n, the number of file. -a1, means joining files based on file 1
-o n.m  n, the number of file; m, the number of field. -o 1.3, means display field 3 of file 1
-jn m   n, the number of file; m, the number of field.
-t      delimiter

# join cross
join names.txt town.txt

# mismatch connections, join all
join -a1 -a2 names.txt town.txt

# base on file1, join all
join -a1 names.txt town.txt

# selective join
join -o 1.1,2.2 names.txt town.txt

# different field join
# extract 3rd field of file 1, 2nd field of file 2, then join them together
join -j1 3 -j2 2 file1 file2

cat pers
P.Jones Office Runner ID897
S.Round UNIX admin ID666
L.Clip Personl Chief ID982

cat pers2
Dept2C ID897 6 years
Dept3S ID666 2 years
Dept5Z ID982 1 year

join -j1 4 -j2 2 pers pers2
ID897 P.Jones Office Runner Dept2C 6 years
ID666 S.Round UNIX admin Dept3S 2 years
ID982 L.Clip Personl Chief Dept5Z 1 year

9. cut

cut [options] file1 file2
-c LIST, select only these characters
-f LIST, select only these fields
-d, delimiter

cut -d: -f4 pers
cut -d: -f1,3 pers
cut -d: -f1-3 pers

cut -d: -f1,6 /etc/passwd

# file permision
ls -l | cut -c1-10

echo $PATH | tr ":" "\n" | nl
echo $PATH | cut -d":" -f3,5

10. paste

paste -d -s - file1 file2
-d, delimiter
-s, paste one file at a time instead of in parallel

paste -d: pas1 pas2

# list file name, 3 files each row
ls | paste -d" " - - -

# list file name, ls -l|awk 'NF>3{print $8}'
ls | paste -d"" -

11. split

1
2
3

split -output_file_size input-filename output-filename
-output_file_size, default 1000 lines
output-filename, default x[aa]->x[zz]

12. dos2unix, `^M=ctrl+v ctrl ^ enter`

dos2uninx dosfile

sed -e 's/^M//' dosfile

tr -s "\r\n" "\n" < dosfile
tr -d "\015" < dosfile

col -bx < dosfile

# delete ^M in vim
:set ff=unix
:%s/\r//g
:%s/^M//gc

13. lsof, list open file

lsof  filename 显示打开指定文件的所有进程 
lsof -a 表示两个参数都必须满足时才显示结果 
lsof -c string   显示COMMAND列中包含指定字符的进程所有打开的文件 
lsof -u username  显示所属user进程打开的文件 
lsof -g gid 显示归属gid的进程情况 
lsof +d /DIR/ 显示目录下被进程打开的文件 
lsof +D /DIR/ 同上，但是会搜索目录下的所有目录，时间相对较长 
lsof -d FD 显示指定文件描述符的进程 
lsof -n 不将IP转换为hostname，缺省是不加上-n参数 
lsof -i 用以显示符合条件的进程情况 
lsof -i[46] [protocol][@hostname|hostaddr][:service|port] 


lsof -i tcp:22
lsof -i :22
lsof -i @10.40.53.22

lsof /etc/passwd
lsof /etc/cdrom

lsof `which httpd`

lsof -c bash

lsof -u apache

lsof +D /tmp

lsof -i  # port 80

14. echo

echo -n     取消行末换行
echo -E     关闭反斜线控制字符转换
echo -e     启用反斜线控制字符转换

\c  取消行末换行符
\n  newline     => \012
\r  return
\t  TAB         => \011
\num    ASCII八进制编码
\xnum   ASCII十六进制编码

15. 删除空格

sed "/^\s*$/d"
sed '/^$/d'
sed -i '/^$/d'
awk 'NF>0'
perl -i.backup -n -e "print if /\S/"
grep -v '^$'

Eli's Blog

Linux文本处理

1. find

1.1 time

1.2 user & group

1.3 file

2. xargs

1.1 选项说明

2.2 实例

3. sed

3.1 参数说明

3.2 Address

3.3 Command

4. awk

4.1 Built-in Variables:

4.2 Built-in Functions:

4.3 FILE SPACING:

4.4 SLECTIVE PRINTING OF CERTAIN LINES:

4.5 SELECTIVE DELETION OF CERTAIN LINES:

5. chatrr

6. grep

7. sort

8. join

9. cut

10. paste

11. split

12. dos2unix, `^M=ctrl+v ctrl ^ enter`

13. lsof, list open file

14. echo

15. 删除空格

上一页

Go Etcd

下一页

Linux系统

Eli's Blog

Linux文本处理

1. find

1.1 time

1.2 user & group

1.3 file

2. xargs

1.1 选项说明

2.2 实例

3. sed

3.1 参数说明

3.2 Address

3.3 Command

4. awk

4.1 Built-in Variables:

4.2 Built-in Functions:

4.3 FILE SPACING:

4.4 SLECTIVE PRINTING OF CERTAIN LINES:

4.5 SELECTIVE DELETION OF CERTAIN LINES:

5. chatrr

6. grep

7. sort

8. join

9. cut

10. paste

11. split

12. dos2unix, ^M=ctrl+v ctrl ^ enter

13. lsof, list open file

14. echo

15. 删除空格

上一页

Go Etcd

12. dos2unix, `^M=ctrl+v ctrl ^ enter`