Summary of Methods to Remove Duplicate Lines from Files Using Linux Bash Commands

A summary and explanation of several methods for removing duplicate lines from files on the Bash command line of Linux computers/servers.

When you look at the following reference, you’ll notice there are many ways to remove duplicate lines from files using commands.

shell - Remove duplicate entries using a Bash script - Stack Overflow

This time, after comparing the output of each command, I got some interesting verification results, so I’d like to introduce them along with those findings.

First, let’s verify using fuga.txt as an example, which has duplicate hoge entries as shown below.


$ cat > fuga.txt
hoge
fuga
foo
hoge
bar

$ cat fuga.txt | sed '$!N; /^\(.*\)\n\1$/!P; D'
hoge
fuga
foo
hoge
bar

$ cat fuga.txt | sort -u
bar
foo
fuga
hoge

$ cat fuga.txt | awk '!a[$0]++'
hoge
fuga
foo
bar

With sed ’$!N; /^(.*)\n\1$/!P; D’, the duplicate line “hoge” was not removed because the duplicate lines are not consecutive.

sort -u sorts before removing duplicate lines, so the result differs significantly from the original file content.

awk ‘!a[$0]++’ seems to work without any particular problems.

Let’s verify with the following next. The results were interesting.


$ cat > fuga.txt
りんご
ぱいん
[りんご]
りんご
ぱぷりか
[りんご]
なす

$ cat fuga.txt | sed '$!N; /^\(.*\)\n\1$/!P; D'
りんご
ぱいん
[りんご]
りんご
ぱぷりか
[りんご]
なす

$ cat fuga.txt | sort -u
[りんご]
なす
りんご
ぱぷりか

$ cat fuga.txt | awk '!a[$0]++'
りんご
ぱいん
[りんご]
ぱぷりか
なす

Did you notice? The number of output lines differs between sort -u and awk ‘!a[$0]++’. The non-duplicate line “ぱいん” that is maintained in the awk version has disappeared in the sort version.

I thought about it, but couldn’t figure out the cause. I think it’s safe to consider this a bug.

Based on these results, for removing duplicate lines from files, awk ‘!a[$0]++’ appears to be the most suitable command as it maintains the order and is likely bug-free.

Summary of Methods to Remove Duplicate Lines from Files Using Linux Bash Commands

Share this article

🔗 Copy Links

Install DebugTips