bash shell

Summary of Methods to Remove Duplicate Lines from Files Using Linux Bash Commands

A summary and explanation of several methods for removing duplicate lines from files on the Bash command line of Linux computers/servers. When you look at the following reference, you'll notice there are many ways to remove duplicate lines from files using commands.

Shou Arisaka
2 min read
Oct 5, 2025

A summary and explanation of several methods for removing duplicate lines from files on the Bash command line of Linux computers/servers.

When you look at the following reference, youโ€™ll notice there are many ways to remove duplicate lines from files using commands.

shell - Remove duplicate entries using a Bash script - Stack Overflow

This time, after comparing the output of each command, I got some interesting verification results, so Iโ€™d like to introduce them along with those findings.

First, letโ€™s verify using fuga.txt as an example, which has duplicate hoge entries as shown below.


$ cat > fuga.txt
hoge
fuga
foo
hoge
bar

$ cat fuga.txt | sed '$!N; /^\(.*\)\n\1$/!P; D'
hoge
fuga
foo
hoge
bar

$ cat fuga.txt | sort -u
bar
foo
fuga
hoge

$ cat fuga.txt | awk '!a[$0]++'
hoge
fuga
foo
bar

With sed โ€™$!N; /^(.*)\n\1$/!P; Dโ€™, the duplicate line โ€œhogeโ€ was not removed because the duplicate lines are not consecutive.

sort -u sorts before removing duplicate lines, so the result differs significantly from the original file content.

awk โ€˜!a[$0]++โ€™ seems to work without any particular problems.

Letโ€™s verify with the following next. The results were interesting.


$ cat > fuga.txt
ใ‚Šใ‚“ใ”
ใฑใ„ใ‚“
[ใ‚Šใ‚“ใ”]
ใ‚Šใ‚“ใ”
ใฑใทใ‚Šใ‹
[ใ‚Šใ‚“ใ”]
ใชใ™

$ cat fuga.txt | sed '$!N; /^\(.*\)\n\1$/!P; D'
ใ‚Šใ‚“ใ”
ใฑใ„ใ‚“
[ใ‚Šใ‚“ใ”]
ใ‚Šใ‚“ใ”
ใฑใทใ‚Šใ‹
[ใ‚Šใ‚“ใ”]
ใชใ™

$ cat fuga.txt | sort -u
[ใ‚Šใ‚“ใ”]
ใชใ™
ใ‚Šใ‚“ใ”
ใฑใทใ‚Šใ‹

$ cat fuga.txt | awk '!a[$0]++'
ใ‚Šใ‚“ใ”
ใฑใ„ใ‚“
[ใ‚Šใ‚“ใ”]
ใฑใทใ‚Šใ‹
ใชใ™

Did you notice? The number of output lines differs between sort -u and awk โ€˜!a[$0]++โ€™. The non-duplicate line โ€œใฑใ„ใ‚“โ€ that is maintained in the awk version has disappeared in the sort version.

I thought about it, but couldnโ€™t figure out the cause. I think itโ€™s safe to consider this a bug.

Based on these results, for removing duplicate lines from files, awk โ€˜!a[$0]++โ€™ appears to be the most suitable command as it maintains the order and is likely bug-free.

Share this article

Shou Arisaka Oct 5, 2025

๐Ÿ”— Copy Links