Using just uniq is no viable as repeated lines need to be next to each other for uniq to identify it. Using sort will just mix them up, loosing their original placement within the file. So here's a work around (originally from Linux Journal).
Let's say we have a document called file:
$ cat file a b c d a b c d |
Here are the steps we will take:
1- Use nl (or cat -) to add a numbering to the each line;
2- Use “sort -k 2” to place equal line after each other (we have to sort by the second column);
3- “Uniq -f 1” will remove equal lines (we also have to use the second column);
4- “sort -n” will re-add them in the proper order as per the first field, or the numbers
4- “sed 's/[0-9]//g'” will remove the numbers
This is what you command should look like:
$ nl file | sort -k 2 | uniq -f 1 | sort -n | sed 's/[0-9]//g' a b c d |
On my machine I had a problem where nl kept adding empty fields (still trying to find why), so I had to modify my expression a little bit:
$ nl file | expand | tr -s '[:blank:]' | sed 's/^ *//g' | sort -k 2 | uniq -f 1 | sort -n | sed 's/[0-9]//g' | sed 's/^ *//g' a b c d |
Now, let's say your sources.list got mixed up somehow, and all lines are now duplicate. We can apply the same concept like this:
$ grep -v '^#' sources.list | nl | expand | tr -s '[:blank:]' | sed 's/^ *//g' | sort -k 2 | uniq -f 1 | sort -n | sed 's/[0-9] *//g' deb cdrom:[Ubuntu ._Gutsy Gibbon_ - Release i()]/ gutsy main restricted deb http://ca.archive.ubuntu.com/ubuntu/ gutsy multiverse deb-src http://ca.archive.ubuntu.com/ubuntu/ gutsy multiverse deb http://ca.archive.ubuntu.com/ubuntu/ gutsy-updates multiverse deb-src http://ca.archive.ubuntu.com/ubuntu/ gutsy-updates multiverse deb http://ca.archive.ubuntu.com/ubuntu/ gutsy-backports main restricted universe multiverse deb-src http://ca.archive.ubuntu.com/ubuntu/ gutsy-backports main restricted universe multiverse deb http://archive.canonical.com/ubuntu gutsy partner deb-src http://archive.canonical.com/ubuntu gutsy partner deb http://security.ubuntu.com/ubuntu gutsy-security main restricted deb-src http://security.ubuntu.com/ubuntu gutsy-security main restricted deb http://security.ubuntu.com/ubuntu gutsy-security universe deb-src http://security.ubuntu.com/ubuntu gutsy-security universe deb http://security.ubuntu.com/ubuntu gutsy-security multiverse deb-src http://security.ubuntu.com/ubuntu gutsy-security multiverse deb http://archive.ubuntu.com/ubuntu gutsy universe multiverse deb-src http://archive.ubuntu.com/ubuntu gutsy universe multiverse deb http://wine.budgetdedicated.com/apt edgy main deb http://ca.archive.ubuntu.com/ubuntu/ gutsy main restricted deb-src http://ca.archive.ubuntu.com/ubuntu/ gutsy main restricted deb http://ca.archive.ubuntu.com/ubuntu/ gutsy-updates main restricted deb-src http://ca.archive.ubuntu.com/ubuntu/ gutsy-updates main restricted deb http://ca.archive.ubuntu.com/ubuntu/ gutsy universe deb-src http://ca.archive.ubuntu.com/ubuntu/ gutsy universe deb http://ca.archive.ubuntu.com/ubuntu/ gutsy-updates universe deb-src http://ca.archive.ubuntu.com/ubuntu/ gutsy-updates universe |
Vic.
5 comments:
Great, sort of what I was looking for, I just have a problem.
I need the same action but with an array.
If I have, for example:
array=(1 2 3 4 1 2 3 4 1 2 3 4)
and I want to create a list for every "unique" value on that array.
I tried this to no avail:
array2=( `echo ${array[@]} | sort | uniq -u` )
It doesn't work of course because the array values are printed all in a single line.
Do you have an idea of how can I achieve this? I'd prefer it to be on the fly and not saving it to a file and reading from it later...
Thanks in advance
thanks for the interesting information
Many thanks.
Just one small thing, the sed [0-9] bit won't just remove the line numbers at the start, it will remove all numbers anywhere in each line.
sed 's/^ *[0-9]*//g'
worked for me without trashing the rest of the line.
The standard way to do this, at least for old timers is
awk '!s[$0]++'
Post a Comment