Page 1 of 1

[RESOLVED] sort command not having desired effect

Posted: 2011/11/11 15:03:56
by reformat
I cannot get the sort command in Centos5.7 to replicate a sort previously done on SCO Openserver 5.0.5.
I have transferred everything from SCO box to Centos5.7 box without much problem but 1 application requires the sorting of an external text file prior to processing. The file is large with each record having many fields but the problem is simple to explain.

One of the fields I must sort on has either numbers or alpha characters (no spaces) but both are quoted with double quotes ie "12345" or "ABCD". The file cannot be modified (eg removing quotes) as the following process requires them.

I need to use the field to get the records in alphanumeric order but with smaller number before larger numbers eg:
IN-FILE
"51393"
"51393"
"5172"
"5172"
"2703"
"2703"
"2703"
"5172"
"2703"
"CASH"
"CASH"
"62175"
"5172"
"62175"
"CASH"
"CASH"
"62175"
"62175"

OUT-FILE
"2703"
"2703"
"2703"
"2703"
"5172"
"5172"
"5172"
"5172"
"51393"
"51393"
"62175"
"62175"
"62175"
"62175"
"CASH"
"CASH"
"CASH"
"CASH"

SCO sorts the records correctly but nothing I do with Centos5.7 works.

Can anyone assist even if it means using a different language sort on Centos5.7 as at the moment I am having to continue running the old SCO server just for this task.

[RESOLVED] sort command not having desired effect

Posted: 2011/11/14 10:52:08
by r_hartman
Welcome to the CentOS fora.

[code]$ sort -n -k1.2[/code]
delivers:
[code]"CASH"
"CASH"
"CASH"
"CASH"
"2703"
"2703"
"2703"
"2703"
"5172"
"5172"
"5172"
"5172"
"51393"
"51393"
"62175"
"62175"
"62175"
"62175"[/code]
Substitute your column number for the '1' in '-k1.2', as this was now a single column

Re: sort command not having desired effect

Posted: 2011/11/17 17:06:58
by reformat
Thanks for your reply. Your instruction worked on my demo data which was confusing as I have been trying many variations of -kn.n, previously on the real file. A bit more testing reveals that the required sort only seems to works if the data being sorted is in the 1st column. If I insert a dummy column 1 and try the sort again on column 2 it stops working. sort amended to: sort -n -k2.2

Revised INPUT file:

1 "51393"
1 "51393"
1 "5172"
1 "5172"
1 "2703"
1 "2703"
1 "2703"
1 "5172"
1 "2703"
1 "CASH"
1 "CASH"
1 "62175"
1 "5172"
1 "62175"
1 "CASH"
1 "CASH"
1 "62175"
1 "62175"

Re: sort command not having desired effect

Posted: 2011/11/18 08:40:46
by r_hartman
[code]$ sort -n -k2.3
-- or --
$ sort -k2.3n[/code]Yes, I'm aware this does not seem to make sense.

Input file (I 'diversified' your column 1 a bit):[code]1 "51393"
1 "51393"
1 "5172"
"2" "5172"
1 "2703"
1 "2703"
2 "2703"
1 "5172"
1 "2703"
1 "CASH"
1 "CASH"
1 "62175"
1 "5172"
"1" "62175"
"1" "CASH"
"1" "CASH"
"1" "62175"
"1" "62175"[/code]

Output:
[code]$ sort -k2.3n test.txt
"1" "CASH"
"1" "CASH"
1 "CASH"
1 "CASH"
1 "2703"
1 "2703"
1 "2703"
2 "2703"
1 "5172"
1 "5172"
1 "5172"
"2" "5172"
1 "51393"
1 "51393"
"1" "62175"
"1" "62175"
"1" "62175"
1 "62175"[/code]I added another 1st column and changed the sort to '$ sort -k3.3n'. Works.
Don't ask. Maybe this qualifies as a bug. :-o

EDIT: Not convinced this is safe.
Try -k1.3n on the original file and the result is not what I'd expect.
Same for -k2.4n on this example.
You'll need to do some thorough testing before accepting this will be reliable.

I suggest sorting a 'real' file, and then extracting the sorted column only, without quotation marks.
Then extract the sort column from the 'real' file, without quotation marks, sort that, and finally run a diff over both single column files.
Repeat this for a number of 'real' files before accepting the sort-result as reliable.

[RESOLVED] sort command inconsistent by field

Posted: 2011/11/20 19:18:43
by reformat
Your non spec field identification appears to work; many thanks !
The easiest way to confirm the new sort syntax was to perform the same file sorts on sco and then diff the Centos 5.7 sorted complete files. No errors on this months files and I can back track the last 6 months to get some confidence.
Like you I think this is a bug as the results differ when syntax from field 1 is replicated on a different field; such inconsistency will not be part of the plan. Also to get the desired results is not as per the manual.
I wonder what the sort command is doing in later versions and I had better be careful at future upgrades.

Re: [RESOLVED] sort command inconsistent by field

Posted: 2011/11/21 08:34:48
by r_hartman
[quote]I wonder what the sort command is doing in later versions and I had better be careful at future upgrades.[/quote]
sort in CentOS 5.7 is version 5.79
I did my testing in CentOS 6.1 (well, CentOS 6.0 + CR), which has sort version 8.4

But I was also wondering whether this might bite somewhere in the future.