BabelPad Help : Sort Lines



Sort Lines Dialog Box

In order to sort some or all lines in the BabelPad edit window, select one or more whole lines, and select "Sort Lines..." from the Edit menu. Note that if the start or end position of the selection is not at the start of a line (e.g. if you select all but there is no line break after the last character in the document) then the "Sort Lines..." menu option will be disabled. When "Sort Lines..." is selected the following dialog box will be displayed:



Sort Method

The following sort methods are supported:


If you select the Unicode Collation Algorithm or CLDR Collation Algorithm then you can choose to customize the collation order for any of the listed languages (Neutral is the default collation). At present only a very few languages are supported, as a proof of concept. It is unlikely that additional languages will be added in the future (and possible that this feature will be removed), as language-specific customizations can now be applied using user-defined customizations (see below).


Sort Direction


Casing Options

This option is only available with the Windows default collation method.


UCA Options

These options are only applicable to UCA or CLDR sort methods.


Other Options

This option is only available with the UCA and CLDR collation methods.


Casing Options

This option is only available with the Windows default collation method.


Customize UCA / CLDR Collations

This option is only available with the UCA and CLDR collation methods. When this option is enabled you may customize collation elements by clicking on the "Define Customizations" button, which opens this dialog box:



This dialog enables you define one collation element ["Source"] (character or string) as equivalent to another collation element ["Target"] (character or string or null). The following buttons are available:

Pressing the "Add" of "Edit" button opens this dialog box:



In this dialog box enter the character or string to be redefined in the "Actual collation element" edit box, and enter the character or string it is to be processed as in the "Process as equivalent to" edit box (e.g. enter "ph" in the first box and "f" in the second box to treat "ph" as if it were "f", and so sort "sulphur" and "sulfur" the same). The "Process as equivalent to" box may be left blank, in which case the character or string in the first box will be ignored when sorting. To enter Unicode characters that are not on your keyboard, either copy and paste from BabelPad or BabelMap, or enter the Unicode character as a Universal Character Name (e.g. \u00C6 for Æ) which will be automatically converted to a Unicode character after entry.

The file format for loading and saving customizations is a text file encoded as UTF-8 with two tab-separated columns. The first column specifies the source character or string, and the second column specifies the target character or string (or may be empty to ignore the character or string in the first column). An optional third column with a comment may be included. Sample customization files for Welsh and Spanish are avalailable. In the file for Welsh customization the Welsh digraphs "ch", "dd", "ff", "ll", "ng", "ph", "rh", and "th" have been redefined as equivalent to Unicode characters that sort after "c", "d", "f", "l", "g", "p", "r", and "t" respectively:


CH Ↄ ROMAN NUMERAL REVERSED ONE HUNDRED
Ch Ↄ ROMAN NUMERAL REVERSED ONE HUNDRED
ch ↄ LATIN SMALL LETTER REVERSED C
DD Ɖ LATIN CAPITAL LETTER AFRICAN D
Dd Ɖ LATIN CAPITAL LETTER AFRICAN D
dd ɖ LATIN SMALL LETTER D WITH TAIL
FF Ꞙ LATIN CAPITAL LETTER F WITH STROKE
Ff Ꞙ LATIN CAPITAL LETTER F WITH STROKE
ff ꞙ LATIN SMALL LETTER F WITH STROKE
LL Ꝇ LATIN CAPITAL LETTER BROKEN L
Ll Ꝇ LATIN CAPITAL LETTER BROKEN L
ll ꝇ LATIN SMALL LETTER BROKEN L
NG Ɡ LATIN CAPITAL LETTER SCRIPT G
Ng Ɡ LATIN CAPITAL LETTER SCRIPT G
ng ɡ LATIN SMALL LETTER SCRIPT G
PH Ᵽ LATIN CAPITAL LETTER P WITH STROKE
Ph Ᵽ LATIN CAPITAL LETTER P WITH STROKE
ph ᵽ LATIN SMALL LETTER P WITH STROKE
RH Ʀ LATIN LETTER YR
Rh Ʀ LATIN LETTER YR
rh ʀ LATIN LETTER SMALL CAPITAL R
TH Ŧ LATIN CAPITAL LETTER T WITH STROKE
Th Ŧ LATIN CAPITAL LETTER T WITH STROKE
th ŧ LATIN SMALL LETTER T WITH STROKE

If you load this customization then a list of Welsh words sorts as below (words affected by the customization highlighted in bold):


bach
cadwyn
cywydd
chwaeth
da
dysgu
dda
edn
fagddu
fyny
fferllyd
gaeaf
gwaed
gynt
ngwaed
hafod
lafant
lwc
llaeth
mab
pab
pys
philosophi
ras
rwber
rhyd
saith
tad
tywysog
thus
ubain
ŵyll
ysgol

Please note that at present there is no way for the user to specify something like "sort 'dd' between 'd' and 'e'" (d < dd < e), and the only way to get the desired sort order is to redefine the source collation element as some other Unicode character or sequence of characters. For the example above, this means redefining Welsh digraphs as Unicode characters which are modified forms of the letter after which the digraph is to sort (and which are not used for Welsh). Unfortunately it does require some understanding of Unicode and/or the DUCET to choose appropriate substitutions.

If the "Set as default text sort" checkbox is checked after defining customizations then the customizations will be cached until changed or until BabelPad is closed. This means that if you do another UCA or CLDR sort during the current BabelPad session you will not need to respecify or reload the customizations.


Default Text Sort

You may specify one of the UCA, CLDR or Windows collation methods as the default text sort method by checking the "Set as default text sort" checkbox. When you make any changes to the parameters for the default sort then the "Set as default text sort" checkbox will become unchecked, and you will need to recheck it if you want the new parameters to be the new default.



See Also



Download | Help Contents