QTIPS - Using @Upper.Case and @Lower.Case with Foreign Languages
QTIPS - Local Language Sets
Sorting out Collation Sequences by Mike Pope
QTIPS - FOR/NEXT variables
QTIPS - Break-On Date Fields
IConvs / OConvs
QTIPS - @Date.Format
@ATTACK - @Date.Format
QTIPS - Short Cut Implicit Formatting
Utility Diskette # 4
QTIPS - DOSTime
VERBatim - V11
@ATTACK - @Backgrnd.Time
@ATTACK - @Index.Time
QTIPS - Time-outs in Windows
SecureUser
VERBatim - V25
@ATTACK - @Files.System
Advanced Revelation Initialisation Sequence (Overview) by Mike Pope
REVMEDIA Revisted
@ATTACK - @Reduction.Done
QTIPS - Using @Upper.Case and @Lower.Case with Foreign Languages
Base Conversions
Utility Diskette # 3 - Part I
Sorting out Collation Sequences by Mike Pope
What's New (and un(der)documented!) In 2.12
Sorting out Collation Sequences by Mike Pope
V119 - Part I
V119 - Part II
VERBatim - V121
Utility Diskette # 3 - Part I
@ATTACK - @Attrbt.Ptr
@ATTACK - @Query.Table
REVMEDIA Revisited
Uncommon Knowledge - WC_Table_Exit_Mode%
QTIPS - New Catalyst Option
Version 3 Technical Highlights - Deleting Tables Programmatically
Version 3 Technical Highlights - Aliasing Tables Programmatically
Version 3 TCL Subroutines - Creating Tables
Version 3 TCL Subroutines - Deleting Tables
Version 3 TCL Subroutines - Aliasing Tables
Symbol Table Structure
@ATTACK - @Return.Value
QTIPS - Btree.Extract
Comp
Reader's Clinic - Removing "Searching Cross References" Message
@ATTACK - @List.Active
IConvs / OConvs
Reader's Clinic - Stop Lists
REVMEDIA Revisited
REVMEDIA Revisited
QTIPS - FOR/NEXT variables
Redefining Keys
Background Processing
Capture
Creating Your Own Background Processes
@ATTACK - @Edit.Keys
@ATTACK - @Index.Time
@ATTACK - @PlayBack
@ATTACK - @Priority.Int
@ATTACK - @Prog.Char
How Indexes Are Updated
A RevTechie Replies - And Miscellaneous Jottings - Mike Pope - Revelation Technologies (UK) Ltd
QTIPS - Use of Mouse
QTIPS - Interrupt Proof Error Messages
Uncommon Knowledge - WC_Soft_Keys%
Version 3 Technical Highlights - Input.Char
Version 3 Technical Highlights - @Prog.Char
Version 3 Technical Highlights - Add_Buttons
Version 3 Technical Highlights - Highlight
Utility Diskette # 3 - Part I
QTIPS - Finding/Replacing Spaces With The Editor
Utility Diskette # 4
QTIPS - Using @Upper.Case and @Lower.Case with Foreign Languages
@ATTACK - @Lower.Case
@ATTACK - @Upper.Case
Sorting out Collation Sequences by Mike Pope
Sorting out Collation Sequences by Mike Pope
QTIPS - DOS File Names
DOS Interfacing (Part II)
VERBatim - V116
@ATTACK - @Pri.File
@ATTACK - @Rollout.File
File Variables
How Indexes Are Updated
Index Record Layouts
QTIPS - File Variable of File In SELECT Statement
QTIPS - Amending non-Attached Files
LINEAR HASH FILE STRUCTURES - Part 1
Index Flush
QTIPS - File Handle Structure
Sorting out Collation Sequences by Mike Pope
Sorting out Collation Sequences by Mike Pope
Networked %SK%
VERBatim - V86
@ATTACK - @Help.Level
@ATTACK - @Window.Level
@ATTACK - @File.Error
@ATTACK - @File.Error.Mode
@ATTACK - @Last.Error
A RevTechie Replies - And Miscellaneous Jottings - Mike Pope - Revelation Technologies (UK) Ltd
RTP5 and RTP51
QTIPS - Standardising Error Message Display
QTIPS - Interrupt Proof Error Messages
Version 3 Technical Highlights - ValidateName
Defaults
QTIPS - Autofilling Default Values
REVMEDIA Revisted
QTIPS - Using @Upper.Case and @Lower.Case with Foreign Languages
Base Conversions
Utility Diskette # 3 - Part I
Sorting out Collation Sequences by Mike Pope
What's New (and un(der)documented!) In 2.12
RTP Series - RTP5
VERBatim - V22
Play it Again, Cam
Reader's Forum - Mark Hirst Revelation C Interface - Part 1
Reader's Forum - Numeric Precision in R/Basic - Hal Wyman
QTIPS - Use of Mouse
Reader's Forum The C Interface Part 2 - Mark Hirst (Senior Techie - ICS) Reader's Clinic

RevMedia FKB

DocumentV3I9A6
TitleSorting out Collation Sequences by Mike Pope
KeywordsCOLLATION
MAP
SEQUENCE
LANGUAGE
SET
ASCII
CM
CM_US
CM_ISO
%LOCAL_GROUP%
@LOWER.CASE
@UPPER.CASE
TextThe introduction of language sets in Version 2 1 of Advanced Revelation was
a great step forward in allowing the product to be 'localised' for specific
languages and countries Along WITH allowing language specific attributes
such as date format time format etc the system now enables you to create
language specific alphabetic (or 'collation') sequences

In earlier versions of Advanced Revelation sorting was done strictly
according to the ASCII sequence of the characters involved For instance
the character 'A' (ASCII 65) would be followed by 'B' (ASCII 66) However
'Ž' is ASCII 142 so it would sort incorrectly for German speakers who
expect 'Ž' to fall immediately AFTER 'A'

As of Version 2 10 Advanced Revelation allows resequencing USING 'collation
maps' (CMs) A CM is a table that stores the desired sequence of a set of
characters regardless of the actual ASCII sequence value of the characters

Collation Maps in Advanced Revelation
A CM 256 bytes long Each byte of the map tells Advanced Revelation what the
relative (ordinal) sort value is for that position in the ASCII chart For
example the 65th position of the map tells the system what the sort order
is for the character 'A' relative to any other character in the map

When sorting USING a CM Advanced Revelation follows these steps:

* extract the next character to be sorted
* look into the CM at that character's ASCII sequence value (+1 to account
for ASCII 0)
* extract the sort sequence of that character

For EXAMPLE here is an extract FROM a system CM:

( 65 chars here)FILO

which translates into this:

ÚÄÄÄÄÄÄÄÂÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³ Char ³ Pos ³ Sort Sequence ³
ÃÄÄÄÄÄÄÄÅÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
³ A ³ 66 ³ F (70) ³
³ B ³ 67 ³ I (73) ³
³ C ³ 68 ³ L (76) ³
³ D ³ 69 ³ O (79) ³
³ (etc )³ ³ ³
ÀÄÄÄÄÄÄÄÁÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

Why doesn't the sort sequence for 'B' (73) immediately follow that for 'A'
(70)? Because the characters 'Ž' and 'b' come between them If you look into
the CM indeed you would find that the position for 'Ž' (143) is the
character 'G' (71) and that at position for 'b' (99) is the character 'H'
(72)

Advanced Revelation actually implements two types of CM one for case
sensitive sorting (in which 'a' and 'A' sort differently) and another for
case insensitive sorting (in which 'a' and 'A' sort the same) The example
above is of a case sensitive CM in which each character has a unique sort
sequence value A case insensitive map looks identical but repeats the
relative sort value for as many characters as necessary

Storing Collation Maps
Advanced Revelation stores CMs as two fields In Version 2 10 these were
the 10th (case sensitive) and 11th (case insensitive) fields of the LND_
records in the SYSTEM file This meant that the CMs were BOUND to individual
language sets which led to difficulties In Version 2 11 CMs have been
isolated into stand alone records in the SYSTEM file named WITH a CM_
prefix again WITH two fields (1/2 = sensitive/insensitive) The language
set records instead of containing the CMs themselves just store in <10>
the CM record key for that language

Version 2 11 ships WITH two CMs: CM_US and CM_ISO The former produces sort
values that match the ASCII table; the latter treats alternates of a
character (e g a …ƒ„) as having sequential sort values

Using Collation Maps
At login or level 1 RESET the system looks at the CURRENT language set
(from the environment) and attempts to load the CMs ASSOCIATED WITH it If
there is any error (e g LEN(map) NE 256) Advanced Revelation reverts to
using the default (ASCII) CM

In R/BASIC CMs are used for all character comparisons (e g IF 'B' LT '„')
Because of this specific sort operations such as LOCATE BY and V119 will
use the CURRENT CMs Obviously various system routines use LOCATE BY and
V119 and therefore ALSO implicitly take advantage of CMs For EXAMPLE
R/LIST uses V119 for BY clauses and INDEXES of course use LOCATE BY

To ensure that all users use the same CM to UPDATE INDEXES the system
stamps CM information on the index when you CREATE it and checks it against
the CURRENT CM when you attempt an update The key to the CM is stored in
the 8th FIELD of the field record in the !file It is this latter that
caused PROBLEMS in 2 10 In 2 10 this CM key was the language set NAME If
you changed language sets even if the collation sequence remained the same
the INDEXING system prohibited modifications In 2 11 the problem was
resolved by storing only the actual CM name which remains the same across
many language sets Of course if you load up a new CM altogether indexing
will insist on a REBUILD in order to resequence the index according to the
new CM For Quick/Rightdexes the CM name is stored in the dict of the file
in %LOCAL GROUP% (value1 = dict CM value2 = DATA CM)

So when are case insensitive CMs used? The only place is when you use the
special case insensitive COMPARISON operators (e g _EQC) You might think
that they'd be used in INDEXES However indexing simply converts
@LOWER CASE to @UPPER CASE for case insensitive INDEXES (which means that
you should check and adjust those two VARIABLES in the LND record) (Editors
note: This can especially be a problem when USING the B CATALYST lookup
logic If @LOWER CASE and @UPPER CASE do not contain the foreign language
set characters then lookups involving them may fail if entered in lowercase
AMcA) Creating Your Own Collation Maps It is not difficult (just
confusing) to create your own CMs Following is a UTILITY to do this To use
it create two records in the LISTS file called NEW_CM_CASE and
NEW_CM_NOCASE For the case sensitive list put the characters one per
field in the order that you want to re sequence (e g A Ž a „ B b
' '=@FM) In the second (case insensitive) list each field can contain
multiple characters per field to indicate that these have the same sort
sequence (e g AŽa„ Bb) INCLUDE just the characters to be resequenced Any
missing characters are added into the CMs in ASCII order where you might
have left any holes

The PROGRAM works as a reverse of how Advanced Revelation uses the CMs For
each character in the desired sequence it goes to that ASCII value of the
character in the new map and inserts its sort sequence (relative position)
there Additional features of the utility are that it assumes standard sort
order for ASCII values 1 32 and that it checks for duplicate assignments

The programs creates a new sort order record called CM_NEW To create one of
a DIFFERENT name change the LINE which reads WRITEV NEW_CM ON SYSTEM_FILE
'CM_NEW' FCTR

/*
program: reset_cm
author: mike pope RT(UK)Ltd
date: 11 Feb 1992
notes: SOURCE user's desired sort order
NEW_CM new CM created by this program
ASCII_TABLE ascii chart values used to backfill
*/

OPEN 'SYSTEM' TO SYSTEM_FILE ELSE CALL FSMSG() ; STOP
SOURCE_CASE = XLATE('LISTS' 'NEW_CM_CASE' '' 'X')
SOURCE_NOCASE = XLATE('LISTS' 'NEW_CM_NOCASE' '' 'X')

/*
REMEMBER! The records FROM the LISTS file should be @FM delimited
*/
SOURCE = SOURCE_CASE ; FCTR = 1 ; GOSUB MAKE_MAP
SOURCE = SOURCE_NOCASE ; FCTR = 2 ; GOSUB MAKE_MAP
STOP
/* */

MAKE_MAP:
IF SOURCE ELSE RETURN
NEW_CM = STR( @FM 256 ) ; * initialize to all high values
ASCII_TABLE = NEW_CM
* Now BUILD table of ASCII values > space
FOR CTR = 33 TO 256
ASCII_TABLE[CTR 1] = CHAR(CTR)
NEXT
/*
initialize NEW_CMs to have the 1st 32 characters already in place
*/
FOR CTR = 1 TO 32
NEW_CM[CTR 1] = CHAR(CTR)
NEXT
FIELD_CNT = COUNT( SOURCE @FM ) + 1 ; * We know Source is not null
FOR FIELD_CTR = 1 TO FIELD_CNT
NEXT_FIELD = SOURCE< FIELD_CTR >
CHAR_CNT = LEN(NEXT_FIELD) ;* LOOP thru each char (req for case insens)
FOR PTR = 1 TO CHAR_CNT
NEXT_CHAR = NEXT_FIELD[ PTR 1 ]
VALUE = SEQ( NEXT_CHAR )
SORT_SEQ = CHAR(FIELD_CTR+32) ;*offset to SKIP 1st 32 char
/*
see if this value has already been put into NEW_CM if there is not an
@FM at this position it has already been used
*/
IF NEW_CM[ VALUE 1 ] NE @FM THEN
CALL MSG('Char %1% is duplicated!' '' '' NEXT_CHAR )
STOP
END ELSE
NEW_CM[ VALUE 1 ] = SORT_SEQ
ASCII_TABLE[ VALUE 1] = @FM ; * @fm means 'already used'
END
NEXT PTR
NEXT FIELD_CTR
/* backfill characters not in new CM */
NEW_CM_PTR = 1
FOR CTR = 33 TO 256
NEXT_CHAR = ASCII_TABLE[ CTR 1 ]
IF NEXT_CHAR NE @FM THEN
* SCAN NEW_CM (from NEW_CM_PTR) for holes (marked by @FM)
HOLE_FOUND = 0
LOOP
NEXT_CM_CHAR = NEW_CM[ NEW_CM_PTR 1 ]
IF NEXT_CM_CHAR EQ @FM THEN
HOLE_FOUND = 1
END ELSE
NEW_CM_PTR += 1
END ; * if next_cm_char eq @fm
UNTIL HOLE_FOUND REPEAT
NEW_CM[ NEW_CM_PTR 1 ] = CHAR( CTR )
NEW_CM_PTR += 1
END ; * if next_char ne @fm
NEXT CTR

* shift CM 1 char right to add in ascii 0 CONVERT hi end chars to @VM
CONVERT @FM:@RM TO '' IN NEW_CM
NEW_CM[ 254 2 ] = @VM:@VM ; * positions 254/255 are taken by @VMs in CMs
* note that this means that @FMs and @RMs cannot be remapped
NEW_CM = CHAR(0) : NEW_CM
IF LEN(NEW_CM) = 256 THEN
WRITEV NEW_CM ON SYSTEM_FILE 'CM_NEW' FCTR
END ELSE
* Call appropriate (900 or 901) SYS MESSAGES error message
CALL MSG( 899+FCTR '' '' 'CM_NEW' ) ; STOP
END
RETURN


(Volume 3 Issue 9 Pages 7 10)
[revmedia/copyrigh.htm]

Page last modified: 31/01/03