| Viewer |
| QTIPS - Using INIT.VIEW with Printers |
| REVMEDIA Revisited |
| QTIPS - Sub-Headings in RLIST (Revisited) |
| SecureUser |
| VERBatim - V25 |
| @ATTACK - @Files.System |
| Advanced Revelation Initialisation Sequence (Overview) by Mike Pope |
| REVMEDIA Revisted |
| Reader's Clinic - Different Id Same Record |
| RTP Series - RTP25 |
| QTIPS - String Space |
| Reader's Letters - Jim Owen |
| QTIPS - Finding/Replacing Spaces With The Editor |
| Networked %SK% |
| RTP Series - RTP25 |
| A RevTechie Replies - And Miscellaneous Jottings - Mike Pope - Revelation Technologies (UK) Ltd |
| Gas Bar |
| VERBatim - V9 |
| @ATTACK - @Rec.Count |
| QTIPS - Replacing GAS.BAR routine during PERFORM "SELECT" |
| A RevTI Techie Replies - Mike Pope - Revelation Technologies (UK) Ltd |
| QTIPS - @Date.Format |
| @ATTACK - @Date.Format |
| QTIPS - Short Cut Implicit Formatting |
| Utility Diskette # 4 |
| Playing with Scan Codes |
| QTIPS - Compiling Protection Code |
| QTIPS - Invalid Code and Command |
| QTIPS - Code/Command Help |
| Compiling 64K on a Shoestring by Blaise Wrenn (LexStat Systems Ltd) |
RevMedia FKB
| Document | V3I1A1 |
| Title | LINEAR HASH FILE STRUCTURES - Part 1 |
| Keywords | LINEAR HASH FILE STRUCTURE LK OV FRAMES |
| Text | Dave Harmacek of Harmacek Database SYSTEMS had a recent problem where his client corrupted his Linear Hash File To recover the remaining information it was necessary to deal WITH the files at the DOS level The file structure was not self evident so we had to undertake some research The starting point was Technical Bulletin # 29 by Ron Phillips In this he sets out most of the salient facts relating to the way in which Linear Hashing works Unfortunately it was not written WITH a view to providing all the information required to produce an LH fixing utility so some elements of it lack detail whilst others are a little misleading Over the next two issues the physical structure of Linear Hash files will be addressed In this the first article attention will be given primarily to the LK portion of the file and the structure of pointers The second article will consider the OV portion and how frames are linked Header Information Every LH file is split physically into two files the LK file and the OV file The LK file contains the primary data frames and the OV contains the overflow Each of these files is split up into a number of "frames" Normally these frames are 1024 bytes in length although this size can be changed when the file is created or recreated Each frame is made up of some header information followed by the actual data in the frame The header information can be one of four types LK frame First Group Header LK frame Subsequent Group Headers OV frame Free Frames Group Header OV frame Subsequent Group Headers A frame can be identified by looking at the first byte in the frame which will have one of the following values ASCII(26) First LK Group Header ASCII(13) Subsequent LK Group Headers ASCII(7) Free Frames OV Header ASCII(14) Subsequent OV Group Headers Header Structure The structure of the LK header is described in the following table (Where (x) is length) ÚÄÄÄÄÄÂÄÄÄÄÂÄÄÄÄÄÄÄÂÄÄÄÄÂÄÄÄÄÄÄÂÄÄÄÄÄÂÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄ¿ ³Byte ³ 1 ³ 2 5 ³6 9 ³10 13 ³14 15³16 19 ³ 20 ³21 22³23 26 ³ ÃÄÄÄÄÄÅÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ´ ³Group³Type³Forward³Skip³Modulo³Frame³In Use³Threshold³Size ³Rec Count³ ³1 LK ³ (1)³ (4) ³(4) ³ (4) ³Size ³ (4) ³ (1) ³Lock ³ (4) ³ ³ ³ ³ ³ ³ ³(2) ³ ³ ³ (2) ³ ³ ÃÄÄÄÄÄÅÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ´ ³LK ³Type³Forward³Skip³Modulo³ ³ ³ ³ ³ ³ ³Rest ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ ÀÄÄÄÄÄÁÄÄÄÄÁÄÄÄÄÄÄÄÁÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÙ Type Is a single byte identifying the frame type as above Forward Is a pointer to the next OVERFLOW frame Skip Is a pointer to the OVERFLOW frame where the next record actually begins Hence if the CURRENT record is very large the system can skip out the intervening OVERFLOW frames and go straight to the OVERFLOW frame containing the next record Modulo The modulo (number of primary frames) in the file Note that this value may change between Group 1 and other Groups Group 1 is always the most CURRENT modulo but other groups have a modulo which reflects the modulo of the file when the particular group was relocated (as the file grows or shrinks) Framesize The size of each frame This must be the same for each frame in the file but can change FROM file to file All utilities written to access LH files at the DOS level ought to take account of this fact and not assume a standard 1024 byte frame size In practice frame sizes seem to have little effect on PERFORMANCE In Use Records how much space is currently taken up by the file This figure is used to work out when the file ought to be resized Threshold This figure is the THRESHOLD figure at which resizing will commence If the percentage utilisation drops below this figure the file will have its modulo decreased if it goes above this figure the file will have its modulo increased Size Lock Set by system processes to indicate whether the file may be permitted to expand/contract Rec Count Number of records in the file Converting FROM Base 256 to Base 10 The information is stored in a Base 256 format WITH the least significant byte first Thus to convert the information FROM Base 256 to Base 10 one could use the following code segment CONVERT: RET_VALUE = seq(VALUE[1 1]) RET_VALUE += 256 * seq(VALUE[2 1]) RET_VALUE += 65536 * seq(VALUE[3 1]) RET_VALUE += 16777216 * seq(VALUE[4 1]) VALUE = RET_VALUE RETURN The above code segment works for all lengths of number up to 4 digits However remembering that V118 is used for exactly these SORT of conversions but only to THREE figures it is actually approximately three times quicker to use the following code segment (on a machine without an 80x87) (for four digit numbers) CONVERT: CALL V118(VALUE[1 3] LOW_BITS) HIGH_BIT = 16777216 * seq(VALUE[4 1]) RET_VALUE = HIGH_BIT + LOW_BITS RETURN Note that if the number being converted is less than three digits in length (as WITH frame size) and size lock the value to convert must be padded to 3 digits WITH CHAR(0)s (i e REPLACE the V118 call with V118(FMT(VALUE[1 3] "L(" : CHAR(0) : ")#3" LOW_BITS)) Records Within The Frame After the header block within the frame the data records begin These are stored sequentially and at the end of each record is a record marker (CHAR(255) At the end of all the records within a group is an "End of Group" marker a CHAR(128) To enable the system to MOVE FROM record to record rapidly within a frame (and ALSO to be able to detect frame format errors) each record is preceded by a STRING of between 2 and 6 bytes combining the length of the record WITH the length of the record id The way in which these are stored is a little confusing The maximum record size is 64K and the maximum record id size is theoretically the same 64K can be represented in Base 256 by two bytes It might therefore seem reasonable to assume that each record would be preceded by four bytes two for the record length and two for the id length This is not the CASE In storing the record and ID lengths AREV tries to optimise the information and only stores as much information as it needs to If a record id or a record length is short only one byte is needed to represent it so AREV only stores one byte To ensure that the system knows that only one byte is being used (or two or three) AREV FLAGS the last byte in the length chain To do this the MSB (Most Significant Bit the eighth bit of the number byte) is set to 1 If the MSB is not used the largest size the byte can represent is CHAR(127) Thus if a byte is greater than CHAR(127) the MSB must be set and this must be the last byte in the length chain Finding out the Record length and Id length is then achieved by starting at the first byte in the data frame and reading one byte at a TIME until the byte is greater than CHAR(127) The bytes READ up to this point represent the record length The Id length is found by continuing to take a byte at a time until the next byte greater than CHAR(127) is found The bytes read up to this point represent the Id length It is traditional to TEST for bits being set by masking WITH other bytes Thus to check for the eighth bit being set one would BITAND with 128 This will only return true if the eighth bit is set The approach of only storing the minimum required has one problem Since the MSB is used as a flag one bit per byte cannot be used to store information There are thus only seven data bits per byte meaning that each byte can only store up to 127 The maximum that can be stored in two bytes is therefore 16 256 (127 + (127 * 127)) To store a record length over 16 256 then three bytes are required giving 127 + (127 * 127) + (127 * 127 * 127) = 2 064 639 It is thus possible that in some circumstances this approach may use more bytes per record than two eight bit bytes Normally this will not be the case Unlike the information stored in the header block this information is stored WITH the most significant byte first When the bytes making up the lengths have been identified they can be converted into DECIMAL USING the following code fragment CONVERT_7_BITS: BEGIN CASE CASE LEN(V) = 3 V = (SEQ(V[1 1]*16384) + (SEQ(V[2 1]*128) V += SEQ(V[3 1]) 128 ; * Strip flag bit CASE LEN(V) = 2 V = SEQ(V[1 1]*128 + SEQ(V[2 1] 128 CASE LEN(V) = 1 V = SEQ(V) 128 END CASE RETURN (Volume 3 Issue 1 Pages 4 7) |
Page last modified: 08/02/03