Sprezzatura :: Making Databases Happen

Client Server Corner - SQL Server Data Loading Part I - Bill Wolff

SQL Server offers excellent performance benefits for very large files. Arev tasks that normally take hours are reduced to minutes. Many corporations have pilot projects under way that exercise the client server relationship. The Arev SQL Server bond is uniquely positioned to handle that challenge. Most downsizing projects involve large data sets from a mainframe or mini environment. The data is downloaded and converted to ASCII for input to the server. Another popular approach is upsizing swollen Arev applications that are slowing under the weight of excessive data entry. In either case, client programs are typically written to port the data to SQL Server. We will explore four of these techniques and compare the efficiency of each.

All four methods rely on R/BASIC programming. The first uses bond technology to map read and write statements to SQL inserts. The second and third build the SQL insert statements in code and pass them to the server with a subroutine call. EXEC_TRANS_SQL and MSSQL_C are the routines supplied with the bond package that perform this task. The last method builds an ASCII file with OSBWRITE that is compatible with the SQL Server bulk copy utility. A fair test might include the Arev IMPORT.EXPORT utility as a fifth method, but that alternative was considered too slow.

SQL Server Basics

A few points on SQL Server architecture are worth mentioning before we code. Accounts are called databases, data files are called tables, fields are called columns, and records are called rows. This nomenclature is generic to Structured Query Language and is central to the Arev and Open Engine SQL implementations. Statements are passed to the server for data definition (DDL) and data manipulation (DML). In all cases, statements are text strings with proper syntax. INSERT is the verb for creating new rows. In Arev, a row might be added to VERBS as follows:

   INSERT  INTO VERBS VALUES ('LISTDICT', 'RBASIC', 'VERBS', NULL, 'V45')

Note that key fields in SQL are one or more unique columns. SQL Server defines unique fields with an index statement. LISTDICT would normally be the key in Arev programming but appears here as the first column in the list. The other columns follow in order. The keyword NULL is used to imply value unknown instead of the Arev convention of "". For comparison, the same result would be generated with the following R/BASIC statements:

0001    OPEN 'VERBS' TO F_VERBS THEN
0002      WRITE 'RBASIC': @FM:'VERBS':@FM:@FM: 'V45' ON F_VERBS, 'LISTDICT'
0003    END

Once the syntactical conventions are understood, data loading can begin. At some point, Arev or any other front end must connect to the server and speak with this syntax. The Arev bond provides the translation and connectivity functions necessary for this to work. The bond has several layers including a base filing system (BFS) and supporting subroutines. By addressing the bond at the appropriate layer, performance and ease of use can be balanced as needed.

Bonding Principles

Arev bonds to a SQL Server table by attaching it with the proper volume pointer. SQL Server volume pointers are defined in the SETVOLUME window. A name is defined and stored as a key in VOC. The record has the following structure:

<1>  VOLUME
<2>  MSSQL_BFS
<3>  server name: @VM: database name: @VM: user login: @VM: password
<4>  control file location

The parameters in field three are accessed in the SETVOLUME window with the F2 option key. They refer to the server name on the LAN, the database, and the login/password combination defined on SQL Server. This combination connects to the server and provides specific rights. These rights are controlled by the server administrator and may prevent you from accessing desired tables and procedures.

Once this volume pointer is defined, LISTMEDIA and ATTACH work the same way they do with linear hash files. Attaching a file will create a phantom Arev dictionary in the directory listed in the control file location mentioned above. This dictionary has field definitions in standard Arev format. Key fields are determined by querying SQL Server for a unique clustered index on the table. Symbolic and group fields can be added as needed and will work as advertised. If the table structure is changed on the server, the field definitions will need to be regenerated.

The bonded table and its dictionary are also listed in the FILES file. The file handle of the SQL table has an interesting structure. You can find it in the FILES record field 5. The names and datatypes of the columns are stored there for use in SQL statement generation.

A connection is also established for SQL Server. This is similar to an open file and uses the network named pipe feature. This connection has a process number and data segment which are stored in a labelled common variable. The latter has data required for the supporting C routines and takes about 10K of string space. Note that each workstation can have multiple connections to SQL Server which is common in multi-user environments like Windows and OS/2.

With all of these pointers established, SQL tables can be listed, edited, and programmed like any other Arev file. The MSSQL_BFS portion of the bond translates each open, read, write, delete, lock, etc. into appropriate SQL statements and transmits them over the process connection. The query results are stored in a buffer and converted to @id and @record one row at a time. Arev routines never see the SQL statements and consequently work with the data in a normal fashion.

Bonded Files

Applying this discussion to data loading means writing an R/BASIC program that reads a linear hash or ASCII source file and writes records to a bonded SQL table. The source code follows:

0001    /*
0002    Wolff Data Systems
0003    Arev copy to SQL example
0004    Bonded SQL Server file with R/BASIC WRITE record to table
0005  minutes
0006    */
0007    open 'MASTER' to f_master then
0008      open 'MASTER_SQL' to f_master_sql then
0009       select f_master
0010       done = '' ; count = 0 ; error = ''
0011       call dostime( start)
0012       loop until done
0013        count += 1
0014        print @( 0, 21): count:
0015        readnext k_master then
0016         read r_master from f_master, k_master then
0017           r_master_sql = field( r_master, @fm, 1, 7)
0018           r_master_sql< 8> = r_master< 8, 1>
0019           r_master_sql< 9> = r_master< 8, 2>
0020           r_master_sql< 10> = r_master< 8, 3>
0021           r_master_sql< 11> = field( r_master, @fm, 9, 5)
0022           r_master_sql< 16> = field( r_master, @fm, 15, 2)
0023           write r_master_sql on f_master_sql, k_master else
0024                 error< -1> = k_master
0025           end
0026         end
0027        end else done = 1
0028       repeat
0029       call dostime( finish)
0030       call msg( finish - start, '', '', '')
0031       oswrite error on 'error.sql'
0032      end
0033    end

In this example, MASTER is a linear hash file on a network drive. MASTER_SQL is a SQL Server table attached with the bond. Each record in the source file is read, normalised, and counted. The elapsed time is recorded and displayed with a call to message. Lines 20 through 22 convert a multi-valued address field to three SQL columns.

This method is easy to program for an Arev veteran but generates excessive overhead for the bond. Each write forces the bond to call a subroutine that builds a SQL insert statement. The dictionary must be read to handle date and numeric conversions. This query is transmitted over the volume connection and tested for errors. @FILE.ERROR is set if there is a problem. This routine converted 8,500 records in 64 minutes on a 386/33 system with Ethernet adapters.

EXEC_TRANS_SQL Bypasses the Bond

Since each source record is parsed and converted to a new string for writing to the target file, we can skip a step by building the INSERT statement at the same time. The bond routine will not be called so overhead is reduced. The EXEC_TRANS_SQL subroutine provides this facility. It accepts a SQL statement and transmits it to SQL Server using a volume connection for a previously attached SQL table. The code is similar with some notable exceptions:

0001     /*
0002      Wolff Data Systems
0003      Arev copy with exec_trans example
0004      Exec_Trans_SQL call to insert each record - 27 minutes
0005    */
0006    declare subroutine exec_trans_sql
0007    open 'MASTER' to f_master then
0008      * read records
0009      select f_master
0010      done = ''
0011      count = 0
0012      p_dos = 0
0013      error = ''
0014      exec_trans_sql( 'TEST_SQL', 'use test', results)
0015      call dostime( start)
0016      loop until done
0017       count += 1
0018       readnext k_master then
0019        read r_master from f_master, k_master then
0020  
0021         r_master_sql='(':k_master:',"':field(r_master,@fm,1,7)
0022         r_master_sql< 8> = r_master< 8, 1>
0023         r_master_sql< 9> = r_master< 8, 2>
0024         r_master_sql< 10> = r_master< 8, 3>
0025         r_master_sql< 11> = field( r_master, @fm, 9, 3)
0026         r_master_sql< 14> = oconv( r_master< 12>, 'D4-')
0027         r_master_sql< 15> = oconv( r_master< 13>, 'D4-')
0028         r_master_sql< 16> = r_master< 15>
0029         r_master_sql< 17> = r_master< 16>: '")'
0030         swap @fm with '","' in r_master_sql
0031         r_master_sql[ 1, 0] = 'insert into master_sql values '
0032         exec_trans_sql( 'TEST_SQL', r_master_sql, results)
0033         print @( 0, 21): count 'R#5':
0034         if @file.error then error< -1> = @file.error
0035        end
0036       end else done = 1
0037      repeat
0038      call dostime( finish)
0039      call msg( finish - start, '', '', '')
0040      oswrite error on 'error.sql'
0041    end

Line 16 is a simple call to SQL Server that ensures the current database is correct. This is similar to an Arev logto. Line 22 incorporates the key into the data values. Lines 27 and 28 supply the necessary date conversion directly without referencing the data dictionary. Line 32 adds the SQL verb and table name to the beginning of the string. Line 33 transmits the completed statement to the server. RESULTS is a variable used for query results. INSERT statements do not generate results so this variable is not tested, but @FILE.ERROR is.

Generating the SQL statement directly in the conversion routine cuts processing time in half. Applying the date conversion right in the code loop also saves time. The test took 27 minutes.

(Volume 4, Issue 1, Pages 6-8)

Making Databases Happen

Registered Address: 12A Marlborough Place, Brighton, BN1 1WN
UK +44 (0) 20 8912 1010
USA +1 215 939 3400

RevMedia

Client Server Corner - SQL Server Data Loading Part I - Bill Wolff

SQL Server Basics

Bonding Principles

Bonded Files

EXEC_TRANS_SQL Bypasses the Bond