The Tokenizer

CODE

SKIP TESTING CODE

 
timing :-  statistics(cputime, T0),
           test,
	   statistics(cputime, T1),
	   TR is T1 - T0,
	   format('It took ~f sec.~n', [TR]).

timing(X) :- statistics(cputime, T0),
           call(X),
	   statistics(cputime, T1),
	   TR is T1 - T0,
	   format('It took ~f sec.~n', [TR]).

timing_all :- timing(test), nl, nl,
              timing(test93), nl, nl,
              timing(testams), nl, nl.

timing_lex :-  statistics(cputime, T0),
               test_lex,
	       statistics(cputime, T1),
	       TR is T1 - T0,
	       format('VHDL-93 Tokenizer took ~f sec.~n', [TR]).

ttest :-
	write('Testing Individual VHDL-87 Files'),nl,
        test1([ 'tp_1', 'tp_2', 'tp_3', 'tp_4', 'tp_5' ]),
	write('Testing Composite Files (* = one design unit)'), nl,
        test1([ 'tp_6', 'tp_7' ]),
	nl, write(':):):):)All VHDL-87 Tests Done.'),nl,nl.
						    
test :-
	nl, write('Testing on  Individual VHDL-87 Files'),nl,
        test1([ 'tp_1', 'tp_2', 'tp_3', 'tp_4', 'tp_5' ]),
	write('Testing on Composite Files (* = one design unit)'), nl,
        test1([ 'tp_6', 'tp_7' ]),
	nl,write('Testing Formatter on File tp_5'),nl,
        tell('pretty.vhdl'), test(tp_5), told,
        vhdl_read('pretty.vhdl'),
	nl,write('Testing Formatter on File tp_7'),nl,
        tell('pretty.vhdl'), test(tp_7), told,
        vhdl_read('pretty.vhdl'),
	nl,write(':):):):)All VHDL-87 Tests Done.'),nl,nl.

test93 :- 
	write('Testing on  VHDL-93 Files'),nl,
        test1('tkp_0'), test1('tkp_1'), 
        test1('tkp_2'), test1('tkp_3'), 
        write('--------The quirky one ...(requires peek_char/peek_byte)'), 
        nl, test1('tkp_4'), 
	write(':):):):)All VHDL-93 Tests Done.'),nl,nl.

testams :-
	write('Testing on VHDL-AMS Validation Suite'),nl,
        test1([ 'ams1', 'ams2', 'ams3', 'ams4', 'ams5', 'ams6', 'ams7']),
	nl,write('Testing Formatter on File ams3'),nl,
        tell('pretty.vhdl'), test(ams3), told,
	write(':):):):)All VHDL-AMS Tests Done.'),nl,nl.

testall :- 
        test, test93, testams.

test1([]) :- !.                                       
% test1([]) :- !.

test1([Test|Tests]) :-
        write('VHDL test file: '),write(Test),write(' ... '),ttyflush,
        vhdl_read(Test),  !,
        write('Test passed.'),nl,
        test1(Tests).

test1([T|Ts]) :- !,
        write('Test Failed on '),write(T),nl,
        test1(Ts).

test1(F) :-
        write('VHDL test file: '), write(F), write(' ... '),ttyflush,
        vhdl_read(F),	!,
        write('Test passed.'),nl.

test1(F) :- !,  write('Test Failed on '), write(F), nl.


test(File) :-
        vhdl_read(File,Design_Units),
        write_vhdl_design_units(Design_Units,0,L,[]),
        !,  write_list(L,0).

testpp([]) :- !.
testpp([F|FL]) :- !, testpp(F), testpp(FL).

testpp(X) :- tell('pretty.vhdl'), test(X), told,
             vhdl_read('pretty.vhdl'), nl.

testppall :- testpp([tp_1,tp_2,tp_3,tp_4,tp_5,tp_6,tp_7]),
             testpp([tkp_0,tkp_1,tkp_2,tkp_3]),
             testpp([ams1,ams2,ams3,ams4,ams5,ams6,ams7]).

avoidtestpp(L) :- 
     append([tkp_0,tkp_1,tkp_2,tkp_3],[ams1,ams2,ams3,ams4,ams5,ams6,ams7],L1),
     append([tp_1,tp_2,tp_3,tp_4,tp_5,tp_6,tp_7],L1,L2), 
     member(E,L2), (\+ member(E,L)), testpp(E), fail.
avoidtestpp(_). 


test_lex :- 
	write('Testing Lex on  VHDL Files'),nl,
        test_lex('tp_1'), nl, write('one done'), nl,
        test_lex('tp_2'), nl, write('one more done'), nl,
        test_lex('tp_3'), nl, write('three done'), nl,
        test_lex('tp_4'), nl, write('four done'), nl,
        test_lex('tp_5'), nl, write('five done'), nl,
        test_lex('tp_6'), nl, write('six done'), nl,
        test_lex('tp_7'),
        nl, write('seventh -- large one -- done'), nl,
        nl, write('the quirky one ... (requires peek_char)'), nl,
        test_lex('tlex'),
        nl, write('the quirky one -- done too'), nl,
	nl, write(':) :) :) ALL TESTS DONE :):):).'),nl.

test_lex(Name) :-
        file_path(Name,File),
        exists_file(File),
        see(File),
            vhdl_get_token_line(Tokens),
            vhdl_lex_test(Tokens),
        seen.

vhdl_lex_test([]) :- !.
vhdl_lex_test(Tokens) :-
        write('TOKEN LINE: '),write(Tokens),nl,
        vhdl_get_token_line(Next_Tokens),
        vhdl_lex_test(Next_Tokens).

file_path(Name,File) :- concat_atom(['data/',Name,'.vhdl'],File).
file_path(Name,File) :- concat_atom(['test_data/',Name,'.vhdl'],File).
file_path(Name,File) :- concat_atom([Name,'.vhdl'],File).
file_path(Name,File) :- concat_atom([Name,'.vhd'],File).
file_path(Name,Name).                                  

% Replaced with peek_byte built-in for SWI-Prolog version  > 2.5
peek_char(Ch) :-   current_input(Stream),
                   stream_position(Stream,Old,Old), get0(Ch),
                   stream_position(Stream,_,Old).

2.1 Lexical Analysis

Prolog clauses are a good way to represent finite-state machines, which in turn are the best way to construct lexical analysers or tokenizers. The basic strategy used here is to read a character and use it as the first argument to a predicate which has clauses for each particular case of interest. First-argument indexing causes this to operate with the efficiency of a case statement. Since nearly all commercial Prologs have first argument indexing, we can practically consider it to be part of the language.

VHDL-87 is based on ISO seven-bit coded character set (ISO 646-1983), while VHDL-93 is based on ISO eight-bit coded character set (ISO 8859-1987). So the Prolog predicate get/1 that returns a non-layout character needs to be redefined as g_get/1 as follows. (The ISO character set processing predicates used here are also part of the Quintus Prolog library called ctypes.)


%  g_get(-Char) returns the next graphic (non-control, non-blank) 
%  character on the input. If end_of_file then return -1. 

g_get(Char) :-
   repeat, get0(Char), (is_graph(Char) ; is_endfile(Char)),!.

One of the unpleasant things about VHDL is that while keywords and (ordinary) identifiers are case-insensitive, all other strings must have their case preserved. Unfortunately, this means that not only do BEGIN and begin have to be recognized as the same token, but so do Begin, BeGin, begiN, begIN, and all other permutations. So we introduce an additional predicate c_get0/1 to normalize letters to their lower case equivalents.


% c_get0(-Char) returns the next character on the input converting 
% an upper case letter to the corresponding lower case one. 

c_get0(CharLow) :-
  get0(Char), to_lower(Char,CharLow).

2.2 Representing Characters

We will define the tokenizer from the bottom-up, starting with the test for the space characters and the format effectors.


blank(32).  /* Space */
blank(160). /* NBSP */

format_effector(9).   /* Horizontal Tab */
format_effector(X) :- non_tab_format_effector(X).  /* others */

non_tab_format_effector(10).  /* Line Feed */
non_tab_format_effector(11).  /* Vertical Tab */
non_tab_format_effector(12).  /* Form Feed */
non_tab_format_effector(13).  /* Carriage Return */

These characters are defined by the decimal value of their ISO(ASCII) codes. For all other characters, we will use the ``zero-quote'' notation, which allows us to type the characters as they normally appear. In Quintus Prolog, an integer followed immediately by a single quote directs the Prolog parser to interpret the next string as a number in that base. Thus, 2'1010 is equivalent to the decimal number ten, and 16'0f is the decimal number fifteen. As an extension to this syntax, a preceding integer of zero tells the system that the next object is a character and we want the value of its ISO(ASCII) code. This would not have worked well in the case above for the space and the format effector characters, since they are invisible in most text editors. For that reason we use the ISO(ASCII) codes 32,9, 10 etc. For the other characters, we will use the more readable form 0'a , 0'b , 0'* etc. However, if we are not dealing with a single character, but want a list of characters, we can do better than [ 0'a, 0'b, 0'c, 0'd ] by typing "abcd". There is no special string datatype in Prolog, so this quoted string of characters is just another way of representing a list of ISO(ASCII) codes.

2.3 How the Tokenizer Works

The token recognizer is a ``look-ahead'' predicate vhdl_get_token/3 that expects the current character as its first argument. This results in a direct call to the appropriate clause (based on the character value) in Prologs which use first argument indexing. In particular, the predicate vhdl_get_token (+LookAheadCh,-Token,-NextCh) is always invoked with LookAheadCh instantiated with a character that can start a token, and it always returns the token starting with LookAheadCh in Token and the character following the token in NextCh. To be precise, LookAheadCh can be instantiated with a (non-blank) graphic character except the characters in {'#','$','?','@','^','_','`','{','}','~'} that can never start a valid token and the characters in {'-',''',';'} that need special treatment as explained later (see clauses for vhdl_get_token_line/3 ).

Each token or lexical element is either a delimiter, an operator, an identifier (which may be a reserved word), an abstract literal (number), a character literal, a string literal, a bit string literal, or a comment.


/* 
% To get the source line number where error occurred ADD  

line_number_message :- 
       seeing(File), current_input(Stream),
       stream_position(Stream,'$stream_position'(A,Line,B),
                              '$stream_position'(A,Line,B)),
       put(7),put(7),
       format("Error on line ~d of file ~a~n~n", [ Line, File]).

lex_error(_) :- line_number_message.
lex_error(_,_) :- line_number_message.
lex_error(_,_,_) :- line_number_message.
*/
lex_error(illegal_char,Ch) :- !,
    format(' LEX-ERROR: Invalid Token Starter. ~c (~d) Found.',[Ch,Ch]),
    nl, fail.

lex_error(illegal_char_after_tick,Ch) :- !,
    format(' LEX-ERROR: Invalid Char after Tick. ~c (~d) Found.',[Ch,Ch]),
    nl, fail.

lex_error(unmatched_tick,Ch) :- !,
    format(' LEX-ERROR: Unmatched Tick ('') around ~c (~d) ?.',[Ch,Ch]),
    nl, fail.

lex_error(unexpected_char_lit,Ch) :- !,
    format(' LEX-ERROR: Unexpected Char Lit. ''~c'' (~d) Found.',[Ch,Ch]),
    nl, fail.

lex_error(illegal_char_in_string,Ch) :- !,
    format(' LEX-ERROR: Invalid Char in String/Id. ~c (~d) Found.',[Ch,Ch]),
    nl, fail.

lex_error(line_break,Ch) :-  !,
    format(' LEX-ERROR: Line_Break in String. ASCII ~d found. ~n',Ch),
    write(' Probably Missing Matching  Delimiter? '), nl, fail.

lex_error(illegal_base,Ch) :- !,
    format(' LEX-ERROR: Invalid Base(b/o/x). ''~c'' (~d) Found.',[Ch,Ch]), 
    nl, fail.

lex_error(illegal_digit,Ch) :- !,
    format(' LEX-ERROR: Invalid Digit. ''~c'' (~d) Found.',[Ch,Ch]),
    nl, fail.

lex_error(missing_exponent,Ch) :- !,
    format(' LEX-ERROR: No Digit in Expo. '' ~c'' (~d) Found.', [Ch,Ch]), 
    nl, fail.

lex_error(illegal_digit,Digit,Base) :- !,
    format(' LEX-ERROR: Digit ~c > Base ~d',[Digit,Base]), nl, fail.

lex_warning(start_underscores) :- !,
    write(' LEX-WARNING: underscore ''_'' at start of exponent. '), nl.

lex_warning(empty_id) :- !,
    write(' LEX-WARNING: Empty Extended Identifier. '), nl.

lex_warning(underscores, Where) :- !,
    write(' LEX-WARNING: successive underscores ''_'' in '), write(Where),
    format('~c ~n',0'.).

lex_warning(nofraction,Ch) :- !,
    format(' LEX-WARNING: no digit in frac. ''~c'' (~d) Found.',[Ch,Ch]),
    nl.

The VHDL-93 (simple and compound) operators are handled by the next set of clauses (only two of which are shown explicitly in the browser frame).


vhdl_get_token(0'<,T,NC) :- !, get0(C2), vhdl_operator(0'<,C2,T,NC).
vhdl_get_token(0'=,T,NC) :- !, get0(C2), vhdl_operator(0'=,C2,T,NC).
        .
        .
        .

vhdl_get_token(0'>,T,NC) :- !, get0(C2), vhdl_operator(0'>,C2,T,NC).
vhdl_get_token(0':,T,NC) :- !, get0(C2), vhdl_operator(0':,C2,T,NC).
vhdl_get_token(0'*,T,NC) :- !, get0(C2), vhdl_operator(0'*,C2,T,NC).
vhdl_get_token(0'/,T,NC) :- !, get0(C2), vhdl_operator(0'/,C2,T,NC).
vhdl_get_token(0'&,'&',NC) :- !, get0(NC).
vhdl_get_token(0'+,'+',NC) :- !, get0(NC).
vhdl_get_token(0'.,'.',NC) :- !, get0(NC).
vhdl_get_token(0'|,'|',NC) :- !, get0(NC).
vhdl_get_token(0'!,'|',NC) :- !, get0(NC).

Note that the minus sign '-' in VHDL-93 can begin a comment, or an (unary or binary) operator. We handle '-' separately because the comment string is stripped off by the tokenizer, while the operator contributes a token to the parser.

Leading digits indicate that we are parsing an abstract literal. The abstract literals come in two flavors --- the decimal literals (numbers in base 10) and the based literals (numbers in base 2 to 16). Note that the sign and the magnitude of a number are recognized as two separate tokens. Irrespective of whether the abstract literal represents a real number or an integer, the token returned is number(Value) . In particular, no check has been made to enforce positive exponents for integers.


%vhdl_abs_lit(+Base_or_first_digit,+Digit_or_delimiter,-Token,-Next_Ch)

vhdl_get_token(0'0,T,C) :- !, get0(C2), vhdl_get_number(C2,R,[0'0|R],T,C).
vhdl_get_token(0'1,T,C) :- !, get0(C2), vhdl_abs_lit(1,C2,T,C).
vhdl_get_token(0'2,T,C) :- !, get0(C2), vhdl_abs_lit(2,C2,T,C).
        .
        .
        .

vhdl_get_token(0'3,T,C) :- !, get0(C2), vhdl_abs_lit(3,C2,T,C).
vhdl_get_token(0'4,T,C) :- !, get0(C2), vhdl_abs_lit(4,C2,T,C).
vhdl_get_token(0'5,T,C) :- !, get0(C2), vhdl_abs_lit(5,C2,T,C).
vhdl_get_token(0'6,T,C) :- !, get0(C2), vhdl_abs_lit(6,C2,T,C).
vhdl_get_token(0'7,T,C) :- !, get0(C2), vhdl_abs_lit(7,C2,T,C).
vhdl_get_token(0'8,T,C) :- !, get0(C2), vhdl_abs_lit(8,C2,T,C).
vhdl_get_token(0'9,T,C) :- !, get0(C2), vhdl_abs_lit(9,C2,T,C).

The code for vhdl_get_number/5 , vhdl_abs_lit/4 and other predicates for recognizing abstract literals (both decimal and based literals) is not shown in detail in the browser frame because while it is complex, it is also rather uninteresting. Interested persons are encouraged to look at the source code. Note that the original VHDL-87 parser recognized only decimal literals, and did not handle based literals.

And now, we recognize the remaining punctuation characters. These are the single characters which are not considered as operators (though they could have been handled by that rule). Note that we do not need to consider ';', ''' and blank characters because of the precondition on vhdl_get_token call (satisfied by the invocation context).


vhdl_get_token(0',,',',C) :- !, g_get(C).
vhdl_get_token(0'(,'(',C) :- !, g_get(C).
vhdl_get_token(0'),')',C) :- !, g_get(C).
vhdl_get_token(0'[,'[',C) :- !, g_get(C).
vhdl_get_token(0'],']',C) :- !, g_get(C).

The string literals and the VHDL-93 extended identifiers are recognized as follows. (The original VHDL-87 parser did not recognize strings with embedded double quotes.)


vhdl_get_token(0'",string(Cs),NC) :-
        !, get0(C), read_to_next_double_quote(C,Cs,NC).

vhdl_get_token(0'\,identifier(T),NC) :-
        !, get0(C), read_to_next_backslash(C,Cs,NC),
        ( Cs = [_,_|_] -> true ; lex_warning(empty_id) ),
        name(T,[0'\|Cs]).

%Allow the reasonable string  %abc#def% (language design issue) 

vhdl_get_token(0'%,string(Cs),NC) :-
        !, get0(C), read_to_next_percent(C,Cs,NC).

Finally, the remaining clause handles all tokens which begin with an alphabetic character (including the non-ASCII alphabetic ones). With first argument indexing, the call to this predicate comes directly here if the look-ahead character is not one handled above. Note the use of case conversion predicates to normalize the identifier names.


vhdl_get_token(C,T,NC) :-
    (is_alpha(C) -> 
        ( to_lower(C,CL), 
          c_get0(C2), 
          vhdl_get_token_aux(C2,CL,T,NC) )
      ;  lex_error(illegal_char,C) ).

vhdl_get_token_aux(0'",CL,T,NC) :-
               !, get_bit_string_lit_quotes(CL,T,NC).

vhdl_get_token_aux(0'%,CL,T,NC) :-
               !, get_bit_string_lit_percent(CL,T,NC).

vhdl_get_token_aux(C2,CL,T,NC) :-
                get_id_chars(C2,Cs,NC),
                name(Token,[CL|Cs]),
                T = identifier(Token).

All identifier tokens are temporarily tagged with identifier/1. This tag has been introduced for efficient branching and correct recognition of character literals and attributes as explained later. This tag is stripped off when the token is passed to the parser.


read_to_next_double_quote(0'",Cs,C) :- 
        !, get0(NC),
        (NC == 0'" -> 
                  get0(C2), 
                  read_to_next_double_quote(C2,CCs,C), 
                  Cs = [0'"|CCs]
               ;  Cs = [], C = NC).
 
read_to_next_double_quote(Ch,[Ch|Cs],C) :- 
        is_print(Ch) -> 
                      get0(NC),                         
                      read_to_next_double_quote(NC,Cs,C)
                   ;  (non_tab_format_effector(Ch) ->
                                  lex_error(line_break,Ch)
                               ;  lex_error(illegal_char,Ch)
                      ).

read_to_next_percent(0'%,Cs,C) :-
        !, get0(NC),
        (NC == 0'% -> 
                  get0(C2), 
		  read_to_next_percent(C2,CCs,C), 
		  Cs = [0'%|CCs]
               ;  Cs = [], C = NC).

read_to_next_percent(Ch,[Ch|Cs],C) :-
        is_print(Ch) -> 
                      get0(NC),                         
                      read_to_next_percent(NC,Cs,C)
                   ;  (non_tab_format_effector(Ch) ->
                                  lex_error(line_break,Ch)
                               ;  lex_error(illegal_char,Ch)
                      ).

read_to_next_backslash(0'\,Cs,C) :-
        !, get0(NC),
        (NC == 0'\ -> 
                  get0(C2), 
		  read_to_next_backslash(C2,CCs,C), 
		  Cs = [0'\|CCs]
               ;  Cs = [0'\], C = NC).

read_to_next_backslash(Ch,[Ch|Cs],C) :-
        is_print(Ch) -> 
                      get0(NC),                         
                      read_to_next_backslash(NC,Cs,C)
                   ;  (non_tab_format_effector(Ch) ->
                                  lex_error(line_break,Ch)
                               ;  lex_error(illegal_char_in_string,Ch)
                      ).
 
% VHDL-93 does not allow successive underscores; 
% allow them for efficiency (language design issue) 

vhdl_get_number(0'_,In,All,Token,C) :-
        !,   get0(NC), 
        ((NC == '_')-> lex_warning(underscores,number) ; true),
        vhdl_get_number(NC,In,All,Token,C).

% VHDL-93 requires  one digit after period; 
% can support zero digits for efficiency (language design issue) 

vhdl_get_number(0'.,[0'.|In],All,Token,C) :-
        !,   get0(NC),
        (is_digit(NC) -> true  ; lex_error(nofraction,NC)),
         vhdl_get_fraction(NC,In,All,Token,C).

vhdl_get_number(0'e,[0'e|In],All,Token,C) :-
        !,   get0(NC),
        vhdl_get_exponent(NC,In,All,Token,C).

vhdl_get_number(0'E,[0'e|In],All,Token,C) :-
        !,   get0(NC),
        vhdl_get_exponent(NC,In,All,Token,C).

vhdl_get_number(D,[D|In],All,Token,C) :-
        is_digit(D),
        !,   get0(NC),
        vhdl_get_number(NC,In,All,Token,C).

vhdl_get_number(C,[],All,number(Num),C) :- name(Num,All).

vhdl_get_fraction(0'_,In,All,Token,C) :-
        !,    get0(NC),
        ((NC == 0'_)-> lex_warning(underscores,fraction); true),
        vhdl_get_fraction(NC,In,All,Token,C).

vhdl_get_fraction(0'e,[0'e|In],All,Token,C) :-
        !,    get0(NC),
        vhdl_get_exponent(NC,In,All,Token,C).

vhdl_get_fraction(0'E,[0'e|In],All,Token,C) :-
        !,    get0(NC),
        vhdl_get_exponent(NC,In,All,Token,C).

vhdl_get_fraction(D,[D|In],All,Token,C) :-
        is_digit(D),
        !,    get0(NC),
        vhdl_get_fraction(NC,In,All,Token,C).

vhdl_get_fraction(C,[],All,number(Num),C) :- 
        name(Num,All).

vhdl_get_exponent(0'_,In,All,Token,C) :-
        !, lex_warning(start_underscore),  get0(NC),
        vhdl_get_exponent(NC,In,All,Token,C).

vhdl_get_exponent(0'-,[0'-|In],All,Token,C) :-
        !,    get0(NC),
        vhdl_get_rest_exponent(NC,In,All,Token,C).

vhdl_get_exponent(0'+,[0'+|In],All,Token,C) :-
        !,    get0(NC),
        vhdl_get_rest_exponent(NC,In,All,Token,C).

vhdl_get_exponent(D,[D|In],All,Token,C) :-
        is_digit(D),
        !,    get0(NC),
        vhdl_get_rest_exponent(NC,In,All,Token,C).

vhdl_get_rest_exponent(0'_,In,All,Token,C) :-
        !,    get0(NC),
        ((NC == '_') -> lex_warning(underscores,exponent); true),
        vhdl_get_rest_exponent(NC,In,All,Token,C).

vhdl_get_rest_exponent(D,[D|In],All,Token,C) :-
        is_digit(D),
        !,    get0(NC),
        vhdl_get_rest_exponent(NC,In,All,Token,C).

vhdl_get_rest_exponent(C,[],All,number(Num),C) :- 
        name(Num,All).

% vhdl_get_based_lit(+Base,+Digit,+Value_In,-Token,-Next_Ch)

% Both '\#' and '%' can delimit based literal mantissa ->
% Accept all possible pairings -> "#...:", "#...#", ":...#",":...:" ->
% to avoid needless complication (language design issue) ->

vhdl_abs_lit(2,C2,T,C) :- 
        !, ( ((C2 == 0'#) ; (C2 == 0':))  ->
                      ( get0(C3), vhdl_get_based_lit(2,C3,0,T,C) )
                      ; vhdl_get_number(C2,R,[0'2|R],T,C) ).

vhdl_abs_lit(3,C2,T,C) :- 
        !, ( ((C2 == 0'#) ; (C2 == 0':))  ->
                      ( get0(C3), vhdl_get_based_lit(3,C3,0,T,C) )
                      ; vhdl_get_number(C2,R,[0'3|R],T,C) ).

vhdl_abs_lit(4,C2,T,C) :- 
        !, ( ((C2 == 0'#) ; (C2 == 0':))  ->
                      ( get0(C3), vhdl_get_based_lit(4,C3,0,T,C) )
                      ; vhdl_get_number(C2,R,[0'4|R],T,C) ).

vhdl_abs_lit(5,C2,T,C) :- 
        !, ( ((C2 == 0'#) ; (C2 == 0':))  ->
                      ( get0(C3), vhdl_get_based_lit(5,C3,0,T,C) )
                      ; vhdl_get_number(C2,R,[0'5|R],T,C) ).

vhdl_abs_lit(6,C2,T,C) :- 
        !, ( ((C2 == 0'#) ; (C2 == 0':))  ->
                      ( get0(C3), vhdl_get_based_lit(6,C3,0,T,C) )
                      ; vhdl_get_number(C2,R,[0'6|R],T,C) ).

vhdl_abs_lit(7,C2,T,C) :- 
        !, ( ((C2 == 0'#) ; (C2 == 0':))  ->
                      ( get0(C3), vhdl_get_based_lit(7,C3,0,T,C) )
                      ; vhdl_get_number(C2,R,[0'7|R],T,C) ).

vhdl_abs_lit(8,C2,T,C) :- 
        !, ( ((C2 == 0'#) ; (C2 == 0':))  ->
                      ( get0(C3), vhdl_get_based_lit(8,C3,0,T,C) )
                      ; vhdl_get_number(C2,R,[0'8|R],T,C) ).

vhdl_abs_lit(9,C2,T,C) :- 
        !, ( ((C2 == 0'#) ; (C2 == 0':))  ->
                      ( get0(C3), vhdl_get_based_lit(9,C3,0,T,C) )
                      ; vhdl_get_number(C2,R,[0'9|R],T,C) ).

vhdl_abs_lit(1,0'0,T,C)  :-  
        !, get0(C3), (( (C3 == 0'#) ; (C3 == 0':) ) ->
                         get0(C4), vhdl_get_based_lit(10,C4,0,T,C)
                       ; vhdl_get_number(C3,R,[0'1,0'0|R],T,C) ).

vhdl_abs_lit(1,0'1,T,C)  :-  
        !, get0(C3), (( (C3 == 0'#) ; (C3 == 0':) ) -> 
                         get0(C4), vhdl_get_based_lit(11,C4,0,T,C)
                       ; vhdl_get_number(C3,R,[0'1,0'1|R],T,C) ).

vhdl_abs_lit(1,0'2,T,C)  :-  
        !, get0(C3), (( (C3 == 0'#) ; (C3 == 0':) ) -> 
                         get0(C4), vhdl_get_based_lit(12,C4,0,T,C)
                       ; vhdl_get_number(C3,R,[0'1,0'2|R],T,C) ).

vhdl_abs_lit(1,0'3,T,C)  :-  
        !, get0(C3), (( (C3 == 0'#) ; (C3 == 0':) ) -> 
                         get0(C4), vhdl_get_based_lit(13,C4,0,T,C)
                       ; vhdl_get_number(C3,R,[0'1,0'3|R],T,C) ).

vhdl_abs_lit(1,0'4,T,C)  :-  
        !, get0(C3), (( (C3 == 0'#) ; (C3 == 0':) ) -> 
                         get0(C4), vhdl_get_based_lit(14,C4,0,T,C)
                       ; vhdl_get_number(C3,R,[0'1,0'4|R],T,C) ).

vhdl_abs_lit(1,0'5,T,C)  :-  
        !, get0(C3), (( (C3 == 0'#) ; (C3 == 0':) ) -> 
                         get0(C4), vhdl_get_based_lit(15,C4,0,T,C)
                       ; vhdl_get_number(C3,R,[0'1,0'5|R],T,C) ).

vhdl_abs_lit(1,0'6,T,C)  :-  
        !, get0(C3), (( (C3 == 0'#) ; (C3 == 0':) ) -> 
                         get0(C4), vhdl_get_based_lit(16,C4,0,T,C)
                       ; vhdl_get_number(C3,R,[0'1,0'6|R],T,C) ).

vhdl_abs_lit(1,C2,T,C)  :-  
        !, vhdl_get_number(C2,R,[0'1|R],T,C).


vhdl_get_based_lit(Base,0'_,Vin,T,C) :- 
     !, get0(NC), 
     ((NC == '_')-> lex_warning(underscores,based_lit); true),
     vhdl_get_based_lit(Base,NC,Vin,T,C).

vhdl_get_based_lit(Base,0'#,Vin,T,C) :- 
      !, get0(NC), 
      ( ((NC == 0'e) ; (NC == 0'E))  -> 
      get0(NNC),vhdl_get_based_lit_exp(Base,NNC,Vin,T,C)
      ; T = number(Vin), C = NC ).

vhdl_get_based_lit(Base,0':,Vin,T,C) :- 
      !, get0(NC), 
      ( ((NC == 0'e) ; (NC == 0'E))  -> 
      get0(NNC),vhdl_get_based_lit_exp(Base,NNC,Vin,T,C)
      ; T = number(Vin), C = NC ).

vhdl_get_based_lit(Base,0'.,Vin,T,C) :- 
     !, get0(NC),vhdl_get_based_lit_frac(Base,NC,Base,Vin,0,T,C).

vhdl_get_based_lit(Base,D,Vin,T,C) :- 
     char_value(D,DVal), 
     ( DVal >= Base -> 
                      lex_error(illegal_digit,D,Base) 
                   ;  V is Vin*Base + DVal, get0(NC),
                      vhdl_get_based_lit(Base,NC,V,T,C) ).

vhdl_get_based_lit_exp(Base,0'+,Vin_Man,number(Atom),C) :-
     !, get0(NC), 
     ( is_digit(NC) ->
           ( vhdl_get_based_lit_rest_exp(Base,NC,0,Vout_Exp,C),
             Atom is Vin_Man * (Base ^ Vout_Exp) )
        ; lex_error(missing_exponent,NC) ).

vhdl_get_based_lit_exp(Base,0'-,Vin_Man,number(Atom),C) :-
     !, get0(NC), 
     ( is_digit(NC) -> 
           ( vhdl_get_based_lit_rest_exp(Base,NC,0,Vout_Exp,C),
             NVout_Exp is - Vout_Exp,
             Atom is Vin_Man * (Base ^ NVout_Exp) )
        ; lex_error(missing_exponent,NC) ).

vhdl_get_based_lit_exp(Base,D,Vin_Man,number(Atom),C) :-
     is_digit(D) -> 
           ( vhdl_get_based_lit_rest_exp(Base,D,0,Vout_Exp,C),
             Atom is Vin_Man * (Base ^ Vout_Exp) )
        ; lex_error(missing_exponent,D).

vhdl_get_based_lit_rest_exp(Base,0'_,Vin_Exp,Vout_Exp,C) :- 
     !, get0(NC),
     ((NC == '_')-> lex-warning(underscores,based_exp) ; true),
     vhdl_get_based_lit_rest_exp(Base,NC,Vin_Exp,Vout_Exp,C).

vhdl_get_based_lit_rest_exp(Base,D,Vin_Exp,Vout_Exp,C) :- 
    is_digit(D), !, V_Exp is Vin_Exp*10 + (D - 0'0),
    get0(NC), vhdl_get_based_lit_rest_exp(Base,NC,V_Exp,Vout_Exp,C).

vhdl_get_based_lit_rest_exp(_,C,Vin_Exp,Vin_Exp,C).

vhdl_get_based_lit_frac(Base,0'_,Weight,Vin_int,Vin_frac,T,C) :- 
     !, get0(NC),
     ((NC == '_')-> lex-warning(underscores,based_exp) ; true),
     vhdl_get_based_lit_frac(Base,NC,Weight,Vin_int,Vin_frac,T,C).

vhdl_get_based_lit_frac(Base,0'#,_,Vin_int,Vin_frac,T,C) :- 
     !, get0(NC), 
      ( ((NC == 0'e) ; (NC == 0'E))  -> 
      Vin_num is Vin_int + Vin_frac,
      get0(NNC),vhdl_get_based_lit_exp(Base,NNC,Vin_num,T,C)
      ; T = number(Vin_int), C = NC ).

vhdl_get_based_lit_frac(Base,0':,_,Vin_int,Vin_frac,T,C) :- 
     !, get0(NC), 
      ( ((NC == 0'e) ; (NC == 0'E))  -> 
      Vin_num is Vin_int + Vin_frac,
      get0(NNC),vhdl_get_based_lit_exp(Base,NNC,Vin_num,T,C)
      ; T = number(Vin_int), C = NC ).

vhdl_get_based_lit_frac(Base,D,Weight,Vin_int,Vin_frac,T,C) :- 
     char_value(D,DVal), 
     ( DVal >= Base ->  lex_error(illegal_digit,D,Base)
            ;  V is Vin_frac + DVal/ Weight, get0(NNC),
               New_Wt is Weight / Base,
               vhdl_get_based_lit_frac(Base,NNC,New_Wt,Vin_int,V,T,C)).

char_value(0'0,0) :- !.   char_value(0'1,1) :- !.   
char_value(0'2,2) :- !.   char_value(0'3,3) :- !.   
char_value(0'4,4) :- !.   char_value(0'5,5) :- !.
char_value(0'6,6) :- !.   char_value(0'7,7) :- !.   
char_value(0'8,8) :- !.   char_value(0'9,9) :- !.   
char_value(0'A,10) :- !.  char_value(0'B,11) :- !.
char_value(0'C,12) :- !.  char_value(0'D,13) :- !.  
char_value(0'E,14) :- !.  char_value(0'F,15) :- !.  
char_value(0'a,10) :- !.  char_value(0'b,11) :- !.
char_value(0'c,12) :- !.  char_value(0'd,13) :- !.  
char_value(0'e,14) :- !.  char_value(0'f,15) :- !.
char_value(D,_) :- lex_error(illegal_digit,D).

Now, we define the get_id_chars/3 predicate to read an identifier after the first character has been read. Note that both reserved words and ordinary identifiers are treated similarly.


% is_csym/1 is true of Ch if Ch is a letter, digit or underscore.
% successive underscores permitted (language design issue) 

get_id_chars(C,Cs,NC) :- 
      is_csym(C) -> 
           ( c_get0(C2), get_id_chars(C2,CCs,NC), Cs = [C|CCs] )
        ;  Cs = [], NC = C.

The bit string literals are recognized as follows.


get_bit_string_lit_quotes(Base,bit_string(Val),NC) :-
           get0(C), read_to_next_double_quote(C,BinStr,NC),
           vhdl_scan_constant(BinStr,Base,Val).

get_bit_string_lit_percent(Base,bit_string(Val),NC) :-
           get0(C), read_to_next_percent(C,BinStr,NC),
           vhdl_scan_constant(BinStr,Base,Val).

vhdl_scan_constant([],_,[]).
vhdl_scan_constant([C|Cs],Base,RV) :-
            char_value(Base,C,StrV),
            vhdl_scan_constant(Cs,Base,NV),
            append(StrV,NV,RV).

char_value(0'b,C,StrV) :- binary_value(C,StrV).
char_value(0'o,C,StrV) :- octal_value(C,StrV).
char_value(0'x,C,StrV) :- hex_value(C,StrV).

binary_value(0'0,"0") :- !.   binary_value(0'1,"1") :- !.   
binary_value(C,_) :-     lex_error(illegal_digit,C,2).

octal_value(0'0,"000") :- !.   octal_value(0'1,"001") :- !.   
octal_value(0'2,"010") :- !.   octal_value(0'3,"011") :- !.   
octal_value(0'4,"100") :- !.   octal_value(0'5,"101") :- !.
octal_value(0'6,"110") :- !.   octal_value(0'7,"111") :- !.   
octal_value(C,_) :-     lex_error(illegal_digit,C,8).

hex_value(0'0,"0000") :- !.   hex_value(0'1,"0001") :- !.   
hex_value(0'2,"0010") :- !.   hex_value(0'3,"0011") :- !.   
hex_value(0'4,"0100") :- !.   hex_value(0'5,"0101") :- !.
hex_value(0'6,"0110") :- !.   hex_value(0'7,"0111") :- !.   
hex_value(0'8,"1000") :- !.   hex_value(0'9,"1001") :- !.   
hex_value(0'A,"1010") :- !.  hex_value(0'B,"1011") :- !.
hex_value(0'C,"1100") :- !.  hex_value(0'D,"1101") :- !.  
hex_value(0'E,"1110") :- !.  hex_value(0'F,"1111") :- !.  
hex_value(0'a,"1010") :- !.  hex_value(0'b,"1011") :- !.
hex_value(0'c,"1100") :- !.  hex_value(0'd,"1101") :- !.  
hex_value(0'e,"1110") :- !.  hex_value(0'f,"1111") :- !.  
hex_value(C,_) :-     lex_error(illegal_digit,C,16).

The predicate called when the comment characters (--) are recognized consumes the rest of the current line. If this parser is used as part of a translation system, this must be changed to preserve comments so that they can be reproduced in the output description.


vhdl_consume(C) :-  non_tab_format_effector(C), !. % end-of-line
vhdl_consume(_) :-  get0(C), vhdl_consume(C).

error(MSG) :- write('error: '), write(MSG), write(statement),nl,fail.

The following list of reserved words of VHDL 1076.1 is used to generate a set of facts representing tables vhdl_reserved/3 , vhdl_keyword/1 and vhdl_token/1 (see appendix).


'Reserved Words'(vhdl,
       [abs, access, across, after, alias, all, and,
        architecture, array, assert, attribute,
        begin, block, body,
        break, 
        buffer, bus,
        case, component, configuration, constant,
        disconnect, downto,
        else, elsif, end, entity, exit,
        file, for, function,
        generate, generic, group, guarded,
        if, impure, in, inertial, inout, is,
        label, library, linkage, literal, loop,
        map, mod,
        nand, 
        nature, 
        new, next, noise, nor, not, null,
        of, on, open, or, others, out,
        package, port, postponed, 
        procedural, 
        procedure, process, pure,
        quantity,
        range, record, register, rem, report, 
        return, rol, ror,
        select, severity, signal,
        shared, sla, sll, 
        spectrum, 
        sra, srl, 
        subnature, 
        subtype,
        terminal, 
        then, 
        through, 
        to, 
        tolerance, 
        transport, type,
        unaffected, units, until, use,
        variable,
        wait, when, while, with,
        xnor, xor,
        '+', '-', '**', '*', 
        '&', '|', '!',
        '/', '/=', ':', ':=', '=',
        '<=', '<', '>=', '=>', '>','<>',
        '(',  ')', ';', ',','[',']',
        token(number), token(string), token(bit_string), token(char)]).

This includes four special token terms which are created by the tokenizer and must be treated as language tokens by the grammar.

Finally, here are the rules for single/double character operators recognized by the tokenizer.


vhdl_operator(0'*,0'*,'**',NC) :- !, get0(NC).
vhdl_operator(0'*,  C, '*', C) :- !.
vhdl_operator(0'/,0'=,'/=',NC) :- !, get0(NC).
vhdl_operator(0'/,  C, '/', C) :- !.
vhdl_operator(0':,0'=,':=',NC) :- !, get0(NC).
vhdl_operator(0':,  C, ':', C) :- !.
vhdl_operator(0'<,0'=,'<=',NC) :- !, get0(NC).
vhdl_operator(0'<,0'>,'<>',NC) :- !, get0(NC).
vhdl_operator(0'<,  C,  '<',C) :- !.
vhdl_operator(0'>,0'=,'>=',NC) :- !, get0(NC).
vhdl_operator(0'>,  C,  '>',C) :- !.
vhdl_operator(0'=,0'>,'=>',NC) :- !, get0(NC).
vhdl_operator(0'=,0'<,'=<',NC) :- !, get0(NC).
vhdl_operator(0'=,0'=,'==',NC) :- !, get0(NC).
vhdl_operator(0'=,  C, '=', C).
% clauses for +, -, &, and | not needed

2.4 Reading Lines of Tokens

Finally, we come a full circle to the predicate mentioned in the first section, the token line reader. The vhdl_get_token_line/1 predicate will read tokens up to a semi-colon, or end-of-file. This predicate calls vhdl_get_token_line/3 with a blank look-ahead character. The mode argument required for appropriate handling of ticks is explained later.


vhdl_get_token_line(Line) :-
        vhdl_get_token_line(0' ,charlit_mode,Line).
The vhdl_get_token_line/3 predicate does the actual work by indexing on the look-ahead character when it is either a semicolon, a blank, a dash/minus, a tick, the end-of-file indicator, or a normal character. Basically, vhdl_get_token_line/3 reads over all the characters on the input that do not contribute to a token (such as blanks, comments etc), and then invokes vhdl_get_token/3 to read a token. The dash/minus and the tick need special treatment. The characters {'#','\$','.','?','@','^','_','`','{','}','~'} can never start a valid VHDL-93 token, and so, cause error.

vhdl_get_token_line(0'#,_,[eRrOr]) :- !, lex_error(illegal_char,0'#).
vhdl_get_token_line(0'$,_,[eRrOr]) :- !, lex_error(illegal_char,0'$).
vhdl_get_token_line(0'?,_,[eRrOr]) :- !, lex_error(illegal_char,0'?).
vhdl_get_token_line(0'@,_,[eRrOr]) :- !, lex_error(illegal_char,0'@).
vhdl_get_token_line(0'^,_,[eRrOr]) :- !, lex_error(illegal_char,0'^).
vhdl_get_token_line(0'_,_,[eRrOr]) :- !, lex_error(illegal_char,0'_).
vhdl_get_token_line(0'`,_,[eRrOr]) :- !, lex_error(illegal_char,0'`).
vhdl_get_token_line(0'{,_,[eRrOr]) :- !, lex_error(illegal_char,0'{).
vhdl_get_token_line(0'},_,[eRrOr]) :- !, lex_error(illegal_char,0'}).
vhdl_get_token_line(0'~,_,[eRrOr]) :- !, lex_error(illegal_char,0'~).

vhdl_get_token_line(-1,_,[]) :- !.  % end-of-file
vhdl_get_token_line(0';,_,[]) :- !.  % end of VHDL statement

vhdl_get_token_line(0'-,Mode,Tokens) :-
        !, get0(NC),
        ( (NC == 0'-)
            -> (get0(Ch), vhdl_consume(Ch), get0(Next),
                vhdl_get_token_line(Next,Mode,Tokens) )
            ;  (vhdl_get_token_line(NC,charlit_mode,Ts), 
                Tokens = ['-'|Ts] )
        ).

This clause handles comments correctly. The original VHDL-87 parser would give an error if the program file ended with a comment because it did not treat this case separately.

The handling of ticks in Ada-based languages is relatively complex. This is because ticks can occur in several different contexts that can ``interact'' in complicated ways as explained below. A single tick can be used as a separator in a qualified expression (such as typename'(value) ), and in attribute names (such as prefix'attribute ). A pair of ticks can be used as delimiters in character literals. Given these facts, the lexical analysis of constructions such as character'('a') , A'B'C , T'range of range'a'to'b' , and "or"'A'B (borrowed from internet postings of Joerg Lohse and Jacques Rouillard) becomes rather involved. However, the following disambiguation rule (adapted from Gary Beihl's posting on the internet) can be used to guide the tokenizer:

A tick begins a character literal if and only if the previous token is not a non-keyword identifier or attribute name or a close paren or a string literal and there is a matching tick to terminate the character literal exactly two characters ahead in the input stream. In other cases, the tick can either flag a potential attribute designator or is a token by itself.

To implement this rule, we need to know the previous token type and have the ability to peek one character ahead (in addition to the one lookahead we already have in the tokenizer). The information about the previous token is captured by the mode argument. In particular, vhdl_get_token_line/3 ``operates'' in one of two modes --- charlit_mode or attr_mode --- depending on whether or not it should expect a character literal after a tick. In summary, with the additional ``peeking'' capability, it is possible to make the tokenizer (lexical analyzer) an independent module of the parser.

The browser frame does not show the code for handling tricky cases involving ticks, but the interested reader can peek at the HTML source of Prolog code for details.


% Handles quirky VHDL-93 examples involving ticks. uses peek_byte/1.
% in charlit_mode, it recognizes a character literal token.
% in attr_mode, it recognizes attribute and tick tokens.

vhdl_get_token_line(0'',charlit_mode,Tokens) :-
           !, get0(C1), 
           (is_print(C1) -> 
                    peek_byte(C2),
                    (C2 == 0'' -> 
                          ( get0(C2), get0(NC), 
                            vhdl_get_token_line(NC,charlit_mode,Ts), 
                            Tokens  = [char(C1)|Ts] )
                       ;  
                          lex_error(unmatched_tick,C1) )
                ;
                    lex_error(illegal_char_after_tick,C1) ).

vhdl_get_token_line(0'',attr_mode,Tokens) :-
                  !, get0(C), 
                  vhdl_get_token_line_after_tick(C,Tokens).

vhdl_get_token_line(Blank,Mode,Tokens) :-
        is_space(Blank),
        !,  g_get(C),
        vhdl_get_token_line(C,Mode,Tokens).

vhdl_get_token_line(C,_,[Token|Ts]) :-
        vhdl_get_token(C,T,Next), 
        strip_id_tag(T,Token),
        (special_tick_token(T) -> 
                        vhdl_get_token_line(Next,attr_mode,Ts)
                    ;   vhdl_get_token_line(Next,charlit_mode,Ts) ).

strip_id_tag(identifier(Name),Name) :- !.
strip_id_tag(T,T).

special_tick_token(identifier(Name)) :- \+ vhdl_keyword(Name).
special_tick_token(')').
special_tick_token(string(_)).
special_tick_token(attr(_)).

vhdl_get_token_line_after_tick(0'(,Tokens) :-
             !, (get0(C2),
                   (C2 == 0'' -> 
                        (get0(C3), peek_byte(C4),
                           (C4 == 0'' -> 
                                get0(C4), get0(NC),
                                vhdl_get_token_line(NC,charlit_mode,Ts),
                                Tokens = ['''','(',char(C3)|Ts]
                              ;  
                                % vhdl_get_token_line(C3,charlit_mode,Ts),
                                % Tokens = [char(0'()|Ts]
                                lex_error(unexpected_char_lit,0'()
                           )
                        )
                     ; 
                         vhdl_get_token_line(C2,charlit_mode,Ts), 
                         Tokens = ['''','('|Ts]
                    )
                ).

vhdl_get_token_line_after_tick(0'-,Tokens) :-
        !, get0(NC),
        ( (NC == 0'-)
            -> ( get0(Ch), vhdl_consume(Ch), get0(Next),
	         vhdl_get_token_line_after_tick(Next,Tokens)
               )
            ;  (NC = 0''  ->
	               %  vhdl_get_token_line(NC,charlit_mode,Ts), 
                       %  Tokens = [char('_')|Ts]
                       lex_error(unexpected_char_lit,0'-)
                    ;  lex_error(illegal_char_after_tick,NC)
               )
        ).

vhdl_get_token_line_after_tick(C1,Tokens) :-
        (is_alpha(C1) ->
               to_lower(C1,CL), 
               get_id_chars(CL,Cs,NC), name(Token,Cs),
               vhdl_get_token_line(NC,attr_mode,Ts),
               Tokens = [attr(Token)|Ts]
             ;
              (is_space(C1) -> 
                       g_get(C),
                       vhdl_get_token_line_after_tick(C,Tokens)
                    ;  
                       (peek_byte(C2),
                          (C2 == 0'' ->
                               % (is_print(C1) -> 
                               % get0(C2), get0(C3),
                               % vhdl_get_token_line(C3,charlit_mode,Ts),
                               % Tokens = [char(C1)|Ts]
                               % ;
                               % lex_error(illegal_char_after_tick,C1)
                               % )
                               lex_error(unexpected_char_lit,C1)
                            ;
                               vhdl_get_token_line(C1,charlit_mode,Ts),
	                       Token = [''''|Ts]
                          )
                       )
             )
      ).

2.5 Summary of Changes

We now highlight the revisions made to the original VHDL-87 tokenizer written in Quintus Prolog to conform to VHDL 1076.1 standard.

2.6 Lexical Overhead

We now discuss certain elements of the lexical structure of VHDL-93 that seem to complicate the implementation without really contributing much to programmer convenience.

Identifiers and Abstract literals

In the syntax of identifiers and abstract literals (numbers), successive underscores are not permitted. The implementation can be simplified by permitting sequence of underscores without sacrificing compatibility with VHDL-87. Similarly, not allowing an underscore to start the fractional part or the exponent part of a number seems unnecessary. If the fractional part is zero, a zero need not be required to follow the period. Banning an empty extended identifier does not seem justified.

Replacement Characters

In the syntax of based literals, the two matching double quotes delimiters can be replaced by two matching percent characters. The implementation can be simplified by allowing double quotes to be matched with percent. However, this simplification will not work for strings.

In the syntax of strings, the VHDL-93 does not treat the double quotes delimiter and the percent delimiter similarly. In particular, the ``reasonable'' string %...#...% is illegal.

The Tick Problem

The use of tick as a separator in attribute names, as a token in qualified expressions, and as a delimiter in character literals, permits us to create VHDL-93 constructs that are ``difficult'' to tokenize. Replacing the tick with another character in attribute names and qualified expressions could have simplified matters. However, the backward compatibility will pose a problem then.

Porting to SWI-Prolog

The original parser required the following Quintus Prolog libraries: library(basics) , library(files) , library(strings) , library(math) , and library(ctypes) . In particular, it imported append/3 and member/2 from library(basics), file_exists/1 from library(files), pow/3 from library(math), concat_atom/2 from library(strings) , and a host of other character classification predicates from library(ctypes) . Furthermore, we require the built-in peek_char/1 to tokenize the tick character in the VHDL input correctly.

SWI-Prolog supports append/3 , member/2 and concat_atom/2 predicates as built-ins. file_exists/1 predicate of Quintus Prolog is the same as the SWI-Prolog built-in exists_file/1 , while the pow/3 and atom_chars/1 predicates can be obtained using the built-in exponentiation operator ``^'' and name/1 respectively. SWI-Prolog also supports library(ctypes). Furthermore, the user need not know the precise ``location'' of these predicates because SWI-Prolog supports automatic loading of library predicates. There are a few Quintus Prolog predicates such as statistics/2, format/2, stream_position/2 etc that can be made to work with minor modification. The Quintus Prolog built-in peek-char/1 is missing but can be defined in SWI-Prolog as follows:


peek_char(Ch) :-   current_input(Stream),
                   stream_position(Stream,Old,Old), get0(Ch),
                   stream_position(Stream,_,Old).
However, SWI_Prolog version 2.5 and above provides peek-byte/1 .