• Welcome to PowerBasic Museum 2020-A.
 

News:

Forum in repository mode. No new members allowed.

Main Menu

Features of the x86 Processors

Started by Donald Darden, May 21, 2007, 08:44:45 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Donald Darden

Right now I am thinking more of "limitations of the x86 Processors", but this could
be a good place to discuss specific features as well.

In working on an algorythm for an enhanced INSTR() function, one that would
support case insensitive matching, I found it very hard to achieve a real efficiency in testing because it always requires several steps, and each step takes up a finite amount of time.  The results are strongly influenced by the data to which it is being applied.

This is because the basic architecture of the x86 family does not have any
instructions optimized for case insensitive operations, though it does have some
instructions specifically designed to work with strings.  If I wrote a sub to assist
in the testing, it would likely have a low limit (say the value of "A") in the lower
byte of the AX register, and a high limit (the value of "Z") in the upper byte of
the AX register.  What I would want to know is if some unknown byte resides
within that range, or outside that range, which would make it a capital letter if
true.  If true, I could then force it to lower case with an OR 32 instruction
before attempting to match it to a character set that was alreacy set to lower
case.

Now the problem with case insensitive matching, and with string operations in
general, is that they presume the standard A to Z alphabet and a limited range
of additional symbols, and does not support Unicode.  Fact is, Unicode is still
evolving, so who knows where it will eventually lead.

I don't see any way to address the unknown as to the future of Unicode, but
range fixing within a character set would be a very useful tool, if it were a part
of the instruction repertoire of the x86.  To emulate this functionality in terms
of individual steps would be laborous and time consuming.   

José Roca

 
A solution for a powerful INSTR, with unicode support, is the use of the Microsoft VBScript regular expressions engine, available to PB through COM.

It allows to set the search pattern, the scope and an ignore case flag, and returns a collection of matches.

Each Match object represents an occurrence of the pattern, and exposes properties such as Value (the occurrence found), Length (number of characters in the occurrence), and FirstIndex (the position of the occurrence in the source string).

Donald Darden

That's likely welcomed news, José.  Perhaps you have a working example of how to
do this. so that people can benefit from your knowledge and expereience?

José Roca

 
Here is  quick and dirty one. Better performance will be obtained using direct interface calls instead of Automation and a collection's enumerator instead of the Item property. But it illustrates the idea:


' SED_PBWIN
#COMPILE EXE
#DIM ALL
#INCLUDE "WIN32API.INC"

$PROGID_VBScriptRegExp = "VBScript.RegExp"

INTERFACE DISPATCH VBScriptRegExp
   MEMBER GET  Pattern<&H00002711>() AS STRING
   MEMBER LET  Pattern<&H00002711>()   ' Parameter Type AS STRING
   MEMBER GET  IgnoreCase<&H00002712>() AS INTEGER
   MEMBER LET  IgnoreCase<&H00002712>()   ' Parameter Type AS INTEGER
   MEMBER GET  Global<&H00002713>() AS INTEGER
   MEMBER LET  Global<&H00002713>()   ' Parameter Type AS INTEGER
   MEMBER GET  Multiline<&H00002717>() AS INTEGER
   MEMBER LET  Multiline<&H00002717>()   ' Parameter Type AS INTEGER
   MEMBER CALL Execute<&H00002714>(IN sourceString AS STRING<&H00000000>) AS VARIANT
   MEMBER CALL Test<&H00002715>(IN sourceString AS STRING<&H00000000>) AS INTEGER
   MEMBER CALL Replace<&H00002716>(IN sourceString AS STRING<&H00000000>, _
               IN replaceVar AS VARIANT<&H00000001>) AS STRING
END INTERFACE

INTERFACE DISPATCH VBScriptMatch
   MEMBER GET  Value<&H00000000>() AS STRING
   MEMBER GET  FirstIndex<&H00002711>() AS LONG
   MEMBER GET  Length<&H00002712>() AS LONG
   MEMBER GET  SubMatches<&H00002713>() AS VARIANT
END INTERFACE

INTERFACE DISPATCH VBScriptMatchCollection
   MEMBER GET  Item<&H00000000>(IN index AS LONG<&H00000000>) AS VARIANT
   MEMBER GET  Count<&H00000001>() AS LONG
END INTERFACE

INTERFACE DISPATCH VBScriptSubMatches
   MEMBER GET  Item<&H00000000>(IN index AS LONG<&H00000000>) AS VARIANT
   MEMBER GET  Count<&H00000001>() AS LONG
END INTERFACE

FUNCTION VBInstr (vText AS VARIANT, vPattern AS VARIANT) AS STRING

   LOCAL vRes AS VARIANT
   LOCAL i AS LONG
   LOCAL nCount AS LONG
   LOCAL vItem AS VARIANT
   LOCAL vIdx AS VARIANT
   LOCAL vTRUE AS VARIANT
   LOCAL vFALSE AS VARIANT
   LOCAL strOutput AS STRING

   LOCAL oMatch AS VBScriptMatch
   LOCAL oRegEx AS VBScriptRegExp
   LOCAL oMatches AS VBScriptMatchCollection

   vTRUE= -1
   vFALSE = 0

   oRegEx = NEW VBScriptRegExp IN "VBScript.RegExp"

   OBJECT LET oRegEx.Pattern = vPattern
   OBJECT LET oRegEx.Global = vTRUE
   OBJECT LET oRegEx.IgnoreCase = vTRUE
   OBJECT LET oRegEx.MultiLine = vTRUE

   OBJECT CALL oRegEx.Execute(vText) TO vRes
   oMatches = vRes
   vRes = EMPTY

   OBJECT GET oMatches.Count TO vRes
   nCount = VARIANT#(vRes)

   FOR i = 0 TO nCount - 1
      vIdx = i AS LONG
      OBJECT GET oMatches.Item(vIdx) TO vItem
      IF VARIANT#(vItem) <> %NULL THEN
         oMatch = vItem
         vItem = EMPTY
         OBJECT GET oMatch.Value TO vRes
         strOutput = strOutput & "Found " & VARIANT$(vRes)
         OBJECT GET oMatch.FirstIndex TO vRes
         strOutput = strOutput & " at index " & FORMAT$(VARIANT#(vRes)) & $CRLF
         oMatch = NOTHING
      END IF
   NEXT

   oRegEx = NOTHING
   oMatches = NOTHING
   oMatch = NOTHING

   FUNCTION = strOutput

END FUNCTION

FUNCTION PBMAIN

   LOCAL vText AS VARIANT
   LOCAL vPattern AS VARIANT
   LOCAL strOutput AS STRING

   vText = "blah blah a234 blah blah x345 blah blah"
   vPattern = "[A-Z][0-9][0-9][0-9]"

   strOutput = VBInstr(vText, vPattern)
   MSGBOX strOutput

END FUNCTION


Theo Gottwald

Quote>the use of the Microsoft VBScript regular expressions engine

I am sure, its always available under Vista.
Are there OS-Conditions, which may need additional Updates before this Call can be used?

Will it run - for example - on any W2K SP1 or will it need updates first?
What about Windows NT?

Win 95 and 98 are out of interest.


José Roca

 
Windows 2000 ships with version 5.1. If you don't update, you will be able to run the above code, but you won't be able to use the new methods added to the RegExpr2 interface and the SubMatches collection.

Latest version is 5.6. You can download it at:

http://www.microsoft.com/downloads/details.aspx?familyid=C717D943-7E4B-4622-86EB-95A22B832CAA&displaylang=en

Theo Gottwald

Thanks for the link and the Info, Jose.

My App shall work under W2k "as-is" without the need of getting Internet-Updates first.
That was the reason for the question.

Now I know, that even under this condition, your code can be used.