VBspeed / String / Tokenize
VBspeed © 2000-10, updated: 24-Nov-2000
Tokenize


The Definition
Function Tokenize
Returns a zero-based, one-dimensional array containing a specified number of substrings. This array is returned in an argument.
The function itself returns Ubound(asToken), or -1 if asToken is not bound (empty array) which is the case if Expression is a zero-length string.

Tokenize is the bigger brother of Split since it can take more than one delimiter per parse. Note, however, that in Tokenize each delimiter can be only a one char string, while the delimiter in Split can be of any length.
Declaration:
Tokenize(Expression, asToken(), Delimiters[, IncludeEmpty])
Arguments:
ExpressionRequired. String expression containing substrings and delimiters. If Expression is a zero-length string, Tokenize returns an empty array, that is, an array with no elements and no data.
asToken()Required. One-dimensional string array that will hold the returned substrings. Does not have to be bound before calling Tokenize.
DelimitersRequired. String containing a sequence of delimiter characters used to identify substring limits. If Delimiters is a zero-length string, a single-element array containing the entire Expression string is returned.
IncludeEmptyOptional. Boolean flag: if True, zero-length tokens are returned, too. Is False by default, which means that adjoining delimiter chars count as one.
Example:
lRet = Tokenize("http://www.xbeat.net/vbspeed/index.htm", asToken, "/.:")
' count tokens: lRet + 1 = 7
' asToken elements: "http", "www", "xbeat", "net", "vbspeed", "index", "htm"


The Charts
Calls
 cntTokens = Tokenize(sExpression, asToken, sDelim)
Call 1 sExpression = "http://www.xbeat.net/vbspeed/index.htm" (28 chars, 7 tokens)
Delim = "/.:"
Call 2 sExpression = "12345 67890, 12345 67 ... [etc.]" (12998 chars, 2000 tokens)
Delim = " ," (blank and comma)
 VB5
CodeAuthorDopingNotes
Tokenize01 Donald  
Tokenize02 Donald  
Tokenize03 cmdX  
Tokenize04 GuidoAPI 
Call 1
46.12177µs
34.73137µs
21.7651µs
11.0029µs
Call 2
44.4826,663µs
21.478,743µs
31.9911,825µs
11.005,956µs
 VB6
CodeAuthorDopingNotes
Tokenize01 Donald  
Tokenize02 Donald  
Tokenize03 cmdX  
Tokenize04 GuidoAPI 
Call 1
46.24187µs
34.71141µs
21.7452µs
11.0030µs
Call 2
44.7529,325µs
21.368,403µs
32.1913,523µs
11.006,178µs
Conclusions
Well, doping does it.
Mail your code! How to read all those numbers


The Code
Tokenize01
submitted 24-Sep-2000 by Donald Lessau  
Doping: none
Public Function Tokenize01(Expression As String, _
    asToken() As String, _
    Delimiters As String, _
    Optional IncludeEmpty As Boolean) As Long

' returns Ubound(asToken), or -1 if asToken is not bound (empty array)
' by Donald, donald@xbeat.net, 20000924, revision 001
  Const BUFFERDIM As Long = 1024
  
  Dim iPos As Long
  Dim cntToken As Long
  Dim posStart As Long
  Dim ubArray As Long
  
  ubArray = -1
  posStart = 1
  
  For iPos = 1 To Len(Expression)
    ' is char among delimiters?
    If InStr(Delimiters, Mid$(Expression, iPos, 1)) Then
      If iPos > posStart Or IncludeEmpty Then
        If cntToken > ubArray Then
          ubArray = ubArray + BUFFERDIM
          ReDim Preserve asToken(ubArray)
        End If
        ' cut token out of Expression
        asToken(cntToken) = Mid$(Expression, posStart, iPos - posStart)
        cntToken = cntToken + 1
      End If
      posStart = iPos + 1
    End If
  Next
  ' remainder
  If iPos > posStart Or IncludeEmpty Then
    If cntToken > ubArray Then
      ReDim Preserve asToken(cntToken)
    End If
    asToken(cntToken) = Mid$(Expression, posStart)
    cntToken = cntToken + 1
  End If
  
  ' erase or shrink
  If cntToken = 0 Then
    Erase asToken
  Else
    ReDim Preserve asToken(cntToken - 1)
  End If
  
  ' return ubound
  Tokenize01 = cntToken - 1
  
End Function
Author's comments:
Donald's comments:

top | charts


Tokenize02
submitted 24-Sep-2000 by Donald Lessau  
Doping: none
Public Function Tokenize02(Expression As String, _
    asToken() As String, _
    Delimiters As String, _
    Optional IncludeEmpty As Boolean) As Long

' returns Ubound(asToken), or -1 if asToken is not bound (empty array)
' by Donald, donald@xbeat.net, 20000924, revision 001
  Const BUFFERDIM As Long = 1024
  
  Dim bExpr() As Byte, iExpr As Long, ubExpr As Long
  Dim bDelim() As Byte, iDelim As Long, ubDelim As Long
  Dim cntToken As Long
  Dim posStart As Long
  Dim ubArray As Long
  
  bExpr = Expression
  bDelim = Delimiters
  ubExpr = UBound(bExpr)
  ubDelim = UBound(bDelim)
  
  ubArray = -1
  
  For iExpr = 0 To ubExpr Step 2
    ' is char among delimiters?
    For iDelim = 0 To ubDelim Step 2
      ' compare low unicode byte
      If bExpr(iExpr) = bDelim(iDelim) Then
        ' compare high unicode byte
        If bExpr(iExpr + 1) = bDelim(iDelim + 1) Then
          If iExpr > posStart Or IncludeEmpty Then
            If cntToken > ubArray Then
              ubArray = ubArray + BUFFERDIM
              ReDim Preserve asToken(ubArray)
            End If
            ' cut token out of Expression
            asToken(cntToken) = MidB$(Expression, posStart + 1, iExpr - posStart)
            cntToken = cntToken + 1
          End If
          posStart = iExpr + 2
          Exit For
        End If
      End If
    Next
  Next
  ' remainder
  If iExpr > posStart Or IncludeEmpty Then
    If cntToken > ubArray Then
      ReDim Preserve asToken(cntToken)
    End If
    asToken(cntToken) = MidB$(Expression, posStart + 1)
    cntToken = cntToken + 1
  End If
  
  ' erase or shrink
  If cntToken = 0 Then
    Erase asToken
  Else
    ReDim Preserve asToken(cntToken - 1)
  End If
  
  ' return ubound
  Tokenize02 = cntToken - 1
    
End Function
Author's comments:
Donald's comments:

top | charts


Tokenize03
submitted 28-Sep-2000 by cmdX  
Doping: none
Public Function Tokenize03(Expression As String, _
    asToken() As String, _
    Delimiters As String, _
    Optional IncludeEmpty As Boolean) As Long
  
  ' by cmdX, cmdX@online.ru, 20000928, revision 001
   
  ' --- Pseudo code ---
  ' For Each Delimiter In Delimeters
  '   For Each OccurenceOf Delimiter In Expression
  '     Store Delimiter's position
  '   Next [Occurence]
  ' Next [Delimiter]
  ' Store Each Token Between Delimeters
  ' Return [ByRef asToken], asToken.Count | -1
   
  ' --- Declare section ---
  Dim strCurDelim As String          ' Current delimiter (e.g "/")
  Dim strCurToken As String
  Dim nDelim As Long                 ' Delim's position in the Delimiters string (from 1 on)
  Dim pDelim As Long                 ' Position of a current delimiter in the Expression
  Dim pDelimPrev As Long             ' Previous position
  Dim Index As Long
  Dim lCounter As Long
  Dim nTokenCount As Long
  Dim nLen As Long
   
  ' --- Initialization ---
  Tokenize03 = -1
  pDelimPrev = 1
  nLen = Len(Expression)
   
  ' --- Body ---
  If nLen = 0 Then Exit Function
  ReDim apDelimiter(Index To nLen + 1) As Byte ' ~Sorted array of pointers to delimiters
  apDelimiter(Index) = 1
   
  For nDelim = 1 To Len(Delimiters)
    strCurDelim = Mid$(Delimiters, nDelim, 1)
    pDelim = InStr(pDelimPrev, Expression, strCurDelim)
    Do While pDelim <> 0
      Index = Index + 1
      apDelimiter(pDelim) = 1                  ' Mark position
      pDelimPrev = pDelim
      pDelim = InStr(pDelimPrev + 1, Expression, strCurDelim)
    Loop
    pDelimPrev = 1
  Next ' nDelim
  apDelimiter(nLen + 1) = 1
   
  ReDim asToken(0 To Index)
  ReDim apDelim(0 To Index + 1) As Long
   
  For lCounter = 0 To nLen + 1
    If apDelimiter(lCounter) Then
      apDelim(nTokenCount) = lCounter
      nTokenCount = nTokenCount + 1
    End If
  Next ' lCounter
  nTokenCount = 0
   
  If IncludeEmpty Then
    For lCounter = 0 To Index
      pDelimPrev = apDelim(lCounter)
      pDelim = apDelim(lCounter + 1)
      asToken(lCounter) = Mid$(Expression, pDelimPrev + 1, pDelim - pDelimPrev - 1)
    Next ' lCounter
  Else
    For lCounter = 0 To Index
      pDelimPrev = apDelim(lCounter)
      pDelim = apDelim(lCounter + 1)
        If pDelim - pDelimPrev > 1 Then
          asToken(nTokenCount) = Mid$(Expression, pDelimPrev + 1, pDelim - pDelimPrev - 1)
          nTokenCount = nTokenCount + 1
        End If ' Empty Token
    Next ' lCounter
  End If ' IncludeEmpty
   
  
  If IncludeEmpty Then
    If Index > 1 Then Tokenize03 = Index
  Else
    If nTokenCount > 0 Then
      ReDim Preserve asToken(0 To nTokenCount - 1)
      Tokenize03 = nTokenCount - 1
    End If
  End If ' IncludeEmpty
 
End Function
Author's comments:
Donald's comments:

top | charts


Tokenize04
submitted 24-Nov-2000 by Guido Beckmann  
Doping: API
' VB5 -> msvbvm50.dll
Private Declare Function VarPtrArray& Lib "msvbvm60.dll" Alias "VarPtr" (ptr() As Any)
Private Declare Sub RtlMoveMemory Lib "kernel32" (dst As Any, src As Any, ByVal nBytes&)
Private Declare Sub RtlZeroMemory Lib "kernel32" (dst As Any, ByVal nBytes&)
Private Type SAFEARRAY1D
    cDims           As Integer
    fFeatures       As Integer
    cbElements      As Long
    cLocks          As Long
    pvData          As Long
    cElements       As Long
    lLbound         As Long
End Type

Public Function Tokenize04&(Expression$, ResultTokens$(), Delimiters$, Optional IncludeEmpty As Boolean) ' Tokenize02 by Donald, donald@xbeat.net ' modified by G.Beckmann, G.Beckmann@NikoCity.de Const ARR_CHUNK& = 1024 Dim cExp&, ubExpr& Dim cDel&, ubDelim& Dim aExpr%(), aDelim%() Dim sa1 As SAFEARRAY1D, sa2 As SAFEARRAY1D Dim cTokens&, iPos& ubExpr = Len(Expression) ubDelim = Len(Delimiters) sa1.cbElements = 2: sa1.cElements = ubExpr sa1.cDims = 1: sa1.pvData = StrPtr(Expression) RtlMoveMemory ByVal VarPtrArray(aExpr), VarPtr(sa1), 4 sa2.cbElements = 2: sa2.cElements = ubDelim sa2.cDims = 1: sa2.pvData = StrPtr(Delimiters) RtlMoveMemory ByVal VarPtrArray(aDelim), VarPtr(sa2), 4 If IncludeEmpty Then ReDim Preserve ResultTokens(ubExpr) Else ReDim Preserve ResultTokens(ubExpr \ 2) End If ubDelim = ubDelim - 1 For cExp = 0 To ubExpr - 1 For cDel = 0 To ubDelim If aExpr(cExp) = aDelim(cDel) Then If cExp > iPos Then ResultTokens(cTokens) = Mid$(Expression, iPos + 1, cExp - iPos) cTokens = cTokens + 1 ElseIf IncludeEmpty Then ResultTokens(cTokens) = vbNullString cTokens = cTokens + 1 End If iPos = cExp + 1 Exit For End If Next cDel Next cExp '/ remainder If (cExp > iPos) Or IncludeEmpty Then ResultTokens(cTokens) = Mid$(Expression, iPos + 1) cTokens = cTokens + 1 End If '/ erase or shrink If cTokens = 0 Then Erase ResultTokens() Else ReDim Preserve ResultTokens(cTokens - 1) End If '/ return ubound Tokenize04 = cTokens - 1 '/ tidy up RtlZeroMemory ByVal VarPtrArray(aExpr), 4 RtlZeroMemory ByVal VarPtrArray(aDelim), 4 End Function
Author's comments:
Donald's comments:

top | charts




VBspeed © 2000-10 by Donald Lessau