VBspeed / String / Compress
VBspeed © 2000-10, updated: 16-Nov-2001
Compress


The Definition
Function Compress
Returns a string where multiple adjacent occurrences of a specified substring are compressed to just one occurrence. For example, the function will compress multiple spaces within a string down to single spaces.
Declaration:
Compress(sExpression, sCompress[, Compare])
Arguments:
sExpressionRequired. String expression containing substring sequences to be compressed. If sExpression is a zero-length string, Compress returns a zero-length string as well.
sCompressRequired. The single string whereof sequences are to be compressed. Read sCompress as "compress multiples of". Can be longer than one char. If sCompress is a zero-length string, the function returns sExpression unmodified.
CompareOptional. Numeric value indicating the kind of comparison to use when evaluating substrings.
If omitted, the default value is 0, which means a binary comparison is performed.
Examples:
Compress("abbbbcba", "b") => "abcba"
Compress("wackwackwack!", "wack") => "wack!"
You may use this function (VB5/6-compatible) to verify the correctness of your emulation code.


The Charts
Calls
 sRet = Compress(sExpression, sCompress)
Call 1 sExpression = "abbc"
sCompress = "b"
Call 2 sExpression = "a" & String$(10000, "b") & "c"
sCompress = "b"
Call 3 sExpression = Replicate(100, "abcabcabc,")
sCompress = "abc"
Call 4
sExpression = Replicate(100, "Too   many  blanks in  here! ")
sCompress = " "
[probably the most real-world test case]
 VB5
CodeAuthorDopingNotes
Compress01 Donald  
Compress02 Donald  
Compress03 PeterWeighill  
Compress04 PeterWeighill  
Compress05 PeterNierop  
Compress06 TomWinters  
Compress07 GuyAPI,TLB 
Call 1
61.298.279µs
41.107.051µs
51.127.200µs
31.086.923µs
X3.1920.399µs
21.026.520µs
11.006.403µs
Call 2
644.077,346µs
526.474,413µs
426.464,411µs
326.124,353µs
X6.541,090µs
21.72287µs
11.00167µs
Call 3
45.85546µs
33.39316µs
612.421,159µs
511.931,113µs
X2.11196µs
21.68156µs
11.0093µs
Call 4
47.411,582µs
34.721,008µs
647.9510,246µs
547.3910,127µs
X2.24478µs
21.95417µs
11.00214µs
 VB6
CodeAuthorDopingNotes
Compress01 Donald  
Compress02 Donald  
Compress03 PeterWeighill  
Compress04 PeterWeighill  
Compress05 PeterNierop  
Compress06 TomWinters  
Compress07 GuyAPI,TLB 
Call 1
61.599.494µs
41.307.762µs
51.307.761µs
31.237.340µs
X3.5421.097µs
21.197.116µs
11.005.959µs
Call 2
646.187,645µs
524.574,067µs
423.863,949µs
323.333,863µs
X6.921,146µs
22.76456µs
11.00166µs
Call 3
46.82647µs
33.26310µs
612.171,155µs
511.781,118µs
X2.07197µs
21.70162µs
11.0095µs
Call 4
47.371,590µs
34.24915µs
647.1310,166µs
546.5910,049µs
X2.25485µs
21.92414µs
11.00216µs
Notes & Conclusions
Replace (used in Compress01) just isn't happening here. Compress03/04 are looking good but have a severe problem with Calls 3 and 4. Unfortunately Call 4 is probably a typical real-world compress-job. Compress05 is king. Compress06 is the king's boss. But the king is dead and the boss, too. Long live the king, long live Compress07.

Compress05 has severe problems with Unicode and some non-textual chars ("[", "{") in TextCompare mode. Hint: ANDing with 223 to convert a char to uppercase needs a careful hand.
Compress06 and Compress07 fail at the Stooges in TextCompare mode -- but we tolerate this for the time being.
Mail your code! How to read all those numbers


The Code
Compress01
submitted 04-Dec-2000 by Donald Lessau  
Doping: none
Public Function Compress01( _
                sExpression As String, _
                sCompress As String, _
                Optional Compare As VbCompareMethod = vbBinaryCompare) As String
' by Donald, donald@xbeat.net, 20001204
' probably the most obvious but slow solution
  
  Dim sExpanded As String
  
  If Len(sExpression) = 0 Then Exit Function
  Compress01 = sExpression
  If Len(sCompress) = 0 Then Exit Function
  
  ' reduce searching for multiples to searching for doubles
  sExpanded = sCompress & sCompress
  
  Do While InStr(1, Compress01, sExpanded, Compare)
    ' note: If using VB6 you can use the native Replace function
    '       but it's much slower than Replace05.
    '       See http://www.xbeat.net/vbspeed/ for the best emulations.
    Compress01 = Replace05(Compress01, sExpanded, sCompress, , , Compare)
  Loop
                  
End Function
Author's comments:
Donald's comments: see Replace05

top | charts


Compress02
submitted 04-Dec-2000 by Donald Lessau  
Doping: none
Public Function Compress02( _
                sExpression As String, _
                sCompress As String, _
                Optional Compare As VbCompareMethod = vbBinaryCompare) As String
' by Donald, donald@xbeat.net, 20001204

  Dim sExpanded As String
  Dim Start As Long
  Dim posSource As Long
  Dim posTarget As Long
  Dim lenCompress As Long
  Dim lenCopy As Long
  
  If Len(sExpression) = 0 Then Exit Function
  Compress02 = sExpression
  lenCompress = Len(sCompress)
  If lenCompress = 0 Then Exit Function
  
  ' reduce searching for multiples to searching for doubles
  sExpanded = sCompress & sCompress
  Start = 1
  posTarget = 1
  
  Do
    ' search next expanded section
    posSource = InStr(Start, sExpression, sExpanded, Compare)
    If posSource > 0 Then
      ' insert
      lenCopy = posSource - Start + lenCompress
      Mid$(Compress02, posTarget) = Mid$(sExpression, Start, lenCopy)
      posTarget = posTarget + lenCopy
      ' search end of compressed section
      Start = posSource + lenCompress * 2
      Do While InStr(Start, sExpression, sCompress, Compare) = Start
        Start = Start + lenCompress
      Loop
    Else
      ' no more expanded sequences
      ' insert remainder
      Mid$(Compress02, posTarget) = Mid$(sExpression, Start)
      ' trim to actual size
      Compress02 = Left$(Compress02, posTarget + Len(sExpression) - Start)
      ' done
      Exit Function
    End If
  Loop
                  
End Function
Author's comments:
Donald's comments:

top | charts


Compress03
submitted 06-Dec-2000 by Peter Weighill  
Doping: none
Public Function Compress03( _
                sExpression As String, _
                sCompress As String, _
                Optional Compare As VbCompareMethod = vbBinaryCompare) As String
' by Peter Weighill, pweighill@btinternet.com, 20001206
    
    Dim sExpanded As String
    Dim Start As Long
    Dim Start2 As Long
    Dim Start3 As Long
    Dim lenCompress As Long
    
    If Len(sExpression) = 0 Then Exit Function
    Compress03 = sExpression
    lenCompress = Len(sCompress)
    If lenCompress = 0 Then Exit Function
    
    sExpanded = sCompress & sCompress
    Start = 1
    
    Do
        Start = InStr(Start, Compress03, sExpanded, Compare)
        If Start = 0 Then Exit Do
            
        Start3 = Start
        Do
            Start2 = InStr(Start3 + lenCompress, Compress03, sCompress, Compare)
            If Start2 = 0 Or Start2 > Start3 + lenCompress Then Exit Do
            Start3 = Start2
        Loop
        Compress03 = Left$(Compress03, Start + lenCompress - 1) & Mid$(Compress03, Start3 + lenCompress)
        Start = Start + lenCompress + 1
    Loop
End Function
Author's comments:
Donald's comments:

top | charts


Compress04
submitted 06-Dec-2000 by Peter Weighill  
Doping: none
Public Function Compress04( _
                sExpression As String, _
                sCompress As String, _
                Optional Compare As VbCompareMethod = vbBinaryCompare) As String
' by Peter Weighill, pweighill@btinternet.com, 20001206

    Dim sExpanded As String
    Dim Start As Long
    Dim Start2 As Long
    Dim Start3 As Long
    Dim lenCompress As Long

    If Len(sExpression) = 0 Then Exit Function
    Compress04 = sExpression
    lenCompress = Len(sCompress)
    If lenCompress = 0 Then Exit Function

    sExpanded = sCompress & sCompress
    Start = 1

    Do
        Start = InStr(Start, Compress04, sExpanded, Compare)
        If Start = 0 Then Exit Do

        Start3 = Start + lenCompress + lenCompress
        Do
            Start2 = InStr(Start3, Compress04, sCompress, Compare)
            If Start2 = 0 Or Start2 > Start3 Then Exit Do
            Start3 = Start2 + lenCompress
        Loop
        Compress04 = Left$(Compress04, Start + lenCompress - 1) & Mid$(Compress04, Start3)
        Start = Start + lenCompress + 1
    Loop
End Function
Author's comments:
Donald's comments:

top | charts


Compress05
submitted 18-Dec-2000 by Peter Nierop  
Doping: none
Public Function Compress05( _
                sExpression As String, _
                sCompress As String, _
                Optional Compare As VbCompareMethod = vbBinaryCompare _
                ) As String

' by Peter Nierop, pnierop.pnc@inter.nl.net, 20001217

  Dim aOrg() As Byte
  Dim aFind() As Byte
  Dim aOut() As Byte

  Dim lMaxOrg&, lCurOrg&, lMaxFind&, lCurFind&, lFind&, lCurOut&, lMaxOut&
  Dim lFoundTwice&, lCopy&, lComp&

  'prepare Original String
  If Len(sExpression) = 0 Then Exit Function
  aOrg = sExpression
  lMaxOrg = UBound(aOrg)

  'prepare Compressed Output
  ReDim aOut(lMaxOrg)

  'Character or String to find
  If Len(sCompress) = 0 Then
    Compress05 = sExpression
    Exit Function
  End If

  aFind = sCompress
  lMaxFind = UBound(aFind)

  ' if test capitals
  If Compare = vbBinaryCompare Then
    lComp = &HFF
  Else
    lComp = &HDF   'to uppercase
    For lCurFind = 0 To lMaxFind Step 2
      aFind(lCurFind) = aFind(lCurFind) And &HDF
    Next
    lCurFind = 0
  End If

  ' preload the first character to find
  lFind = aFind(0)

  '==========  With one character to find -> shorter loop =====================
  If lMaxFind = 1 Then

    ' step through lowest bytes of unicode string array
    For lCurOrg = 0 To lMaxOrg Step 2

      'look for match with find character
      If lFind = (aOrg(lCurOrg) And lComp) Then
        lFoundTwice = lFoundTwice + 1
      Else
        lFoundTwice = 0
      End If

      'copy only if not a second match
      If lFoundTwice < 2 Then
        aOut(lCurOut) = aOrg(lCurOrg)
        lCurOut = lCurOut + 2
      End If

    Next

  Else
  '============ Longer loop if multiple characters to find ======================

    ' step through lowest bytes of unicode string array
    For lCurOrg = 0 To lMaxOrg Step 2

      'look for match with current find character
      If lFind = (aOrg(lCurOrg) And lComp) Then

        lCurFind = lCurFind + 2
        ' if no more characters to test -> match with string happened
        If lCurFind >= lMaxFind Then
          lFoundTwice = lFoundTwice + 1
          lCurFind = 0  'and start over
        End If
        ' now load next character from string to find
        lFind = aFind(lCurFind)

      Else
        ' no match so clean up
        lFoundTwice = 0
        lCurFind = 0
        lFind = aFind(lCurFind)
      End If

      ' copy character in output
      aOut(lCurOut) = aOrg(lCurOrg)
      lCurOut = lCurOut + 2

      'reset pointer to copy to if we had a second match
      If lFoundTwice = 2 Then
        lFoundTwice = 1
        lCurOut = lCurOut - lMaxFind - 1
      End If


    Next

  End If

  'shorten compressed string to real length
  ReDim Preserve aOut(lCurOut - 1)
  Compress05 = aOut  'array to string conversion
End Function
Author's comments:
Donald's comments:

top | charts


Compress06
submitted 04-Nov-2001 by Tom Winters  
Doping: none
The complete code, a function of 16KB wrapped up in a module,
is a bit too long to be displayed here ... download it to have a look.
Author's comments:
Donald's comments:

top | charts


Compress07
submitted 21-Nov-2001 by Guy Gervais  
Doping: API, TLB StringHelpers
Private Type SafeArray1D
    cDims As Integer
    fFeatures As Integer
    cbElements As Long
    cLocks As Long
    pvData As Long
    cElements As Long
    lLbound As Long
End Type

Private Const FADF_AUTO As Long = &H1
Private Const FADF_FIXEDSIZE As Long = &H10

' UnRem if not declared elsewhere
'Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (dst As Any, src As Any, ByVal nBytes As Long)
'Private Declare Function VarPtrArray Lib "msvbvm60.dll" Alias "VarPtr" (Ptr() As Any) As Long


Public Function Compress07(Text As String, ByVal Comp As String, Optional Compare As VbCompareMethod = vbBinaryCompare) As String
' By Guy Gervais, ggervais@videotron.ca, 20 Nov 2001
'   - Requires a reference to the Split03.tlb

Dim txt()   As Integer
Dim SAtxt   As SafeArray1D

Dim cmp()   As Integer
Dim SAcmp   As SafeArray1D

Dim ret()   As Integer
Dim SAret   As SafeArray1D

Dim iLtxt   As Long
Dim iLcmp   As Long

Dim iUBtxt  As Long
Dim iUBcmp  As Long
Dim iDiff   As Long

Dim i       As Long
Dim j       As Long
Dim Ptr     As Long

Dim iMatch  As Long
Dim iPoison As Long
    
        
    ' Init
    iLtxt = Len(Text)
    iLcmp = Len(Comp)
    
    If iLtxt = 0 Then Exit Function
    If iLcmp = 0 Then
        Compress07 = Text
        Exit Function
    End If
    
    ' Text
    With SAtxt
        .cDims = 1
        .cbElements = 2&
        .cElements = iLtxt
        .fFeatures = FADF_AUTO Or FADF_FIXEDSIZE
        .pvData = StrPtr(Text)
    End With
    CopyMemory ByVal VarPtrArray(txt), VarPtr(SAtxt), 4
    iUBtxt = UBound(txt)
    
    ' Comp
    With SAcmp
        .cDims = 1
        .cbElements = 2&
        .cElements = iLcmp
        .fFeatures = FADF_AUTO Or FADF_FIXEDSIZE
        .pvData = StrPtr(Comp)
    End With
    CopyMemory ByVal VarPtrArray(cmp), VarPtr(SAcmp), 4
    iUBcmp = UBound(cmp)
    
    ' Alloc return string
    Compress07 = StringHelpers.SysAllocStringLen(ByVal 0&, iLtxt)
    
    ' Return
    With SAret
        .cDims = 1
        .cbElements = 2&
        .cElements = iLtxt
        .fFeatures = FADF_AUTO Or FADF_FIXEDSIZE
        .pvData = StrPtr(Compress07)
    End With
    CopyMemory ByVal VarPtrArray(ret), VarPtr(SAret), 4
    
    iDiff = iUBtxt - iUBcmp
    Ptr = 0
    iPoison = -1
    If Compare Then
        
        ' Switch Comp string to Uppercase
        For i = 0 To iUBcmp
            Select Case cmp(i)
                Case 97 To 122, 224 To 246, 248 To 254
                    cmp(i) = cmp(i) And 223&
            End Select
        Next
        
        ' Text compress
        For i = 0 To iUBtxt
            
            Select Case txt(i)
                Case 97 To 122, 224 To 246, 248 To 254
                    iMatch = ((txt(i) And 223&) = cmp(0))
                Case Else
                    iMatch = (txt(i) = cmp(0))
            End Select
            
            If iMatch Then
                If iLcmp > 1 Then
                    ' Check for complete match
                    For j = 1 To iUBcmp
                        Select Case txt(i + j)
                            Case 97 To 122, 224 To 246, 248 To 254
                                If (txt(i + j) And 223&) <> cmp(j) Then GoTo TxtSkip
                            Case Else
                                If txt(i + j) <> cmp(j) Then GoTo TxtSkip
                        End Select
                    Next
                Else
                    ' Special optimization when Len(Comp) = 1
                    Do
                        If i = iDiff Then
                            i = i + 1
                            Exit Do
                        End If
                        i = i + 1
                        Select Case txt(i)
                            Case 97 To 122, 224 To 246, 248 To 254
                                If (txt(i) And 223&) <> cmp(0) Then Exit Do
                            Case Else
                                If txt(i) <> cmp(0) Then Exit Do
                        End Select
                    Loop
                    i = i - 1
                End If
                If i = iPoison Then
                    Ptr = Ptr - iLcmp
                End If
                If iPoison <= i Then iPoison = i + iLcmp
            End If
TxtSkip:
            ret(Ptr) = txt(i)
            Ptr = Ptr + 1
        Next
    
    Else
        
        ' Binary compress
        For i = 0 To iUBtxt
            If (txt(i) = cmp(0)) Then
                If iLcmp > 1 Then
                    ' Check for complete match
                    If i <= iDiff Then
                        For j = 1 To iUBcmp
                            If txt(i + j) <> cmp(j) Then GoTo BinSkip
                        Next
                    Else
                        GoTo BinSkip
                    End If
                Else
                    ' Special optimization when Len(Comp) = 1
                    Do
                        If i = iDiff Then
                            i = i + 1
                            Exit Do
                        End If
                        i = i + 1
                        If txt(i) <> cmp(0) Then Exit Do
                    Loop
                    i = i - 1
                End If
                If i = iPoison Then
                    Ptr = Ptr - iLcmp
                End If
                If iPoison <= i Then iPoison = i + iLcmp
            End If
BinSkip:
            ret(Ptr) = txt(i)
            Ptr = Ptr + 1
        Next
    End If

    ' Clean up
    SAtxt.pvData = 0
    SAcmp.pvData = 0
    SAret.pvData = 0

    ' Trim
    Compress07 = Left$(Compress07, Ptr)

End Function
Author's comments :
Donald's comments :

top | charts




VBspeed © 2000-10 by Donald Lessau