Guidance Forums / Reginald Rexx / Process large text files

Search 搜索
Home Home

Forum List • Thread List • Refresh • New Topic • Search • Previous • Next First 1 Last
Message1. Process large text files
Posted by: PeterJ 2009-05-14 04:59:51 Last edited by: guidance 2010-07-16 21:06:55 (Total edited 2 times)
One of my recent activities was to search through large text files consisting of up to 1 million records with a size of 20 MB. After some experimenting it turned out that to "LOADTEXT" such files into a stem and search in it is too slow. Therefore I implemented a tiny DLL in Purebasic, which reads and process the file within memory, in an amazing speed.
If you like you can try it yourself using the FILELOAD.REX and small sample file of 1500 records. For documentation purposes I also added the source code of the DLL:
Message2. Looks really useful
Posted by: Michael S 2009-05-14 16:02:13
One thing that struck me immediately though. Since It's a DLL, do you HAVE to "access" the code via FUNCDEF - you can't achieve the same thing via the library call ?
Posted by: PeterJ 2009-05-14 16:25:48 Last edited by: PeterJ 2009-05-14 16:28:42 (Total edited 2 times)
You need to use FUNCDEF, as the I/O parameters need to be defined. The Library directive can only be used for Reginald conform DLLs. This DLL is a standalone one. Concerning performance or flexibility this isn't a disadvantage.
Message4. Had another look at your code
Posted by: Michael S 2009-05-27 00:50:30
Is there any "documentation" on how it works, what parms it expects etc etc. For example, does FIND allow a PREV as the first argument ? What is the SEM as the second argument for/mean ? If NOCASE is omitted, is the default CASE ?
(It sure as hell is like greased lightning)
Posted by: PeterJ 2009-05-27 21:54:22 Last edited by: PeterJ 2009-05-27 21:59:02 (Total edited 1 time)
means return me the record with the first occurrence of search string "SEM".
No PREV is not possible, what do you want achieve?

Brief documentation:
If you FUNCDEF with the same rexx names as I did in the example I posted, we have the following functions:

loads a text file into a memory area and keeps it until the script has ended. The file must be a ASCII text
file, a new line begins after the character combination CR+NL or a simple NL.
If reading of the file was successful the function returns the number of fetched records.


       mode:           <FIRST/NEXT>
                  if mode=FIRST, it returns the first record where the search string occurs
                  with NEXT it retrieves the next record, positioned after the previous found record
      search-string:    string to be found in the fetched records
      search-options:    <CASE/NOCASE> search is case sensitive (CASE), or insensitive (NOCASE)
      The function returns the entire record containing the search string. The initial search must be performed
      with the FIRST keyword, any subsequent search must use NEXT.
      If search string is not found, or not found after last successful search an empty string is returned.
   e.g. record=FIND('FIRST','purple','NOCASE')  /* return record with first occurrence of purple in file */
        record=FIND('NEXT','purple','NOCASE')   /* return record with second occurrence of purple in file */
      record=FIND('NEXT','purple','NOCASE')   /* return next record with occurrence of purple in file */
      if the string is contained more than once in a record, the record is only returned once!
returns the content of the requested record. 1 is the first record, 2 the second, etc. If the requested number is not
available, GETR return an emptry string.
Message6. Thanks Peter
Posted by: Michael S 2009-05-29 14:29:53
No PREV is not possible, what do you want achieve?
Nothing as yet - more an understanding of what was possible or not.
Forum List • Thread List • Refresh • New Topic • Search • Previous • Next First 1 Last
© Fri 2024-6-21  Guidance Laboratory Inc. Hits:0