Guidance Forums / Reginald Rexx / Comma delimited text conversion-Guidance指路人

Guidance
指路人
g.yi.org

Guidance Forums / Reginald Rexx / Comma delimited text conversion

Home

Software

Upload

回顶部

⇑

Forum List • Thread List • Reply • Refresh • New Topic • Search • Previous • Next

1. Comma delimited text conversion

#1319

Posted by: jwebb 2002-11-10 08:50:58

I need to read a .csv file, format the data a bit (sorting, etc.) then produce an HTML file for output.

Would someone please give me a nudge in the right direction? Perhaps a sample of similar scope?

2. How to tokenize a string?

#1321

Posted by: guidance 2002-11-10 11:29:31

I also want to know how to tokenize a string by a certain delimitor and return an array. There is no string/word built-in function to do exactly what I want.

3. Re: Comma delimited text conversion

#1324

Posted by: 2002-11-10 13:22:17

Because a .csv file is a text file, you can read all of the lines into a stem variable easily with Reginald's LOADTEXT(). Each line would be in a separate variable.

To tokenize a string, look at the PARSE keyword. When the string you want to parse is the value of some variable, use PARSE VAR. (Most of time, that's the case). If the string is the return of some function, use PARSE VALUE... WITH. If you're getting the string directly from the user then check out PARSE PULL.

A PARSE statement can break apart a string in all sorts of ways. It can break it apart by pattern matching (ie, search strings), or by blank spaces, or by a certain number of characters or offset in the string, or any combination of those.

If you want to break the string apart by a certain delimitor (such as a comma), then check out the page (in my REXX book) "Using search strings (to break apart tokens)". There's an example to break apart a string at a semi-colon and then a following comma.

4. Re: Comma delimited text conversion

#1329

Posted by: guidance 2002-11-10 13:54:51

.csv is "comma, space, value" -- a text file.

Seems PARSE would be the best solution, but, if there's an unknown number of fields in the input string, PARSE can't parse it in one statement -- need a loop.

By the way, maybe need a cross link between the PARSE and String/word functions section in the doc. I spent more than 10 minutes to find the example you mentioned. PARSE is a special statement in REXX. I didn't notice that at first, and just looked through the string functions to try to find a suitable function to do what PARSE does.

5. Re: Comma delimited text conversion

#1334

Posted by: 2002-11-10 21:22:23

There is an entire section named "Parsing" in the REXX book because PARSE is a very, very useful and important keyword. I don't know how you could have missed that section.

A csv file is a text file where each line contains numerous "values" separated by a comma, something like this:

item 1, item 2, item 3

In that case, if you have an entire line in the variable MyVar, you can break off each "value" as so:

/* Keep going until we empty out the original line */
DO WHILE myvar = ""

   /* Break off the next value into the variable "val"
    * and update MyVar
    */
   PARSE VAR myvar val ',' myvar

   /* Remove leading and trailing spaces */
   val = STRIP(val)

   /* Here you'd do something with val. I simply SAY it */
   SAY val

END

If you wanted, you could collect the pieces in a stem variable:

/* Initially, no pieces */
count = 0

DO WHILE myvar = ""

   PARSE VAR myvar val ',' myvar

   /* Increment number of pieces */
   count = count + 1

   /* Store the piece */
   pieces.count = STRIP(val)

END

/* Here, "count" is the number of pieces, and pieces.1 to pieces.XXX are those pieces */

Just one point. The above works unless you're dealing with some value that has embedded commas. For example, some csv files apparently have values that are quoted in order to allow embedded commas. For example, the line to parse may be:

item 1, "item 2, with an embedded comma", item 3

Furthermore, it appears that some databases use that "trick" of embedding a single quote inside of a quoted string, by putting two of them back to back (just like you can do in a REXX literal string).

So, to account for these extra "got-cha's", you could call the following function to parse one line of a CSV file:

/* ================== ParseCSV ====================
 * This is passed a line (from a CSV text file) to
 * parse. It breaks up the fields of the line into
 * an array where (arrayname).0 is a count of how
 * many fields are broken off, and those fields
 * are stored in (arrayname).1 to (arrayname).XXX
 * where XXX is however many fields there are.
 *
 * This returns an empty string if success, or an
 * error message if an error.
 *
 * error = ParceCSV(line)
 *
 * line is the CSV line to parse.
 *
 * The name of the array (stem) must be assigned to the
 * variable named 'Array'. This will be the stem variable
 * where the fields are to be stored.
 *
 * EXAMPLE:
 *
 * Array = 'MyArray.'
 * error = ParseCSV('Field 1, ""Field 2"", Field 3')
 * IF error == "" THEN
 *    DO i = 1 TO MyArray.0
 *       SAY "Field" i "=" MyArray.i
 *    END
 */

parsecsv: PROCEDURE EXPOSE (array)

/* This is a '1' when we're within a quoted string,
 * or '0' otherwise. Initially, 0.
 */
inside = 0

/* We haven't yet parsed any fields. */
count = 0

/* Let's trim off any leading and trailing spaces, and make
 * sure that we have something to parse.
 */
orig = STRIP(ARG(1))
IF orig == "" THEN DO

   /* If the last char is not a comma, then just stick a comma
    * on the end of it so that the last field has a definitive
    * end.
    */
   IF RIGHT(orig, 1) == ',' THEN orig = orig || ','

   /* We have to slog through each character ourselves, because
    * we want to account for quoted fields. So, we first need to
    * determine how many chars we need to slog through.
    */
   totallength = LENGTH(orig)

   /* This is the position within our original string
    * where the next field starts. Initially at the
    * start of the string.
    */
   startpos = 1

   /* Do each char of the original line */ 
   DO i = 1 TO totallength

      SELECT

         /* Have we gotten to the end of the field? This happens
          * when we encounter a comma that falls outside of
          * any quotes, or if we get to the end of the line.
          */
         WHEN SUBSTR(orig, i , 1) == ',' & ~inside THEN
            DO

               /* Extract the text of this field, with leading/trailing spaces
                * trimmed. That text begins at an offset of 'startpos' within the
                * original line, and it ends at an offset of 'i'.
                */
               piece = STRIP(SUBSTR(orig, startpos, i - startpos))

               /* Update 'startpos' to where the next field should start. It
                * starts after the comma.
                */
               startpos = i + 1

               /* Remove any pair of quotes around the field. */
               IF LEFT(piece, 1) == '"' & RIGHT(piece, 1) == '"' & piece == '"' THEN
                  piece = SUBSTR(piece, 2, LENGTH(piece) - 2)

               /* Another field will be stored in our array, so increment the count. */
               count = count + 1

               /* Find any instances of double quote characters back to
                * back, and replace them with a single double quote.
                * NOTE: Because the REXX interpreter is doing the same
                * with our own instruction below, we need to specify 4
                * double quote characters in order to have 2 of them in
                * this literal string.
                *
                * Store the final text in the next field of our array.
                */
                CALL VALUE array || count, CHANGESTR('""', piece, '"')

            END /* The end of a field. */

         /* Do we have a quote char? If so, then toggle 'inside'. This
          * variable will be '1' when we're inside of a quoted string,
          * and '0' when we're outside of the quotes. We need to
          * know whether we're inside or out so that we know whether
          * any comma we encounter will be regarded as an embedded
          * comma (inside the quotes) or the end of a field (outside
          * quotes).
          */
         WHEN SUBSTR(orig, i , 1) = '"' THEN inside = 1 - inside

         /* If we're parsing in the middle of a field, just keep looking
          * for any quote characters inside of it, or a comma outside of
          * quotes (ie, the end of the field).
          */
         OTHERWISE NOP

      END /* SELECT */

   END /* All chars processed */

END /* Not a blank original string */

/* Store how many fields we have. */
CALL VALUE array || "0", count

/* If we got an odd number of quote characters, then something is
 * amiss with this string.
 */
IF inside THEN RETURN "The original line has an odd number of double quote characters!"

/* Done */
RETURN ""

Forum List • Thread List • Reply • Refresh • New Topic • Search • Previous • Next

掌柜推荐

¥860.00 ·

¥900.00 ·

¥810.00 ·

¥317.00 ·

¥1,370.00 ·

¥660.00 ·

g.yi.org Hits:0