Discussion:
Definition of string length
(too old to reply)
August Karlstrom
2008-10-12 18:16:16 UTC
Permalink
The Oberon language report, revision 27.7.2008, has the following
definition of the length of a string (section 3.4)*:

"The number of characters in a string (including the null character) is
called the length of the string."

On the other hand section 9.1.3 says

"Strings can be assigned to any array of characters, provided the length
of the string is less than that of the array."

If the string length includes the null character I think "less than"
should be replaced by "no greater than".

Comments?


August


*http://www.inf.ethz.ch/personal/wirth/Articles/Oberon/Oberon07.Report.pdf
B Smith-Mannschott
2008-11-02 11:35:45 UTC
Permalink
Post by August Karlstrom
The Oberon language report, revision 27.7.2008, has the following
"The number of characters in a string (including the null character)
is called the length of the string."
That's an odd definition, this is how I've always thought of things:

VAR s: ARRAY m OF CHAR;
BEGIN s := "helloworld";

index: 0 1 2 3 4 5 . . . n . . . m
-------------------------------------------
array: h e l l o w . . . 0X . . . .

The array s has length m, that is indexes 0 though m-1 are valid, all
others are out of bounds.

The string stored in s has length n-1. The string's contents are at
indices 0 through n-2 inclusive. The string's null terminator is at
index n-1.

An array must have at least length n to store the contents of string s.

So my "length" (of a string) would normally be n-1 in this example,
while the quoted standard considers the "length" to be n. Odd, but,
let's go with that and see where it leads...
Post by August Karlstrom
On the other hand section 9.1.3 says
"Strings can be assigned to any array of characters, provided the
length of the string is less than that of the array."
If the string length includes the null character I think "less than"
should be replaced by "no greater than".
Yup. We could get a string with "length" n into an array of length "n"
since the report's definition of string length already allows for the
null.
Post by August Karlstrom
Comments?
I believe you are correct in suspecting an error here, though I
suspect the error lies in the definition of "length" above.

As I recall the Component Pascal went to considerable length to
distinguish between the length of the string contained with an array and
the length of the array, as container. This is a distinction worth
making clearly.

// Ben
August Karlstrom
2008-11-03 22:49:00 UTC
Permalink
Post by B Smith-Mannschott
Post by August Karlstrom
The Oberon language report, revision 27.7.2008, has the following
"The number of characters in a string (including the null character)
is called the length of the string."
That's an odd definition,
Exactly, it is not the usual definition in which the null character is
excluded.

[...]
Post by B Smith-Mannschott
I believe you are correct in suspecting an error here, though I
suspect the error lies in the definition of "length" above.
If you are used to think about string lengths with the null character
included there is less risk forgetting to make room for the null
character; with the new terminology the declaration

VAR s: ARRAY maxlen + 1 OF CHAR;

becomes

VAR s: ARRAY maxlen OF CHAR;

Maybe this was part of the motivation for the new definition.
Post by B Smith-Mannschott
As I recall the Component Pascal went to considerable length to
distinguish between the length of the string contained with an array and
the length of the array, as container. This is a distinction worth
making clearly.
Of course.


August
Christian Luginbühl
2008-11-04 20:55:58 UTC
Permalink
Post by August Karlstrom
Post by B Smith-Mannschott
That's an odd definition,
If you are used to think about string lengths with the null character
included there is less risk forgetting to make room for the null
character; with the new terminology the declaration
VAR s: ARRAY maxlen + 1 OF CHAR;
becomes
VAR s: ARRAY maxlen OF CHAR;
The very interesting point is that with this new definition, the
empty string ("") has a length of 1 (one).

Cool! Very odd indeed.
Post by August Karlstrom
Maybe this was part of the motivation for the new definition.
Probably. The result is at least ... questionable.

But it has one advantage: It is different from the definition in the "C" world ;-)

Christian.
August Karlstrom
2008-11-04 21:54:23 UTC
Permalink
Post by Christian Luginbühl
The very interesting point is that with this new definition, the
empty string ("") has a length of 1 (one).
...which it really has - storage wise.


August
Christian Luginbühl
2008-11-05 21:37:35 UTC
Permalink
Post by August Karlstrom
Post by Christian Luginbühl
The very interesting point is that with this new definition, the
empty string ("") has a length of 1 (one).
...which it really has - storage wise.
Well, for me, the length of a string comes from the model domain
and therefore should be defined as the number of characters in the
string. The terminator is _not_ part of the string.

I prefer to call the number of bytes required to store a particular
representation of a string different, e.g. allocation size. Given
the alignment properties of modern architectures, this might as well
differ from the actual number of bytes which will be allocated to
a particular string instance.


My Oberon is a little rusty, how do you determine the length of a string
at runtime, and how is it defined?

Christian.
August Karlstrom
2008-11-06 12:35:29 UTC
Permalink
Post by August Karlstrom
Post by Christian Luginbühl
The very interesting point is that with this new definition, the
empty string ("") has a length of 1 (one).
...which it really has - storage wise.
Well, for me, the length of a string comes from the model domain and
therefore should be defined as the number of characters in the string.
I tend to agree.
The terminator is _not_ part of the string.
It depends on how the concept "string" is defined. In general terms, a
string is a sequence of symbols from a predetermined alphabet. If A is
the set of ASCII characters we can define this alphabet as A or as A\{null}.
I prefer to call the number of bytes required to store a particular
representation of a string different, e.g. allocation size. Given
the alignment properties of modern architectures, this might as well
differ from the actual number of bytes which will be allocated to
a particular string instance.
Indeed.
My Oberon is a little rusty, how do you determine the length of a string
at runtime,
To determine the length of a string stored in a character array s we do

len := 0;
WHILE s[len] # 0X DO INC(len) END;

We could also use a library function such as Strings.Length. Note also
that there is no way (programmatically) to determine the length of a
string constant.
and how is it defined?
Oberon, pre Oberon rev. 27.7.2008, has the usual definition of string
length - the number of characters before the null character.


August
Chris Burrows
2008-11-07 03:43:40 UTC
Permalink
Post by August Karlstrom
Post by Christian Luginbühl
The terminator is _not_ part of the string.
It depends on how the concept "string" is defined. In general terms, a
string is a sequence of symbols from a predetermined alphabet. If A is the
set of ASCII characters we can define this alphabet as A or as A\{null}.
Indeed. In Oberon-07 the character set is defined to be the Latin-1 set
which *does* contain the null character. Consequently, from an entirely
logical, as opposed to intuitive, point of view, it could be argued that the
null terminator should be included when counting the number of characters in
a string. Just because everybody else does it differently does not mean that
they are right ;-)
Post by August Karlstrom
Post by Christian Luginbühl
My Oberon is a little rusty, how do you determine the length of a string
at runtime,
To determine the length of a string stored in a character array s we do
len := 0;
WHILE s[len] # 0X DO INC(len) END;
Note that the above would give a length which is one less than the length as
defined in the Oberon-07 report. If you include the null terminator in the
count a suitable Length function would be:

PROCEDURE Length(CONST s: ARRAY OF CHAR): INTEGER;
VAR
i: INTEGER;
BEGIN
i := 0;
WHILE s[i] # 0X DO INC(i) END;
RETURN i+1
END Length;
Post by August Karlstrom
We could also use a library function such as Strings.Length. Note also
that there is no way (programmatically) to determine the length of a
string constant.
I assume by this that you mean

a) there is no intrinsic (e.g. STRLEN) function and

b) you need to assign the string to an array before you can count the
characters

which is fair enough. However, using the above function it is possible to
write:

len := Length('ABC');

I inspected the ARM code generated by the Oberon-07 compiler and there is no
string copying involved - just address passing. To your mind does this count
as a programmatic way to determine the length of a string constant?
Post by August Karlstrom
Post by Christian Luginbühl
and how is it defined?
Oberon, pre Oberon rev. 27.7.2008, has the usual definition of string
length - the number of characters before the null character.
Well, not exactly ...

The earliest publicly available Oberon-07 Report that I have (Revision
1.12.2007) says:

"Every string is terminated by a (invisible) null character. The number of
characters (including the null character) in a string is called the length
of the string."

Neither the earlier Oberon (1990) nor Oberon-2 (1995) reports mention a null
character in the definition of string. The null character is only introduced
as a consequence of the process of assigning a string to an array of
characters.

Getting back to your question at the beginning of this whole discussion -
what has changed is that Oberon-07 Report Revision 1.12.2007 used to say:

9.1.3. Strings can be assigned to any array of characters, provided
the length of the string is not greater than that of the array.

whereas in the latest Revision 27.7.2008 (and some intermediate revisions)
it now says:

9.1.3. Strings can be assigned to any array of characters, provided
the length of the string is less than that of the array.

You may be interested to know that the implementation of Oberon-07 that I am
using does accept the following:

VAR
s: ARRAY 4 OF CHAR;
...
s := 'ABC';

However, it would not be reasonable to conclude from this simple experiment
that all conceivable possibilities might be correspondingly acceptable in
all implementations, either now or in the future. Having said that, I have
yet been unable to think of any technical reason why a potential Oberon-07
implementation might require the character array length (i.e. the number of
its elements) to be greater than the string length (including the null
character).

--
Chris Burrows
Armaide: Oberon-07 ARM Development System for Windows
http://www.armaide.com
August Karlstrom
2008-11-07 14:33:09 UTC
Permalink
Post by Chris Burrows
Post by August Karlstrom
Post by Christian Luginbühl
The terminator is _not_ part of the string.
It depends on how the concept "string" is defined. In general terms, a
string is a sequence of symbols from a predetermined alphabet. If A is the
set of ASCII characters we can define this alphabet as A or as A\{null}.
Indeed. In Oberon-07 the character set is defined to be the Latin-1 set
which *does* contain the null character. Consequently, from an entirely
logical, as opposed to intuitive, point of view, it could be argued that the
null terminator should be included when counting the number of characters in
a string. Just because everybody else does it differently does not mean that
they are right ;-)
The only restriction on a string, according to the Oberon-07 report, is
that it cannot contain both a double quote and a single quote. So with a
hex editor it would even be possible to include more than one null
character in a string. Though, when such a string is assigned to a
character array or compared to another string or character array I guess
only the characters up to the first null character will be considered.
Anyway, I think the report implicitly disallows more than one null
character in a string.
Post by Chris Burrows
Post by August Karlstrom
Post by Christian Luginbühl
My Oberon is a little rusty, how do you determine the length of a string
at runtime,
To determine the length of a string stored in a character array s we do
len := 0;
WHILE s[len] # 0X DO INC(len) END;
Note that the above would give a length which is one less than the length as
defined in the Oberon-07 report.
Indeed. I was thinking in Oberon pre Oberon-07. In both cases the null
character is used as a sentinel.
Post by Chris Burrows
If you include the null terminator in the
PROCEDURE Length(CONST s: ARRAY OF CHAR): INTEGER;
VAR
i: INTEGER;
BEGIN
i := 0;
WHILE s[i] # 0X DO INC(i) END;
RETURN i+1
END Length;
Or with one operation less:

i := 1;
WHILE s[i] # 0X DO INC(i) END;
RETURN i
Post by Chris Burrows
Post by August Karlstrom
We could also use a library function such as Strings.Length. Note also
that there is no way (programmatically) to determine the length of a
string constant.
I assume by this that you mean
a) there is no intrinsic (e.g. STRLEN) function and
b) you need to assign the string to an array before you can count the
characters
which is fair enough. However, using the above function it is possible to
len := Length('ABC');
I inspected the ARM code generated by the Oberon-07 compiler and there is no
string copying involved - just address passing. To your mind does this count
as a programmatic way to determine the length of a string constant?
I was again thinking in Oberon pre Oberon-07. With the introduction of
constant parameters it really is possible (but not without using
procedures ;-). This is in my opinion the most important new feature in
Oberon-07.
Post by Chris Burrows
Post by August Karlstrom
Post by Christian Luginbühl
and how is it defined?
Oberon, pre Oberon rev. 27.7.2008, has the usual definition of string
length - the number of characters before the null character.
Well, not exactly ...
The earliest publicly available Oberon-07 Report that I have (Revision
"Every string is terminated by a (invisible) null character. The number of
characters (including the null character) in a string is called the length
of the string."
Neither the earlier Oberon (1990) nor Oberon-2 (1995) reports mention a null
character in the definition of string. The null character is only introduced
as a consequence of the process of assigning a string to an array of
characters.
To me, this seems to be the most sensible way to define strings and
there relation to character arrays.
Post by Chris Burrows
Getting back to your question at the beginning of this whole discussion -
9.1.3. Strings can be assigned to any array of characters, provided
the length of the string is not greater than that of the array.
whereas in the latest Revision 27.7.2008 (and some intermediate revisions)
9.1.3. Strings can be assigned to any array of characters, provided
the length of the string is less than that of the array.
The first version is the correct one. I have been in contact with Prof.
Wirth himself and he confirms it.


August
Chris Burrows
2008-11-08 00:38:46 UTC
Permalink
Post by August Karlstrom
If you include the null terminator in the count a suitable Length
PROCEDURE Length(CONST s: ARRAY OF CHAR): INTEGER;
VAR
i: INTEGER;
BEGIN
i := 0;
WHILE s[i] # 0X DO INC(i) END;
RETURN i+1
END Length;
i := 1;
WHILE s[i] # 0X DO INC(i) END;
RETURN i
No! It does not allow for the empty string:

i := Length("");
Post by August Karlstrom
I was again thinking in Oberon pre Oberon-07. With the introduction of
constant parameters it really is possible (but not without using
procedures ;-). This is in my opinion the most important new feature in
Oberon-07.
It certainly contributes to the aim of writing efficient and secure
software. I find that more often than not in my recent work CONST
parameters are more appropriate than value parameters.
Post by August Karlstrom
9.1.3. Strings can be assigned to any array of characters,
provided the length of the string is not greater than that of the array.
whereas in the latest Revision 27.7.2008 (and some intermediate
9.1.3. Strings can be assigned to any array of characters,
provided the length of the string is less than that of the array.
The first version is the correct one. I have been in contact with Prof.
Wirth himself and he confirms it.
Good - thank you for confirming that.

--
Chris Burrows
Armaide: Oberon-07 ARM Development System for Windows
http://www.armaide.com
August Karlstrom
2008-11-08 17:09:55 UTC
Permalink
Post by Chris Burrows
Post by August Karlstrom
i := 1;
WHILE s[i] # 0X DO INC(i) END;
RETURN i
i := Length("");
Ah, of course. How stupid of me. (A note to myself: Never post code that
you have not tested.)
Post by Chris Burrows
Post by August Karlstrom
I was again thinking in Oberon pre Oberon-07. With the introduction of
constant parameters it really is possible (but not without using
procedures ;-). This is in my opinion the most important new feature in
Oberon-07.
It certainly contributes to the aim of writing efficient and secure
software. I find that more often than not in my recent work CONST
parameters are more appropriate than value parameters.
It also contributes to clarity - with previous versions of Oberon the
intention of making a parameter a variable parameter can be either to
enable modification of it or just to prevent unnecessary copying of data.
Post by Chris Burrows
Post by August Karlstrom
Post by Chris Burrows
9.1.3. Strings can be assigned to any array of characters,
provided the length of the string is not greater than that of the array.
whereas in the latest Revision 27.7.2008 (and some intermediate
9.1.3. Strings can be assigned to any array of characters,
provided the length of the string is less than that of the array.
The first version is the correct one. I have been in contact with Prof.
Wirth himself and he confirms it.
Good - thank you for confirming that.
Hopefully Prof. Wirth will revise the report soon. There is also a
mistake in the example in section 11 where LONGINT is used instead of
INTEGER. He knows about this too.


August
August Karlstrom
2008-11-08 19:14:42 UTC
Permalink
Post by August Karlstrom
The only restriction on a string, according to the Oberon-07 report, is
that it cannot contain both a double quote and a single quote.
...and not line feed, carriage return or form feed either.


August

Loading...