Functions on strings
Introduction
Character strings (text, string) can occur as part of a device protocol, e.g. as an error message from a subsystem or measurement data transmission via a serial interface. Many WEB-based communication protocols are also based on strings.
The math module supports the scalar type string, which typically contains characters encoded with UTF-8.
All data types can be converted into the string type with str()
, for example, or into the corresponding target types with the other cast operators, provided that no syntactical errors occur.
Operators on text
The following operators are supported with the String data type:
-
all comparison operators, the numerical order of the ASCII characters applies: e.g.
'4' < 'A' < 'a'
-
+
- operator, links two stringss_connected = s1 + s2;
tipstr(x)
converts any data type into a string representation, including multidimensional objects such as vectors and matrices. To get more control over the formatting, the functionsFormat(f, ...)
can be helpful.
Properties and basic functions
Length of a character string "sLength"
The length of a character string s in bytes is calculated as follows
l = sLength(s);
In UTF-8 encoding, special characters can also occupy several bytes with the appropriate encoding. This function determines the number of bytes, not the number of (printable) characters!
Extraction of a substring as a copy "sCopy"
A substring r of the string s can be extracted as follows
r = sCopy(s, b); // Copy of the character string s from position b
r = sCopy(s, b, c); // Copy of the string s from position b of length c
r = sCopy(s, BC); // BC is a vector of length 2 that contains b and c (behavior as with sCopy(s, b, c))
r = sCopy(s, BC, sel); // BC is a matrix with two columns corresponding to b and c,
// where sel represents a selective row index
In many places, the matrix BC
is the result of search functions, e.g. sFind()
, which can return several hits in the string. The corresponding text part can be extracted directly via the index sel
.
Example:
// 0 1 2 3
// 0123456789012345678901234567890123456789
geo = 'lat: 51.234567, long: 12.3456789';
p6 = sFind(geo, 'lat:\s*([0-9.+-]+)', {subex: true});
// => p6 := [ [ 0,14] // refers to 'lat: 51.234567'
// , [ 5, 9]]; // refers to '51.234567'
lat = dbl(sCopy(geo, p6, 1));
// => lat := 51.234567;
Extraction of the start and end of the string "sLeft", "sRight"
String start b and string end e of length l of the string s can be obtained as a copy as follows
b = sLeft(s,l);
e = sRight(s,l);
Remove a substring "sErase"
To obtain a copy of a string s in which a substring has been removed, the following can be used
r = sErase(s, b); // Copy of the character string s without characters from position b
r = sErase(s, b, c); // Copy of the string s without characters from position b of length c
r = sErase(s, BC); // BC is a vector of length 2 that contains b and c (behavior as with sErase(s, b, c))
r = sErase(s, BC, sel); // BC is a matrix with two columns corresponding to b and c,
// where sel represents a selective row index
In many places, the matrix BC
is the result of search functions, e.g. sFind()
, which can return several hits in the string. The corresponding text part can be removed directly via the index sel
.
Insert a string "sInsert"
To insert a string t within a string s at position b and return a copy r, the following can be used
r = sInsert(s, b, t);
Create, format, read, clean up
Extraction of numerical values from a string "sScan"
This function is in preparation.
To extract numerical values from a string s and return a vector v of these values, the following can be used
v = sScan(f, s);
Here, f represents a format string of the type scanf()
.
Formatting of a string "sFormat"
The following function can be used to format a string f, e.g. containing numerical values, as a string r
r = sFormat(f, ...);
Here, f represents a format string of the type printf()
, and a corresponding argument of the form %[<width>][.<precision>]<type>
must be specified for each placeholder.
-
<width>
: optional field width for the output, width < 0: left-aligned -
.<precision>
: optional for f/g/e, number of decimal places or valid digits -
<type>
: The arguments are converted according to the selected output type. Additional length specifications for the data type are therefore not required.<type>
data type output b <bool>
Output of 'false' or 'true' d <int>
signed integer, base 10 u <uint>
unsigned integer, base 10 x, X <uint>
unsigned integer, base 16
small (x) or large (X) digitsA
-F
o <uint>
unsigned integer, base 8 f <dbl>
Floating point number without 10-exponent
Example: 1234.5678g <dbl>
Floating point number in optimized representation
like f or e depending on the order of magnitude of the number.
Examples: 1234.45678, 1.2345e9, 1.2345e-12e <dbl>
Floating point number with 10's exponent
Example: 1.2345678e3s <str>
text
Example:
template = 'alt: %6.2f, lat: %.9f, lon: %.9f';
tx = sFormat(template, 140.4, 49.8765432, -3.14);
// tx := 'alt: 140.40, lat: 49.876543200, lon: -3.140000000';
Access to the content of a resource "sResource"
The function returns the content of the designated resource as a constant (binary) string.
resStr = sResource({@ref:'myResource'});
Depending on the configuration, the content of the resource may not only contain pure UTF-8 text (e.g. an INI file or a JSON object), but also binary data such as double arrays, jpg images, etc. Binary data in particular may only be further processed by suitable special functions. Please contact us for the implementation of such functions.
Remove whitespace "sTrim"
This function removes all whitespace at the beginning and end of the argument, e.g. in
s_trimmed = sTrim(s);
Conversion to upper/lower case "sUpper", "sLower"
The conversion of a string with regard to upper/lower case is carried out using
s_uppercase = sUpper(s);
s_lowercase = sLower(s);
Simplification "sSimplify" of strings
The function sSimplify()
replaces all multiple occurrences of white space (spaces, tabs, CR, LF, ...) with a single space.
s_simple = sSimplify(s);
Normalization "sNormalize" of strings
The function sNormalize()
removes matching quotation marks "..."
or '...'
at the beginning and end of the string and replaces all \
-ESCAPE sequences with their designated code.
s_normalized = sNormalize(s);
Search, find and disassemble
Search for substrings "sFind"
The search function sFind
returns the index r or a matrix with the position and length of the text parts of the pattern p found in the string s and can be used as follows
r1 = sFind(s, {...}); // configuration of the 'pattern' is mandatory
r1 = sFind(s,p); // direct search from the start of the string
r2 = sFind(s,p,b); // direct search from position b
// Optional configuration object for all variants
rx = sFind(..., { pattern: <string>
, case: <bool>
, all: <bool>
, regex: <bool>
, subex: <bool>
});
If no pattern is found, this function returns -1
.
A matrix with the search results has the following structure:
This matrix can be passed directly together with the line number of the individual result to the functions sCopy()
or sErase()
.
A single result can be further processed with the function GetRow()
or by directly specifying the indexing:
part4 = sCopy(s, rx, 3); // counting from zero
part4a = sCopy(s, rx[3,0], rx[3,1]); // equivalent
part4_sc = GetRow(rx, 3); // vector [s3, c3]
part4b = sCopy(s, part4_sc); // equivalent
Property | Value | Description |
---|---|---|
pattern | <str> | Constant search pattern if parameter p is not used. It is somewhat more performant for regex modes, as the expression does not have to be translated again and again. |
case | <bool> | Case-sensitive search, (def: false, case insensitive) |
all | <bool> | Returns a vector with the starting positions and lengths of the matching text parts, not in conjunction with subex |
regex | <bool> | Interprets the pattern pattern or p as a regular expression, the result is a matrix with the position and length of the expressions/sub-expressions found |
subex | <bool> | Returns position and length of the extracted text parts in a result matrix, sets regex automatically |
Examples
// 0 1 2 3
// 0123456789012345678901234567890123456789
str = 'Hello world and hello my dear friends!
p0 = sFind(str, "dog");
// => p0 := -1; // not found, independant of selected modes
// my test with (p0 < 0) ? ... : ...
// or isMatrix(p0) ? ... : ... for 'all'- or 'regex'-modes
p1 = sFind(str, "hello");
// => p1 := 0;
p2 = sFind(str, "hello", {case: true});
// => p2 := 16; // first one is now skipped
p3 = sFind(str, "hello", {all: true});
// => p3 := [ [ 0, 5]
// , [16, 5]];
p4 = sFind(str, "(and|dear)", { regex: true, all: true});
// => p4 := [ [12, 3]
// , [25, 4]];
p5 = sFind(str, 'dear\s+(\w+)', { subex: true });
// => p5 := [ [25, 12] // first row is complete match
// , [30, 7]]; // then (...) extractions follow
dear= sCopy(str, p5, 1);
// => dear:= 'friends';
// 0 1 2 3
// 0123456789012345678901234567890123456789
geo = 'lat: 51.234567, long: 12.3456789';
p6 = sFind(geo, 'lat:\s*([0-9.+-]+)', {subex: true});
// => p6 := [ [ 0,14]
// , [ 5, 9]];
lat = dbl(sCopy(geo, p6, 1));
// => lat := 51.234567;
Replace substrings "sReplace"
To replace (several) character strings t_k within the character string s with the corresponding text of s_k, the following can be used
r = sReplace(s,t_1,s_1, ..., t_N,s_N);
Here, a sequential replacement takes place in the order of the arguments, the values s_k are implicitly converted into a text representation. The sFormat()
function can be used for more control during conversion.
Example:
template = 'alt: <alt>, lat: <lat>, lon: <lon>';
tx = sReplace(template, '<alt>', 140.4, '<lat>', 49.8765432, '<lon>', -3.14);
// tx := 'alt: 140.400000, lat: 49.876543, lon: -3.140000';
Search for key-value combination "sGetKV"
This function is in preparation.
This function searches a string (text block) for a key-value pair and returns the value.
v1 = sGetKV(s, k);
// Optional configuration block
vx = sGetKV(..., { format: <enum>
, assign: <str>
, ending: <str>
, quotes: <bool>
, trim: <bool>
});
Property | Value | Description |
---|---|---|
format | <enum> | This input defines the format of the string. Possible values are: - json: - ini: - csv: |
assign | <regex> | separator between key and value, (def.: : ) |
ending | <regex> | separator between key-value entries, (def.: ,] ) |
quotes | <bool> | Automatic removal of quotation marks from keys and values. (def.: true) |
trim | <bool> | Automatic removal of white space at the beginning and end of a value (def.: true) |
Splitting character strings "sSplit"
The sSplit function can be used to split a character string at a character ch. This returns a two-column matrix of all start positions and lengths of the corresponding substrings
r1 = sSplit(s, ch);
// Optional configuration block
rx = sSplit(..., { trim: <bool>
});
pos = rx[0, 0];
len = rx[0, 1];
Property | Value | Description |
---|---|---|
trim | <bool> | Removes white space at the beginning and end of elements |
Number of lines within a string "sLines"
The number of lines in a string s can be determined as follows
lines = sLines(s);
All line break variants are supported (LF, CRLF, LFCR, CR).
Extraction of a line from a multi-line string "sLine"
The following function can be used to extract the i-th line from a string s
lineI = sLine(s,i);
// Optional configuration object
lineX = sLine(..., { trim: <bool>
});
Property | Value | Description |
---|---|---|
trim | <bool> | Removes white space at the beginning and end of the line |
Read in complete lines "sGetLine"
The sGetLine()
function combines the individual values from the string channel s and stores them until a complete line can be output. This is removed from the buffer and the accumulation is continued with the remainder.
To obtain a complete line from a stream of characters, the following can be used
line1 = sGetLine(s);
// Optional configuration object
lineX = sGetLine(..., { trim: <bool>
, eoln: <str>
, timeout: <dbl>
});
Property | Value | Description |
---|---|---|
trim | <bool> | The output line is still cleaned of spaces |
eoln | <string> | Definition of the line terminator |
timeout | <dbl> | Time in seconds after the last fragment until a line end is inserted. |