How to identify proper substring length

How to identify proper substring length - c#

I'm trying to read column values from this file starting at the arrow position:
Here's my error:
I'm guessing it's because the length values are wrong.
Say I have column with value :"Dog "
with the word dog and a few spaces after it. Do I have to set the length parameter as 3 (for dog) or can I set it as 6 to accommodate the spaces after Dog. This because each column length is fixed. As you can see some words are smaller than others and in order to be consistent I just want to set length as max column length (ex: 28 is length of 3rd column of my file but not all 28 spots are taken up everytime - ex: the word client is only 6 characters long

Robert Levy's answer is correct for the issue you're seeing - you've attempted to pull a substring from a string with a starting position that is greater than the length of the string.
You're parsing a fixed-length field file, where each field has a certain amount of characters, whether or not it uses all of them, and the pos and len arrays are intended to define those field lengths for use with Substring. As long as the line you're reading matches the expected field starts and lengths, you will be ok. As soon as you come to a line that doesn't match (for example, what appears to be the totals line - 0TotalRecords: 3,390,315) the field length definitions you've been using won't work, as the format has changed (and the line length may not even be the same).
There are a couple of things I would change to make this work. First, I would change your pos and len arrays so that they take the entirety of the field, not part of it. You can use Trim() to get rid of any leading or trailing blanks. As defined, your first field will only take the last number of the Seq# (pos 4, len 1), and your second field will only take the first 5 characters of the field, even though it appears to have space for ~12 characters.
Take a look at this (it's hard to be exact working from the picture, but for purposes of demonstration it will work):
1 2 3 4
01234567890123456789012345678901234567890
Seq# Field Description
3 BELNR ACCOUNTING DOCUMENT NBR
The numbers are the position of each charcter in the line. I would define the pos array to be the start of the field (0 for the first field, and then the position of the first letter of the field heading for each field after that), so you would have:
Seq# = 0
Field = 6
Description = 18
The len array would hold the length of the field, which I would define as the amount of characters up to the beginning of the next field, like this:
Seq# = 6
Field = 12
Description = 28 (using what you have as it is hard to tell
This would make your array initialization the following:
int[] pos = new int[3] { 0, 6, 18 };
int[] len = new int[3] { 6, 12, 28 };
If you wanted the fourth field, it would start at position 36 (pos 18 + len 28 = 36).
The second thing is I would check in the loop to see if the Total Records line is there, and skip that line (most likely it's the last line):
foreach (string line in textBox1.Lines)
{
if (!line.Contains("Total Records"))
{
val[j] = line.Substring(pos[j], len[j]).Trim();
}
}
Another way to do this would be to modify the original query and add a TakeWhile clause to it to only take lines until you hit the Total Records one:
string[] lines = File.ReadAllLines(ofd.FileName).Skip(8)
.TakeWhile(l => !l.Contains("Total Records")).ToArray();
The above would skip the first 8 lines and take all the remaining lines up to, but not including, the first line to contain "Total Records" in the string.
Then you could do something like this:
string[] lines = File.ReadAllLines(ofd.FileName).Skip(8)
.TakeWhile(l => !l.Contains("Total Records")).ToArray();
textBox1.Lines = lines;
int[] vale = new int[3];
int[] pos = new int[3] { 0, 6, 18 };
int[] len = new int[3] { 6, 12, 28 };
foreach (string line in textBox1.Lines)
{
val[j] = line.Substring(pos[j], len[j]).Trim();
}
Now you don't have to check for the "Total Records" line.
Of course, if there are other lines in your file, or there are records after the "Total Records" line (which I rather doubt) you'll have to handle those cases as well.
In short, the code for pulling out the substrings will only work for lines that match that particular format (or more specifically, have fields that match those positions/lengths) - anything outside out of that will either give you incorrect values or throw an error (if the start position is greater than the length of the string).

that exception is complaining about the first parameter which suggests that your file contains a row that is < 18 characters

Related

Filtering on full string match but not on substrings

So I've got a long string of numbers and characters and I'd like to filter out a substring. The thing I'm struggling with is that I need a full match on a certain value (starting with S) but this may not be matched in another value.
Input:
S10 1+0000000297472+00EURS100 1+0000000297472+00EURS1023P 1+0000000816072+00EUR
The input is exactly like this.
Breakdown of input:
S10 1+0000000297472+00EUR
Every part starts with a tag S and ends with EUR
There are spaces in between because every part has a fixed length
=>
index 0 : tag 'S' with length 1
index 1 : code with length 7
index 8 : numbertype with length 1
index 9 : sign with length 1
index 10 : value with length 13
index 23 : sign with length 1
index 24 : exponent with length 2
index 26 : unit with length 3
I need to match on for example S10 and I only want this substring till EUR. I don't want it to match on S100 or S1023P or any other combination. Only on exactly S10
Output:
S10 1+0000000297472+00EUR
I'm trying to use Regex to find my match on 'S + code'. I'm doing a full match on my search query and then as soon as anything follows I don't want it anymore. But doing it like this also discards the actual match as after the S10 the value will follow which will match with [^\d|^\D])+\w
foreach (var field in fieldList)
{
var query = "S" + field.BallanceCode;
var index = Regex.Match(values, Regex.Escape(query) + #"([^\d|^\D])+\w").Index;
}
For example when looking for S10
needs to match:
S10 1+0000000297472+00EUR
may not match:
S10/15 1+0000001748447+00EUR
S1023P 1+0000000816072+00EUR
S10000001+0000000546546+00EUR
Update:
Using this code
var index = Regex.Match(values, Regex.Escape(query) + #"\p{Zs}.*?EUR").Index;
wil yield S10, S10/15, etc when looked for. However looking for S1000000 in the string doesn't work because there is no whitespace between the code and 1+
S10000001+0000000546546+00EUR
For example when looking for S1000000
needs to match:
S10000001+0000000297472+00EUR
may not match:
S10 1+0000001748447+00EUR
S1023P 1+0000000816072+00EUR
S10/15 1+0000000546546+00EUR

You can use a regex that requires a space (or whitespace) to appear right after the field.BallanceCode:
var index = Regex.Match(values, Regex.Escape(query) + (field.BallanceCode.Length < 7 ? #"\p{Zs}" : "") + ".*?EUR").Index;
The regex will match the S10, then any horizontal whitespace (\p{Zs}), then any 0 or more characters other than a newline (as few as possible due to *?) up to the first EUR.
The (field.BallanceCode.Length < 7 ? #"\p{Zs}" : "") check is necessary to support a 7-digit BallanceCode. If it contains 7 digits or more, we do not check if there is a whitespace after it. If the length is less than 7, we check for a space.

So you just want the start (S...) and end (...EUR) of each line and skip everything in between?
^([sS]\d+).*?([\d\+]+EUR)$
http://regexr.com/3c1ob

How Can I read From Line number() to line Starts with in C#

Let's say I have text file like this
<pre>----------------
hPa m C
---------------------
1004.0 28 13.6
1000.0 62 16.2
998.0 79 17.2
992.0 131 18.0
<pre>----------------
Sometext here
1000.0 10 10.6
1000.0 10 11.2
900.0 10 12.2
900.0 100 13.0
<aaa>----------------
How Can I Create Array in C# that reads text file from line number 5 (1004.0) to just before line that starts with string <pre>-
I used string[] lines = System.IO.File.ReadAllLines(Filepath);
To make each line in the array
The problem is I want only numbers of first section in the array in order to separate them later to another 3 arrays (hPa, m, C) .

Here's a possible solution. It's probably way more complicated than it should be, but that should give you an idea of possible mechanisms to further refine your data.
string[] lines = System.IO.File.ReadAllLines("test.txt");
List<double> results = new List<double>();
foreach (var line in lines.Skip(4))
{
if (line.StartsWith("<pre>"))
break;
Regex numberReg = new Regex(#"\d+(\.\d){0,1}"); //will find any number ending in ".X" - it's primitive, and won't work for something like 0.01, but no such data showed up in your example
var result = numberReg.Matches(line).Cast<Match>().FirstOrDefault(); //use only the first number from each line. You could use Cast<Match>().Skip(1).FirstOrDefault to get the second, and so on...
if (result != null)
results.Add(Convert.ToDouble(result.Value, System.Globalization.CultureInfo.InvariantCulture)); //Note the use of InvariantCulture, otherwise you may need to worry about , or . in your numbers
}

Do you mean this?
System.IO.StreamReader file = new System.IO.StreamReader(FILE_PATH);
int skipLines = 5;
for (int i = 0; i < skipLines; i++)
{
file.ReadLine();
}
// Do what you want here.

How can I get the IndexOf() method to return the correct values?

I have been working with googlemaps and i am now looking to format coordinates.
I get the coordinates in the following format:
Address(coordinates)zoomlevel.
I use the indexof method to get the start of "(" +1 so that i get the first number of the coordinate and store this value in a variable that i call "start".
I then do them same thing but this time i get the index of ")" -2 to get the last number of the last coordinate and store this value in a variable that i call "end".
I get the following error:
"Index and length must refer to a location within the string.Parameter name: length"
I get the following string as an imparameter:
"Loddvägen 155, 840 80 Lillhärdal, Sverige (61.9593214318303,14.0585965625)5"
by my calculations i should get the value 36 in the start variable and the value 65 in the end variable
but for some reason i get the values 41 in start and 71 in end.
why?
public string RemoveParantheses(string coord)
{
int start = coord.IndexOf("(")+1;
int end = coord.IndexOf(")")-2;
string formated = coord.Substring(start,end);
return formated;
}
I then tried hardcoding the correct values
string Test = cord.Substring(36,65);
I then get the following error:
startindex cannot be larger than length of string. parameter name startindex
I understand what both of the errors mean but in this case they are incorrect since im not going beyond the strings length value.
Thanks!

The second parameter of Substring is a length (MSDN source). Since you are passing in 65 for the second parameter, your call is trying to get the characters between 36 and 101 (36+65). Your string does not have 101 characters in it, so that error is thrown. To get the data between the ( characters, use this:
public string RemoveParantheses(string coord)
{
int start = coord.IndexOf("(")+1;
int end = coord.IndexOf(")")-2;
string formated = coord.Substring(start, end - start);
return formated;
}
Edit: The reason it worked with only the coordinates, was because the length of the total string was shorter, and since the coordinates started at the first position, the end coordinate was the last position. For example...
//using "Loddvägen 155, 840 80 Lillhärdal, Sverige (61.9593214318303,14.0585965625)5"
int start = coord.IndexOf("(") + 1; // 36
int end = coord.IndexOf(")")-2; // 65
coord.Substring(start, end); //looks at characters 35 through 101
//using (61.9593214318303,14.0585965625)5
int start = coord.IndexOf("(") + 1; // 1
int end = coord.IndexOf(")")-2; // 30
coord.Substring(start, end); //looks at characters 1 through 31
The second instance was valid because 31 actually existed in your string. Once you added the address to the beginning of the string, your code would no longer work.

Extracting parts of a string is a good use for regular expressions:
var match = Regex.Match(locationString, #"\((?<lat>[\d\.]+),(?<long>[\d\.]+)\)");
string latitude = match.Groups["lat"].Value;
string longitude = match.Groups["long"].Value;

You probably forgot to count newlines and other whitespaces, a \r\n newline is 2 "invisible" characters. The other mistake is that you are calling Substring with (Start, End) while its (Start, Count) or (Start, End - Start)

by my calculations i should get the value 36 in the start variable and the value 65 in the end variable
Then your calculations are wrong. With the string above I also see (and LinqPad confirms) that the open paren is at position 42 and the close paren is at index 73.
The error you're getting when using Substring is becuase the parameters to Substring are a beginning position and the length, not the ending position, so you should be using:
string formated = coord.Substring(start,(end-start+1));

That overload of Substring() takes two parameters, start index and a length. You've provided the second value as the index of the occurance of ) when really you want to get the length of the string you wish to trim, in this case you could subtract the index of ) from the index of (. For example: -
string foo = "Loddvägen 155, 840 80 Lillhärdal, Sverige (61.9593214318303,14.0585965625)5";
int start = foo.IndexOf("(") + 1;
int end = foo.IndexOf(")");
Console.Write(foo.Substring(start, end - start));
Console.Read();
Alternatively, you could parse the string using a regular expression, for example: -
Match r = Regex.Match(foo, #"\(([^)]*)\)");
Console.Write(r.Groups[1].Value);
Which will probably perform a little better than the previous example

string input =
"Loddvägen 155, 840 80 Lillhärdal, Sverige (61.9593214318303,14.0585965625)5";
var groups = Regex.Match(input,
#"\(([\d\.]+),([\d\.]+)\)(\d{1,2})").Groups;
var lat = groups[1].Value;
var lon = groups[2].Value;
var zoom = groups[3].Value;

C# loading data from text file

So I'm trying to load some data from a text file when the user types 'Load'. I have it sort of working, but there's a bug with it, it seems.
Let's say you buy an axe, which is 70 coins. You start with 100 coins, so you're left with 30 coins. You decide to save and exit the game. You come back and load your saved game, and you have 49 coins and 48 health points.
It also does the exact same thing when you save straight away with all the default values. I don't have any idea where it's get the value of 49 and 48 from.
const string f = "savedgame.txt";
using (StreamReader r = new StreamReader(f))
{
string line;
while ((line = r.ReadLine()) != null)
{
player.gold = line[0];
player.health = line[1];
Console.WriteLine("Your game has been loaded.");
menu.start(menu, shop, player, fishing, woodcut, mine, monster, quests, save, load);
}
}
This is my text file that I've just saved now.
100
20
1
0
0
5
I've tried examples off Google, but they did the same thing. So I did a bit of research and made one myself.... Did the same thing.
I don't know how much I can show you guys, but I can show more if needed.
Am I doing something wrong?

There are two problems with your current approach -
First, your logic is currently working on the characters of the first line -
The first time through your loop, line is "100", so line[0] is 1. This is a char of '1', not the value 1, so player.gold = line[0] translates into setting player.gold to the numerical equivelent of the character '1', which is 49.
Second, you're starting a loop and reading line by line, and then doing your work instead of reading all of the lines at once.
You can solve these issues by reading all of the lines at once, and working line by line. Then you also need to convert the entire line to a number, not read one character:
const string f = "savedgame.txt";
var lines = File.ReadAllLines(f);
player.gold = Convert.ToInt32(lines[0]);
player.health = Convert.ToInt32(lines[1]);

r is a string, not an array of lines. you're trying to set gold to the first character of the first line, and health to the second character, etc.

Parse string with encapsulated int to an array that contains other numbers to be ignored

during my coding I've come across a problem that involved parsing a string like this:
{15} there are 194 red balloons, {26} there are 23 stickers, {40} there are 12 jacks, ....
my code involved pulling both the sentence and the number into two separate arrays.
I've solved the problem involving parsing out the sentence into its own array using a *.Remove(0, 5) to eliminate the first part the problem with that part was that I had to make sure that the file always was written to a standard where {##} where involved however it was not as elegant as I would like in that some times the number would be {3} and i would be forced to make it { 3}.
as there were also the chance of the string containing other numbers I wasn't able to simply parse out the integers first.
int?[] array = y.Split(',')
.Select(z =>
{
int value;
return int.TryParse(z, out value) ? value : (int?)null;
})
.ToArray();
so anyway back to the problem at hand, I need to be able to parse out "{##}" into an array with each having its own element.

Here's one way to do it using positive lookaheads/lookbehinds:
string s = "{15} there are 194 red balloons, {26} there are 23 stickers, {40} there are 12 jacks";
// Match all digits preceded by "{" and followed by "}"
int[] matches = Regex.Matches(s, #"(?<={)\d+(?=})")
.Cast<Match>()
.Select(m => int.Parse(m.Value))
.ToArray();
// Yields [15, 26, 40]

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to identify proper substring length - c#

that exception is complaining about the first parameter which suggests that your file contains a row that is < 18 characters

Related

Filtering on full string match but not on substrings

How Can I read From Line number() to line Starts with in C#

How can I get the IndexOf() method to return the correct values?

C# loading data from text file

Parse string with encapsulated int to an array that contains other numbers to be ignored

Categories

Resources