C# How to organise an array with breaks [closed] - c#

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have a spreadsheet with 12 columns and between 1 and 50 rows - I do not know how many rows there will be.
If I copy the data out of the spreadsheet I can create an array with any number of Rows and all of the data is separated into an array no problem.
I have a further 6 pieces of data from other sources within the program ‘a’ to ‘f’ - taken from various TextBoxes and DatePickers.
I need to take data from cells: 1, 2, 3, 4, 5, 6 from the array - this becomes 13, 14, 15, 16, 17, 18 from the second row and so on (as there are 6 cells containing data I do not need in each row).
I need to order the data:
a, b, 3, 2, c, d, 1, 4, 5, 6, e, f
a, b, 15, 14, c, d, 13, 16, 17, 18, e, f
And so on.
This new string of data needs to be copied into a different spreadsheet that I cannot change.
I would like to be able to add more than 1 row at a time.
With a lot of help from stack overflow, which improved the code I was using to add 1 row at a time, I created this code:
string phrase = Value.Text;
string[] words = Value.Text.Split(new char[] { '\t', '\r' });
List<string> values = new List<string>();
values.Add(a.Text)
values.Add(b.Text);
values.Add(words[3]);
values.Add(words[2]);
values.Add(c.Text);
values.Add(d.Text);
values.Add(words[1]);
values.Add(words[4]);
values.Add(words[5]);
values.Add(words[6]);
values.Add(e.Text);
values.Add(f.Text);
string outPut = String.Join("\t", values);
this.OutPutValue.Text = outPut;
This works for a single row.
I can add a string:
String newLine = “\r”
So I have this code:
values.Add(e.Text);
values.Add(f.Text);
values.Add(newLine);
values.Add(a.Text)
values.Add(b.Text);
values.Add(words[15]);
values.Add(words[14]);
values.Add(c.Text);
And so on...
If I try this, in the receiving spreadsheet the second Row starts on Column B instead of Column A because of the extra Tab from:
String.Join("\t", values);
Is there a way to introduce a line break so that the next line starts on Column A, rather than Column B?
Someone offered StringBuilder when I raised this before, but I failed to give enough information and I do not think that would work in this scenario (or at least I could not get my head around it).
Thanks for any help.

Here's how you can produce the two lines of data:
List<string> values = new List<string>();
StringBuilder sb = new System.Text.StringBuilder();
string[] words = Value.Text.Split(new char[] { '\t', '\r' });
values.Add(a.Text);
values.Add(b.Text);
values.Add(words[3]);
values.Add(words[2]);
values.Add(c.Text);
values.Add(d.Text);
values.Add(words[1]);
values.Add(words[4]);
values.Add(words[5]);
values.Add(words[6]);
values.Add(e.Text);
values.Add(f.Text);
sb.AppendLine(String.Join("\t", values));
values.Clear();
values.Add(a.Text);
values.Add(b.Text);
values.Add(words[15]);
values.Add(words[14]);
values.Add(c.Text);
values.Add(d.Text);
values.Add(words[13]);
values.Add(words[16]);
values.Add(words[17]);
values.Add(words[18]);
values.Add(e.Text);
values.Add(f.Text);
sb.AppendLine(String.Join("\t", values));
OutPut.AppendText(sb.ToString());
But does "Value.Text" above represent all of the rows, or just a single row?

Related

Regex to extract time information from a string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm receiving data from a third party device. I need to extract two pieces of information. I think I need to use a Regular Expression, but I don't know anything of this.
Below you can find a few example strings:
TN 12 1 17:45:19.90400 7173
TN 4 4 17:45:20.51800 7173
TN 13 1 17:45:24.03200 7173
TN 5 4 17:45:26.06300 7173
TN 6 4 17:45:29.28700 7173
TN 14 1 17:45:31.03200 7173
From each of these strings I need to extract two pieces of data:
the time
the number before the time
So the data I'm looking for is this:
1 and 17:45:19.90400
4 and 17:45:20.51800
1 and 17:45:24.03200
4 and 17:45:26.06300
4 and 17:45:29.28700
1 and 17:45:31.03200
The number will always be present and it will always be 1, 2, 3 or 4.
The time will also be the same format but I'm not sure if there will be single digit hours. So I don't know if 9 o'clock will be displayed as
9 or 09
Any suggestions on how I can extract this using a RegEx?
Thanks
My usual approach to this is to create a class that represents the data we want to capture, and give it a static Parse method that takes in an input string and returns an instance of the class populated with data from the string. Then we can just loop through the lines and populate a list of our custom class with data from each line.
For example:
class TimeData
{
public TimeSpan Time { get; set; }
public int Number { get; set; }
public static TimeData Parse(string input)
{
var timeData = new TimeData();
int number;
TimeSpan time;
if (string.IsNullOrWhiteSpace(input)) return timeData;
var parts = input.Split(new char[0], StringSplitOptions.RemoveEmptyEntries);
if (parts.Length > 2 && int.TryParse(parts[2], out number))
{
timeData.Number = number;
}
if (parts.Length > 3 && TimeSpan.TryParseExact(parts[3], "hh\\:mm\\:ss\\.fffff",
CultureInfo.CurrentCulture, out time))
{
timeData.Time = time;
}
return timeData;
}
}
Now we can just loop through the list of strings, call Parse on each line, and end up with a new list of objects that contain the Time and associated Number for each line. Also note that, by using a TimeSpan to represent the time, we now have properties for all the parts, like Hour, Minute, Seconds, Milliseconds, TotalMinutes, etc:
var fileLines = new List<string>
{
"TN 12 1 17:45:19.90400 7173",
"TN 4 4 17:45:20.51800 7173",
"TN 13 1 17:45:24.03200 7173",
"TN 5 4 17:45:26.06300 7173",
"TN 6 4 17:45:29.28700 7173",
"TN 14 1 17:45:31.03200 7173",
};
List<TimeData> allTimeData = fileLines.Select(TimeData.Parse).ToList();

Binary Writer/Reader extra character [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 7 years ago.
Improve this question
I am converting some legacy VB6 code to C# and this just has me a little baffled. The VB6 code wrote certain data sequentially to a file. This data is always 110 bytes. I can read this file just fine in the converted code, but I'm having trouble with when I write the file from the converted code.
Here is a stripped down sample I wrote real quick in LINQPad:
void Main()
{
int[,] data = new[,]
{
{
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
},
{
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39
}
};
using ( MemoryStream stream = new MemoryStream() )
{
using ( BinaryWriter writer = new BinaryWriter( stream, Encoding.ASCII, true ) )
{
for( var i = 0; i < 2; i++ )
{
byte[] name = Encoding.ASCII.GetBytes( "Blah" + i.ToString().PadRight( 30, ' ' ) );
writer.Write( name );
for( var x = 0; x < 20; x++ )
{
writer.Write( data[i,x] );
}
}
}
using ( BinaryReader reader = new BinaryReader( stream ) )
{
// Note the extra +4 is because of the problem below.
reader.BaseStream.Seek( 30 + ( 20 * 4 ) + 4, SeekOrigin.Begin );
string name = new string( reader.ReadChars(30) );
Console.WriteLine( name );
// This is the problem..This extra 4 bytes should not be here.
//reader.ReadInt32();
for( var x = 0; x < 20; x++ )
{
Console.WriteLine( reader.ReadInt32() );
}
}
}
}
As you can see, I have a 30 character string written first. The string is NEVER longer than 30 characters and is padded with spaces if it is shorter. After that, twenty 32-bit integers are written. It is always 20 integers. So I know each character in a string is one byte. I know a 32 bit integer is four bytes. So in my reader sample, I should be able to seek 110 bytes ( 30 + (4 * 20) ), read 30 chars, and then read 20 ints and that's my data. However, for some reason, there is an extra 4 bytes being written after the string.
Am I just missing something completely obvious (as is normally the case for myself)? Strings aren't null terminated in .Net and this is four bytes anyway, not just an extra byte? So where is this extra 4 bytes coming from? I'm not directly calling Write(string) so it can't be a prefixed length, which it's obviously not since it's after my string. If you uncomment the ReadInt32(), it produces the desired result.
The extra 4 bytes are from the extra 4 characters you're writing. Change the string you're encoding as ASCII to this:
("Blah" + i.ToString()).PadRight(30, ' ')
That is, pad the string after you've concatenated the prefix and the integer.
Your extra four bytes are whitespace, because you aren't subtracting the length of 'Blah'. You don't know where you are in your stream. So basically, you think you're writing only 30 chars, but you really wrote 34 chars.
I know you didn't ask this - but you're writing garbage data to a file that doesn't need to be there.
Instead of padding your string with whitespace, you should just include a header or pointer that indicates the length of the next field in your file.
For example, say you have a 120 byte file. The first 4 bytes of the file indicate that the length of the following string is 96 bytes. So you read 4 bytes, get the length and then read 96 bytes. The next 4 bytes say that you have a string that's 16 bytes long, so you read the next 16 bytes and get your next string. This is pretty much how every well defined protocol works.

Parsing Plain Text Table

I'm trying to parse a table in plain text format. The program is written in Visual Studio using C#. I need to parse through the table and insert the data into the database.
Below is a sample table I will be reading in:
ID Name Value1 Value2 Value3 Value4 //header
1 nameA 3.0 0.2 2 6.2
2 nameB
3 nameC 2.9 3.0 7.3
4 nameD 1.5 3.0 1.8 1.1
5 nameE
6 nameF 1.2 2.4 3.3 2.5
7 nameG 3.0 3.2 2.1 4.5
8 nameH 88 12.4 28.9
In the example, I will need to capture data for id 1, 3, 4, 6, 7, and 8.
I thought of two ways to approach this, but neither of them works 100%.
Method 1:
By reading in the header, I can get the start index for each column. I will then use Substring collect data for each row.
ISSUE: once it past a certain row (which I will have no idea when this is happening), the columns shift, and Substring will no longer to collect the correct data.
This method will only collect correct data for 1, 3, and 4.
Method 2:
Using Regex to collect all the matches. I'm hoping this can collect ID, Name, Value1, Value2, Value3, Value4, in this order.
My pattern is (\d*?)\s\s\s+(.*?)\s\s\s+(\d*\.*\d*)\s\s\s+(\d*\.*\d*)\s\s\s+(\d*\.*\d*)\s\s\s+(\d*\.*\d*)
ISSUE: data that are collected are shifted left for some rows. For example, on ID 3, Value2 should be blank, but the regex will be reading Value2 = 3.0, Value3 = 7.3, and Value4 = blank. Same thing goes for ID 8.
Question:
How can I read in the whole table and parse them correctly?
(1) I do not know starting from which row the values will be shifted and
(2) I do not know how many cells it will be shifted by and if they are consistent.
Additional Information
The table is in a PDF file, I converted the PDF to text file so I can read in the data. The shifting data happens when a table goes across multiple pages, but it is not consistent.
EDIT
Below are some actual data:
68 BENZYL ALCOHOL 6.0 0.4 1 7.4
91 EVERNIA PRUNASTRI (OAK MOSS) 34 3 3 10
22 test 2323 23 12
ok, here u go! Use this regex pattern:
NOTE: you have to match this to any single line, not to the whole document! If you want to do it for your whole document then you have to add the 'multiline' modifier ('m'). You can do this by adding (?m) at the beginning of the regex pattern!
EDIT:
You provided some lines of your real data. Here's my updated regex pattern:
^(?<id>\d+)(?:\s{2,25})(?<name>.+?)(?:\s{2,45})(?<val1>\d+(?:\.\d+)?)?(?:\s{2,33})(?<val2>\d+(?:\.\d+)?)?(?:\s{2,14})(?<val3>\d+(?:\.\d+)?)?(?:\s{2,19})(?<val4>\d+(?:\.\d+)?)?$
How about treating this file like a fixed-length file, where you can define each column by an index and length. Once you have defined your fixed length columns, you can just get the value for the column with Substring, then Trim to clean it up.
You can wrap all this up in a Linq statement to project to an anonymouse type and filter for the IDs you want.
Something like this:
static void Main(string[] args)
{
int[] select = new int[] { 1, 3, 4, 6, 7, 8 };
string[] lines = File.ReadAllLines("TextFile1.txt");
var q = lines.Skip(1).Select(l => new {
Id = Int32.Parse(GetValue(l, 0, 6)),
Name = GetValue(l, 6, 11),
Value1 = GetValue(l, 17, 11),
Value2 = GetValue(l, 28, 13),
Value3 = GetValue(l, 41, 14),
Value4 = GetValue(l, 55, 13),
}).Where(o => select.Contains(o.Id));
var r = q.ToArray();
}
static string GetValue(string line, int index, int length)
{
string value = null;
int lineLength = line.Length;
// Take as much of the line as we can up to column length
if(lineLength > index)
value = line.Substring(index, Math.Min(length, lineLength - index)).Trim();
// Return null if we just have whitespace
return String.IsNullOrWhiteSpace(value) ? null : value;
}

Convert a string to multidimensional array

I have a matrix, which is read from the console. The elements are separated by spaces and new lines. How can I convert it into a multidimensional int array in c#? I have tried:
String[][] matrix = (Console.ReadLine()).Split( '\n' ).Select( t => t.Split( ' ' ) ).ToArray();
but when I click enter, the program ends and it doesn't allow me to enter more lines.
The example is:
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
int[,] Matrix = new int[n_rows,n_columns];
for(int i=0;i<n_rows;i++){
String input=Console.ReadLine();
String[] inputs=input.Split(' ');
for(int j=0;j<n_columns;j++){
Matrix[i,j]=Convert.ToInt32(inputs[j]);
}
}
you can try this to load the matrix
First things first, Console.ReadLine() reads a single line from the input. So in order to accept multiple lines you need to do 2 things:
Have a loop which allows the user to continue entering lines of data. You could let them go until they leave a line blank, or you could fix it to 5 lines of input
Store these "lines" of data for future processing.
Assuming the user can enter any number of lines, and a blank line (just hitting enter) indicates the end of data entry something like this will suffice
List<string> inputs = new List<string>();
var endInput = false;
while(!endInput)
{
var currentInput = Console.ReadLine();
if(String.IsNullOrWhitespace(currentInput))
{
endInput = true;
}
else
{
inputs.Add(currentInput);
}
}
// when code continues here you have all the user's input in separate entries in "inputs"
Now for turning that into an array of arrays:
var result = inputs.Select(i => i.Split(' ').ToArray()).ToArray();
That will give you an array of arrays of strings (which is what your example had). If you wanted these to be integers you could parse them as you go:
var result = inputs.Select(i => i.Split(' ').Select(v => int.Parse(v)).ToArray()).ToArray();
// incoming single-string matrix:
String input = #"1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9";
// processing:
String[][] result = input
// Divide in to rows by \n or \r (but remove empty entries)
.Split(new[]{ '\n', '\r' }, StringSplitOptions.RemoveEmptyEntries)
// no divide each row into columns based on spaces
.Select(x => x.Split(new[]{ ' ' }, StringSplitOptions.RemoveEmptyEntries))
// case from IEnumerable<String[]> to String[][]
.ToArray();
result:
String[][] result = new string[]{
new string[]{ "1","2","3","4","5" },
new string[]{ "2","3","4","5","6" },
new string[]{ "3","4","5","6","7" },
new string[]{ "4","5","6","7","8" },
new string[]{ "5","6","7","8","9" }
};
It can be done in multiple ways
You can read a single line containing multiple numbers separated by a char, split by that char obtaining an array of ints and then you should fetch a matrix.
With out-of-the-box linq there is no trivial way for the fetching step and i think it is not really the case to use third-party libraries from codeplex like LinqLib or something.

How to identify proper substring length

I'm trying to read column values from this file starting at the arrow position:
Here's my error:
I'm guessing it's because the length values are wrong.
Say I have column with value :"Dog "
with the word dog and a few spaces after it. Do I have to set the length parameter as 3 (for dog) or can I set it as 6 to accommodate the spaces after Dog. This because each column length is fixed. As you can see some words are smaller than others and in order to be consistent I just want to set length as max column length (ex: 28 is length of 3rd column of my file but not all 28 spots are taken up everytime - ex: the word client is only 6 characters long
Robert Levy's answer is correct for the issue you're seeing - you've attempted to pull a substring from a string with a starting position that is greater than the length of the string.
You're parsing a fixed-length field file, where each field has a certain amount of characters, whether or not it uses all of them, and the pos and len arrays are intended to define those field lengths for use with Substring. As long as the line you're reading matches the expected field starts and lengths, you will be ok. As soon as you come to a line that doesn't match (for example, what appears to be the totals line - 0TotalRecords: 3,390,315) the field length definitions you've been using won't work, as the format has changed (and the line length may not even be the same).
There are a couple of things I would change to make this work. First, I would change your pos and len arrays so that they take the entirety of the field, not part of it. You can use Trim() to get rid of any leading or trailing blanks. As defined, your first field will only take the last number of the Seq# (pos 4, len 1), and your second field will only take the first 5 characters of the field, even though it appears to have space for ~12 characters.
Take a look at this (it's hard to be exact working from the picture, but for purposes of demonstration it will work):
1 2 3 4
01234567890123456789012345678901234567890
Seq# Field Description
3 BELNR ACCOUNTING DOCUMENT NBR
The numbers are the position of each charcter in the line. I would define the pos array to be the start of the field (0 for the first field, and then the position of the first letter of the field heading for each field after that), so you would have:
Seq# = 0
Field = 6
Description = 18
The len array would hold the length of the field, which I would define as the amount of characters up to the beginning of the next field, like this:
Seq# = 6
Field = 12
Description = 28 (using what you have as it is hard to tell
This would make your array initialization the following:
int[] pos = new int[3] { 0, 6, 18 };
int[] len = new int[3] { 6, 12, 28 };
If you wanted the fourth field, it would start at position 36 (pos 18 + len 28 = 36).
The second thing is I would check in the loop to see if the Total Records line is there, and skip that line (most likely it's the last line):
foreach (string line in textBox1.Lines)
{
if (!line.Contains("Total Records"))
{
val[j] = line.Substring(pos[j], len[j]).Trim();
}
}
Another way to do this would be to modify the original query and add a TakeWhile clause to it to only take lines until you hit the Total Records one:
string[] lines = File.ReadAllLines(ofd.FileName).Skip(8)
.TakeWhile(l => !l.Contains("Total Records")).ToArray();
The above would skip the first 8 lines and take all the remaining lines up to, but not including, the first line to contain "Total Records" in the string.
Then you could do something like this:
string[] lines = File.ReadAllLines(ofd.FileName).Skip(8)
.TakeWhile(l => !l.Contains("Total Records")).ToArray();
textBox1.Lines = lines;
int[] vale = new int[3];
int[] pos = new int[3] { 0, 6, 18 };
int[] len = new int[3] { 6, 12, 28 };
foreach (string line in textBox1.Lines)
{
val[j] = line.Substring(pos[j], len[j]).Trim();
}
Now you don't have to check for the "Total Records" line.
Of course, if there are other lines in your file, or there are records after the "Total Records" line (which I rather doubt) you'll have to handle those cases as well.
In short, the code for pulling out the substrings will only work for lines that match that particular format (or more specifically, have fields that match those positions/lengths) - anything outside out of that will either give you incorrect values or throw an error (if the start position is greater than the length of the string).
that exception is complaining about the first parameter which suggests that your file contains a row that is < 18 characters

Categories

Resources