how to get text file rows with no delimiter into array

how to get text file rows with no delimiter into array - c#

I have a text file that I'm trying to input into an array called columns.
Each row in the text file belongs to a different attribute in a sub-class I have created.
For example, row 2 in my text file is a date that I would like to pass over...I do not want to use the Split because I do not have a delimiter but I do not know an alternative. I am not fully understanding the below if someone could help. When I try to run it, it says that columns[1] is out of its range...Thank you.
StreamReader textIn =
new StreamReader(
new FileStream(path, FileMode.OpenOrCreate, FileAccess.Read));
//create the list
List<Event> events = new List<Event>();
while (textIn.Peek() != -1)
{
string row = textIn.ReadLine();
string[] columns = row.Split(' ');
Event special = new Event();
special.Day = Convert.ToInt32(columns[0]);
special.Time = Convert.ToDateTime(columns[1]);
special.Price = Convert.ToDouble(columns[2]);
special.StrEvent = columns[3];
special.Description = columns[4];
events.Add(special);
}
Input file sample:
1
8:00 PM
25.00
Beethoven's 9th Symphony
Listen to the ninth and final masterpiece by Ludwig van Beethoven.
2
6:00 PM
15.00
Baseball Game
Come watch the championship team play their archrival--No work stoppages, guaranteed.

Well, one way to do it (though it is a bit ugly) would be to use File.ReadAllLines, and then loop through the array, something like this:
string[] lines = File.ReadAllLines(path);
int index = 0;
while (index < lines.Length)
{
Event special = new Event();
special.Day = Convert.ToInt32(lines[index]);
special.Time = Convert.ToDateTime(lines[index + 1]);
special.Price = Convert.ToDouble(lines[index + 2]);
special.StrEvent = lines[index + 3];
special.Description = lines[index + 4];
events.Add(special);
lines = lines + 5;
}
This is very brittle code - a lot can break it. What if one of the events is missing a line? What if there are multiple blank lines in it? What if one of the Convert.Toxxx methods throws an error?
If you have the option to change the format of the file, I strongly recommend you make it delimited at least. If you can't change the format, you'll need to make the code sample above more robust so that it can handle blank lines, failed conversions, missing lines, etc.
Much, much, much easier to use a delimited file. Even easier to use an XML or JSON file.
Delimited File (CSV)
Let's say you have the same sample input, but this time it's a CSV file, like this:
1,8:00 PM,25.00,"Beethoven's 9th Symphony","Listen to the ninth and final masterpiece by Ludwig van Beethoven."
2,6:00 PM,15.00,"Baseball Game","Come watch the championship team play their archrival--No work stoppages, guaranteed"
I put quotes on the last two items in case there's ever a comma in there, it won't break the parsing.
For CSV files, I like to use the Microsoft.VisualBasic.FileIO.TextFieldParser class, which despite it's name can be used in C#. Don't forget to add a reference to Microsoft.VisualBasic and a using directive (using Microsoft.VisualBasic.FileIO;).
The following code will allow you to parse the above CSV sample:
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] {","};
parser.TextFieldType = Delimited;
parser.HasFieldsEnclosedInQuotes = true;
string[] parsedLine;
while (!parser.EndOfData)
{
parsedLine = parser.ReadFields();
Event special = new Event();
special.Day = Convert.ToInt32(parsedLine[0]);
special.Time = Convert.ToDateTime(parsedLine[1]);
special.Price = Convert.ToDouble(parsedLine[2]);
special.StrEvent = parsedLine[3];
special.Description = parsedLine[4];
events.Add(special);
}
}
This still has some issues though - you would need to handle cases where there were missing fields and I would recommend using TryParse methods instead of Convert.Toxxx, but it's a little easier (I think) than the non-delimited sampe.
XML File (Using LINQ to XML)
Now let's try it with an XML file and use LINQ to XML to get the data:
<Events>
<Event>
<Day>1</Day>
<Time>8:00 PM</Time>
<Price>25.00</Price>
<Title><![CDATA[Beethoven's 9th Symphone]]></Title>
<Description><![CDATA[Listen to the ninth and final masterpiece by Ludwig van Beethoven.]]></Description>
</Event>
<Event>
<Day>2</Day>
<Time>6:00 PM</Time>
<Price>15.00</Price>
<Title><![CDATA[Baseball Game]]></Title>
<Description><![CDATA[Come watch the championship team play their archrival--No work stoppages, guaranteed]]></Description>
</Event>
</Events>
I've used CDATA for the title and description so that special characters won't break the XML parsing.
This is easily parsed into your Events by the following code:
XDocument doc = XDocument.Load(path);
List<Event> events = (from x in doc.Descendants("Event")
select new Event {
Day = Convert.ToInt32(x.Element("Day").Value),
Time = Convert.ToDateTime(x.Element("Time").Value),
Price = Convert.ToDouble(x.Element("Price").Value),
StrEvent = x.Element("Title").Value,
Description = x.Element("Description").Value
}).ToList();
Of course, this is still not perfect as you still have the possibility of conversion failures or missing elements.
Pipe-Delimited File Example
Per our discussion in the comments, if you want to use the pipe (|), you need to put each event (in its entirety) on one line, like this:
1|8:00 PM|25.00|Beethoven's 9th Symphony|Listen to the ninth and final masterpiece by Ludwig van Beethoven.
2|6:00 PM|15.00,|Baseball Game|Come watch the championship team play their archrival--No work stoppages, guaranteed
You can still use the TextFieldParser example above if you like (just change the delimiter from , to |, or if you want you can use your original code.
Some Final Thoughts
I wanted to also address the original code and show why it wasn't working. The main reason was that you were reading one line at a time, and then splitting on ' '. This would have been a good start if all the fields were on the same line (although it still would have had problems because of spaces in the Time, StrEvent and Description fields), but they weren't.
So when you read the first line (which was 1) and split on ' ', you got one value back (1). When you tried to access the next element of the split array, you got the index out of range error because there was no columns[1] for that line.
Essentially, you were trying to treat each line as if it had all the fields in it, when in reality it was one field per line.

For your given sample file something like
string[] lines = File.ReadAllLines(path);
for (int index = 4; index < lines.Length; index += 5)
{
Event special = new Event();
special.Day = Convert.ToInt32(lines[index - 4]);
special.Time = Convert.ToDateTime(lines[index - 3]);
special.Price = Convert.ToDouble(lines[index - 2]);
special.StrEvent = lines[index - 1];
special.Description = lines[index];
events.Add(special);
}
Would do the job, but like Tim already mentioned, you should consider changing your file format.

delimiters can be deleted if your side column values haven't intersect char or have fix size.by this condition you can read file and split field on it.
if you want to read from file and load data automatically to variables , i suggest Serialize and deSeialize variabls to file but that file isn't text file!

Related

Using StreamReader to read a .text file and split it up into arrays/classes

I need to use something like StreamReader to read a .text file and spit it out into arrays that could be used for a pictureviewer and option boxes, etc. The layout of the text file is something like:
PhotoURL PAGEURL SKU# Option1 Option2 Option3 .etc
[Edit]:
example of text file
http://image.com/book.jpg google.com PG52389 Hardcover Ebook
http://item.com/shirt.jpg google.com SH34920 Small Medium Large
ExamplePhotoUrlHere google.com SE39270 Grey Black Red Blue
Not every item has every single option, so there are some blanks on certain columns.
I know I need to use streamreader to read the text file, but I'm not sure how to split it into a class with arrays and all that.

Possible starting point (just splits on spaces):
var linesAsArrays = File.ReadAllLines(absoluteFilePath).Split();
You'll need to figure out what "column" mean - because from your sample it is unclear how to separate one from another.
Note that it may be better option to find existing CSV reader instead of inventing your own.

I am not going to write a complete solution, but you could do something like this (untested)
char[] separators = { ' ', '\t' };
while(!streamReader.EndOfStream)
{
string line = streamReader.ReadLine();
string[] fields = line.Split(separators, 4);
var result = new
{
PhotoUrl = fields[0],
PageUrl = fields[1],
Sku = fields[2],
Options = fields[3].Split(separators),
};
// Use result
}

How to traverse back and forth of text file line

I have a text file which contains lines that i need to process.Here is the format of the lines present into my text file..
07 IVIN 15:37 06/03 022 00:00:14 600 2265507967 0:03
08 ITRS 15:37 06/03 022 00:00:09 603 7878787887 0:03
08 ITRS 15:37 06/03 022 00:00:09 603 2265507967 0:03
Now as per my requirement i have to read this text file line by line.Now as soon as i get ITRS into any line i have to search for the number 2265507967 into the immediate upside of the text file lines.As soon as it gets 2265507967 in the upside lines ,it should read that line.
Now i am reading the lines into strings and breaking into characters based on spaces.Here is my code..
var strings = line.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
My problem is that i am not getting way to traverse upside of the text file and search for the substring .i.e. 2265507967.Please help .

I am not aware of being able to go backwards when reading a file (other than using the seek() method) but I might be wrong...
A simpler approach would be to:
Create a dictionary, key value being the long numeric values while the value being the line to which it belongs: <2265507967,07 IVIN 15:37 06/03 022 00:00:14 600 2265507967 0:03>
Go through the file one line at a time and:
a. If the line contains ITRS, get the value from the line and check your dictionary. Once you will have found it, clear the dictionary and go back to step 1.
b. If it does not contain ITRS, simply add the number and the line as key-value pairs.
This should be quicker than going through one line at a time and also simpler. The drawback would be that it could be quite memory intensive.
EDIT: I do not have a .NET compiler handy, so I will provide some pseudo code to better explain my answer:
//Initialization
Dictionary<string, string> previousLines = new Dictionary<string, string>();
TextReader tw = new TextReader(filePath);
string line = String.Empty;
//Read the file one line at a time
while((line = tw.ReadLine()) != null)
{
if(line.contains("ITRS")
{
//Get the number you will use for searching
string number = line.split(new char[]{' '})[4];
//Use the dictionary to read a line you have previously read.
string line = previousLines[number];
previousLines.Clear(); //Remove the elements so that they do not interrupt the next searches. I am assuming that you want to search between lines which are found between ITRS tags. If you do not want this, simply omit this line.
... //Do your logic here.
}
else
{
string number = line.split(new char[]{' '})[4];
previousLines.Add(number, line);
}
}

Comparing excel sheet to text file

I have the following data from an excel sheet:
06:07:00 6:07
Data1
Data2
Data3
Data4
06:15:00 06:15
Data5
Data6
Data7
Data8
I want to compare this to the following data from text file:
XXXXXXXXXX 06:08:32 13.0 Data1
XXXXXXXXXX 06:08:45 6.0 Data2
xxxxxxxxxx 06:08:51 5.0 Data3
xxxxxxxxxx 06:08:56 13.0 Data4
xxxxxxxxxx 06:13:44 9.0 Data5
xxxxxxxxxx 06:13:53 11.0 Data6
xxxxxxxxxx 06:14:04 6.0 Data7
xxxxxxxxxx 06:14:10 13.0 Data8
As I want to use the time to compare the two files (excel with text), Time is different for each group. Group1(data1 to Data4), group2 (Data5-data8).
Does anyone have any idea how to go about this situation.
EDIT1:
Here is what I tried to do:
private void doTest(string time)
{
TimeSpan ts = TimeSpan.Parse(time);
int hours = ts.Hours;
int min = ts.Minutes;
int sec = ts.Seconds;
int minstart, minend;
string str;
minstart = min - 5;
minend = min + 5;
while (min != minend)
{
sec = sec + 1;
if (sec < 60)
{
if (hours < 10)
str = hours.ToString().PadLeft(2, '0');
else str = hours.ToString();
if (minstart < 10)
str = str + minstart.ToString().PadLeft(2, '0');
else str = str + minstart.ToString();
if (sec < 10)
str = str + sec.ToString().PadLeft(2, '0');
else str = str + sec.ToString();
chkwithtext(str);
}
else if (sec == 60)
{
sec = 00;
min = min + 1;
str = hours.ToString() + min.ToString() + sec.ToString();
chkwithtext(str);
}
}
}
private void chkwithtext(string str)
{
// check with the text file here if time doesn't match go
// back increment the time with 1sec and then check here again
}

It's not precisely clear how you are 'comparing' the times, but for this answer I'll make the assumption that data from the text file is to be compared if, and only if, its timestamp is within x minutes (defaulting to x = 5) of the Excel timestamp.
My recommendation would be to use an Excel add-in called Schematiq for this - you can download this (approx. 9MB) from http://schematiq.htilabs.com/ (see screenshots below). It's free for personal, non-commercial use. (Disclaimer: I work for HTI Labs, the authors of Schematiq.)
However, I'd do the time handling in Excel. First we'll calculate the start/stop limits for the Excel timestamps. For example, for the first time (06:07:00) we want the range 6:02-6:12. We'll also break the actual, 'start' and 'end' times into hours, minutes and seconds for ease later on. The Excel data sheet looks like this:
Next we need a Schematiq 'template function' which will take the start and end times and return us a range of times. This template is shown here:
The input values to this function are effectively 'dummy' values - the function is compiled internally by Schematiq and can then be called with whatever inputs are required. The 'Result' cell contains text starting with '~#...' (and likewise several of the previous cells) - this indicates a Schematiq data-link containing a table, function or other structure. To view it, you can click the cell and look in the Schematiq Viewer which appears as a task pane within Excel like this:
In other words, Schematiq allows you to hold an entire table of data within a single cell.
Now everything is set up, we simply import the text file and get Schematiq to do the work for us. For each 'time group' within the Excel data, a suitable range of times is generated and this is matched against the text file. You are returned all matching data, plus any unmatched data from both Excel and the text file. The necessary calculations are shown here:
Your Excel worksheet is therefore tiny, and clicking on the final cell will display the final results in the Schematiq Viewer. The results, including the Excel data and the 'template calculation', are shown here:
To be clear, what you see in this screenshot is the entire contents of the workbook - there are no other calculations taking place anywhere other than in the actual cells you see.
The 'final results' themselves are shown enlarged here:
This is exactly the comparison you're after (with a deliberately introduced error - Data9 - in the text file, to demonstrate the matching). You can then carry out whatever comparisons or further analysis you need to.
All of the data-links represent the use of Schematiq functions - the syntax is very similar to Excel and therefore easy to pick up. As an example, the call in the final cell is:
=tbl.SelectColumns(D21, {"Data","Text file"}, TRUE)
This selects all columns from the Schematiq table in cell D21 apart from the 'Data' and 'Text file' columns (the final Boolean argument to this function indicates 'all but').
I'd recommend downloading Schematiq and trying this for yourself - I'd be very happy to email you a copy of the workbook I've put together, so it should just run immediately.

I'm not sure if I understand what do you mean, but I'd start with exporting excel file to csv with ; separator - it's way much easier to work this way. Then some simple container class:
public class DataTimeContainer
{
public string Data;
public string TimeValue1 = string.Empty;
public string TimeValue2 = string.Empty;
}
And use it this way:
//Processint first file
List<DataTimeContainer> Container1 = new List<DataTimeContainer>();
string[] lines = File.ReadAllLines("c:\\data1.csv");
string groupTimeValue1 = string.Empty;
string groupTimeValue2 = string.Empty;
foreach (string[] fields in lines.Select(l => l.Split(';')))
{
//iterating over every line, splited by ';' delimiter
if (!string.IsNullOrWhiteSpace(fields[0]))
{
//we're in a line having both values, like:
//06:07:00 ; 6:07
groupTimeValue1 = fields[0];
groupTimeValue2 = fields[1];
}
else
//we're in line looking like this:
// ; DataX
Container1.Add(new DataTimeContainer(){Data = fields[1], TimeValue1 = groupTimeValue1, TimeValue2 = groupTimeValue2});
}
//Processing second file
List<DataTimeContainer> Container2 = new List<DataTimeContainer>();
lines = File.ReadAllLines("c:\\data2.txt");
foreach (string[] fields in lines.Select(l => l.Split(';')))
{
Container2.Add(new DataTimeContainer() { TimeValue1 = fields[1], TimeValue2 = fields[2], Data = fields[3]});
}
DoSomeComparison();
Of course I'm using strings as data types because I do not know what kind of objects they're supposed to be. Let me know how's that working for you.

If this is a one-time comparison, I would recommend just pulling the text file into Excel (using the Text-to-Columns tools if needed) and running a comparison there with the built-in functions.
If however you need to do this frequently, something like Tarec suggested would be a good start. It seems like you're trying to compare separate event logs within a given timespan (?) - your life will be easier if you parse to objects with DateTime properties instead of comparing text strings.

Populate your Data from your 2 sources(excel and text file) into 2 lists .
Make sure that Lists are of same type .
I would recommend Convert your Excel data to Text File Format .. and then populate Each line of text file and Excel file data into string List.
And then you can compare your List by using the LINQ or Enumerable Methods .
Quickest way to compare two List<>

How to handle quotation marks within CSV files?

To read a CSV file, I use the following statement:
var query = from line in rawLines
let data = line.Split(';')
select new
{
col01 = data[0],
col02 = data[1],
col03 = data[2]
};
The CSV file I want to read is malformed in the way, that an entry can have the separator ; itself as data when surrounded with qutation marks.
Example:
col01;col02;col03
data01;"data02;";data03
My read statement above does not work here, since it interprets the second row as four columns.
Question: Is there an easy way to handle this malformed CSV correctly? Perhaps with another LINQ query?

Just use a CSV parser and STOP ROLLING YOUR OWN:
using (var parser = new TextFieldParser("test.csv"))
{
parser.CommentTokens = new string[] { "#" };
parser.SetDelimiters(new string[] { ";" });
parser.HasFieldsEnclosedInQuotes = true;
// Skip over header line.
parser.ReadLine();
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
Console.WriteLine("{0} {1} {2}", fields[0], fields[1], fields[2]);
}
}
TextFieldParser is built in .NET. Just add reference to the Microsoft.VisualBasic assembly and you are good to go. A real CSV parser will happily handle this situation.

Parsing CSV files manually can always lead to issues like this. I would advise that you use a third party tool like CsvHelper to handle the parsing.
Furthermore, it's not a good idea to explicitly parse commas, as your separator can be overridden in your computers environment options.
Let me know if I can help further,
Matt

Not very elegant but after using your method you can check if any colxx contains an unfinished quotation mark (single) you can join it with the next colxx.

How to read barcode from text file by specified place in C#?

0000016011071693266104*014482*3 15301 45 VETRO NOVA BLUVETRO NOVA BLUE FLAT STRETCH 115428815150010050 05420 000033 0003
0000072011076993266101*014687*4 15300 45 VETRO NOVA BLUVETRO NOVA BLUE FLAT STRETCH 115428815160010030 05430 000032 0007
I have a text file which includes many barcode codes line by line, and as you see in above string format are company codes and others show other things.
So how can I get read this text line by line and character by character in C#?

For reading it line by line you can use a StreamReader - see for example on MSDN http://msdn.microsoft.com/en-us/library/db5x7c0d.aspx
Another option is:
string[] AllLines = File.ReadAllLines (#"C:\MyFile.txt");
This give you all lines in a string array and you can work with them - this uses more memory but is faster... see for example http://msdn.microsoft.com/en-us/library/s2tte0y1.aspx
When have a line in a string you can split that line for example:
string[] MyFields = AllLines[1].Split(null); // since your fields seem to be separated by whitespace
The result is that you have the parts of the line in an array and can access for example the second field in the line with MyFields[1] - see http://msdn.microsoft.com/en-us/library/b873y76a.aspx
EDIT - as per comment another option:
IF you exactly know the positions and lengths of your fields you can do this:
string MyIdentity = AllLines[1].SubString(1, 5);
For MSDN reference see http://msdn.microsoft.com/en-us/library/aka44szs.aspx

You use Microsoft libraries dedicated to files and streams to open a file, and Readline().
Then you use Microsoft libraries dedicated to parsing to parse those lines.
You create, with Microsoft libraries, a regular expression to detect bar codes (not borcod...)
Then you throw away anything that doesn't match your regular expression.
Then you compile and debug (you can use Mono). And voilà, you have a C# program that solves your problem.
Note: you definitely don't need to go "character by character". Microsoft libraries and parsing will be much easier for your simple need.

If all you are after is reading it line-by-line, and character-by-character, then this is a possible solution:
var lines = File.ReadLines(#"pathtotextfile.txt");
foreach (var line in lines)
{
foreach (var character in line)
{
char individualCharacter = character;
}
}
If you need to know which line and character you are on; you can use a for loop instead:
var lines = File.ReadAllLines(#"pathtotextfile.txt");
for (var i = 0; i < lines.Length; i++)
{
var line = lines[i];
for(var j = 0; j < line.Length; j++)
{
var character = line[j];
}
}
Or use SelectMany in LINQ:
var lines = File.ReadLines(#"pathtotextfile.txt");
foreach (char individualCharacter in lines.SelectMany(line => line))
{
}
Now, as far as my opinion goes, doing it "line by line" and "character by character" seems like a difficult choice to me. If you can tell us what exactly each bit of information is in the barcode, we could help you extract it that way.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

how to get text file rows with no delimiter into array - c#

Related

Using StreamReader to read a .text file and split it up into arrays/classes

How to traverse back and forth of text file line

Comparing excel sheet to text file

How to handle quotation marks within CSV files?

How to read barcode from text file by specified place in C#?

Categories

Resources