reading a CSV issue

reading a CSV issue - c#

I am trying to read a csv
following is the sample.
"0734306547 ","9780734306548 ","Jane Eyre Pink PP ","Bronte Charlotte ","FRONT LIST",20/03/2013 0:00:00,0,"PAPERBACK","Y","Pen"
Here is the code i am using read CSV
public void readCSV()
{
StreamReader reader = new StreamReader(File.OpenRead(#"C:\abc\21-08-2013\PNZdatafeed.csv"),Encoding.ASCII);
List<string> ISBN = new List<String>();
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
if (!String.IsNullOrWhiteSpace(line))
{
string[] values = line.Split(',');
if (values[9] == "Pen")
{
ISBN.Add(values[1]);
}
}
}
MessageBox.Show(ISBN.Count().ToString());
}
I am not able to compare it values if (values[9] == "Pen") because when i debug the code it says values[9] value is \"Pen\""
How do i get rid of the special characters.?

The problem here is that you're splitting the line every time you find , and leaving the data like that. For example, if this is the line you're reading in:
"A","B","C"
and you split it at commas, you'll get "A", "B", and "C" as your data. According to your description, you don't want quotes around the data.
To throw away quotes around a string:
Check if the leftmost character is ".
If so, check if the rightmost character is ".
If so, remove the leftmost and rightmost characters.
In pseudocode:
if (data.left(1) == "\"" && data.right(1) == "\"") {
data = data.trimleft(1).trimright(1)
}
At this point you might have a few questions (I'm not sure how much experience you have). If any of these apply to you, feel free to ask them, and I'll explain further.
What does "\"" mean?
How do I extract the leftmost/rightmost character of a string?
How do I extract the middle of a string?

Related

How to read a portion of a line in a text file?

So, I have a text file with thousands of lines formatted similarly to this:
123456:0.8525000:1590882780:91011
These files are almost always a different length, and I only need to read the first two parts of the line, being 123456:0.8525000.
I know that I can split each line using C#, but I'm unsure how to only read the first 2 parts. Anyone have any idea on how to do this? Sorry if my question doesn't make sense, I can restate it if needed.

The Split function returns a string[], an array of strings.
Just take the 2 first elements of the result of Split (with : as the separator).
var read = "123456:0.8525000:1590882780:91011";
var values = read.Split(":");
Console.WriteLine(values[0]); // 123456
Console.WriteLine(values[1]); // 0.8525000
.NET Fiddle
Don't forget that elements of values are string and not yet int or double values. See How to convert string to integer in C# for how to convert from string to number type.

There are TONS of ways to doing this but I am going to suggest some options that involving read the full line as its much easier to work with / understand and that your lines are of varying length. I did add a suggestion on using StreamReader on a file at the end in addendum but you may need to figure out serious work arounds on skipping lines you don't want, restarting a char iterating loop on new lines etc.
I first demonstrate the latest and greatest IAsyncEnumerable found in NetCore 3.x followed by a similar string-based approach. By sharing an Int example that is a slightly advanced and that will also be asynchronous, I hope to also help others and demonstrate a fairly modern approach in 2020. Streaming out only the data you need will be a huge benefit in keeping it fast and a low memory footprint.
public static async IAsyncEnumerable<int> StreamFileOutAsIntsAsync(string filePathName)
{
if (string.IsNullOrWhiteSpace(filePathName)) throw new ArgumentNullException(nameof(filePathName));
if (!File.Exists(filePathName)) throw new ArgumentException($"{filePathName} is not a valid file path.");
using var streamReader = File.OpenText(filePathName);
string currentLine;
while ((currentLine = await streamReader.ReadLineAsync().ConfigureAwait(false)) != null)
{
if (int.TryParse(currentLine.AsSpan(), out var output))
{
yield return output;
}
}
}
This streams every int out of a file, checking that file exists and that the filename path is not null or blank etc.
Streaming maybe too much for a beginner so I don't know your level.
You may want to start with just turning the file into a list of strings.
Modifying my previous example above to something less complex but split your strings for you. I recommend learning about streaming so you don't have every piece of string in memory while you work on it... or maybe you want them all. I am not here to judge.
Once you get your string line out from a file you can do whatever else needs to be done.
public static async Task<List<string>> GetStringsFromFileAsync(string filePathName)
{
if (string.IsNullOrWhiteSpace(filePathName)) throw new ArgumentNullException(nameof(filePathName));
if (!File.Exists(filePathName)) throw new ArgumentException($"{filePathName} is not a valid file path.");
using var streamReader = File.OpenText(filePathName);
string currentLine;
var strings = new List<string>();
while ((currentLine = await streamReader.ReadLineAsync().ConfigureAwait(false)) != null)
{
var lineAsArray = currentLine.Split(new string[] { ":" }, StringSplitOptions.RemoveEmptyEntries);
// Simple Data Validation
if (lineAsArray.Length == 4)
{
strings.Add($"{lineAsArray[0]}:{lineAsArray[1]}");
strings.Add($"{lineAsArray[2]}:{lineAsArray[3]}");
}
}
return strings;
}
The meat of the code is really simple, open the file for reading!
using var streamReader = File.OpenText(filePathName);
and then loop through that file...
while ((currentLine = await streamReader.ReadLineAsync()) != null)
{
var lineAsArray = currentLine.Split(new string[] { ":" }, StringSplitOptions.RemoveEmptyEntries);
// Simple Data Validation
if (lineAsArray.Length == 4)
{
// Do whatever you need to do with the first bits of information.
// In this case, we add them all to a list for return.
strings.Add($"{lineAsArray[0]}:{lineAsArray[1]}");
strings.Add($"{lineAsArray[2]}:{lineAsArray[3]}");
}
}
What this demonstrates is that, for every line that I read out that is not null, break into four parts (based on the ":") character removing all empty entries.
We then use a C# feature called String Interpolation ($"") to put the first two back together with ":" as a string. Then the second two. Or whatever you need to do with reading each part of the line.
That's really all there is to it! Hope it helps.
Addendum: If you really need to read parts of file, please use a StreamReader.Read and Peek()
using (var sr = new StreamReader(path))
{
while (sr.Peek() >= 0)
{
Console.Write((char)sr.Read());
}
}
Reading each character

Some bare bones code:
string fileName = #"c:\some folder\path\file.txt";
using (StreamReader sr = new StreamReader(fileName))
{
while (!sr.EndOfStream)
{
String[] values = sr.ReadLine().Split(":".ToCharArray());
if (values.Length >= 2)
{
// ... do something with values[0] and values[1] ...
Console.WriteLine(values[0] + ", " + values[1]);
}
}
}

Finding multiple semi predictable patterns in a string

Alright, so I'm writing an application that needs to be able to extract a VAT-Number from an invoice (https://en.wikipedia.org/wiki/VAT_identification_number)
The biggest challenge to overcome here is that as apparent from the wikipedia article I have linked to, each country uses its own format for these VAT-numbers (The Netherlands uses a 14 character number while Germany uses a 11 character number).
In order to extract these numbers, I throw every line from the invoice into an array of strings, and for each string I test if it has a length that is equal to one of the VAT formats, and if that checks out, I check if said string also contains a country code ("NL", "DE", etc).
string[] ProcessedFile = Reader.ProcessFile(Input);
foreach(string S in ProcessedFile)
{
RtBEditor.AppendText(S + "\n");
}
foreach(string X in ProcessedFile)
{
string S = X.Replace(" ", string.Empty);
if (S.Length == 7)
{
if (S.Contains("GBGD"))
{
MessageBox.Show("Land = Groot Britanie (Regering)");
}
}
/*
repeat for all other lenghts and country codes.
*/
The problem with this code is that 1st:
if there is a string that happens to have the same length as one of the VAT-formats, and it has a country code embedded in it, the code will incorrectly think that it has found the VAT-number.
2nd:
In some cases, the VAT-number will be included like "VAT-number: [VAT-number]". In this case, the text that precedes the actual number will be added to its length, making the program unable to detect the actual VAT-Number.
The best way to fix this is in my assumption to somehow isolate the VAT-Number from the strings all together, but I have yet to find a way how to actually do this.
Does anyone by any chance know any potential solution?
Many thanks in advance!
EDIT:
Added a dummy invoice to clarify what kind of data is contained within the invoices.

As someone in the comments had pointed out, the best way to fix this is by using Regex. After trying around a bit I came to the following solution:
public Regex FilterNormaal = new Regex(#"[A-Z]{2}(\d)+B?\d*");
private void BtnUitlezen_Click(object sender, EventArgs e)
{
RtBEditor.Clear();
/*
Temp dummy vatcodes for initial testing.
*/
Form1.Dummy1.VAT = "NL855291886B01";
Form1.Dummy2.VAT = "DE483270846";
Form1.Dummy3.VAT = "SE482167803501";
OCR Reader = new OCR();
/*
Grab and process image
*/
if(openFileDialog1.ShowDialog() == DialogResult.OK)
{
try
{
Input = new Bitmap(openFileDialog1.FileName);
}
catch
{
MessageBox.Show("Please open an image file.");
}
}
string[] ProcessedFile = Reader.ProcessFile(Input);
foreach(string S in ProcessedFile)
{
string X = S.Replace(" ", string.Empty);
RtBEditor.AppendText(X + "\n");
}
foreach (Match M in FilterNormaal.Matches(RtBEditor.Text))
{
MessageBox.Show(M.Value);
}
}
At first, I attempted to iterate through my array of strings to find a match, but for reasons unknown, this did not yield any results. When applying the regex to the entire textbox, it did output the results I needed.

Text file line by line into string array

I need help, trying to take a large text document ~1000 lines and put it into a string array, line by line.
Example:
string[] s = {firstLineHere, Secondline, etc};
I also want a way to find the first word, only the first word of the line, and once first word it found, copy the entire line. Find only the first word or each line!

You can accomplish this with File.ReadAllLines combined with a little Linq (to accomplish the addition to the question stated in the comments of Praveen's answer.
string[] identifiers = { /*Your identifiers for needed lines*/ };
string[] allLines = File.ReadAllLines("C:\test.txt");
string[] neededLines = allLines.Where(c => identifiers.Contains(c.SubString(0, c.IndexOf(' ') - 1))).ToArray();
Or make it more of a one liner:
string[] lines = File.ReadAllLines("your path").Where(c => identifiers.Contains(c.SubString(0, c.IndexOf(' ') - 1))).ToArray();
This will give you array of all the lines in your document that start with the keywords you define within your identifiers string array.

There is an inbuilt method to achieve your requirement.
string[] lines = System.IO.File.ReadAllLines(#"C:\sample.txt");
If you want to read the file line by line
List<string> lines = new List<string>();
using (StreamReader reader = new StreamReader(#"C:\sample.txt"))
{
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
//Add your conditional logic to add the line to an array
if (line.Contains(searchTerm)) {
lines.Add(line);
}
}
}

Another option you could use would be to read each individual line, while splitting the line into segments and comparing only the first element against
the provided search term. I have provided a complete working demonstration below:
Solution:
class Program
{
static void Main(string[] args)
{
// Get all lines that start with a given word from a file
var result = GetLinesWithWord("The", "temp.txt");
// Display the results.
foreach (var line in result)
{
Console.WriteLine(line + "\r");
}
Console.ReadLine();
}
public static List<string> GetLinesWithWord(string word, string filename)
{
List<string> result = new List<string>(); // A list of strings where the first word of each is the provided search term.
// Create a stream reader object to read a text file.
using (StreamReader reader = new StreamReader(filename))
{
string line = string.Empty; // Contains a single line returned by the stream reader object.
// While there are lines in the file, read a line into the line variable.
while ((line = reader.ReadLine()) != null)
{
// If the line is white space, then there are no words to compare against, so move to next line.
if (line != string.Empty)
{
// Split the line into parts by a white space delimiter.
var parts = line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
// Get only the first word element of the line, trim off any additional white space
// and convert the it to lowercase. Compare the word element to the search term provided.
// If they are the same, add the line to the results list.
if (parts.Length > 0)
{
if (parts[0].ToLower().Trim() == word.ToLower().Trim())
{
result.Add(line);
}
}
}
}
}
return result;
}
}
Where the sample text file may contain:
How shall I know thee in the sphere which keeps
The disembodied spirits of the dead,
When all of thee that time could wither sleeps
And perishes among the dust we tread?
For I shall feel the sting of ceaseless pain
If there I meet thy gentle presence not;
Nor hear the voice I love, nor read again
In thy serenest eyes the tender thought.
Will not thy own meek heart demand me there?
That heart whose fondest throbs to me were given?
My name on earth was ever in thy prayer,
Shall it be banished from thy tongue in heaven?
In meadows fanned by heaven's life-breathing wind,
In the resplendence of that glorious sphere,
And larger movements of the unfettered mind,
Wilt thou forget the love that joined us here?
The love that lived through all the stormy past,
And meekly with my harsher nature bore,
And deeper grew, and tenderer to the last,
Shall it expire with life, and be no more?
A happier lot than mine, and larger light,
Await thee there; for thou hast bowed thy will
In cheerful homage to the rule of right,
And lovest all, and renderest good for ill.
For me, the sordid cares in which I dwell,
Shrink and consume my heart, as heat the scroll;
And wrath has left its scar--that fire of hell
Has left its frightful scar upon my soul.
Yet though thou wear'st the glory of the sky,
Wilt thou not keep the same beloved name,
The same fair thoughtful brow, and gentle eye,
Lovelier in heaven's sweet climate, yet the same?
Shalt thou not teach me, in that calmer home,
The wisdom that I learned so ill in this--
The wisdom which is love--till I become
Thy fit companion in that land of bliss?
And you wanted to retrieve every line where the first word of the line is the word 'the' by calling the method like so:
var result = GetLinesWithWord("The", "temp.txt");
Your result should then be the following:
The disembodied spirits of the dead,
The love that lived through all the stormy past,
The same fair thoughtful brow, and gentle eye,
The wisdom that I learned so ill in this--
The wisdom which is love--till I become
Hopefully this answers your question adequately enough.

Alternative to File.AppendAllText for newline

I am trying to read characters from a file and then append them in another file after removing the comments (which are followed by semicolon).
sample data from parent file:
Name- Harly Brown ;Name is Harley Brown
Age- 20 ;Age is 20 years
Desired result:
Name- Harley Brown
Age- 20
I am trying the following code-
StreamReader infile = new StreamReader(floc + "G" + line + ".NC0");
while (infile.Peek() != -1)
{
letter = Convert.ToChar(infile.Read());
if (letter == ';')
{
infile.ReadLine();
}
else
{
System.IO.File.AppendAllText(path, Convert.ToString(letter));
}
}
But the output i am getting is-
Name- Harley Brown Age-20
Its because AppendAllText is not working for the newline. Is there any alternative?

Sure, why not use File.AppendAllLines. See documentation here.
Appends lines to a file, and then closes the file. If the specified file does not exist, this method creates a file, writes the specified lines to the file, and then closes the file.
It takes in any IEnumerable<string> and adds every line to the specified file. So it always adds the line on a new line.
Small example:
const string originalFile = #"D:\Temp\file.txt";
const string newFile = #"D:\Temp\newFile.txt";
// Retrieve all lines from the file.
string[] linesFromFile = File.ReadAllLines(originalFile);
List<string> linesToAppend = new List<string>();
foreach (string line in linesFromFile)
{
// 1. Split the line at the semicolon.
// 2. Take the first index, because the first part is your required result.
// 3. Trim the trailing and leading spaces.
string appendAbleLine = line.Split(';').FirstOrDefault().Trim();
// Add the line to the list of lines to append.
linesToAppend.Add(appendAbleLine);
}
// Append all lines to the file.
File.AppendAllLines(newFile, linesToAppend);
Output:
Name- Harley Brown
Age- 20
You could even change the foreach-loop into a LINQ-expression, if you prefer LINQ:
List<string> linesToAppend = linesFromFile.Select(line => line.Split(';').FirstOrDefault().Trim()).ToList();

Why use char by char comparison when .NET Framework is full of useful string manipulation functions?
Also, don't use a file write function multiple times when you can use it only one time, it's time and resources consuming!
StreamReader stream = new StreamReader("file1.txt");
string str = "";
while ((string line = infile.ReadLine()) != null) { // Get every line of the file.
line = line.Split(';')[0].Trim(); // Remove comment (right part of ;) and useless white characters.
str += line + "\n"; // Add it to our final file contents.
}
File.WriteAllText("file2.txt", str); // Write it to the new file.

You could do this with LINQ, System.File.ReadLines(string), and System.File.WriteAllLines(string, IEnumerable<string>). You could also use System.File.AppendAllLines(string, IEnumerable<string>) in a find-and-replace fashion if that was, in fact, the functionality you were going for. The difference, as the names suggest, is whether it writes everything out as a new file or if it just appends to an existing one.
System.IO.File.WriteAllLines(newPath, System.IO.File.ReadLines(oldPath).Select(c =>
{
int semicolon = c.IndexOf(';');
if (semicolon > -1)
return c.Remove(semicolon);
else
return c;
}));
In case you aren't super familiar with LINQ syntax, the idea here is to loop through each line in the file, and if it contains a semicolon (that is, IndexOf returns something that is over -1) we cut that off, and otherwise, we just return the string. Then we write all of those to the file. The StreamReader equivalent to this would be:
using (StreamReader reader = new StreamReader(oldPath))
using (StreamWriter writer = new StreamWriter(newPath))
{
string line;
while ((line = reader.ReadLine()) != null)
{
int semicolon = line.IndexOf(';');
if (semicolon > -1)
line = c.Remove(semicolon);
writer.WriteLine(line);
}
}
Although, of course, this would feed an extra empty line at the end and the LINQ version wouldn't (as far as I know, it occurs to me that I'm not one hundred percent sure on that, but if someone reading this does know I would appreciate a comment).
Another important thing to note, just looking at your original file, you might want to add in some Trim calls, since it looks like you can have spaces before your semicolons, and I don't imagine you want those copied through.

parse lines using linq to txt

var t1 = from line in File.ReadAllLines(#"alkahf.txt")
let item = line.Split(new string[] {". "}, StringSplitOptions.RemoveEmptyEntries)
let verse = line.Split(new string[] { "\n. " }, StringSplitOptions.RemoveEmptyEntries)
select new
{
Index = item,
Text = verse
};
having problems with above code im unsure how to parse the lines properly.
the format of the file is like so, I would also like to ignore any empty lines
StringSplitOptions.RemoveEmptyEntries doesn't work for some reason
1. This is text it might have numbers
2. I skipped a line

In the LINQ part, you are inside a single line, so you might want to exclude the empty lines first:
from line in File.ReadAllLines(#"alkahf.txt")
where !string.IsNullOrEmpty(line)
You then do two splits - one on newline, which is odd (since that won't be there, since we know we are reading lines). I expect you mean something like:
let parts = line.Split('.')
where parts.Length == 2
select new {
Index = parts[0],
Text = parts[1]
};
?
Also, note that ReadAllLines is a buffered operation; if you want true streaming, you might want something like:
public static IEnumerable<string> ReadLines(string path) {
using(var reader = File.OpenText(path)) {
string line;
while((line = reader.ReadLine()) != null) {
yield return line;
}
}
}
which is not buffering (you don't load the entire file at once). Just change the first line to:
from line in ReadLines(#"alkahf.txt")

Thanks to Marc's answer I fixed my issue. Sorry for the late response I'm working on this as a personal project.
The code is like so
var t1 = from line in StreamReaderExtension.ReadLinesFromFile(#"alkahf.txt")
let parts = line.Split(new string[]{". "},
StringSplitOptions.RemoveEmptyEntries)
where !string.IsNullOrEmpty(line)
&& int.Parse(parts[0].ToString()).ToString() != ""
select new
{
Index = parts[0],
Text = parts[1]
};
The int parse addition makes sure that the input is returning an integer, if you're using this code it's a good idea to set a flag in case it picks ups a non-integer or it will go unnoticed.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

reading a CSV issue - c#

Related

How to read a portion of a line in a text file?

Finding multiple semi predictable patterns in a string

Text file line by line into string array

Alternative to File.AppendAllText for newline

parse lines using linq to txt

Categories

Resources