0000016011071693266104*014482*3 15301 45 VETRO NOVA BLUVETRO NOVA BLUE FLAT STRETCH 115428815150010050 05420 000033 0003
0000072011076993266101*014687*4 15300 45 VETRO NOVA BLUVETRO NOVA BLUE FLAT STRETCH 115428815160010030 05430 000032 0007
I have a text file which includes many barcode codes line by line, and as you see in above string format are company codes and others show other things.
So how can I get read this text line by line and character by character in C#?
For reading it line by line you can use a StreamReader - see for example on MSDN http://msdn.microsoft.com/en-us/library/db5x7c0d.aspx
Another option is:
string[] AllLines = File.ReadAllLines (#"C:\MyFile.txt");
This give you all lines in a string array and you can work with them - this uses more memory but is faster... see for example http://msdn.microsoft.com/en-us/library/s2tte0y1.aspx
When have a line in a string you can split that line for example:
string[] MyFields = AllLines[1].Split(null); // since your fields seem to be separated by whitespace
The result is that you have the parts of the line in an array and can access for example the second field in the line with MyFields[1] - see http://msdn.microsoft.com/en-us/library/b873y76a.aspx
EDIT - as per comment another option:
IF you exactly know the positions and lengths of your fields you can do this:
string MyIdentity = AllLines[1].SubString(1, 5);
For MSDN reference see http://msdn.microsoft.com/en-us/library/aka44szs.aspx
You use Microsoft libraries dedicated to files and streams to open a file, and Readline().
Then you use Microsoft libraries dedicated to parsing to parse those lines.
You create, with Microsoft libraries, a regular expression to detect bar codes (not borcod...)
Then you throw away anything that doesn't match your regular expression.
Then you compile and debug (you can use Mono). And voilà, you have a C# program that solves your problem.
Note: you definitely don't need to go "character by character". Microsoft libraries and parsing will be much easier for your simple need.
If all you are after is reading it line-by-line, and character-by-character, then this is a possible solution:
var lines = File.ReadLines(#"pathtotextfile.txt");
foreach (var line in lines)
{
foreach (var character in line)
{
char individualCharacter = character;
}
}
If you need to know which line and character you are on; you can use a for loop instead:
var lines = File.ReadAllLines(#"pathtotextfile.txt");
for (var i = 0; i < lines.Length; i++)
{
var line = lines[i];
for(var j = 0; j < line.Length; j++)
{
var character = line[j];
}
}
Or use SelectMany in LINQ:
var lines = File.ReadLines(#"pathtotextfile.txt");
foreach (char individualCharacter in lines.SelectMany(line => line))
{
}
Now, as far as my opinion goes, doing it "line by line" and "character by character" seems like a difficult choice to me. If you can tell us what exactly each bit of information is in the barcode, we could help you extract it that way.
Related
I have a file that contains many lines. There is a line here looking like below:
hello jim jack nina richi sam
I need to add a specific text salmon in this line and change it to below (it could be added anywhere in this line -end -begining - in the middle -doesnt matter ):
hello jim jack nina richi sam salmon
I tried:
string path = #"C:\testFolder\newTestLog.txt";
StreamReader myReader = new StreamReader(path);
string[] allLines = File.ReadAllLines(path);
foreach (string element in allLines) {
if (element.StartsWith("hello"))
{
Console.WriteLine(element);
}
}
myReader.Close();
}
Using this I'm able to read the file line by line and add each line to an array and print that line if that starts with "hello", but I'm not sure how to add text to this line
You should use what Joel answered it's nicer but if you're having trouble implementing it try this. After adding the salmon to the lines that start with hello you can overwrite the txt file by using File.WriteAllLines
string filePath = #"C:\testFolder\newTestLog.txt";
string[] allLines = File.ReadAllLines(filePath);
for(int i = 0; i < allLines.Length; i++)
{
if (allLines[i].StartsWith("hello"))
{
allLines[i] += " salmon";
}
}
File.WriteAllLines(filePath, allLines);
Try this:
string path = #"C:\testFolder\newTestLog.txt";
var lines = File.ReadLines(path).Select(l => l + l.StartsWith("hello")?" salmon":"");
foreach (string line in lines)
Console.WriteLine(line);
Note that this still only writes the results to the Console, as your sample does. It's not clear what you really want to happen with the output.
If you want this saved to the original file, you've opened up a small can of worms. Think of all of the data in your file as if it's stored in one contiguous block1. If you append text to any line in the file, that text has nowhere to go but to overwrite the beginning of the next. As a practical matter, if you need to modify file, this often means either writing out a whole new file, and then deleting/renaming when done, or alternatively keeping the whole file in memory and writing it all from start to finish.
Using the 2nd approach, where we keep everything in memory, you can do this:
string path = #"C:\testFolder\newTestLog.txt";
var lines = File.ReadAllLines(path).Select(l => l + l.StartsWith("hello")?" salmon":"");
File.WriteAllLines(path, lines);
1 In fact, a file may be split into several fragments on the disk, but even so, each fragment is presented to your program as part of a single whole.
I need help, trying to take a large text document ~1000 lines and put it into a string array, line by line.
Example:
string[] s = {firstLineHere, Secondline, etc};
I also want a way to find the first word, only the first word of the line, and once first word it found, copy the entire line. Find only the first word or each line!
You can accomplish this with File.ReadAllLines combined with a little Linq (to accomplish the addition to the question stated in the comments of Praveen's answer.
string[] identifiers = { /*Your identifiers for needed lines*/ };
string[] allLines = File.ReadAllLines("C:\test.txt");
string[] neededLines = allLines.Where(c => identifiers.Contains(c.SubString(0, c.IndexOf(' ') - 1))).ToArray();
Or make it more of a one liner:
string[] lines = File.ReadAllLines("your path").Where(c => identifiers.Contains(c.SubString(0, c.IndexOf(' ') - 1))).ToArray();
This will give you array of all the lines in your document that start with the keywords you define within your identifiers string array.
There is an inbuilt method to achieve your requirement.
string[] lines = System.IO.File.ReadAllLines(#"C:\sample.txt");
If you want to read the file line by line
List<string> lines = new List<string>();
using (StreamReader reader = new StreamReader(#"C:\sample.txt"))
{
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
//Add your conditional logic to add the line to an array
if (line.Contains(searchTerm)) {
lines.Add(line);
}
}
}
Another option you could use would be to read each individual line, while splitting the line into segments and comparing only the first element against
the provided search term. I have provided a complete working demonstration below:
Solution:
class Program
{
static void Main(string[] args)
{
// Get all lines that start with a given word from a file
var result = GetLinesWithWord("The", "temp.txt");
// Display the results.
foreach (var line in result)
{
Console.WriteLine(line + "\r");
}
Console.ReadLine();
}
public static List<string> GetLinesWithWord(string word, string filename)
{
List<string> result = new List<string>(); // A list of strings where the first word of each is the provided search term.
// Create a stream reader object to read a text file.
using (StreamReader reader = new StreamReader(filename))
{
string line = string.Empty; // Contains a single line returned by the stream reader object.
// While there are lines in the file, read a line into the line variable.
while ((line = reader.ReadLine()) != null)
{
// If the line is white space, then there are no words to compare against, so move to next line.
if (line != string.Empty)
{
// Split the line into parts by a white space delimiter.
var parts = line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
// Get only the first word element of the line, trim off any additional white space
// and convert the it to lowercase. Compare the word element to the search term provided.
// If they are the same, add the line to the results list.
if (parts.Length > 0)
{
if (parts[0].ToLower().Trim() == word.ToLower().Trim())
{
result.Add(line);
}
}
}
}
}
return result;
}
}
Where the sample text file may contain:
How shall I know thee in the sphere which keeps
The disembodied spirits of the dead,
When all of thee that time could wither sleeps
And perishes among the dust we tread?
For I shall feel the sting of ceaseless pain
If there I meet thy gentle presence not;
Nor hear the voice I love, nor read again
In thy serenest eyes the tender thought.
Will not thy own meek heart demand me there?
That heart whose fondest throbs to me were given?
My name on earth was ever in thy prayer,
Shall it be banished from thy tongue in heaven?
In meadows fanned by heaven's life-breathing wind,
In the resplendence of that glorious sphere,
And larger movements of the unfettered mind,
Wilt thou forget the love that joined us here?
The love that lived through all the stormy past,
And meekly with my harsher nature bore,
And deeper grew, and tenderer to the last,
Shall it expire with life, and be no more?
A happier lot than mine, and larger light,
Await thee there; for thou hast bowed thy will
In cheerful homage to the rule of right,
And lovest all, and renderest good for ill.
For me, the sordid cares in which I dwell,
Shrink and consume my heart, as heat the scroll;
And wrath has left its scar--that fire of hell
Has left its frightful scar upon my soul.
Yet though thou wear'st the glory of the sky,
Wilt thou not keep the same beloved name,
The same fair thoughtful brow, and gentle eye,
Lovelier in heaven's sweet climate, yet the same?
Shalt thou not teach me, in that calmer home,
The wisdom that I learned so ill in this--
The wisdom which is love--till I become
Thy fit companion in that land of bliss?
And you wanted to retrieve every line where the first word of the line is the word 'the' by calling the method like so:
var result = GetLinesWithWord("The", "temp.txt");
Your result should then be the following:
The disembodied spirits of the dead,
The love that lived through all the stormy past,
The same fair thoughtful brow, and gentle eye,
The wisdom that I learned so ill in this--
The wisdom which is love--till I become
Hopefully this answers your question adequately enough.
So my method should in theory work, I'm just not getting my expected result back.
I have a function that creates a new TextReader class, reads in a character (int) from my text file and adds it too a list.
The textfile data looks like the following (48 x 30):
111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111
111111111111100000000001111111111000000111111111
111111110000000000000000000000000000000011111111
100000000000000000000000000000000000000001111111
000000000000001111111111111111111111000001111111
100000001111111111111111111112211221000001111111
100000111111122112211221122111111111000001111111
111111111221111111111111111112211110000011111111
111112211111111111111111111111111100000111221111
122111111111111122111100000000000000001111111111
111111111111111111100000000000000000011111111111
111111111111111111000000000000000001112211111111
111111111111221110000001111110000111111111111111
111111111111111100000111112211111122111111111111
111111112211110000001122111111221111111111111111
111122111111000000011111111111111111112211221111
111111110000000011111111112211111111111111111111
111111000000001111221111111111221122111100000011
111111000000011111111111000001111111110000000001
111111100000112211111100000000000000000000000001
111111110000111111100000000000000000000000000011
111111111000011100000000000000000000000011111111
111111111100000000000000111111111110001111111111
111111111110000000000011111111111111111111111111
111111111111100000111111111111111111111111111111
111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111
My method is as follows:
private void LoadReferenceMap(string FileName)
{
FileName = Path.Combine(Environment.CurrentDirectory, FileName);
List<int> ArrayMapValues = new List<int>();
if (File.Exists(FileName))
{
// Create a new stream to write to the file
using (TextReader reader = File.OpenText(FileName))
{
for (int i = 0; i < 48; i++)
{
for (int j = 0; j < 30; j++)
{
int x = reader.Read();
if (x == -1)
break;
ArrayMapValues.Add(x);
}
}
}
level.SetFieldMap(ArrayMapValues);
}
}
Returns:
As you can see once it reaches the end of the first line Read() returns 13 and then 10 before moving on to the next row?
A different approach that removes both problems with conversion of chars to integers and the skipping of Environment.NewLine characters
private void LoadReferenceMap(string FileName)
{
List<int> ArrayMapValues = new List<int>();
if (File.Exists(FileName))
{
foreach(string line in File.ReadLines(FileName))
{
var lineMap = line.ToCharArray()
.Select(x => Convert.ToInt32(x.ToString()));
ArrayMapValues.AddRange(lineMap);
}
level.SetFieldMap(ArrayMapValues);
}
}
The file is small, so it seems to be convenient to read a line as a string (this removes the Environment.NewLine), process the line converting it to a char array and applying the conversion to integer for each char. Finally the List of integers of a single line could be added to your List of integers for all the file.
I have not inserted any check on the length of a single line (48 chars) and the total number of lines (30) because you say that every file has this format. However, adding a small check on the total lines loaded and their lengths, should be pretty simple.
This is because you need to convert the symbol you've got to the char, like this:
(char)sr.Read();
After that you can parse it as int with different approach, for example:
int.Parse(((char)sr.Read()).ToString());
More information on MSDN.
As you can see once it reaches the end of the first line Read() returns 13 and then 10 before moving on to the next row?
The line break in the .NET looks like this: \r\n, and not the \n (Check the Environment.NewLine property.
The actual text file has line breaks in it. This means that once you have read the first 48 characters the next thing in the file is a line break. In this case it is a standard windows new line which is a Carriage Return (character 13) followed by a Line Feed (character 10).
You need to deal with these line breaks in your code somehow. My preferred way of doing this would be the method outlined by Steve above (using File.ReadAllLines). You could alternatively though just at the end of each of your sets of 48 character reads check for the 13/10 character combo. One thing of note though is that some systems just use Line Feed without the carriage return to indicate new lines. Depending on the source of these files you may need to code something to deal with possible different line breaks. Using ReadAllLines will let something else deal with this issue though as would using reader.ReadLine()
If you are also unsure why it is returning 49 instead of 1 then you need to understand about character encoding. The file is stored as bytes which are interpreted by the reading program. In this case you are reading out the values of the characters as integers (which is how .NET stores them internally). You need to convert this to a character. In this case you can just cast to char (ie (char)x). This will then return a char which is '1'. If you want this as an integer you would then need to use Integer.Parse to parse from text into an integer.
Suppose I have this CSV file :
NAME,ADDRESS,DATE
"Eko S. Wibowo", "Tamanan, Banguntapan, Bantul, DIY", "6/27/1979"
I would like like to store each token that enclosed using a double quotes to be in an array, is there a safe to do this instead of using the String split() function? Currently I load up the file in a RichTextBox, and then using its Lines[] property, I do a loop for each Lines[] element and doing this :
string[] line = s.Split(',');
s is a reference to RichTextBox.Lines[].
And as you can clearly see, the comma inside a token can easily messed up split() function. So, instead of ended with three token as I want it, I ended with 6 tokens
Any help will be appreciated!
You could use regex too:
string input = "\"Eko S. Wibowo\", \"Tamanan, Banguntapan, Bantul, DIY\", \"6/27/1979\"";
string pattern = #"""\s*,\s*""";
// input.Substring(1, input.Length - 2) removes the first and last " from the string
string[] tokens = System.Text.RegularExpressions.Regex.Split(
input.Substring(1, input.Length - 2), pattern);
This will give you:
Eko S. Wibowo
Tamanan, Banguntapan, Bantul, DIY
6/27/1979
I've done this with my own method. It simply counts the amout of " and ' characters.
Improve this to your needs.
public List<string> SplitCsvLine(string s) {
int i;
int a = 0;
int count = 0;
List<string> str = new List<string>();
for (i = 0; i < s.Length; i++) {
switch (s[i]) {
case ',':
if ((count & 1) == 0) {
str.Add(s.Substring(a, i - a));
a = i + 1;
}
break;
case '"':
case '\'': count++; break;
}
}
str.Add(s.Substring(a));
return str;
}
It's not an exact answer to your question, but why don't you use already written library to manipulate CSV file, good example would be LinqToCsv. CSV could be delimited with various punctuation signs. Moreover, there are gotchas, which are already addressed by library creators. Such as dealing with name row, dealing with different date formats and mapping rows to C# objects.
You can replace "," with ; then split by ;
var values= s.Replace("\",\"",";").Split(';');
If your CSV line is tightly packed it's easiest to use the end and tail removal mentioned earlier and then a simple split on a joining string
string[] tokens = input.Substring(1, input.Length - 2).Split("\",\"");
This will only work if ALL fields are double-quoted even if they don't (officially) need to be. It will be faster than RegEx but with given conditions as to its use.
Really useful if your data looks like
"Name","1","12/03/2018","Add1,Add2,Add3","other stuff"
Five years old but there is always somebody new who wants to split a CSV.
If your data is simple and predictable (i.e. never has any special characters like commas, quotes and newlines) then you can do it with split() or regex.
But to support all the nuances of the CSV format properly without code soup you should really use a library where all the magic has already been figured out. Don't re-invent the wheel (unless you are doing it for fun of course).
CsvHelper is simple enough to use:
https://joshclose.github.io/CsvHelper/2.x/
using (var parser = new CsvParser(textReader)
{
while(true)
{
string[] line = parser.Read();
if (line != null)
{
// do something
}
else
{
break;
}
}
}
More discussion / same question:
Dealing with commas in a CSV file
I wrote a C# program to read an Excel .xls/.xlsx file and output to CSV and Unicode text. I wrote a separate program to remove blank records. This is accomplished by reading each line with StreamReader.ReadLine(), and then going character by character through the string and not writing the line to output if it contains all commas (for the CSV) or all tabs (for the Unicode text).
The problem occurs when the Excel file contains embedded newlines (\x0A) inside the cells. I changed my XLS to CSV converter to find these new lines (since it goes cell by cell) and write them as \x0A, and normal lines just use StreamWriter.WriteLine().
The problem occurs in the separate program to remove blank records. When I read in with StreamReader.ReadLine(), by definition it only returns the string with the line, not the terminator. Since the embedded newlines show up as two separate lines, I can't tell which is a full record and which is an embedded newline for when I write them to the final file.
I'm not even sure I can read in the \x0A because everything on the input registers as '\n'. I could go character by character, but this destroys my logic to remove blank lines.
I would recommend that you change your architecture to work more like a parser in a compiler.
You want to create a lexer that returns a sequence of tokens, and then a parser that reads the sequence of tokens and does stuff with them.
In your case the tokens would be:
Column data
Comma
End of Line
You would treat '\n' ('\x0a') by its self as an embedded new line, and therefore include it as part of a column data token. A '\r\n' would constitute an End of Line token.
This has the advantages of:
Doing only 1 pass over the data
Only storing a max of 1 lines worth of data
Reusing as much memory as possible (for the string builder and the list)
It's easy to change should your requirements change
Here's a sample of what the Lexer would look like:
Disclaimer: I haven't even compiled, let alone tested, this code, so you'll need to clean it up and make sure it works.
enum TokenType
{
ColumnData,
Comma,
LineTerminator
}
class Token
{
public TokenType Type { get; private set;}
public string Data { get; private set;}
public Token(TokenType type)
{
Type = type;
}
public Token(TokenType type, string data)
{
Type = type;
Data = data;
}
}
private IEnumerable<Token> GetTokens(TextReader s)
{
var builder = new StringBuilder();
while (s.Peek() >= 0)
{
var c = (char)s.Read();
switch (c)
{
case ',':
{
if (builder.Length > 0)
{
yield return new Token(TokenType.ColumnData, ExtractText(builder));
}
yield return new Token(TokenType.Comma);
break;
}
case '\r':
{
var next = s.Peek();
if (next == '\n')
{
s.Read();
}
if (builder.Length > 0)
{
yield return new Token(TokenType.ColumnData, ExtractText(builder));
}
yield return new Token(TokenType.LineTerminator);
break;
}
default:
builder.Append(c);
break;
}
}
s.Read();
if (builder.Length > 0)
{
yield return new Token(TokenType.ColumnData, ExtractText(builder));
}
}
private string ExtractText(StringBuilder b)
{
var ret = b.ToString();
b.Remove(0, b.Length);
return ret;
}
Your "parser" code would then look like this:
public void ConvertXLS(TextReader s)
{
var columnData = new List<string>();
bool lastWasColumnData = false;
bool seenAnyData = false;
foreach (var token in GetTokens(s))
{
switch (token.Type)
{
case TokenType.ColumnData:
{
seenAnyData = true;
if (lastWasColumnData)
{
//TODO: do some error reporting
}
else
{
lastWasColumnData = true;
columnData.Add(token.Data);
}
break;
}
case TokenType.Comma:
{
if (!lastWasColumnData)
{
columnData.Add(null);
}
lastWasColumnData = false;
break;
}
case TokenType.LineTerminator:
{
if (seenAnyData)
{
OutputLine(lastWasColumnData);
}
seenAnyData = false;
lastWasColumnData = false;
columnData.Clear();
}
}
}
if (seenAnyData)
{
OutputLine(columnData);
}
}
You can't change StreamReader to return the line terminators, and you can't change what it uses for line termination.
I'm not entirely clear about the problem in terms of what escaping you're doing, particularly in terms of "and write them as \x0A". A sample of the file would probably help.
It sounds like you may need to work character by character, or possibly load the whole file first and do a global replace, e.g.
x.Replace("\r\n", "\u0000") // Or some other unused character
.Replace("\n", "\\x0A") // Or whatever escaping you need
.Replace("\u0000", "\r\n") // Replace the real line breaks
I'm sure you could do that with a regex and it would probably be more efficient, but I find the long way easier to understand :) It's a bit of a hack having to do a global replace though - hopefully with more information we'll come up with a better solution.
Essentially, a hard-return in Excel (shift+enter or alt+enter, I can't remember) puts a newline that is equivalent to \x0A in the default encoding I use to write my CSV. When I write to CSV, I use StreamWriter.WriteLine(), which outputs the line plus a newline (which I believe is \r\n).
The CSV is fine and comes out exactly how Excel would save it, the problem is when I read it into the blank record remover, I'm using ReadLine() which will treat a record with an embedded newline as a CRLF.
Here's an example of the file after I convert to CSV...
Reference,Name of Individual or Entity,Type,Name Type,Date of Birth,Place of Birth,Citizenship,Address,Additional Information,Listing Information,Control Date,Committees
1050,"Aziz Salih al-Numan
",Individual,Primary Name,1941 or 1945,An Nasiriyah,Iraqi,,Ba’th Party Regional Command Chairman; Former Governor of Karbala and An Najaf Former Minister of Agriculture and Agrarian Reform (1986-1987),Resolution 1483 (2003),6/27/2003,1518 (Iraq)
1050a,???? ???? ???????,Individual,Original script,1941 or 1945,An Nasiriyah,Iraqi,,Ba’th Party Regional Command Chairman; Former Governor of Karbala and An Najaf Former Minister of Agriculture and Agrarian Reform (1986-1987),Resolution 1483 (2003),6/27/2003,1518 (Iraq)
As you can see, the first record has an embedded new-line after al-Numan. When I use ReadLine(), I get '1050,"Aziz Salih al-Numan' and when I write that out, WriteLine() ends that line with a CRLF. I lose the original line terminator. When I use ReadLine() again, I get the line starting with '1050a'.
I could read the entire file in and replace them, but then I'd have to replace them back afterwards. Basically what I want to do is get the line terminator to determine if its \x0a or a CRLF, and then if its \x0A, I'll use Write() and insert that terminator.
I know I'm a little late to the game here, but I was having the same problem and my solution was a lot simpler than most given.
If you are able to determine the column count which should be easy to do since the first line is usually the column titles, you can check your column count against the expected column count. If the column count doesn't equal the expected column count, you simply concatenate the current line with the previous unmatched lines. For example:
string sep = "\",\"";
int columnCount = 0;
while ((currentLine = sr.ReadLine()) != null)
{
if (lineCount == 0)
{
lineData = inLine.Split(new string[] { sep }, StringSplitOptions.None);
columnCount = lineData.length;
++lineCount;
continue;
}
string thisLine = lastLine + currentLine;
lineData = thisLine.Split(new string[] { sep }, StringSplitOptions.None);
if (lineData.Length < columnCount)
{
lastLine += currentLine;
continue;
}
else
{
lastLine = null;
}
......
Thank you so much with your code and some others I came up with the following solution! I have added a link at the bottom to some code I wrote that used some of the logic from this page. I figured I'd give honor where honor was due! Thanks!
Below is a explanation about what I needed:
Try This, I wrote this because I have some very large '|' delimited files that have \r\n inside of some of the columns and I needed to use \r\n as the end of the line delimiter. I was trying to import some files using SSIS packages but because of some corrupted data in the files I was unable to. The File was over 5 GB so it was too large to open and manually fix. I found the answer through looking through lots of Forums to understand how streams work and ended up coming up with a solution that reads each character in a file and spits out the line based on the definitions I added into it. this is for use in a Command Line Application, complete with help :). I hope this helps some other people out, I haven't found a solution quite like it anywhere else, although the ideas were inspired by this forum and others.
https://stackoverflow.com/a/12640862/1582188