Delimited string parsing? [closed]

Delimited string parsing? [closed] - c#

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm looking at parsing a delimited string, something on the order of
a,b,c
But this is a very simple example, and parsing delimited data can get complex; for instance
1,"Your simple algorithm, it fails",True
would blow your naiive string.Split implementation to bits. Is there anything I can freely use/steal/copy and paste that offers a relatively bulletproof solution to parsing delimited text? .NET, plox.
Update: I decided to go with the TextFieldParser, which is part of VB.NET's pile of goodies hidden away in Microsoft.VisualBasic.DLL.

I use this to read from a file
string filename = #textBox1.Text;
string[] fields;
string[] delimiter = new string[] {"|"};
using (Microsoft.VisualBasic.FileIO.TextFieldParser parser =
new Microsoft.VisualBasic.FileIO.TextFieldParser(filename)) {
parser.Delimiters = delimiter;
parser.HasFieldsEnclosedInQuotes = false;
while (!parser.EndOfData) {
fields = parser.ReadFields();
//Do what you need
}
}
I am sure someone here can transform this to parser a string that is in memory.

A very complrehesive library can be found here: FileHelpers

I am not aware of any framework, but a simple state machine works:
State 1: Read every char until you hit a " or a ,
In case of a ": Move to State 2
In case of a ,: Move to State 3
In case of the end of file: Move to state 4
State 2: Read every char until you hit a "
In case of a ": Move to State 1
In case of the end of the file: Either Move to State 4 or signal an error because of an unterminated string
State 3: Add the current buffer to the output array, move the cursor forward behind the , and back to State 1.
State 4: this is the final state, does nothing except returning the output array.

Such as
var elements = new List<string>();
var current = new StringBuilder();
var p = 0;
while (p < internalLine.Length) {
if (internalLine[p] == '"') {
p++;
while (internalLine[p] != '"') {
current.Append(internalLine[p]);
p++;
}
// Skip past last ',
p += 2;
}
else {
while ((p < internalLine.Length) && (internalLine[p] != ',')) {
current.Append(internalLine[p]);
p++;
}
// Skip past ,
p++;
}
elements.Add(current.ToString());
current.Length = 0;
}

There are some good answers here: Split a string ignoring quoted sections
You might want to rephrase your question to something more precise (e.g. What code snippet or library I can use to parse CSV data in .NET?).

To do a shameless plug, I've been working on a library for a while called fotelo (Formatted Text Loader) that I use to quickly parse large amounts of text based off of delimiter, position, or regex. For a quick string it is overkill, but if you're working with logs or large amounts, it may be just what you need. It works off a control file model similar to SQL*Loader (kind of the inspiration behind it).

Better late than never (add to the completeness of SO):
http://www.codeproject.com/KB/database/CsvReader.aspx
This one ff-ing rules.
GJ

I am thinking that a generic framework would need to specify between two things:
1. What are the delimiting characters.
2. Under what condition do those characters not count (such as when they are between quotes).
I think it may just be better off writing custom logic for every time you need to do something like this.

Simplest way is just to split the string into a char array and look for your string determiners and split char.
It should be relatively easy to unit test.
You can wrap it in an extension method similar to the basic .Spilt method.

Related

Replace Characters in .txt file with Another Character from a Defined List [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am creating a simple WPF Program that will read and write .txt files for a simple character switcher/scrambler.
I want to be able to create a list of Definitions in a file for every character to replace it with another character that will be used to switch the characters when its ran.
for example:
a = x
b = r
c = e
d = u
and so on
So essentially I to want be able to select a .txt file and run a C# script when my button is pressed that will replace all characters in the .txt file with the predefined definition of it from the list above and save that changed version as a .txt file. I also want to be able to select the changed .txt file and revert it to the original characters if possible when another button is pressed.
I already have the WPF program set up with the buttons and File directory stuff, I need help with the C# implementation of actually making this idea work as I can't find any resources online that are covering what I am describing. Any guidance on where to look and or what to do next would be appreciated.

First you need to read your mapping file and create a map. I pondered a few ways of doing this but easiest is probably to use a dictionary
var config = File.ReadAllLines(...);
//declare these at class level probably, I only put here to make my code work and show the definition
var scramble = new Dictionary<char, char>();
var unscramble = new Dictionary<char, char>();
foreach(var line in config){
var a = line.First();
var b = line.Last();
scramble[a] = b;
unscramble[b] = a;
}
Then you have to process your file. It would be more memory efficient to read char by char using a StreamReader than to read to as a string and convert to char (needs twice the memory) but I'll leave that as an exercise for the reader. Here I take the quick and dirty method of reading the file and then turning it into a char array immediately, but it briefly requires twice the memory as the file is big:
var chars = File.ReadAllText(...).ToArray();
for(int i = 0; i < chars.Length; i++)
if(scramble.ContainsKey(chars[i]))
chars[i] = scramble[chars[i]];
Writing the chars array to disk as the newly scrambled file is an exercise for you. Unscrambling is a similar pattern

Step 1) let's get the text into a format that we can read through.
var FileText = File.ReadAllText(filename.txt).ToArray();
Step 2) we need to decide how to find the characters in the string. A very basic way to achieve this is going to be to loop through the string that we have searching for our characters.
for (int i = 0; i < FileText.length; i++)
{
switch (FileText[i])
{
case 'a':
FileText[i] = 'x';
break;
... // Fill out cases for each substitution
default:
// This character is something we don't care about, so keep looping
continue;
}
}
There are very likely more efficient solutions, but this will at least get the job done while not being too complex.
Edit:
As per Caius' comment, the assignment to a string indexer doesn't work. It needed to be a character array. I tested the change, and it seems to work now. The downside of this solution is it is not configurable, though.

Counting characters and running an if statement [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I appreciate the time you spend reading this :)
Basically im trying to run an if command if a string is greater than so many characters.
So im running a bunch of line.conatins to filter out what I dont want, but I also want to add a character count, so if the line has less than 30 characters it will filter it out.
So basically, im looking for something like this in C# visual studio 2008
if (line.contains > 30 characters)
{
Run code...
}
Im not to sure of the right syntax to use, and google hasnt been very forthcoming.
I appreciate any help. Thanks, Jason
Wow thanks for the fast response guys, but with lots of trial and error i came up with this
int num_count = line.Length;
if (num_count > 30) { }
seems to work

string data = "fff"
if (data.Length > 30)
{
// MAgic stuff here
}

This should do what you want.
string str = "yourstring";
int i = str.Length;
Also try to post code next time you ask something it helps a lot when determining exactly what you want.

The short, but näive, answer is to use this property:
String.Length
You might want to think about what you mean by character. .NET's String class is a counted sequence of UTF-16 code units, which it types as Char. There are either one or two UTF-16 code units in a Unicode code point, and it's easy to calculate as you step through each Char. Where two code units are used, they are called the low and high surrogates.
But also note that Unicode can represent diacritics and such as separate code points. You might want to exclude them in your count.
Putting them together:
using System.Linq;
...
var test = "na\u0308ive"; // want to count ä as one character
var categoriesNotToCount = new []
{
UnicodeCategory.EnclosingMark,
UnicodeCategory.NonSpacingMark,
UnicodeCategory.SpacingCombiningMark
};
var length = test
.Count(c =>
!categoriesNotToCount.Contains(Char.GetUnicodeCategory(c)) // we just happen to know that all the code points in categoriesNotToCount are representable by one UTF-16 code unit
& !Char.IsHighSurrogate(c) // don't count the high surrogate because we're already counting the low surrogate
);
It all comes down to what you're after. If it's the number of UTF-16 code units you want then, for sure, String.Length is your answer.

Is using LINQ against a single object considered a bad practice? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I don't mean this question to be too subjective.
I google'd this for some time but got no specific answers to address this issue. The thing is, I think I'm getting somewhat addicted to LINQ. I already used LINQ to query on lists among other things like using Linq to Sql, Xml, and so on. But then something struck me: "What if I used it to query a single object?" So I did. It may seem wrong like trying to kill a fly with a grenade launcher. Though we all agree it would be artistically pleasant to see.
I consider it very readable, I don't think there is any performance issues regarding to this, but let me show you an example.
In a web application, I need to retrieve a setting from my configuration file (web.config). But this should have a default value if the key is not present. Also, the value I need is a decimal, not a string, which is the default return from ConfigurationManager.AppSettings["myKey"]. Also, my number should not be more than 10 and it should not be negative. I know I could write this:
string cfg = ConfigurationManager.AppSettings["myKey"];
decimal bla;
if (!decimal.TryParse(cfg,out bla))
{
bla = 0; // 0 is the default value
}
else
{
if (bla<0 || bla>10)
{
bla = 0;
}
}
Which is not complicated, not convoluted, and easy to read. However, this is how I like it done:
// initialize it so the compiler doesn't complain when you select it after
decimal awesome = 0;
// use Enumerable.Repeat to grab a "singleton" IEnumerable<string>
// which is feed with the value got from app settings
awesome = Enumerable.Repeat(ConfigurationManager.AppSettings["myKey"], 1)
// Is it parseable? grab it
.Where(value => decimal.TryParse(value, out awesome))
// This is a little trick: select the own variable since it has been assigned by TryParse
// Also, from now on I'm working with an IEnumerable<decimal>
.Select(value => awesome)
// Check the other constraints
.Where(number => number >= 0 && number <= 10)
// If the previous "Where"s weren't matched, the IEnumerable is empty, so get the default value
.DefaultIfEmpty(0)
// Return the value from the IEnumerable
.Single();
Without the comments, it looks like this:
decimal awesome = 0;
awesome = Enumerable.Repeat(ConfigurationManager.AppSettings["myKey"], 1)
.Where(value => decimal.TryParse(value, out awesome))
.Select(value => awesome)
.Where(number => number >= 0 && number <= 10)
.DefaultIfEmpty(0)
.Single();
I don't know if I'm the only one here, but I feel the second method is much more "organic" than the first one. It's not easily debuggable, because of LINQ, but it's pretty failproof I guess. At least this one I wrote. Anyway, if you needed to debug, you could just add curly braces and return statements inside the linq methods and be happy about it.
I've been doing this for a while now, and it feels much more natural than doing things "line per line, step by step". Plus, I just specified the default value once. And it's written in a line which says DefaultIfEmpty so it's pretty straightforward.
Another plus, I definitely don't do it if I notice the query will be much larger than the one I wrote up there. Instead, I break into smaller chunks of linq glory so it will be easier to understand and debug.
I find it easier to see a variable assignment and automatically think: this is what you had to do to set this value, rather than look at ifs,elses,switches, and etc, and try to figure out if they're part of the formula or not.
And it prevents developers from writing undesired side effects in wrong places, I think.
But in the end, some could say it looks very hackish, or too arcane.
So I come with the question at hand:
Is using LINQ against a single object considered a bad practice?

I say yes, but it's really up to preference. It definitely has disadvantages, but I will leave that up to you. Your original code can become much simpler though.
string cfg = ConfigurationManager.AppSettings["myKey"];
decimal bla;
if (!decimal.TryParse(cfg,out bla) || bla < 0 || bla > 10)
bla = 0; // 0 is the default value
This works because of "short circuit" evaluation, meaning that the program will stop checking other conditions once the first true condition is found.

Are there any tricks for counting the number of lines in a text file? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Say you have a text file - what's the fastest and/or most memory efficient way to determine the number of lines of text in that file?
Is it simply a matter of scanning through it character by character and looking for newline characters?

Probably not the fastest but it will be the most versatile...
int lines = 0;
/* if you need to use an encoding other than UTF-8 you way want to try...
new StreamReader("filename.txt", yourEncoding)
... instead of File.OpenText("myFile.txt")
*/
using (var fs = File.OpenText("myFile.txt"))
while (!fs.EndOfStream)
{
fs.ReadLine();
lines++;
}
... this will probably be faster ...
if you need even more speed you might try a Duff's device and check 10 or 20 bytes before the branch
int lines = 0;
var buffer = new byte[32768];
var bufferLen = 1;
using (var fs = File.OpenRead("filename.txt"))
while (bufferLen > 0)
{
bufferLen = fs.Read(buffer, 0, 32768);
for (int i = 0; i < bufferLen; i++)
/* this is only known to work for UTF-8/ASCII other
file types may need to search for different End Of Line
characters */
if (buffer[i] == 10)
lines++;
}

Unless you've got a fixed line length (in terms of bytes) you'll definitely need to read the data. Whether you can avoid converting all the data into text or not will depend on the encoding.
Now the most efficient way will be reinier's - counting line endings manually. However, the simplest code would be to use TextReader.ReadLine(). And in fact, the simplest way of doing that would be to use my LineReader class from MiscUtil, which converts a filename (or various other things) into an IEnumerable<string>. You can then just use LINQ:
int lines = new LineReader(filename).Count();
(If you don't want to grab the whole of MiscUtil, you can get just LineReader on its own from this answer.)
Now that will create a lot of garbage which repeatedly reading into the same char array wouldn't - but it won't read more than one line at a time, so while you'll be stressing the GC a bit, it's not going to blow up with large files. It will also require decoding all the data into text - which you may be able to get away without doing for some encodings.
Personally, that's the code I'd use until I found that it caused a bottleneck - it's a lot simpler to get right than doing it manually. Do you absolutely know that in your current situation, code like the above will be the bottleneck?
As ever, don't micro-optimise until you have to... and you can very easily optimise this at a later date without changing your overall design, so postponing it isn't going to do any harm.
EDIT: To convert Matthew's answer to one which will work for any encoding - but which will incur the penalty of decoding all the data, of course, you might end up with something like the code below. I'm assuming that you only care about \n - rather than \r, \n and \r\n which TextReader normally handles:
public static int CountLines(string file, Encoding encoding)
{
using (TextReader reader = new StreamReader(file, encoding))
{
return CountLines(reader);
}
}
public static int CountLines(TextReader reader)
{
char[] buffer = new char[32768];
int charsRead;
int count = 0;
while ((charsRead = reader.Read(buffer, 0, buffer.Length)) > 0)
{
for (int i = 0; i < charsRead; i++)
{
if (buffer[i] == '\n')
{
count++;
}
}
}
return count;
}

If it's a fixed record you can get the size of a record and then divide the total file size by that amount to get the number of records. If you're just looking for an estimate, what I've done in the past is just read the first x rows (e.g. 200) and use that to come up with an average row size which you can then use to guess the total number of records (divide total file size by average row size). This works well if your records are going to be fairly uniform and you don't need an exact count. I've used this on large files (do a quick check to get the file size, if it's over 20 MB then get an estimate rather than reading the entire file).
Other than that, the only 100% accurate way is to go through the file line by line using ReadLine.

I'd read it 32kb's at a time(or more), count the number of \r\n's in the memoryblock and repeat until done.

The simplest:
int lines = File.ReadAllLines(fileName).Length;
This will of course read all of the file into memory, so it's not memory efficient at all. The most memory efficient is reading the file as a stream and looking for the line break characters. This will also be the fastest, as it's a minimum of overhead.
There is no shortcut that you can use. Files are not line based, so there is no extra information that you can use, one way of the other you have to read and examine every single byte of the file.

I believe Windows uses two characters to mark the end of the line (10H and 13H if I recall correctly), so you only need to check every second character against these two.

Since this is a purely sequential process with no dependencies between locations, consider map/reduce if data is really huge. In C/C++, you can use OpenMP for parallelism. Each thread will read a chunk and count CRLF in that chunk. Finally, in the reduce part, they will sum their individual counts. Intel Threading Building Blocks provide you C++ template based constructs for parallelism. I agree this is a sledge hammer approach for small files but from a pure performance perspective, this is optimal (divide and conquer)

C# file stream - build a quiz [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Were trying to use external file (txt or CSV) in order to create a file stream in C#.
The data in the file is that of a quiz game made of :
1 short question
4 possibles answers
1 correct answer
The program should be able to tell the user whether he answered correctly or not.
I'm looking for an example code/algorithm/tutorial on how to use the data in the external file to create a simple quiz in C#.
Also, any suggestions on how to construct the txt file (how do I remark an answer as the correct one?).
Any suggestions or links?
Thanks,

My recommendation would be to use an XML file if you must load your data from a file (as opposed to from a database).
Using a text file would require you to pretty clearly define structure for individual elements of the question. Using a CSV could work, but you'd have to define a way to escape commas within the question or answer itself. It might complicate matters.
So, to reiterate, IMHO, an XML is the best way to store such data. Here is a short sample demonstrating the possible structure you might use:
<?xml version="1.0" encoding="utf-8" ?>
<Test>
<Problem id="1">
<Question>Which language am I learning right now?</Question>
<OptionA>VB 7.0</OptionA>
<OptionB>J2EE</OptionB>
<OptionC>French</OptionC>
<OptionD>C#</OptionD>
<Answer>OptionA</Answer>
</Problem>
<Problem id="2">
<Question>What does XML stand for?</Question>
<OptionA>eXtremely Muddy Language</OptionA>
<OptionB>Xylophone, thy Music Lovely</OptionB>
<OptionC>eXtensible Markup Language</OptionC>
<OptionD>eXtra Murky Lungs</OptionD>
<Answer>OptionC</Answer>
</Problem>
</Test>
As far as loading an XML into memory is concerned, .NET provides many intrinsic ways to handle XML files and strings, many of which completely obfuscate having to interact with FileStreams directly. For instance, the XmlDocument.Load(myFileName.xml) method will do it for you internally in one line of code. Personally, though I prefer to use XmlReader and XPathNavigator.
Take a look at the members of the System.Xml namespace for more information.

There's really no set way to do this, though I would agree that for a simple database of quiz questions, text files would probably be your best option (as opposed to XML or a proper database, though the former wouldn't be completely overkill).
Here's a little example of a text-based format for a set of quiz questions, and a method to read the questions into code. Edit: I've tried to make it as easy as possible to follow now (using simple constructions), with plenty of comments!
File Format
Example file contents.
Question text for 1st question...
Answer 1
Answer 2
!Answer 3 (correct answer)
Answer 4
Question text for 2nd question...
!Answer 1 (correct answer)
Answer 2
Answer 3
Answer 4
Code
This is just a simple structure for storing each question in code:
struct Question
{
public string QuestionText; // Actual question text.
public string[] Choices; // Array of answers from which user can choose.
public int Answer; // Index of correct answer within Choices.
}
You can then read the questions from the file using the following code. There's nothing special going on here other than the object initializer (basically this just allows you to set variables/properties of an object at the same time as you create it).
// Create new list to store all questions.
var questions = new List<Question>();
// Open file containing quiz questions using StreamReader, which allows you to read text from files easily.
using (var quizFileReader = new System.IO.StreamReader("questions.txt"))
{
string line;
Question question;
// Loop through the lines of the file until there are no more (the ReadLine function return null at this point).
// Note that the ReadLine called here only reads question texts (first line of a question), while other calls to ReadLine read the choices.
while ((line = quizFileReader.ReadLine()) != null)
{
// Skip this loop if the line is empty.
if (line.Length == 0)
continue;
// Create a new question object.
// The "object initializer" construct is used here by including { } after the constructor to set variables.
question = new Question()
{
// Set the question text to the line just read.
QuestionText = line,
// Set the choices to an array containing the next 4 lines read from the file.
Choices = new string[]
{
quizFileReader.ReadLine(),
quizFileReader.ReadLine(),
quizFileReader.ReadLine(),
quizFileReader.ReadLine()
}
};
// Initially set the correct answer to -1, which means that no choice marked as correct has yet been found.
question.Answer = -1;
// Check each choice to see if it begins with the '!' char (marked as correct).
for(int i = 0; i < 4; i++)
{
if (question.Choices[i].StartsWith("!"))
{
// Current choice is marked as correct. Therefore remove the '!' from the start of the text and store the index of this choice as the correct answer.
question.Choices[i] = question.Choices[i].Substring(1);
question.Answer = i;
break; // Stop looking through the choices.
}
}
// Check if none of the choices was marked as correct. If this is the case, we throw an exception and then stop processing.
// Note: this is only basic error handling (not very robust) which you may want to later improve.
if (question.Answer == -1)
{
throw new InvalidOperationException(
"No correct answer was specified for the following question.\r\n\r\n" + question.QuestionText);
}
// Finally, add the question to the complete list of questions.
questions.Add(question);
}
}
Of course, this code is rather quick and basic (certainly needs some better error handling), but it should at least illustrate a simple method you might want to use. I do think text files would be a nice way to implement a simple system such as this because of their human readability (XML would be a bit too verbose in this situation, IMO), and additionally they're about as easy to parse as XML files. Hope this gets you started anyway...

A good place to start is with Microsoft's documentation on FileStream.
A quick google search will give you pretty much everything you need. Here's a tutorial on reading and writing files in C#. Google is your friend.

any suggestions on how to construct the txt file (how do I remark an answer as the correct one?)
Perhaps the easiest is with a simple text file format - where you have questions and answers on each line (no blank lines). The # sign signifies the correct answer.
Format of the file -
Question
#answer
answer
answer
answer
An example file -
What is 1 + 1?
#2
9
3
7
Who is buried in Grant's tomb?
Ed
John
#Grant
Tim
I'm looking for an example code/algorithm/tutorial on how to use the data in the external file to create a simple quiz in C#.
Here's some code that uses the example file to create a quiz.
static void Main(string[] args)
{
int correct = 0;
using (StreamReader sr = new StreamReader("C:\\quiz.txt"))
{
while (!sr.EndOfStream)
{
Console.Clear();
for (int i = 0; i < 5; i++)
{
String line = sr.ReadLine();
if (i > 0)
{
if (line.Substring(0, 1) == "#") correct = i;
Console.WriteLine("{0}: {1}", i, line);
}
else
{
Console.WriteLine(line);
}
}
for (; ; )
{
Console.Write("Select Answer: ");
ConsoleKeyInfo cki = Console.ReadKey();
if (cki.KeyChar.ToString() == correct.ToString())
{
Console.WriteLine(" - Correct!");
Console.WriteLine("Press any key for next question...");
Console.ReadKey();
break;
}
else
{
Console.WriteLine(" - Try again!");
}
}
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Delimited string parsing? [closed] - c#

A very complrehesive library can be found here: FileHelpers

There are some good answers here: Split a string ignoring quoted sections You might want to rephrase your question to something more precise (e.g. What code snippet or library I can use to parse CSV data in .NET?).

Better late than never (add to the completeness of SO): http://www.codeproject.com/KB/database/CsvReader.aspx This one ff-ing rules. GJ

Simplest way is just to split the string into a char array and look for your string determiners and split char. It should be relatively easy to unit test. You can wrap it in an extension method similar to the basic .Spilt method.

Related

Replace Characters in .txt file with Another Character from a Defined List [closed]

Counting characters and running an if statement [closed]

Is using LINQ against a single object considered a bad practice? [closed]

Are there any tricks for counting the number of lines in a text file? [closed]

C# file stream - build a quiz [closed]

Categories

Resources