C++ ">>" and "<<" IO in C#? - c#

Is there a C# library that provides the functionality of ">>" and "<<" for IO in C++? It was really convenient for console apps. Granted not a lot of console apps are in C#, but some of us use it for them.
I know about Console.Read[Line]|Write[Line] and Streams|FileStream|StreamReader|StreamWriter thats not part of the question.
I dont think im specific enough
int a,b;
cin >> a >> b;
IS AMAZING!!
string input = Console.ReadLine();
string[] data = input.split( ' ' );
a = Convert.ToInt32( data[0] );
b = Convert.ToInt32( data[1] );
... long winded enough? Plus there are other reasons why the C# solution is worse. I must get the entire line or make my own buffer for it. If the line im working on is IDK say the 1000 line of Bells Triangle, I waste so much time reading everything at one time.
EDIT:
GAR!!!
OK THE PROBLEM!!!
Using IntX to do HUGE number like the .net 4.0 BigInteger to produce the bell triangle. If you know the bell triangle it gets freaking huge very very quickly. The whole point of this question is that I need to deal with each number individually. If you read an entire line, you could easily hit Gigs of data. This is kinda the same as digits of Pi. For Example 42pow1048576 is 1.6 MB! I don't have time nor memory to read all the numbers as one string then pick the one I want

No, and I wouldn't. C# != C++
You should try your best to stick with the language convention of whatever language you are working in.

I think I get what you are after: simple, default formatted input. I think the reason there is no TextReader.ReadXXX() is that this is parsing, and parsing is hard: for example: should ReadFloat():
ignore leading whitespace
require decimal point
require trailing whitespace (123abc)
handle exponentials (12.3a3 parses differently to 12.4e5?)
Not to mention what the heck does ReadString() do? From C++, you would expect "read to the next whitespace", but the name doesn't say that.
Now all of these have good sensible answers, and I agree C# (or rather, the BCL) should provide them, but I can certainly understand why they would choose to not provide fragile, nearly impossible to use correctly, functions right there on a central class.
EDIT:
For the buffering problem, an ugly solution is:
static class TextReaderEx {
static public string ReadWord(this TextReader reader) {
int c;
// Skip leading whitespace
while (-1 != (c = reader.Peek()) && char.IsWhiteSpace((char)c)) reader.Read();
// Read to next whitespace
var result = new StringBuilder();
while (-1 != (c = reader.Peek()) && !char.IsWhiteSpace((char)c)) {
reader.Read();
result.Append((char)c);
}
return result.ToString();
}
}
...
int.Parse(Console.In.ReadWord())

Nope. You're stuck with Console.WriteLine. You could create a wrapper that offered this functionality, though.

You can Use Console.WriteLine , Console.ReadLine ..For the purpose.Both are in System NameSpace.

You have System.IO.Stream(Reader|Writer)
And for console: Console.Write, Console.Read

Not that I know of. If you are interested of the chaining outputs you can use System.Text.StringBuilder.
http://msdn.microsoft.com/en-us/library/system.text.stringbuilder(VS.71).aspx
StringBuilder builder = new StringBuilder();
builder.Append("hello").Append(" world!");
Console.WriteLine(builder.ToString());
Perhaps not as pretty as C++, but as another poster states, C# != C++.

This is not even possible in C#, no matter how hard you try:
The left hand side and right hand side of operators is always passed by value; this rules out the possibility of cin.
The right hand side of << and >> must be an integer; this rules out cout.
The first point is to make sure operator overloading is a little less messy than in C++ (debatable, but it surely makes things a lot simpler), and the second point was specifically chosen to rule out C++'s cin and cout way of dealing with IO, IIRC.

Related

Combine multiple lines of C# code for brevity or separate for clarity

I am fairly new to c# and trying to learn best practices. I've been faced with many situations over the last week in which I need to make a choice between longer+simpler code, or shorter code that combines several actions into a single statement. What are some standards you veteran coders use when trying to write clear, concise code? Here is an example of code I'm writing with two options. Which is preferable?
A)
if (ncPointType.StartsWith("A"))//analog points
{
string[] precisionString = Regex.Split(unitsParam.Last(), ", ");
precision = int.Parse(precisionString[1]);
}
else
precision = null;
B)
if (ncPointType.StartsWith("A"))//analog points
precision = int.Parse(Regex.Split(unitsParam.Last(), ", ")[1]);
else
precision = null;
There is no right or wrong. This is opinion based really.
However, remember that whether you add braces, add comments, add whitespace or whatever, it doesn't affect the performance or the size of the final assembly because the compiler optimizes it very well. So, why not go with the more verbose so that other programmers can follow it easier?
This is basically subject to the coding standard you choose. There is no right or wrong, just personal preference.
Agree as a team what you prefer.
But even better is:
precision = ncPointType.StartsWith("A")?
int.Parse(Regex.Split(unitsParam.Last(), ", ")):
null;
This expresses the function beiong executed (set the precision),
and the conditions that control how it is set, in even less code,
- without creating unnecessary temporary variables to hold a temporary result that is not used anywhere else.
If you ask me both A and B are bad practice. You should always use braces regardless of whether or not it is one line or multiple lines. This helps to prevent bugs in the future when people add additional lines of code to your if or else blocks and don't notice that the braces are missing.
Sometimes having "more" code, like using temporary variables, etc., will make your code easier to debug as you can hover over symbols. It's all about balance, and this comes with experience. Remember that you may not be the only one working on your code, so clarity above all.
Ultimately it comes down to your choice because it has nothing to do with performance, but I would also note that you cannot instantiate or define variables if you omit the curly braces:
int a = 0;
if (a == 0)
int b = 3;
Is not valid, whereas:
int a = 0;
if (a == 0)
{
int b = 3;
}
Is valid.
It's easier to debug option A, but option B is more concise while still being readable. You could make it even more concise with a ternary operator:
precision = ncPointType.StartsWith("A") ? int.Parse(Regex.Split(unitsParam.Last(), ", ")[1]) : null;
Although it's far more readable this way (and still works the same way!):
precision = ncPointType.StartsWith("A") ?
int.Parse(Regex.Split(unitsParam.Last(), ", ")[1]) :
null;
It's best to stick with the standards used in your project. Readability is far more important for maintainability than having less lines of code. In both options, the efficiency is the same, so you don't need to worry about speed in this case.
As everyone else here points out, there are no hard and fast "rules" per se, but there should probably be braces around both block of code; both A and B have characteristics similar to the Apple goto bug because the introduce potential ambiguity.
Consider:
void Main()
{
int x=1;
if (x==1)
Console.WriteLine("1");
else if (x==2)
Console.WriteLine("NOT 1");
Console.WriteLine("OK");
}
will produce
1
OK
To someone scanning the code, especially if it's surrounded by other, innocuous looking code, this could easily be misread as "only print "OK" if x==2). Obviously some people will spot it, some won't, but why introduce the danger for the want of a brace of braces :)

'do...while' vs. 'while'

Possible Duplicates:
While vs. Do While
When should I use do-while instead of while loops?
I've been programming for a while now (2 years work + 4.5 years degree + 1 year pre-college), and I've never used a do-while loop short of being forced to in the Introduction to Programming course. I have a growing feeling that I'm doing programming wrong if I never run into something so fundamental.
Could it be that I just haven't run into the correct circumstances?
What are some examples where it would be necessary to use a do-while instead of a while?
(My schooling was almost all in C/C++ and my work is in C#, so if there is another language where it absolutely makes sense because do-whiles work differently, then these questions don't really apply.)
To clarify...I know the difference between a while and a do-while. While checks the exit condition and then performs tasks. do-while performs tasks and then checks exit condition.
If you always want the loop to execute at least once. It's not common, but I do use it from time to time. One case where you might want to use it is trying to access a resource that could require a retry, e.g.
do
{
try to access resource...
put up message box with retry option
} while (user says retry);
do-while is better if the compiler isn't competent at optimization. do-while has only a single conditional jump, as opposed to for and while which have a conditional jump and an unconditional jump. For CPUs which are pipelined and don't do branch prediction, this can make a big difference in the performance of a tight loop.
Also, since most compilers are smart enough to perform this optimization, all loops found in decompiled code will usually be do-while (if the decompiler even bothers to reconstruct loops from backward local gotos at all).
I have used this in a TryDeleteDirectory function. It was something like this
do
{
try
{
DisableReadOnly(directory);
directory.Delete(true);
}
catch (Exception)
{
retryDeleteDirectoryCount++;
}
} while (Directory.Exists(fullPath) && retryDeleteDirectoryCount < 4);
Do while is useful for when you want to execute something at least once. As for a good example for using do while vs. while, lets say you want to make the following: A calculator.
You could approach this by using a loop and checking after each calculation if the person wants to exit the program. Now you can probably assume that once the program is opened the person wants to do this at least once so you could do the following:
do
{
//do calculator logic here
//prompt user for continue here
} while(cont==true);//cont is short for continue
This is sort of an indirect answer, but this question got me thinking about the logic behind it, and I thought this might be worth sharing.
As everyone else has said, you use a do ... while loop when you want to execute the body at least once. But under what circumstances would you want to do that?
Well, the most obvious class of situations I can think of would be when the initial ("unprimed") value of the check condition is the same as when you want to exit. This means that you need to execute the loop body once to prime the condition to a non-exiting value, and then perform the actual repetition based on that condition. What with programmers being so lazy, someone decided to wrap this up in a control structure.
So for example, reading characters from a serial port with a timeout might take the form (in Python):
response_buffer = []
char_read = port.read(1)
while char_read:
response_buffer.append(char_read)
char_read = port.read(1)
# When there's nothing to read after 1s, there is no more data
response = ''.join(response_buffer)
Note the duplication of code: char_read = port.read(1). If Python had a do ... while loop, I might have used:
do:
char_read = port.read(1)
response_buffer.append(char_read)
while char_read
The added benefit for languages that create a new scope for loops: char_read does not pollute the function namespace. But note also that there is a better way to do this, and that is by using Python's None value:
response_buffer = []
char_read = None
while char_read != '':
char_read = port.read(1)
response_buffer.append(char_read)
response = ''.join(response_buffer)
So here's the crux of my point: in languages with nullable types, the situation initial_value == exit_value arises far less frequently, and that may be why you do not encounter it. I'm not saying it never happens, because there are still times when a function will return None to signify a valid condition. But in my hurried and briefly-considered opinion, this would happen a lot more if the languages you used did not allow for a value that signifies: this variable has not been initialised yet.
This is not perfect reasoning: in reality, now that null-values are common, they simply form one more element of the set of valid values a variable can take. But practically, programmers have a way to distinguish between a variable being in sensible state, which may include the loop exit state, and it being in an uninitialised state.
I used them a fair bit when I was in school, but not so much since.
In theory they are useful when you want the loop body to execute once before the exit condition check. The problem is that for the few instances where I don't want the check first, typically I want the exit check in the middle of the loop body rather than at the very end. In that case, I prefer to use the well-known for (;;) with an if (condition) exit; somewhere in the body.
In fact, if I'm a bit shaky on the loop exit condition, sometimes I find it useful to start writing the loop as a for (;;) {} with an exit statement where needed, and then when I'm done I can see if it can be "cleaned up" by moving initilizations, exit conditions, and/or increment code inside the for's parentheses.
A situation where you always need to run a piece of code once, and depending on its result, possibly more times. The same can be produced with a regular while loop as well.
rc = get_something();
while (rc == wrong_stuff)
{
rc = get_something();
}
do
{
rc = get_something();
}
while (rc == wrong_stuff);
It's as simple as that:
precondition vs postcondition
while (cond) {...} - precondition, it executes the code only after checking.
do {...} while (cond) - postcondition, code is executed at least once.
Now that you know the secret .. use them wisely :)
do while is if you want to run the code block at least once. while on the other hand won't always run depending on the criteria specified.
I see that this question has been adequately answered, but would like to add this very specific use case scenario. You might start using do...while more frequently.
do
{
...
} while (0)
is often used for multi-line #defines. For example:
#define compute_values \
area = pi * r * r; \
volume = area * h
This works alright for:
r = 4;
h = 3;
compute_values;
-but- there is a gotcha for:
if (shape == circle) compute_values;
as this expands to:
if (shape == circle) area = pi *r * r;
volume = area * h;
If you wrap it in a do ... while(0) loop it properly expands to a single block:
if (shape == circle)
do
{
area = pi * r * r;
volume = area * h;
} while (0);
The answers so far summarize the general use for do-while. But the OP asked for an example, so here is one: Get user input. But the user's input may be invalid - so you ask for input, validate it, proceed if it's valid, otherwise repeat.
With do-while, you get the input while the input is not valid. With a regular while-loop, you get the input once, but if it's invalid, you get it again and again until it is valid. It's not hard to see that the former is shorter, more elegant, and simpler to maintain if the body of the loop grows more complex.
I've used it for a reader that reads the same structure multiple times.
using(IDataReader reader = connection.ExecuteReader())
{
do
{
while(reader.Read())
{
//Read record
}
} while(reader.NextResult());
}
I can't imagine how you've gone this long without using a do...while loop.
There's one on another monitor right now and there are multiple such loops in that program. They're all of the form:
do
{
GetProspectiveResult();
}
while (!ProspectIsGood());
I like to understand these two as:
while -> 'repeat until',
do ... while -> 'repeat if'.
I've used a do while when I'm reading a sentinel value at the beginning of a file, but other than that, I don't think it's abnormal that this structure isn't too commonly used--do-whiles are really situational.
-- file --
5
Joe
Bob
Jake
Sarah
Sue
-- code --
int MAX;
int count = 0;
do {
MAX = a.readLine();
k[count] = a.readLine();
count++;
} while(count <= MAX)
Here's my theory why most people (including me) prefer while(){} loops to do{}while(): A while(){} loop can easily be adapted to perform like a do..while() loop while the opposite is not true. A while loop is in a certain way "more general". Also programmers like easy to grasp patterns. A while loop says right at start what its invariant is and this is a nice thing.
Here's what I mean about the "more general" thing. Take this do..while loop:
do {
A;
if (condition) INV=false;
B;
} while(INV);
Transforming this in to a while loop is straightforward:
INV=true;
while(INV) {
A;
if (condition) INV=false;
B;
}
Now, we take a model while loop:
while(INV) {
A;
if (condition) INV=false;
B;
}
And transform this into a do..while loop, yields this monstrosity:
if (INV) {
do
{
A;
if (condition) INV=false;
B;
} while(INV)
}
Now we have two checks on opposite ends and if the invariant changes you have to update it on two places. In a certain way do..while is like the specialized screwdrivers in the tool box which you never use, because the standard screwdriver does everything you need.
I am programming about 12 years and only 3 months ago I have met a situation where it was really convenient to use do-while as one iteration was always necessary before checking a condition. So guess your big-time is ahead :).
It is a quite common structure in a server/consumer:
DOWHILE (no shutdown requested)
determine timeout
wait for work(timeout)
IF (there is work)
REPEAT
process
UNTIL(wait for work(0 timeout) indicates no work)
do what is supposed to be done at end of busy period.
ENDIF
ENDDO
the REPEAT UNTIL(cond) being a do {...} while(!cond)
Sometimes the wait for work(0) can be cheaper CPU wise (even eliminating the timeout calculation might be an improvement with very high arrival rates). Moreover, there are many queuing theory results that make the number served in a busy period an important statistic. (See for example Kleinrock - Vol 1.)
Similarly:
DOWHILE (no shutdown requested)
determine timeout
wait for work(timeout)
IF (there is work)
set throttle
REPEAT
process
UNTIL(--throttle<0 **OR** wait for work(0 timeout) indicates no work)
ENDIF
check for and do other (perhaps polled) work.
ENDDO
where check for and do other work may be exorbitantly expensive to put in the main loop or perhaps a kernel that does not support an efficient waitany(waitcontrol*,n) type operation or perhaps a situation where a prioritized queue might starve the other work and throttle is used as starvation control.
This type of balancing can seem like a hack, but it can be necessary. Blind use of thread pools would entirely defeat the performance benefits of the use of a caretaker thread with a private queue for a high updating rate complicated data structure as the use of a thread pool rather than a caretaker thread would require thread-safe implementation.
I really don't want to get into a debate about the pseudo code (for example, whether shutdown requested should be tested in the UNTIL) or caretaker threads versus thread pools - this is just meant to give a flavor of a particular use case of the control flow structure.
This is my personal opinion, but this question begs for an answer rooted in experience:
I have been programming in C for 38 years, and I never use do / while loops in regular code.
The only compelling use for this construct is in macros where it can wrap multiple statements into a single statement via a do { multiple statements } while (0)
I have seen countless examples of do / while loops with bogus error detection or redundant function calls.
My explanation for this observation is programmers tend to model problems incorrectly when they think in terms of do / while loops. They either miss an important ending condition or they miss the possible failure of the initial condition which they move to the end.
For these reasons, I have come to believe that where there is a do / while loop, there is a bug, and I regularly challenge newbie programmers to show me a do / while loop where I cannot spot a bug nearby.
This type of loop can be easily avoided: use a for (;;) { ... } and add the necessary termination tests where they are appropriate. It is quite common that there need be more than one such test.
Here is a classic example:
/* skip the line */
do {
c = getc(fp);
} while (c != '\n');
This will fail if the file does not end with a newline. A trivial example of such a file is the empty file.
A better version is this:
int c; // another classic bug is to define c as char.
while ((c = getc(fp)) != EOF && c != '\n')
continue;
Alternately, this version also hides the c variable:
for (;;) {
int c = getc(fp);
if (c == EOF || c == '\n')
break;
}
Try searching for while (c != '\n'); in any search engine, and you will find bugs such as this one (retrieved June 24, 2017):
In ftp://ftp.dante.de/tex-archive/biblio/tib/src/streams.c , function getword(stream,p,ignore), has a do / while and sure enough at least 2 bugs:
c is defined as a char and
there is a potential infinite loop while (c!='\n') c=getc(stream);
Conclusion: avoid do / while loops and look for bugs when you see one.
while loops check the condition before the loop, do...while loops check the condition after the loop. This is useful is you want to base the condition on side effects from the loop running or, like other posters said, if you want the loop to run at least once.
I understand where you're coming from, but the do-while is something that most use rarely, and I've never used myself. You're not doing it wrong.
You're not doing it wrong. That's like saying someone is doing it wrong because they've never used the byte primitive. It's just not that commonly used.
The most common scenario I run into where I use a do/while loop is in a little console program that runs based on some input and will repeat as many times as the user likes. Obviously it makes no sense for a console program to run no times; but beyond the first time it's up to the user -- hence do/while instead of just while.
This allows the user to try out a bunch of different inputs if desired.
do
{
int input = GetInt("Enter any integer");
// Do something with input.
}
while (GetBool("Go again?"));
I suspect that software developers use do/while less and less these days, now that practically every program under the sun has a GUI of some sort. It makes more sense with console apps, as there is a need to continually refresh the output to provide instructions or prompt the user with new information. With a GUI, in contrast, the text providing that information to the user can just sit on a form and never need to be repeated programmatically.
I use do-while loops all the time when reading in files. I work with a lot of text files that include comments in the header:
# some comments
# some more comments
column1 column2
1.234 5.678
9.012 3.456
... ...
i'll use a do-while loop to read up to the "column1 column2" line so that I can look for the column of interest. Here's the pseudocode:
do {
line = read_line();
} while ( line[0] == '#');
/* parse line */
Then I'll do a while loop to read through the rest of the file.
Being a geezer programmer, many of my school programming projects used text menu driven interactions. Virtually all used something like the following logic for the main procedure:
do
display options
get choice
perform action appropriate to choice
while choice is something other than exit
Since school days, I have found that I use the while loop more frequently.
One of the applications I have seen it is in Oracle when we look at result sets.
Once you a have a result set, you first fetch from it (do) and from that point on.. check if the fetch returns an element or not (while element found..) .. The same might be applicable for any other "fetch-like" implementations.
I 've used it in a function that returned the next character position in an utf-8 string:
char *next_utf8_character(const char *txt)
{
if (!txt || *txt == '\0')
return txt;
do {
txt++;
} while (((signed char) *txt) < 0 && (((unsigned char) *txt) & 0xc0) == 0xc0)
return (char *)txt;
}
Note that, this function is written from mind and not tested. The point is that you have to do the first step anyway and you have to do it before you can evaluate the condition.
Any sort of console input works well with do-while because you prompt the first time, and re-prompt whenever the input validation fails.
Even though there are plenty of answers here is my take. It all comes down to optimalization. I'll show two examples where one is faster then the other.
Case 1: while
string fileName = string.Empty, fullPath = string.Empty;
while (string.IsNullOrEmpty(fileName) || File.Exists(fullPath))
{
fileName = Guid.NewGuid().ToString() + fileExtension;
fullPath = Path.Combine(uploadDirectory, fileName);
}
Case 2: do while
string fileName = string.Empty, fullPath = string.Empty;
do
{
fileName = Guid.NewGuid().ToString() + fileExtension;
fullPath = Path.Combine(uploadDirectory, fileName);
}
while (File.Exists(fullPath));
So there two will do the exact same things. But there is one fundamental difference and that is that the while requires an extra statement to enter the while. Which is ugly because let's say every possible scenario of the Guid class has already been taken except for one variant. This means I'll have to loop around 5,316,911,983,139,663,491,615,228,241,121,400,000 times.
Every time I get to the end of my while statement I will need to do the string.IsNullOrEmpty(fileName) check. So this would take up a little bit, a tiny fraction of CPU work. But do this very small task times the possible combinations the Guid class has and we are talking about hours, days, months or extra time?
Of course this is an extreme example because you probably wouldn't see this in production. But if we would think about the YouTube algorithm, it is very well possible that they would encounter the generation of an ID where some ID's have already been taken. So it comes down to big projects and optimalization.
Even in educational references you barely would find a do...while example. Only recently, after reading Ethan Brown beautiful book, Learning JavaScript I encountered one do...while well defined example. That's been said, I believe it is OK if you don't find application for this structure in you routine job.
It's true that do/while loops are pretty rare. I think this is because a great many loops are of the form
while(something needs doing)
do it;
In general, this is an excellent pattern, and it has the usually-desirable property that if nothing needs doing, the loop runs zero times.
But once in a while, there's some fine reason why you definitely want to make at least one trip through the loop, no matter what. My favorite example is: converting an integer to its decimal representation as a string, that is, implementing printf("%d"), or the semistandard itoa() function.
To illustrate, here is a reasonably straightforward implementation of itoa(). It's not quite the "traditional" formulation; I'll explain it in more detail below if anyone's curious. But the key point is that it embodies the canonical algorithm, repeatedly dividing by 10 to pick off digits from the right, and it's written using an ordinary while loop... and this means it has a bug.
#include <stddef.h>
char *itoa(unsigned int n, char buf[], int bufsize)
{
if(bufsize < 2) return NULL;
char *p = &buf[bufsize];
*--p = '\0';
while(n > 0) {
if(p == buf) return NULL;
*--p = n % 10 + '0';
n /= 10;
}
return p;
}
If you didn't spot it, the bug is that this code returns nothing — an empty string — if you ask it to convert the integer 0. So this is an example of a case where, when there's "nothing" to do, we don't want the code to do nothing — we always want it to produce at least one digit. So we always want it to make at least one trip through the loop. So a do/while loop is just the ticket:
do {
if(p == buf) return NULL;
*--p = n % 10 + '0';
n /= 10;
} while(n > 0);
So now we have a loop that usually stops when n reaches 0, but if n is initially 0 — if you pass in a 0 — it returns the string "0", as desired.
As promised, here's a bit more information about the itoa function in this example. You pass it arguments which are: an int to convert (actually, an unsigned int, so that we don't have to worry about negative numbers); a buffer to render into; and the size of that buffer. It returns a char * pointing into your buffer, pointing at the beginning of the rendered string. (Or it returns NULL if it discovers that the buffer you gave it wasn't big enough.) The "nontraditional" aspect of this implementation is that it fills in the array from right to left, meaning that it doesn't have to reverse the string at the end — and also meaning that the pointer it returns to you is usually not to the beginning of the buffer. So you have to use the pointer it returns to you as the string to use; you can't call it and then assume that the buffer you handed it is the string you can use.
Finally, for completeness, here is a little test program to test this version of itoa with.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n;
if(argc > 1)
n = atoi(argv[1]);
else {
printf("enter a number: "); fflush(stdout);
if(scanf("%d", &n) != 1) return EXIT_FAILURE;
}
if(n < 0) {
fprintf(stderr, "sorry, can't do negative numbers yet\n");
return EXIT_FAILURE;
}
char buf[20];
printf("converted: %s\n", itoa(n, buf, sizeof(buf)));
return EXIT_SUCCESS;
}
I ran across this while researching the proper loop to use for a situation I have. I believe this will fully satisfy a common situation where a do.. while loop is a better implementation than a while loop (C# language, since you stated that is your primary for work).
I am generating a list of strings based on the results of an SQL query. The returned object by my query is an SQLDataReader. This object has a function called Read() which advances the object to the next row of data, and returns true if there was another row. It will return false if there is not another row.
Using this information, I want to return each row to a list, then stop when there is no more data to return. A Do... While loop works best in this situation as it ensures that adding an item to the list will happen BEFORE checking if there is another row. The reason this must be done BEFORE checking the while(condition) is that when it checks, it also advances. Using a while loop in this situation would cause it to bypass the first row due to the nature of that particular function.
In short:
This won't work in my situation.
//This will skip the first row because Read() returns true after advancing.
while (_read.NextResult())
{
list.Add(_read.GetValue(0).ToString());
}
return list;
This will.
//This will make sure the currently read row is added before advancing.
do
{
list.Add(_read.GetValue(0).ToString());
}
while (_read.NextResult());
return list;

Performance - Python vs. C#/C++/C reading char-by-char

So I have these giant XML files (and by giant, I mean like 1.5GB+) and they don't have CRLFs. I'm trying to run a diff-like program to find the differences between these files.
Since I've yet to find a diff program that won't explode due to memory exhaustion, I've decided the best bet was to add CRLFs after closing tags.
I wrote a python script to read char-by-char and add new-lines after '>'. The problem is I'm running this on a single core PC circa-1995 or something ridiculous, and it's only processing about 20MB/hour when I have both converting at the same time.
Any idea if writing this in C#/C/C++ instead will yield any benefits? If not, does anyone know of a diff program that will go byte-by-byte? Thanks.
EDIT:
Here's the code for my processing function...
def read_and_format(inputfile, outputfile):
''' Open input and output files, then read char-by-char and add new lines after ">" '''
infile = codecs.open(inputfile,"r","utf-8")
outfile = codecs.open(outputfile,"w","utf-8")
char = infile.read(1)
while(1):
if char == "":
break
else:
outfile.write(char)
if(char == ">"):
outfile.write("\n")
char = infile.read(1)
infile.close()
outfile.close()
EDIT2:
Thanks for the awesome responses. Increaseing the read size created an unbelievable speed increase. Problem solved.
Reading and writing a single character at a time is almost always going to be slow, because disks are block-based devices, rather than character-based devices - it will read a lot more than just the one byte you're after, and the surplus parts need to be discarded.
Try reading and writing more at a time, say, 8192 bytes (8KB) and then finding and adding newlines in that string before writing it out - you should save a lot in performance because a lot less I/O is required.
As LBushkin points out, your I/O library may be doing buffering, but unless there is some form of documentation that shows this does indeed happen (for reading AND writing), it's a fairly easy thing to try before rewriting in a different language.
Why don't you just use sed?
cat giant.xml | sed 's/>/>\x0a\x0d/g' > giant-with-linebreaks.xml
Rather than reading byte by byte, which incurs a disk access for each byte read, try reading ~20 MB at a time and doing your search + replace on that :)
You can probably do this in Notepad....
Billy3
For the type of problem you describe, I suspect the algorithm you employ for comparing the data will have a much more significant effect than the I/O model or language. In fact, string allocation and search may be more expensive here than anything else.
Some general suggestions before you write this yourself:
Try running on a faster machine if you have one available. That will make a huge difference.
Look for an existing tool online for doing XML diffs ... don't write one yourself.
If are are going to write this in C# (or Java or C/C++), I would do the following:
Read a fairly large block into memory all at once (let's say between 200k and 1M)
Allocate an empty block that's twice that size (this assumes a worst case of every character is a '>')
Copy from the input block to the output block conditionally appending a CRLF after each '>' character.
Write the new block out to disk.
Repeat until all the data has been processed.
Additionally, you could also write such a program to run on multiple threads, so that while once thread is perform CRLF insertions in memory, a separate thread is read blocks in from disk. This type of parallelization is complicated ... so I would only do so if you really need maximum performance.
Here's a really simple C# program to get you started, if you need it. It accepts an input file path and an output path on the command line, and performs the substitution you are looking for ('>' ==> CRLF). This sample leaves much to be improved (parallel processing, streaming, some validation, etc)... but it should be a decent start.
using System;
using System.IO;
namespace ExpandBrackets
{
class Program
{
static void Main(string[] args)
{
if (args.Length == 2)
{
using( StreamReader input = new StreamReader( args[0] ) )
using( StreamWriter output = new StreamWriter( args[1] ) )
{
int readSize = 0;
int blockSize = 100000;
char[] inBuffer = new char[blockSize];
char[] outBuffer = new char[blockSize*3];
while( ( readSize = input.ReadBlock( inBuffer, 0, blockSize ) ) > 0 )
{
int writeSize = TransformBlock( inBuffer, outBuffer, readSize );
output.Write( outBuffer, 0, writeSize );
}
}
}
else
{
Console.WriteLine( "Usage: repchar {inputfile} {outputfile}" );
}
}
private static int TransformBlock( char[] inBuffer, char[] outBuffer, int size )
{
int j = 0;
for( int i = 0; i < size; i++ )
{
outBuffer[j++] = inBuffer[i];
if (inBuffer[i] == '>') // append CR LF
{
outBuffer[j++] = '\r';
outBuffer[j++] = '\n';
}
}
return j;
}
}
}
All of the languages mentioned typically, at some point, revert to the C runtime library for byte by byte file access. Writing this in C will probably be the fastest option.
However, I doubt it will provide a huge speed boost. Python is fairly speedy, if you're doing things correctly.
The main way to really get a big speed improvement would be to introduce threading. If you read the data in from the file in a large block in one thread, and had a separate thread that did your newline processing + diff processing, you could dramatically improve the speed of this algorithm. This would probably be easier to implement in C++, C#, or IronPython than in C or CPython directly, since they provide very easy, high-level synchronization tools for handling the threading issues (especially when using .NET).
you could try xmldiff - http://msdn.microsoft.com/en-us/library/aa302294.aspx
I haven't used it for such huge data but I think it would be reasonably optimized
I put this as a comment on another answer, but in case you miss it--you might want to look at The Shootout. It's a highly optimized set of code for various problems in many languages.
According to those results, Python tends to be about 50x slower than c (but it is faster than the other interpreted languages). In comparison Java is about 2x slower than c. If you went to one of the faster compiled languages, I don't see why you wouldn't see a similar increase.
By the way, the figures attained from the shootout are wonderfully un-assailable, you can't really challenge them, instead if you don't believe the numbers are fair because the code to solve a problem in your favorite language isn't optimized properly, then you can submit better code yourself. The act of many people doing this means most of the code on there is pretty damn optimized for every popular language. If you show them a more optimized compiler or interpreter, they may include the results from it as well.
Oh: except C#, that's only represented by MONO so if Microsoft's compiler is more optimized, it's not shown. All the tests seem to run on Linux machines. My guess is Microsoft's C# should run at about the same speed as Java, but the shootout lists mono as a bit slower (about 3x as slow as C)..
As others said, if you do it in C it will be pretty much unbeatable, because C buffers I/O, and getc() is inlined (in my memory).
Your real performance issue will be in the diff.
Maybe there's a pretty good one out there, but for those size files I doubt it. For fun, I'm a do-it-yourselfer. The strategy I would use is to have a rolling window in each file, several megabytes long. The search strategy for mismatches is diagonal search, which is if you are at lines i and j, compare in this sequence:
line(i+0) == line(j+0)
line(i+0) == line(j+1)
line(i+1) == line(j+0)
line(i+0) == line(j+2)
line(i+1) == line(j+1)
line(i+2) == line(j+0)
and so on. No doubt there's a better way, but if I'm going to code it myself and manage the rolling windows, that's what I'd try.

Semicolons in C#

Why are semicolons necessary at the end of each line in C#?
Why can't the complier just know where each line is ended?
The line terminator character will make you be able to break a statement across multiple lines.
On the other hand, languages like VB have a line continuation character (and may raise compile error for semicolon). I personally think it's much cleaner to terminate statements with a semicolon rather than continue using undersscore.
Finally, languages like JavaScript (JS) and Swift have optional semicolon(s), but at least JS has a convention to always put semicolons (even if not required, which prevents accidents).
No, the compiler doesn't know that a line break is for statement termination, nor should it. It allows you to carry a statement to multilines if you like.
See:
string sql = #"SELECT foo
FROM bar
WHERE baz=42";
Or how about large method overloads:
CallMyMethod(thisIsSomethingForArgument1,
thisIsSomethingForArgument2,
thisIsSomethingForArgument2,
thisIsSomethingForArgument3,
thisIsSomethingForArgument4,
thisIsSomethingForArgument5,
thisIsSomethingForArgument6);
And the reverse, the semi-colon also allows multi-statement lines:
string s = ""; int i = 0;
How many statements is this?
for (int i = 0; i < 100; i++) // <--- should there be a semi-colon here?
Console.WriteLine("foo")
Semicolons are needed to eliminate ambiguity.
So that whitespace isn't significant except inside identifiers and keywords and such.
I personally agree with having a distinct character as a line terminator. It makes it much easier for the compiler to figure out what you are trying to do.
And contrary to popular belief it is not possible 100% of the time to for the compiler to figure out where one statement end and another begins without assistance! There are edge cases where it is ambiguous whether it is a single statement or multiple statements spanning several lines.
Read this article from Paul Vick, the technical lead of Visual Basic to see why its not as easy as it sounds.
Strictly speaking, this is true: if a human could figure out where a statement ends, so can the compiler. This hasn't really caught on yet, and few languages implement anything of that kind. The next version of VB will probably be the first language to implement a proper handling of statements that require neither explicit termination nor line continuation [source]. This would allow code like this:
Dim a = OneVeryLongExpression +
AnotherLongExpression
Dim b = 2 * a
Let's keep our fingers crossed.
On the other hand, this does make parsing much harder and can potentially result in poor error messages (see Haskell).
That said, the reason for C# to use a C-like syntax was probably due to marketing reasons more than anything else: people are already familiar with languages like C, C++ and Java. No need to introduce yet another syntax. This makes sense for a variety of reasons but it obviously inherits a lot of weaknesses from these languages.
It can be done. What you refer to is called "semicolon insertion". JavaScript does it with much success, the reason why it is not applied in C# is up to its designers. Maybe they did not know about it, or feared it might cause confusion among programmers.
For more details in semicolon insertion in JavaScript, please refer to the ECMA-script standard 262 where JavaScript is specified.
I quote from page 22 (in the PDF, page 34):
When, as the program is parsed from left
to right, the end of the input
stream of tokens is encountered and
the parser is unable to parse the
input token stream as a single complete
ECMA Script Program,
then a semicolon isa utomatically inserted at
the end of the input stream.
When, as
the program is parsed from left to right,
a token is encountered that is
allowed by some production of
the grammar, but
the production is a restricted production and the token would be the
first token for a terminal or
nonterminal immediately following the
annotation “[no LineTerminator
here]” with in the restricted production (and there fore such a token is
called a restricted token), and the
restricted token is separated fromt he
previous token by at least one
LineTerminator, then a
semicolon is automatically inserted before the restricted token.
However, there is an additional
overriding condition on the preceding
rules: a semicolon is never
inserted automatically if
the semicolon would then be parsed as an empty statement
or if that semicolon
would become one of the two semicolons in the header of a for statement
(section 12.6.3).
[...]
The specification document even contains examples!
Another good reason for semicolons is to isolate syntax errors. When syntax errors occur the semicolons allow the compiler to get back on track so that something like
a = b + c = d
can be disambiguated between
a = b + c; = d
with the error in the second statement or
a = b + ; c = d
with the error in the first statement. Without the semicolons, it can be impossible to say where a statement ends in the presence of a syntax error. A missing parenthesis might mean that the entire latter half of your program may be considered one giant syntax error rather than being syntax checked line by line.
It also helps the other way - if you meant to write
a = b; c = d;
but typoed and left out the "c" then without semis it would look like
a = b = d
which is valid and you'd have a running program with a bad and difficult to locate bug so semicolons can often help catch errors that otherwise would look like valid syntax. Also, I agree with everybody on readability. I don't like working in languages without some sort of statement terminator for that reason.
I've been mulling this question a bit and if I may take a guess at the motivations of the language designers:
C# obviously has semicolons because of its heritage from C. I've been rereading the K&R book lately and it's pretty obvious that Dennis Ritchie really didn't want to force programmers to code the way he thought was best. The book is rife with comments like, "Although we are not dogmatic about the matter, it does seem that goto statements should be used rarely, if at all" and in the section on functions they mention that they chose one of many format styles, it doesn't matter which one you pick, just be consistent.
So the use of an explicit statement terminator allows the programmer to format their code however they like. For better or worse, it seems consistent with how C was originally designed: do it your way.
I would say that the biggest reason that semicolons are necessary after each statement is familiarity for programmers already familiar with C, C++, and/or Java. C# inherits many syntactical choices from those languages and is not simply named similarly to them. Semicolon-terminated statements is just one of the many syntax choices borrowed from those languages.
Semi-colons are a remnant from the C language, when programmers often wanted to save space by combining statements on one line. i.e.
int i; for( i = 0; i < 10; i++ ) printf("hello world.\n"); printf("%d instance.\n", i);
It also helped the compiler, which was not smart enough to simply infer the end of a statement. In almost all cases, combining statements on one line is not looked favorably upon by most c# developers for readability reasons. The above is typically written like so:
int i;
for( i = 0; i < 10; i++ )
{
printf("hello world.\n);
printf("%d instance.\n", i);
}
Very verbose! For modern languages, compilers can easily be developed to infer end of statements. C# could be altered into another language which uses no unnecessary delimiters other than a space and indenting tab, i.e.
int i
for i=0 i<10 i++
printf "hello world.\n"
printf "%d instance.\n" i
That would certainly save some typing and it looks neater. If indents are used rather than spaces, the code becomes much more readable. We can do one better if we allow types to be inferred and make a special case of for, to read, (for [value]=[initial value] to [final value:
for i=1 to 10 // i is inferred to be an integer
printf "hello world.\n"
printf "%d instance.\n" i
Now, its beginning to look like f# and f#, in some ways, is almost like c# without the unnecessary punctuation. However f# lacks so many extras (like special .NET language constructs, code completion and good intellisense). So, in the end f# can be more work than c# or VB.NET to implement, sadly.
Personally, my work required VB.NET and I have been happier not having to deal with semi-colons. C# is a dated language. Linq has allowed me to cut down on the number of lines of code I have to write. Still, if I had the time, I would write a version of c# which had many of the features of f#.
You could accurately argue that requiring a semicolon to terminate a statement is superfluous. It is technically possible to remove the semicolon from the C# language and still have it work. The problem is that it leaves room for misinterpretation by humans. I would argue that the necessity of semicolons is the disambiguation for the sake of humans, not the compiler. Without some form of statement delimitation, it is much harder for humans to interpret consise staements such as this:
int i = someFlag ? 12 : 5 int j = i + 3
The compiler should be able to handle this just fine, but to a human the below looks much better
int i = someFlag ? 12 : 5; int j = i + 3;

working with incredibly large numbers in .NET

I'm trying to work through the problems on projecteuler.net but I keep running into a couple of problems.
The first is a question of storing large quanities of elements in a List<t>. I keep getting OutOfMemoryException's when storing large quantities in the list.
Now I admit I might not be doing these things in the best way but, is there some way of defining how much memory the app can consume?
It usually crashes when I get abour 100,000,000 elements :S
Secondly, some of the questions require the addition of massive numbers. I use ulong data type where I think the number is going to get super big, but I still manage to wrap past the largest supported int and get into negative numbers.
Do you have any tips for working with incredibly large numbers?
Consider System.Numerics.BigInteger.
You need to use a large number class that uses some basic math principals to split these operations up. This implementation of a C# BigInteger library on CodePoject seems to be the most promising. The article has some good explanations of how operations with massive numbers work, as well.
Also see:
Big integers in C#
As far as Project Euler goes, you might be barking up the wrong tree if you are hitting OutOfMemory exceptions. From their website:
Each problem has been designed according to a "one-minute rule", which means that although it may take several hours to design a successful algorithm with more difficult problems, an efficient implementation will allow a solution to be obtained on a modestly powered computer in less than one minute.
As user Jakers said, if you're using Big Numbers, probably you're doing it wrong.
Of the ProjectEuler problems I've done, none have required big-number math so far.
Its more about finding the proper algorithm to avoid big-numbers.
Want hints? Post here, and we might have an interesting Euler-thread started.
I assume this is C#? F# has built in ways of handling both these problems (BigInt type and lazy sequences).
You can use both F# techniques from C#, if you like. The BigInt type is reasonably usable from other languages if you add a reference to the core F# assembly.
Lazy sequences are basically just syntax friendly enumerators. Putting 100,000,000 elements in a list isn't a great plan, so you should rethink your solutions to get around that. If you don't need to keep information around, throw it away! If it's cheaper to recompute it than store it, throw it away!
See the answers in this thread. You probably need to use one of the third-party big integer libraries/classes available or wait for C# 4.0 which will include a native BigInteger datatype.
As far as defining how much memory an app will use, you can check the available memory before performing an operation by using the MemoryFailPoint class.
This allows you to preallocate memory before doing the operation, so you can check if an operation will fail before running it.
string Add(string s1, string s2)
{
bool carry = false;
string result = string.Empty;
if (s1.Length < s2.Length)
s1 = s1.PadLeft(s2.Length, '0');
if(s2.Length < s1.Length)
s2 = s2.PadLeft(s1.Length, '0');
for(int i = s1.Length-1; i >= 0; i--)
{
var augend = Convert.ToInt64(s1.Substring(i,1));
var addend = Convert.ToInt64(s2.Substring(i,1));
var sum = augend + addend;
sum += (carry ? 1 : 0);
carry = false;
if(sum > 9)
{
carry = true;
sum -= 10;
}
result = sum.ToString() + result;
}
if(carry)
{
result = "1" + result;
}
return result;
}
I am not sure if it is a good way of handling it, but I use the following in my project.
I have a "double theRelevantNumber" variable and an "int PowerOfTen" for each item and in my relevant class I have a "int relevantDecimals" variable.
So... when large numbers is encountered they are handled like this:
First they are changed to x,yyy form. So if the number 123456,789 was inputed and the "powerOfTen" was 10, it would start like this:
theRelevantNumber = 123456,789
PowerOfTen = 10
The number was then: 123456,789*10^10
It is then changed to:
1,23456789*10^15
It is then rounded by the number of relevant decimals (for example 5) to 1,23456 and then saved along with "PowerOfTen = 15"
When adding or subracting numbers together, any number outside the relevant decimals are ignored. Meaning if you take:
1*10^15 + 1*10^10 it will change to 1,00001 if "relevantDecimals" is 5 but will not change at all if "relevantDecimals" are 4.
This method make you able to deal with numbers up doubleLimit*10^intLimit without any problem, and at least for OOP it is not that hard to keep track of.
You don't need to use BigInteger. You can do this even with string array of numbers.
class Solution
{
static void Main(String[] args)
{
int n = 5;
string[] unsorted = new string[6] { "3141592653589793238","1", "3", "5737362592653589793238", "3", "5" };
string[] result = SortStrings(n, unsorted);
foreach (string s in result)
Console.WriteLine(s);
Console.ReadLine();
}
static string[] SortStrings(int size, string[] arr)
{
Array.Sort(arr, (left, right) =>
{
if (left.Length != right.Length)
return left.Length - right.Length;
return left.CompareTo(right);
});
return arr;
}
}
If you want to work with incredibly large numbers look here...
MIKI Calculator
I am not a professional programmer i write for myself, sometimes, so sorry for unprofessional use of c# but the program works. I will be grateful for any advice and correction.
I use this calculator to generate 32-character passwords from numbers that are around 58 digits long.
Since the program adds numbers in the string format, you can perform calculations on numbers with the maximum length of the string variable. The program uses long lists for the calculation, so it is possible to calculate on larger numbers, possibly 18x the maximum capacity of the list.

Categories

Resources