I am not able to google for as, as Google blocks out symbols.
it appeared in this context:
Console.WriteLine("Usage: findduplicatefiles [/sub] DirectoryName [DirectoryName]...");
Thanks :)
It doesn't mean anything in C#. All that Console.WriteLine() call does is write this string:
"Usage: findduplicatefiles [/sub] DirectoryName [DirectoryName]..."
into the console as output.
However, in the Windows command line, / functions as a command-line argument delimiter, and [] means it's an optional argument. The usage prompt is telling the user that sub is an optional argument to use with the findduplicatefiles program.
Examples:
Run findduplicatefiles.exe against the current directory:
C:\>findduplicatefiles .
Run findduplicatefiles.exe against the current directory with the sub argument:
C:\>findduplicatefiles /sub .
Run findduplicatefiles.exe against two directories, C:\abc and C:\def, with the sub argument:
C:\>findduplicatefiles /sub abc def
"[/" in a string doesn't mean anything to C#. It just writes those characters out. But I think you are confused about the meaning of what is being written out.
There is a convention when documenting command-line programs where putting an argument in square braces means the argument is optional. Thus, the string your program is writing out indicates that the command findduplicatefiles may optionally have an argument /sub after which it must have at least one directory name, and may optionally have other directory names.
In this case is does not mean anything to C#. It's just a character in the string just like the rest of the string.
Means optionally put /sub before directoryName. It is not a C# question.
Related
i'm matching words to create simple lexical analyzer.
here is my example code and output
example code:
public class
{
public static void main (String args[])
{
System.out.println("Hello");
}
}
output:
public = identifier
void = identifier
main = identifier
class = identifier
as you all can see my output is not arranged as the input comes. void and main comes after class but in output the class comes at the end. i want to print result as the input is matched.
c# code:
private void button1_Click(object sender, EventArgs e)
{
if (richTextBox1.Text.Contains("public"))
richTextBox2.AppendText("public = identifier\n");
if (richTextBox1.Text.Contains("void"))
richTextBox2.AppendText("void = identifier\n");
if (richTextBox1.Text.Contains("class"))
richTextBox2.AppendText("class = identifier\n");
if (richTextBox1.Text.Contains("main"))
richTextBox2.AppendText("main = identifier\n");
}
Your code is asking the following qustions:
Does the input contain the text "public"? If so, write down "public = identifier".
Does the input contain the text "void"? If so, write down "void = identifier".
Does the input contain the text "class"? If so, write down "class = identifier".
Does the input contain the text "main"? If so, write down "main = identifier".
The answer to all of these questions is yes, and since they're executed in that exact order, the output you get should not be surprising. Note: public, void, class and main are keywords, not identifiers.
Splitting on whitespace?
So your approach is not going to help you tokenize that input. Something slightly more in the right direction would be input.Split() - that will cut up the input at whitespace boundaries and give you an array of strings. Still, there's a lot of whitespace entries in there.
input.Split(new char[] { ' ', '\t', '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries) is a little better, giving us the following output: public, class, {, public, static, void, main, (String, args[]), {, System.out.println("Hello");, } and }.
But you'll notice that some of these strings contain multiple 'tokens': (String, args[]) and System.out.println("Hello");. And if you had a string with whitespace in it it would get split into multiple tokens. Apparently, just splitting on whitespace is not sufficient.
Tokenizing
At this point, you would start writing a loop that goes over every character in the input, checking if it's whitespace or a punctuation character (such as (, ), {, }, [, ], ., ;, and so on). Those characters should be treated as the end of the previous token, and punctuation characters should also be treated as a token of their own. Whitespace can be skipped.
You'll also have to take things like string literals and comments into account: anything in-between two double-quotes should not be tokenized, but be treated as part of a single 'string' token (including whitespace). Also, strings can contain escape sequences, such as \", that produce a single character (that double quote should not be treated as the end of the string, but as part of its content).
Anything that comes after two forward slashes should be ignored (or parsed as a single 'comment' token, if you want to process comments somehow), until the next newline (newline characters/sequences differ across operating systems). Anything after a /* should be ignored until you encounter a */ sequence.
Numbers can optionally start with a minus sign, can contain a dot (or start with a dot), a scientific notation part (e..), which can also be negative, and there are type suffixes...
In other words, you're writing a state machine, with different behaviour depending on what state you're in: 'string', 'comment', 'block comment', 'numeric literal', and so on.
Lexing
It's useful to assign a type to each token, either while tokenizing or as a separate step (lexing). public is a keyword, main is an identifier, 1234 is an integer literal, "Hello" is a string literal, and so on. This will help during the next step.
Parsing
You can now move on to parsing: turning a list of tokens into an abstract syntax tree (AST). At this point you can check if a list of tokens is actually valid code. You basically repeat the above step, but at a higher level.
For example, public, protected and private are keyword tokens, and they're all access modifiers. As soon as you encounter one of these, you know that either a class, a function, a field or a property definition must follow. If the next token is a while keyword, then you signal an error: public while is not a valid C# construct. If, however, the next token is a class keyword, then you know it's a class definition and you continue parsing.
So you've got a state machine once again, but this time you've got states like 'class definition', 'function definition', 'expression', 'binary expression', 'unary expression', 'statement', 'assignment statement', and so on.
Conclusion
This is by no means complete, but hopefully it'll give you a better idea of all the steps involved and how to approach this. There are also tools available that can generate parsing code from a grammar specification, which can ease the job somewhat (though you still need to learn how to write such grammars).
You may also want to read the C# language specification, specifically the part about its grammar and lexical structure. The spec can be downloaded for free from one of Microsofts websites.
CodeCaster is right. You are not on the right path.
I have an lexical analyzer made by me some time ago as a project.
I know, I know I'm not supposed to put things on a plate here, but the analyzer is for c++ so you'll have to change a few things.
Take a look at the source code and please try to understand how it works at least: C++ Lexical Analyzer
In the strictest sense, the reason for the described behaviour is that in the evaluating code, the search for void comes before the search for class. However, the approach in total seems far too simple for a lexical analysis, as it simply checks for substrings. I totally second the comments above; depending on what you are trying to achieve in the big picture, a more sophisticated approach might be necessary.
I asked a similar question recently about using regex to retrieve a URL or folder path from a string. I was looking at this comment by Dour High Arch, where he says:
"I recommend you do not use regexes at all; use separate code paths
for URLs, using the Uri class, and file paths, using the FileInfo
class. These classes already handle parsing, matching, extracting
components, and so on."
I never really tried this, but now I am looking into it and can't figure out if what he said actually is useful to what I'm trying to accomplish.
I want to be able to parse a string message that could be something like:
"I placed the files on the server at http://www.thewebsite.com/NewStuff, they can also
be reached on your local network drives at J:\Downloads\NewStuff"
And extract out the two strings http://www.thewebsite.com/ and J:\Downloads\NewStuff. I don't see any methods on the Uri or FileInfo class that parse a Uri or FileInfo object from a string like I think Dour High Arch was implying.
Is there something I'm missing about using the Uri or FileInfo class that will allow this behavior? If not is there some other class in the framework that does this?
I'd say the easiest way is splitting the strings into parts first.
First delimiter would be spaces, for each word - second would be qoutes (double and single)
Then use Uri.IsWellFormedUriString on each token.
So something like:
foreach(var part in String.Split(new char[]{''', '"', ' '}, someRandomText))
{
if(Uri.IsWellFormedUriString(part, UriKind.RelativeOrAbsolute))
doSomethingWith(part);
}
Just saw at URI.IseWellFormedURIString that this is a bit to strickt to suit your needs maybe.
It returns false if www.Whatever.com is missing the http://
It was not clear from your earlier question that you wanted to extract URL and file path substrings from larger strings. In that case, neither Uri.IsWellFormedUriString nor rRegex.Match will do what you want. Indeed, I do not think any simple method can do what you want because you will have to define rules for ambiguous strings like httX://wasThatAUriScheme/andAre/these part/of/aURL or/are they/separate.strings?andIsThis%20a%20Param?
My suggestion is to define a recursive descent parser and create states for each substring you need to distinguish.
U can use :
(?<type>[^ ]+?:)(?<path>//[^ ]*|\\.+\\[^ ]*)
that will give you 2 groups on each result
type : "http:"
path : //www.thewebsite.com/NewStuff
and
type : "J:"
path : \Downloads\NewStuff
out of the string
"I placed the files on the server at
http://www.thewebsite.com/NewStuff, they can also be reached on your
local network drives at J:\Downloads\NewStuff"
you can use the "type" group to see if the type is http:or not and set action on that.
EDIT
or use regex below if you are sure there is no whitespace in your filepath :
(?<type>[^ ]+?:)(?<path>//[^ ]*|\\[^ ]*)
Try \w+:\S+ and see how well that fits your purposes.
I am trying to pass two directory paths via the command line to a C# application. These paths will likely contain spaces, and given that C# populates args[] by splitting across spaces, this is giving me problems.
What I tried was passing the paths wrapped in quotes, like this:
myprogram.exe "C:\aa a\bbb\" "C:\ppp\ll l\"
..this, however, creates a problem because the backslash at the end of each path is being interpreted by C# as an escape character, so it is parsing the double quote as well. When I run the app with these argurments, args[] contains only one entry:
C:\aa a\bbb" C:\ppp\ll l"
The easy solution would be to only enter directory paths without the final backslash, but it is not optimal and will likely frustrate users of the program.
Is there an easy solution to this?
Did you try to pass them "twiced" ?
myprogram.exe "C:\\aa a\\bbb\\" "C:\\ppp\\ll l\\"
I'm working on a function that given some settings - such as line spacing, the output (in string form) is modified. In order to test such scenarios, I'm using string literals, as shown below for the expected result.
The method, using a string builder, (AppendLine) generates the said output. One issue I have run into is that of comparing such strings. In the example below, both are equal in terms of what they represent. The result is the area which I care about, however when comparing two strings, one literal, one not, equality naturally fails. This is because one of the strings emits line spacing, while the other only demonstrates the formatting it contains.
What would be the best way of solving this equality problem? I do care about formatting such as new lines from the result of the method, this is crucially important.
Code:
string expected = #"Test\n\n\nEnd Test.";
string result = "Test\n\n\nEnd Test";
Console.WriteLine(expected);
Console.WriteLine(result);
Output:
Test\n\n\nEnd Test.
Test
End Test
The # prefix tells the compiler to take the string exactly as it is written. So, it doesn't format the \n characters to carriage returns and line feeds.
Since you don't have the same prefix for the string assigned to your result variable, the compiler formats it. If you would like to continue to use the # prefix, just do the following:
string expected = #"Test
End Test";
You'll have to input the carriage returns and line feed within the string as invisible characters.
You're using the term "literal" incorrectly. "Literal" simply means an actual value that exists in code. In other words, values exist in code either as variables (for the sake of simplicity I'm including constants in this group) and literals. Variables are an abstract notion of a value, whereas literals are a value.
All this is to say that both of your strings are string literals, as they're hard-coded into your application. The # prefix simply states that the compiler is to include escape characters (indeed, anything other than a double-quote) in the string, rather than evaluating the escape sequences when compiling the string literal into the assembly.
First of all, whatever your function returns (either a string that contains standard escape sequences for newlines rather than newlines themselves, or a string that actually contains newlines) is what your test variable should contain. Make your tests as close to the actual output as possible, as the more work you do to massage the values into a comparable form the more code paths you have to test. If you're looking to be able to compare a string with formatting escape sequences embedded into it to a string where those sequences have been evaluated (essentially comparing the two strings in your example), then I would say this:
Be sure that this is really want you want to do.
You'll have to duplicate the functionality of the C# compiler in interpreting these values and turning your "format string" into a "formatted string".
For doing #2, a RegEx processor is probably going to be the simplest option. See this page for a list of C# string escape sequences.
I feel somewhat enlightened, yet annoyed at what I discovered.
This is my first project using MSTest, and after a failing test I was selecting View Test Details to see how and why my test failed. The formatting for string output in this details display is very poor, for example you get:
Assert.AreEqual failed. Expected:<TestTest End>. Actual:<TestTest End>.
This is for formatted text - the strange thing is if you have /r (line feeds) instead of line breaks (/n) the formatting is actually somewhat correct.
It turns out to view the correct output you need to run the tests in debug mode. In other words, when you have a failing test, run the test in debug and the exception will be caught and displayed as follows:
Assert.AreEqual failed. Expected:<Test
Test End>. Actual:<Test
Test End>.
The above obviously containing the correct formatting.
In the end it turns out my initial method of storing the expectations (with formatting) in strings was correct, yet my unfamiliarity of MSTest made me question my means as it appeared to be valid input, yet was simply being displayed back to myself in what appeared a valid output.
Use a regex to strip white space before you do your compare?
I'm getting the error message when running the following code from a C# console program.
"The system cannot find the file
specified"
Here is the code:
System.Diagnostics.Process.Start("C:\Windows\System32\cmd.exe
/c");
Strangely when i omit the /c switch the command can run!?!
Any ideas what I'm doing wrong?
Process.Start takes a filename as an argument. Pass the argument as the second parameter:
System.Diagnostics.Process.Start(#"C:\Windows\System32\cmd.exe", "/c");
Well, for one thing, you're hard-coding a path, which is already destined to break on somebody's system (not every Windows install is in C:\Windows).
But the problem here is that those backslashes are being used as an escape character. There are two ways to write a path string like this - either escape the backslashes:
Process.Start("C:\\Windows\\System32\\cmd.exe", "/c");
Or use the # to disable backslash escaping:
Process.Start(#"C:\Windows\System32\cmd.exe", "/c");
You also need to pass /c as an argument, not as part of the path - use the second overload of Process.Start as shown above.
There is an overload of start to take arguements. Use that one instead.
System.Diagnostics.Process.Start(#"C:\Windows\System32\cmd.exe", "/c");
I can see three problems with code you posted:
1) You aren't escaping your path string correctly
2) You need to pass the /c argument seperately to the path you want to execute
3) You are assuming every machine this code runs on has a c:\windows installation
I'd propose writing it as follows:
string cmdPath = System.IO.Path.Combine(Environment.SystemDirectory,"cmd.exe");
System.Diagnostics.Process.Start(cmdPath, "/c");
you need add # before the path. like this: #"C:\Windows\System32\cmd.exe /c"
I believe that the issue is you're attempting to pass an Argument (/c) as a part of the path.
The arguments and the file name are two distinct items in the Process class.
Try
System.Diagnostics.Process.Start("C:\Windows\System32\cmd.exe", "/c");
http://msdn.microsoft.com/en-us/library/h6ak8zt5.aspx
Easiest way is to add the program to the solution with ADD EXISTING ITEM and type
System::Diagnostics::Process::Start("ccsetup305.exe");