I'm trying to write a program in C# to take a table of strings (variable names) from a database and search a directory of ~30,000 Fortran 77 source files to determine where that variable is calculated. The variables are typically calculated only 1 time in 1 of the fortran files but used many times in other files. The variables in the database table are all explicitly defined somewhere in the fortran files. So far I've accomplished most of this by first building a list of files that each variable appears in, and then searching the files in that list line by line. I've been looking for which side of the "=" sign the variable appears on by doing something like this:
CompareInfo ci = CultureInfo.CurrentCulture.CompareInfo;
for (int k = 0; k < fullpaths.Count; k++)
{
string line;
// Read the file and display it line by line.
System.IO.StreamReader FortranFile = new System.IO.StreamReader(fullpaths[k]);
while ((line = FortranFile.ReadLine()) != null)
{
// Search the file line-by-line for the variable
if (ci.IndexOf(line, Variable, CompareOptions.IgnoreCase) > 0)
{
// Search for the equals sign
int equalLocation = ci.IndexOf(line, "=");
if (equalLocation > 0)
{
// substring LHS
string subLineLHS = line.Substring(0, equalLocation+1);
// is the line commented out?
if (Convert.ToString(subLineLHS[0]) == "C" ||
Convert.ToString(subLineLHS[0]) == "!" ||
Convert.ToString(subLineLHS[0]) == "c" ||
Convert.ToString(subLineLHS[0]) == "*")
{
continue;
}
// ignore if the line contains a DO, IF, or WHILE loop,
// to prevent reading IF [Variable] = xxxx as being calculated.
else if ( (ci.IndexOf(subLineLHS, "IF", CompareOptions.IgnoreCase) > 0) ||
(ci.IndexOf(subLineLHS, "DO", CompareOptions.IgnoreCase) > 0) ||
(ci.IndexOf(subLineLHS, "WHILE", CompareOptions.IgnoreCase) > 0))
{
continue;
}
// find where the variable is used in the line
else if (ci.IndexOf(subLineLHS, Variable, CompareOptions.IgnoreCase) > 0 )
{
isCalculated[k] = true;
calculatedLine[k] = counter;
}
}
} //if loop
counter++;
} //while loop
FortranFile.Close();
}
The problems I'm having is with IF statements, e.g.:
IF(something == xx .AND.
1 variable == xx) THEN
...
this method would tell me that the variable is calculated on that line "variable = xx". 1-line if-statements such as IF(something) variable=xx are also ignored. Lines with multiple = signs may give me problems too.
Any suggestions on how I could get around this? Is there a better method of doing this? Please go easy on me - I'm not a programmer.
Thanks!
The most error-proof approach would be to parse the Fortran code and work from the syntax tree.
My suggestion: use ctags. See for instance Exuberant ctags; it has support for Fortran.
ctags generates an index of all named entities in a set of source code files. The index is stored in a data structure (tags) that can be read from most file editors/IDEs.
If you import that tags file in your favourite text editor, you will be able to jump to the definition of a variable when you position your cursor on it and take proper action.
The tags file is also very easy to read and parse: it structured like this.
named_entity<Tab>file_where_it_is_defined<Tab>location_in_the_file
For instance, from a set of Fortran files (this is on Linux, but Exuberant ctags offers Windows binaries):
gpar remlf90.f90 /^ xrank,npar,gpar,/;" v program:REMLF90
hashia1 ../libs/sparse2.f /^ subroutine hashia1(/;" s
hashv1 ../libs/sparse3.f /^ integer function hashv1(/;" f
hashvr_old ../libs/sparse2.f /^ integer function hashvr_old(/;" f
We can observe that the gparvariable is defined in remlf90.f90 and hashia1 is defined in ../libs/sparse2.f, etc.
Related
I'm trying to determine if an error message on a generated text file contains the word "multiple", "Database", or both. For each text file I have I'm evaluating which word(s) it contains and for now am just using a messaging box to see how it evaluates. From what I can tell the first if statement is the only one that returns true, even though both text .txt files I have in the folder each only have one of the key words. Code below.
I've searched to see if I have the Exclamation point in the wrong position in the else if statements, but from what I've found it looks right.
string[] files = Directory.GetFiles(#"C:\temp\test", "*.txt");
var multi = "multiple";
var data = "Database";
for (int i = 0; i < files.Length; i++)
{
var sheet = File.ReadAllText(files[i]);
if (multi.Any(sheet.Contains) && data.Any(sheet.Contains))
{
MessageBox.Show("Both");
}
else if (multi.Any(sheet.Contains) && !data.Any(sheet.Contains))
{
MessageBox.Show("Just Multiple");
}else if(!multi.Any(sheet.Contains) && data.Any(sheet.Contains))
{
MessageBox.Show("Just Database");
}
So the first file only has the word "multiple" in it. The first if statement should return false since both conditions aren't met since the first any method returned true while the second shouldn't. But from what I can tell both are returning true.
Let's break down your problem:
multi is a string.
LINQ extension methods (of which .Any() is one) operate on IEnumerable<>.
the enumerable for string is IEnumerable<char>, so .Any() will operate on this.
The string Contains() method will accept a string or a char.
So what's happening? You are checking if any of the characters in multi (i.e. m, u, l, t, i, p, l, e) are found in the string sheet.
What you actually want to write is simply if (sheet.contains(multiple)), etc.
Fixing your current code, it should look like this:
if (sheet.Contains(multi) && sheet.Contains(data))
{
MessageBox.Show("Both");
}
else if (sheet.Contains(multi) && !sheet.Contains(data))
{
MessageBox.Show("Just Multiple");
}
else if(!sheet.Contains(multi) && sheet.Contains(data))
{
MessageBox.Show("Just Database");
}
Though I'd probably avoid doing sheet.Contains over and over (especially on bigger files), and instead do those calculations first:
bool containsMulti = sheet.Contains(multi);
bool containsData = sheet.Contains(data);
if (containsMulti && containsData)
{
MessageBox.Show("Both");
}
else if (containsMulti && !containsData)
{
MessageBox.Show("Just Multiple");
}
else if (!containsMulti && containsData)
{
MessageBox.Show("Just Database");
}
And as #Kristianne Nerona notes, you could simply change the last else if to an else, since if the prior two conditions were both false, only one possibility remains.
I have a (hopefully) simple C# question.
I am parsing arguments in a program where a file will be read from command line, I've allowed for both short and long arguments as input (so for my scenario /f and file are both valid)
The value after either of the above arguments should be the file name to be read.
What I want to do is find this file name in the array based off whichever argument is chosen and copy it in to a string while not leaving any loopholes.
I have functioning code, but I'm not really sure it's "efficient" (and secure).
Code (comments and writes removed):
if ( args.Contains("/f") || args.Contains("file"))
{
int pos = Array.IndexOf(args, "/f");
if (pos == -1)
pos = Array.IndexOf(args, "file");
if (pos > -1)
pos++;
inputFile = (args[pos]);
if (File.Exists(inputFile) == false)
{
Environment.Exit(0);
}
}
Is there a more efficient way to do this, perhaps using some nifty logic in the initial if statement to check which parameter is valid and then do a single check on that parameter?
Using 4 ifs and 2 Array.IndexOf's seems horrible just to support 2 different ways to allow someone to say they want to input a file...
Thanks! And I'm sorry if this seems trivial or is not what SO is meant for. I just don't have any real way to get feedback on my coding practises unfortunately.
Your solution won't scale well. Imagine you have two different arguments with a short and long form. How many conditionals and index checks would that be?
You'd be better off using an existing tool (e.g. Command Line Parser Library) for argument parsing.
One problem I see with the code you provided is that, it will fail if the /f or file is the last argument.
If you don't want to write or use complete argument parsing code, the following code will work slightly better.
var fileArguments = new string[] { "/f", "file" };
int fileArgIndex = Array.FindIndex(args,
arg => fileArguments.Contains(arg.ToLowerInvariant()));
if (fileArgIndex != -1 && fileArgIndex < args.Length - 1)
{
inputFile = args[fileArgIndex + 1];
if (!File.Exists(inputFile))
{
Environment.Exit(0);
}
}
You could write a simple argument parser for your specific need and still have support for "new" scenarios. For example, in your entry method have
// The main entry point for the application.
[STAThread]
static void Main(string[] args)
{
// Parse input args
var parser = new InputArgumentsParser();
parser.Parse(args);
....
}
Where your InputArgumentsParser could be something similar to
public class InputArgumentsParser
{
private const char ArgSeparator = ':';
private Dictionary<string[],Action<string>> ArgAction =
new Dictionary<string[],Action<string>>();
public InputArgumentsParser()
{
// Supported actions to take, based on args
ArgAction.Add(new[] { "/f", "/file" }, (param) =>
Console.WriteLine(#"Received file argument '{0}'", param));
}
/// Parse collection, expected format is "<key>:<value>"
public void Parse(ICollection<string> args)
{
if (args == null || !args.Any())
return;
// Iterate over arguments, extract key/value pairs
foreach (string arg in args)
{
string[] parts = arg.Split(ArgSeparator);
if (parts.Length != 2)
continue;
// Find related action and execute if found
var action = ArgAction.Keys.Where(key =>
key.Contains(parts[0].ToLowerInvariant()))
.Select(key => ArgAction[key]).SingleOrDefault();
if (action != null)
action.Invoke(parts[1]);
else
Console.WriteLine(#"No action for argument '{0}'", arg);
}
}
}
In this case /f:myfile.txt or /file:myfile.txt would spit out to console
Received file argument 'myfile.txt'
Please note, the 'C#' tag was included intentionally, because I could accept C# syntax for my answer here, as I have the option of doing this both client-side and server-side. Read the 'Things You May Want To Know' section below. Also, the 'regex' tag was included because there is a strong possibility that the use of regular expressions is the best approach to this problem.
I have the following highlight Plug-In found here:
http://johannburkard.de/blog/programming/javascript/highlight-javascript-text-higlighting-jquery-plugin.html
And here is the code in that plug-in:
/*
highlight v4
Highlights arbitrary terms.
<http://johannburkard.de/blog/programming/javascript/highlight-javascript-text-higlighting-jquery-plugin.html>
MIT license.
Johann Burkard
<http://johannburkard.de>
<mailto:jb#eaio.com>
*/
jQuery.fn.highlight = function(pat) {
function innerHighlight(node, pat) {
var skip = 0;
if (node.nodeType == 3) {
var pos = node.data.toUpperCase().indexOf(pat);
if (pos >= 0) {
var spannode = document.createElement('span');
spannode.className = 'highlight';
var middlebit = node.splitText(pos);
var endbit = middlebit.splitText(pat.length);
var middleclone = middlebit.cloneNode(true);
spannode.appendChild(middleclone);
middlebit.parentNode.replaceChild(spannode, middlebit);
skip = 1;
}
}
else if (node.nodeType == 1 && node.childNodes && !/(script|style)/i.test(node.tagName)) {
for (var i = 0; i < node.childNodes.length; ++i) {
i += innerHighlight(node.childNodes[i], pat);
}
}
return skip;
}
return this.length && pat && pat.length ? this.each(function() {
innerHighlight(this, pat.toUpperCase());
}) : this;
};
jQuery.fn.removeHighlight = function() {
return this.find("span.highlight").each(function() {
this.parentNode.firstChild.nodeName;
with (this.parentNode) {
replaceChild(this.firstChild, this);
normalize();
}
}).end();
};
This plug-in works pretty easily.
If I wanted to highlight all instances of the word "Farm" within the following element...(cont.)
<div id="#myDiv">Farmers farm at Farmer's Market</div>
...(cont.) all I would need to do is use:
$("#myDiv").highlight("farm");
And then it would highlight the first four characters in "Farmers" and "Farmer's", as well as the entire word "farm" within the div#myDiv
No problem there, but I would like it to use this:
$("#myDiv").highlight("Farmers");
And have it highlight both "Farmers" AND "Farmer's". The problem is, of course, that I don't know the value of the search term (The term "Farmers" in this example) at runtime. So I would need to detect all possibilities of no more than one apostrophe at each index of the string. For instance, if I called $("#myDiv").highlight("Farmers"); like in my code example above, I would also need to highlight each instance of the original string, plus:
'Farmers
F'armers
Fa'rmers
Far'mers
Farm'ers
Farme'rs
Farmer's
Farmers'
Instances where two or more apostrophes are found sid-by-side, like "Fa''rmers" should, of course, not be highlighted.
I suppose it would be nice if I could include (to be highlighted) words like "Fa'rmer's", but I won't push my luck, and I would be doing well just to get matches like those found in my bulleted list above, where only one apostrophe appears in the string, at all.
I thought about regex, but I don't know the syntax that well, not to mention that I don't think I could do anything with a true/false return value.
Is there anyway to accomplish what I need here?
Things You May Want To Know:
The highlight plug-in takes care of all the case insensitive requirements I need, so no need to worry about that, at all.
Syntax provided in JavaScript, jQuery, or even C# is acceptable, considering the hidden input fields I use the values from, client-side, are populated, server-side, with my C# code.
The C# code that populates the hidden input fields uses Razor (i.e., I am in a C#.Net Web-Pages w/ WebMatrix environment. This code is very simple, however, and looks like this:
for (var n = 0; n < searchTermsArray.Length; n++)
{
<input class="highlightTerm" type="hidden" value="#searchTermsArray[n]" />
}
I'm copying this answer from your earlier question.
I think after reading the comments on the other answers, I've figured out what it is you're going for. You don't need a single regex that can do this for any possible input, you already have input, and you need to build a regex that matches it and its variations. What you need to do is this. To be clear, since you misinterpreted in your question, the following syntax is actually in JavaScript.
var re = new RegExp("'?" + "farmers".split("").join("'?") + "'?", "i")
What this does is take your input string, "farmers" and split it into a list of the individual characters.
"farmers".split("") == [ 'f', 'a', 'r', 'm', 'e', 'r', 's' ]
It then stitches the characters back together again with "'?" between them. In a regular expression, this means that the ' character will be optional. I add the same particle to the beginning and end of the expression to match at the beginning and end of the string as well.
This will create a regex that matches in the way you're describing, provided it's OK that it also matches the original string.
In this case, the above line builds this regex:
/'?f'?a'?r'?m'?e'?r'?s'?/
EDIT
After looking at this a bit, and the function you're using, I think your best bet will be to modify the highlight function to use a regex instead of a straight string replacement. I don't think it'll even be that hard to deal with. Here's a completely untested stab at it.
function innerHighlight(node, pat) {
var skip = 0;
if (node.nodeType == 3) {
var matchResult = pat.exec(node.data); // exec the regex instead of toUpperCase-ing the string
var pos = matchResult !== null ? matchResult.index : -1; // index is the location of where the matching text is found
if (pos >= 0) {
var spannode = document.createElement('span');
spannode.className = 'highlight';
var middlebit = node.splitText(pos);
var endbit = middlebit.splitText(matchResult[0].length); // matchResult[0] is the last matching characters.
var middleclone = middlebit.cloneNode(true);
spannode.appendChild(middleclone);
middlebit.parentNode.replaceChild(spannode, middlebit);
skip = 1;
}
}
else if (node.nodeType == 1 && node.childNodes && !/(script|style)/i.test(node.tagName)) {
for (var i = 0; i < node.childNodes.length; ++i) {
i += innerHighlight(node.childNodes[i], pat);
}
}
return skip;
}
What I'm attempting to do here is keep the existing logic, but use the Regex that I built to do the finding and splitting of the string. Note that I'm not doing the toUpper call anymore, but that I've made the regex case insensitive instead. As noted, I didn't test this at all, but it seems like it should be pretty close to a working solution. Enough to get you started anyway.
Note that this won't get you your hidden fields. I'm not sure what you need those for, but this will (if it's right) take care of highlighting the string.
Taking Hilgreths answer made the code better however the actual problem was a variable data type , changing the type from string to bool to allow for files with no data fixed the issue .
Can't seem to find an answer to this on here or other forums just wondering if there is a way to ignore files of length 0 .
My program is searching through a DIR and returning all the files , I then want to search through the dir and find the most recent file , if a file length is 0 i want to skip to the next file but the system keeps crashing , my code so far looks like
if(fileinfo.Length > 0)
{
GetLatestWritenFileFileInDirectory(directoryInfo, keywordEH, keywordINTER, keywordM&M);
}
else if(result.Length == 0)
{
}
very rough at the moment as I'm not looking for it to be written for me (obviously) jus want to know if i can skip the empty files in some way without using linq as I'm using framework 1.0
thanks
Use fileinfo.Length instead of result.Length in the else if statement.
you should do
if(File.ReadAllText(yourfileName).Length == 0)
continue;
And this should be all.
I think you can drop the else if statement, because the FileInfo.Length cant be less than 0.
if(fileInfo.Length > 0)
{
////Do your stuff with non empty files
}
This will be enough unless you want to handle the files with length == 0, which seems you don't.
you can get all the files by calling GetFiles method with SearchOption.AllDirectories
FileInfo[] allfiles = directoryInfo.GetFiles("*.*", SearchOption.AllDirectories);
then do as below
for (int i = 0; i < allfiles.Length; i++)
{
if (allfiles[0].Length > 0)
{
GetLatestWritenFileFileInDirectory(allfiles[0], keywordEH, keywordINTER, keywordM & M);
}
}
I'm trying to calculate the number of success cases within a recursive function in C#, but I'm astonished by the fact that my variable is shared between all the function calls!
[update 2]
More than strange this time. doing so
i = i + validTreesFun(tree.Nodes, newWords.ToList()) ;
resets i to 0
doing this
i = validTreesFun(tree.Nodes, newWords.ToList()) + i ;
gives some results (I'm not sure if it's correct)
[updated : the full code]
public static int validTreesFun(List<Tree<char>> nodes, List<string> words)
{
int i = 0;
if (nodes == null && (words == null || words.Count == 0 || (words.Count == 1 && words.First() == "")))
return 1;
else
if (nodes == null)
return 0;
foreach (Tree<char> tree in nodes)
{
var validWords = words.Where(w => w.ToCharArray()[0] == tree.Root)
.Select(w => w);
if (validWords.Count() == 0)
return 0;
else
{
var newWords = validWords.Select(w => join( w.ToCharArray().Skip(1).ToArray()));
i += validTreesFun(tree.Nodes, newWords.ToList());
}
}
return i;
}
when debuging the variable i take the value 1 but it resets to 0 on the next iteration!!
despite the use of
i = i + ....
What is the problem in that piece of code?
Thank you
if (validWords.Count() == 0)
return 0;
Should be
if (validWords.Count() == 0)
continue;
Also, in general, I personally think it is nicer looking to only send in one element at a time to a recursive function.
public static int validTreesFun(Tree<char> node, List<string> words)
That way you don't get the same kind of mistake like above. Finally, a minor note.
w => w.ToCharArray()[0] == tree.Root
can be written as
w => w[0] = tree.Root
No local variables are not at all shared between recursive calls, you should consider some other design problem, inside and after your foreach loop, I dont see any return statements, can you post full code.
Ok, in debugging you will always observe i's current method's value, debugging is not good in recursive functions, its little hard to understand, you will have to move your control down in Call Stack in order to actually observe value of earlier caller of current function.
I would advice you to output Trace or on log file with your level of node, that will help you actual debugging.
Please use TRACE Statement as follow..
Trace.WriteLine(string.Format("{0},{1}",tree.Name,i));
Local variables are not being shared.
What you are seeing (the reset to 0) is the value of i in the (recursively) called function validTreesFun (i gets set to 0 at the start of the function).
Just looking at your code, I think a possible bug might be in someTestHere - if that is never true, then i will stay 0 in the outer scope. Otherwise it should increment by 1 for each true test.
When you are in debug mode, you indeed see that the i is reseted for the call but remain to the wanted value for the caller. Eg , the stack :
validTreesFun --i = 0 for this one
validTreesFun --i = x for this one, but if you do not go trow the calling stack, you will see 0, which is the good value for the top of the stack