I'm retrieving the names of all the classes in the current runtime in .NET, for the purpose of identifying object and function declarations in sourcecode that's given as input to my program, but templates seem a bit off, for example I've got this among the output:
hashset`1+elementcount[t]
hashset`1+slot[t]
hashset`1+enumerator[t]
which I obtain more or less from this simple code (there's some similar code that gets the referenced assemblies but essentially does the following for each of those instead of the executing assembly).
foreach (Type t in Assembly.GetExecutingAssembly().GetTypes())
{
types.Add(t.ToString().Split('.').Last().ToLower());
}
For now I can of course just Split() the strings to get the first part, before the ` mark, but I was wondering if anyone knows exactly what this might be. The three lines above are consecutive, and a couple more entries for HashSet in my results also have this 1+something thing, so I'm positive it's the actual HashSet class. (note: I'm turning everything to lowercase currently but disabling it doesn't seem to change anything.)
So... anyone know what's this notation? Not sure how to google it, but copy-pasting some lines to Google and enclosing in quotes returns either no results or very random threads that don't end up having the line in them. Thanks in advance.
Related
I'm currently working on a form with a bunch of textboxes with quite specific requirements. For example one textbox contain cadastral number and should look like ##:##:#######:~ (last range of digits varies) and also it would be quite nice to see the pattern before you even type anything (if I recall correctly it's called mask). Also giving requirements first two digits always should be 24, so the end result should look shomething like this 24:##:#######:~. Another example is a numeric textbox with units and spaces between big digit numbers (ex 1 000 000 m2). For short this one text box and the others have both static elements (which user should not be able to edit) and dynamic (which makes masked textboxes and similar stuff quite hard to deal with).
So, I've tried different things:
Using maskedTextBox from toolkit package, which turned out bad, because it did poorly handle last part with range of digits, and also when a key was pressed when in the middle of the static mask, it just pushed the carret, but not actually add anything to the text;
Using converters prove quite chalenging at first, but gave remarcable results, it handles variable range of part perfectly, partialy, because of custom converter, but it was difficult to manage things, when user deleted text, because of static parts being integrated in the converter itself;
Using StingFormat for textbox with binding text property was almost useless, although it handle static part quite well, and in the end I couldn't make it work.
Intuition tells me a combination of a custom converter (handling dynamic part) and and StringFormat (handling uneditable static part) should do the job. Maybe the number of requirements is just too much for a simple textbox. Also one thing is bugging me, there could be a sutable existing general solution, that I am not aware of (At first I didn't know Converters was a thing).
Now the question, how would you generally approach this problem? Is there any existing solutions, that with a bit of tweaking would work?
So right now in my lexer I'm trying to skip certain tokens like comments and whitespace, except I need to add them to my "skipped list", rather than hiding them altogether.
In my Scanner frame I have
public int Skip(int sym) {
Token t = _InitToken();
t.SymbolId=sym;
t.Line = Current.Line;
t.Column =Current.Column;
t.Position=Current.Position;
t.Value = yytext;
t.Skipped = null;
_skipped.Add(t);
return yylex();
}
keep in mind this is c# but the interface isn't much different than the C one in lex/flex
I then use this function above in my scanner like so:
"/*" { if(!_TryReadUntilBlockEnd("*/")) return -1; return Skip(478); }
\/\/[^\n]* { return Skip(477); }
where 477 is my symbol id (the lex file is generated hence the lack of constants)
All _TryReadUntilBlockEnd("*/") does is read until it finds a trailing */, consuming it. It's a well tested method and can be ignored for the purposes of this question, except as explanation for how i match the end of a comment. This takes over the underlying input from gplex and handles advancing the underlying input stream itself (like fget() or whatever in C i forget). Basically it's neutral here other than reading the entire comment. Skip(478) is the relevant bit, not this.
It works fine in many cases. The only problem is I'm using it in a recursive descent parser that's parsing C#, and so the stack gets heavy, and when i have a huge stream of line comments it stack overflows.
I can solve it by finding some way to run a match without invoking a lex action again instead of calling yylex() if it's possible - that way i can rewrite it to be iterative, but i have no idea how, and what I've seen from the generated code suggests it's not possible.
The other way I can solve it - and this is my preferred way - is to match multiple C# line comments in one match. That way I only recurse once.
But this is multiline match expression which is disabled by default i think?
How do i enable multiline matching in either flex, lex or *gplex? Or is there another solution to the above problem? *gplex 1.2.2 preferred but it's completely undocumented
I'll take anything at this point. Thanks in advance!
I shouldn't have been calling yylex() at all. Thanks Jonathan and rici in the comments.
Good afternoon,
I'm hoping i can get an assist on this from someone. If not some example code then some general direction i should be going with this.
Essentially i have two large lists (roughly 10-20,000 records each) of string terms and ID's. These lists are coming from two different data providers. The lists are obviously related to one another topically, however each data provider has slight variations in their terms naming conventions. For example list1 would have a term "The Term (Some Subcategory)" and list2 would have "the term - some subcategory". Additionally list1 could have "The Term (Some Subcategory)" and "The Term (Some Subcategory 2)" while list2 only has "the term - some subcategory".
Both lists have the following properties - "term" and "id". What i need to do is compare every term in both lists and if a reasonable match is found generate a new list containing "term", "list1id", "list2id" properties. If no match is found for a term i need it also to be added to the list with either "list1id" or "list2id" null/blank (which will indicate the origin of the unmatched term).
I'm willing to us a NuGet package to occumplish this or if anyone has a good example of what i need that would be helpful too. Essentially i'm attempting to generate a new merged list based on fuzzy terms within each while retaining the ID's of the matched terms somehow.
My research has dug up some similar articles and source such as https://matthewgladney.com/blog/data-science/using-levenshtein-distance-in-csharp-to-associate-lists-of-terms/ and https://github.com/wolfgarbe/symspell but neither seem to fit what i need.
Where do i go from here with this? Any help would be awesome!
Nugs
Your question is pretty broad, but I will attempt a broad answer to, at least, get you started. I've done this sort of thing before.
Do it in two stages: first normalize, then match. By doing this you eliminate known but irrelevant causes of differences. By normalize, for example, make everything caps, remove whitespace, remove non-alphanumeric characters, etc. You'll need to be a little creative and work within any constraints you might have (is "Amy (pony)" the same thing as "Amy pony"?). Then calculate
the distance.
Create a class with a few properties to contain the value from the left list, the value from the right list, the normalized values, the score, etc.
When you get a match, create an instance of that class, add it to a list or equivalent, remove the old values from the original lists, then keep going.
Try to write your code so you keep track of intermediate values (e.g. the normalized values, etc). This will make it easier to debug, and will allow you to log everything after you've done processing.
Once you're done, you can then throw away intermediate values and keep just the things you identified as a match.
I have two strings (they're going to be descriptions in a simple database eventually), let's say they're
String A: "Apple orange coconut lime jimmy buffet"
String B: "Car
bicycle skateboard"
What I'm looking for is this. I want a function that will have the input "cocnut", and have the output be "String A"
We could have differences in capitalization, and the spelling won't always be spot on. The goal is a 'quick and dirty' search if you will.
Are there any .net (or third party), or recommend 'likeness algorithms' for strings, so I could check that the input has a 'pretty close fragment' and return it? My database is going to have liek 50 entries, tops.
What you’re searching for is known as the edit distance between two strings. There exist plenty of implementations – here’s one from Stack Overflow itself.
Since you’re searching for only part of a string what you want is a locally optimal match rather than a global match as computed by this method.
This is known as the local alignment problem and once again it’s easily solvable by an almost identical algorithm – the only thing that changes is the initialisation (we don’t penalise whatever comes before the search string) and the selection of the optimum value (we don’t penalise whatever comes after the search string).
For the project that I'm currently on, I have to deliver specially formatted strings to a 3rd party service for processing. And so I'm building up the strings like so:
string someString = string.Format("{0}{1}{2}: Some message. Some percentage: {3}%", token1, token2, token3, number);
Rather then hardcode the string, I was thinking of moving it into the project resources:
string someString = string.Format(Properties.Resources.SomeString, token1, token2, token3, number);
The second option is in my opinion, not as readable as the first one i.e. the person reading the code would have to pull up the string resources to work out what the final result should look like.
How do I get around this? Is the hardcoded format string a necessary evil in this case?
I do think this is a necessary evil, one I've used frequently. Something smelly that I do, is:
// "{0}{1}{2}: Some message. Some percentage: {3}%"
string someString = string.Format(Properties.Resources.SomeString
,token1, token2, token3, number);
..at least until the code is stable enough that I might be embarrassed having that seen by others.
There are several reasons that you would want to do this, but the only great reason is if you are going to localize your application into another language.
If you are using resource strings there are a couple of things to keep in mind.
Include format strings whenever possible in the set of resource strings you want localized. This will allow the translator to reorder the position of the formatted items to make them fit better in the context of the translated text.
Avoid having strings in your format tokens that are in your language. It is better to use
these for numbers. For instance, the message:
"The value you specified must be between {0} and {1}"
is great if {0} and {1} are numbers like 5 and 10. If you are formatting in strings like "five" and "ten" this is going to make localization difficult.
You can get arround the readability problem you are talking about by simply naming your resources well.
string someString = string.Format(Properties.Resources.IntegerRangeError, minValue, maxValue );
Evaluate if you are generating user visible strings at the right abstraction level in your code. In general I tend to group all the user visible strings in the code closest to the user interface as possible. If some low level file I/O code needs to provide errors, it should be doing this with exceptions which you handle in you application and consistent error messages for. This will also consolidate all of your strings that require localization instead of having them peppered throughout your code.
One thing you can do to help add hard coded strings or even speed up adding strings to a resource file is to use CodeRush Xpress which you can download for free here: http://www.devexpress.com/Products/Visual_Studio_Add-in/CodeRushX/
Once you write your string you can access the CodeRush menu and extract to a resource file in a single step. Very nice.
Resharper has similar functionality.
I don't see why including the format string in the program is a bad thing. Unlike traditional undocumented magic numbers, it is quite obvious what it does at first glance. Of course, if you are using the format string in multiple places it should definitely be stored in an appropriate read-only variable to avoid redundancy.
I agree that keeping it in the resources is unnecessary indirection here. A possible exception would be if your program needs to be localized, and you are localizing through resource files.
yes you can
new lets see how
String.Format(Resource_en.PhoneNumberForEmployeeAlreadyExist,letterForm.EmployeeName[i])
this will gave me dynamic message every time
by the way I'm useing ResXManager