I have a search method in my custom A* algorithm. It uses a collection to keep track of what the search is doing.
For a set path i know i am doing the following with the collection:
Contains 860x (lookup)
Remove 91x
Add 270x
The order or sorting does not really matter unless i can find a way to specifically order it. It is possible to generate a unique ID for each node based on X and Y value. Making a dictionary lookup possible.
Is there any way to calculate based on my method, what would be the best collection to use in this specific case?
thanks in advance,
Smiley
The general census says:
If you don't run into performance issues, leave it alone.
If you do , but you can get away with it, leave it alone.
If you do, but you can't (or you just love your code to be tight), benchmark it, and you'll find.
(Clarification: I didn't use the "premature optimization is the root of evil" reference, because I do think that there is place for optimization. Here's a good article about the subject).
From what you're saying, I doubt it'll make much change, unless you're running on a device with next to no resources, but again, unless you need it, for the above numbers, i doubt you'll see any difference.
Edit:
as per the chat room continuation, I would suggest looking into hashtable and dictionary. To be more specific, a Sorted Dictionary :) .
For interesting read about hashtable vs dictionaries in c#, you can look at this question and at this one.
Good luck, and feel free to post your results for others to learn.
Related
I am trying to store the parts of a long string in an efficient tree-like structure, I have searched but most of the implementations are for searching within words... let me try to explain what I mean with an example, if I have:
/potato/carrot/tomato
/potato/carrot/pea
/potato/lettuce
My initial thoughts were that this should look like this
potato
- carrot
-tomato
-pea
- lettuce
and as far as I have searched, the really efficient search trees (such as DAWG and Tries) are for storing the words as characters and I am not sure how should I go with it. Any ideas?
Thanks a lot in advance!
Edit: As far as persistence is concerned, I don't need to store the tree so I thought of keeping it in memory for as long as the program is running.
Edit2: As far as the storing of children is concerned, I ended up using HybridDictionaries, which are more efficient than Dictionaries and everything works pretty fast now, thanks a lot guys!
To keep it in memory, you might use this pattern I recently encountered:
class Vegetable : Dictionary<string, List<Vegetable>>
Depending on what you want to do with it (search, count, sort) you can implement helper methods inside that class.
I think DAWG is a good way to go... have a look at this project, although it could do with some optimisation it's in C# and although it's several years old, the algorithm hasn't changed in that time.
DawgSharp
It contains method such as MakeDawg, FindNodeDepth etc
"This program takes a plain text file, consisting of only lines of the 26 lowercase English characters. It generates a prefix- and suffix-combined tree that is more efficient than many data structures that would store the same information. This generator code is very slow, but it does work. It can take minutes. It needs to be optimized, and the algorithm does need improvement, but I haven't put a lot of effort into it because it is not run often in my programs."
Heres another interesting approach in tutorial format.
And heres another interesting break down
Forgive me if this is a silly question....but I think back to my Comp. Sci. classes and I distinctly remember learning/being quizzed on several sorting algorithms and the corresponding 'Big O' notation.
Outside of the classroom though, I've never actually written code to sort.
When I get results from a database, I use 'Order By'. Otherwise, I use a collection class that implements a sort. I have implemented IComparable to allow sorting; but I've never gone beyond that.
Was sorting always just an academic pursuit for those of us who don't implement languages/frameworks? Or is it just that modern languages running on modern hardware make it a trivial detail to worry about?
Finally, when I call .Sort on a List(Of String), for example, what sort algorithm is being used under the hood?
While you rarely might need to implement a sorting algorithm yourself understanding the different algorithms and their complexity might help you in solving more complex problems.
Finally, when I call .Sort on a List(Of String), for example, what sort algorithm is being used under the hood?
Quick Sort
I've never implemented my own sorting algorithm once since I took my CS classes in college and if I was ever even contemplating writing my own, I'd want my head examined first.
List<T> uses Quicksort per the MSDN documentation:
http://msdn.microsoft.com/en-us/library/b0zbh7b6.aspx
You probably won't implement you own sorting algorithm if you are using high level languages...
What you have learnt in classroom was merely there to teach you of the existence and importance of the big O (omicron) notation.
It was there to make you know that optimization is always a goal in programming and that when you code something you must always think of how will it execute.
It teaches you that loops inside loops and recursions can lead to big performance problems if not analyzed/optimized well before coding starts.
It is a guidance to check your design before and be able to approximate the execution speed.
It is important for a programmer to know how theses algorithms work. One reason would be that, in certain conditions, certain algorithms are better, although, sorting is rarely the bottleneck.
In some frameworks, the .Sort function uses various methods, depending on the situation.
Modern languages running on modern hardware make it a trivial detail to worry about, unless a profiler shows that sorting is the bottleneck of your code.
According to this, List.Sort uses Array.Sort, which uses QuickSort.
IMO, it's become a bit of an academic exercise. You need to understand algorithmic complexity, and sorting is a good example for working through it because you can easily see the results and calculate the different complexities. In real life, though, there's almost certainly a library call that sorts your range faster than you would be able to do if you try to roll your own.
I don't know what the .Net libraries use for their defualt implementation, but I'd guess it's Quicksort or Shellsort. I'd be interested to find out if it's something else.
I've occasionally had to write my own sort methods, but only when I was writing for a relatively immature and underpowered platform (like .Net 1.1 in Windows CE or embedded java or somesuch).
On anything resembling a modern computer, the best sorting algorithm is the ORDER BY clause in T-SQL.
Implementing your own sort is the kind of thing that you do to gain insight in how algorithms work, what the tradeoffs are, which tried-and-true approaches that solve a wide array of problems efficiently are known, etc.
As Darin Dimitrov's answer states, library sort routines need to have a very competitive average-case performance, so quicksort is typically chosen.
Was sorting always just an academic pursuit for those of us who don't implement languages/frameworks? Or is it just that modern languages running on modern hardware make it a trivial detail to worry about?
Even if you're not implementing your own routine, you may know how your data is likely to be arranged, and may want to choose a suitable library algorithm. For example, see this discussion on nearly sorted data.
I think there are times when you need to have a custom sorting method.
What if you wanted to sort by the make of cars, but not alphabetically?
For example, you have a database with the makes: Honda, Lexus, Toyota, Acura, Nissan, Infiniti
If you use a plain sort, you get the order: Acura, Ford, Honda, Hyundai, Lexus, Toyota
What if you wanted to sort them based on a car company's standard and luxury class together? Lexus, Toyota, Honda, Acura, Nissan, Infiniti.
I think you would need a custom sort method if that's the case.
I'm writing a bot that will analyse posts and reply with a vaguely related strings from a database. I'm not aiming for coherence, just for vague similarity that could pass as someone ignorant to the topic (but knowledgeable enough to try to reply). What are some methods that would help me to choose the right reply?
One thing I've come up with is to create a vocabulary list, check which elements of the list are in the post, and get a reply from the database based on these results. This crude method has been successful about 10% of the time (based on 100 replies to random posts). I might expand the list by more words, but this method has its limit. Any better ones?
(P. S. The database is sizeable -- about 500 000 replies)
First of all, I think the best you can hope for will be about a 50% answer rate, unless you're prepared to write a lot of code.
If you're willing to get your hands dirty with some statistics, check out term frequency–inverse document frequency. Basically, you will use the frequency of uncommon words to determine what keywords are critical to the document, and use this as the input into the tf-idf algorithm to pull out other replies with those same keywords.
You can then combine this further with whitelisting and blacklisting techniques to ignore common words and prioritize certain keywords. You can then keep tuning those lists to enhance the algorithm as you see it work.
There are also simpler string metrics you can use to test basic similarity. Take a look at this list of string metrics.
You might want to look into vector-space mapping and resemblance. The "vaguely related" problem could be handled by resemblance statistical analysis most likely.
Check out this novel use of resemblance:
http://www.cromwell-intl.com/security/attack-study/
There is a PHP function called "similar_text()", (e.g.:
$percent_similar = similar_text($str1, $str2);) This works fairly well but I didn't come up with anything similar in C#. If you could get hold of the source for the PHP function you might try to translate it. I think there may be a Java version also.
Does anyone know how to implement the algorithm for this problem using the Knapsack algorithm?
The method I'm using at present makes extensive use of LINQ and Collections of Collections and a few Dictionaries. For those who dont know what I'm talking about check out The Cutting Stock Problem.
As mentioned in your given link, this problem is in fact an instance of an ILP, which is NP-hard normally.
Directly from wikipedia: Advanced algorithms for solving integer linear programs include:
cutting-plane method
branch and bound
branch and cut
Sometimes to make a variable/method/class name descriptive I need to make it longer. But I don't want to, I'd like to have short names that are easy to read. So I thought of a special addin to IDE like Visual Studio to be able to write short names for class, method, field but be able to attach long names. If you need to - you can make it all long or you can make single name long. If you want to reduce it - use reduction, like two views of the same code. I`d like to know what others thinking about it? Do you think it is usefull? Would anybody use the kind of addin?
Why not just use the standard XML commenting system built into Visual Studio.
If you type /// above the Class/Method/variable etc, it creates the comment stub.
These comments popup through Intelisense/Code Completion with extra info.
This way you keep your naming conventions short and descriptive whilst commenting your code.
You can run a process to then create documentation for your code using these comments.
See: http://msdn.microsoft.com/en-us/magazine/cc302121.aspx
A variable name should be as long as required to make it identifiable, does it matter if it's a bit longer than you would prefer? As long as the code is readable and understandable, surely this makes no difference?
Use comments for names that would be far too long to use as a variable/class name. This would be a lot more appropriate.
If a method name is too long, then it shouldn't be a single method...
I wouldn't use an addin like that.
I never worry about long names. If a method name becomes too long, it may also indicate that the method does too much (unless it happens to include a really long word). On the other hand, I also try to avoid repeating myself. I would not have Account.AccountId for instance, but rather Account.Id. I also lean back on the namespace; if the namespace is clear about what domain I am in, I usually try to not repeat that in class- or member names.
Bottom line; I can't see myself using such an addin.
Other programmers without this addin would find themselves in trouble because if you give too short names they will not fully understand the code, if you give long names they will loose time reading and eventually get angry because long names are difficult to remember :P
One has to find the best name for everything one writes, imho there is no need for a switch to turn on and off verbosity for identifiers.
I would not use that addin.
Nor I. The fact is you are talking about VisualStudio. It takes the heavy-load of remembering most variables names (long and short) with IntelliSense. As Power said, as long as the code is readable and understandable, that's all that matters.
With ReSharper 4 and above, you can get automatic expansion of type and variable names that are camel or Pascal cased:
(source: jetbrains.com)
So you could call your variable myExtremelyLongAndDescriptiveVariableName but then just type mELADVN to use it.
I don't think I'd want it.
The overhead of switching between different views would be as much work as hitting F12 and reading the comment for the function, which will always be more descriptive than the long name.
I wont.
Long function names could be handy in somecases. If you have a special case or something.
Some examples:
what would you favor for multiplication, mul or multiply ? multiply is my choice
Choosing functionnames is a matter of making your code clear for using, if you have too small names and you have to read comment to know what the function does, then youre doing it wrong
IDEs, text editors and compilers support limited (if at all limited) form of described functionality - that is source code comments. I think comments do very well and don't see any necessity of described addin. If comments are too long they can be folded. If you need source code with no comments you can easily strip them off with regex of similar stuff.
Id like to have short names that are
easy to read.
That is often a contradiction in terms.
Take for example a name like oScBf, if you don't already know what it's for it's practically unreadable. Is it outputScreenBuffer, onlineSourceBitflag, openScannerBrowsefile, outdoorSpecialBikinifavorites...?
Longer identifier names are usually preferrable. Eventhough it's more to read, it's still easier to understand.
Reading code is in some ways similar to reading text. You expect it to follow a certain pattern to be easy to read, if you start to add a lot of abbrev. and non-std words in da text u hav 2 stop n think what it means, and u lose da flow. :)
It's a bad idea. Variable names don't usually need to be long to be adequately descriptive, you'll waste a lot of time writing two versions of every name, and many programmers will probably find it rather confusing to have multiple names for the same thing.
With XMLDoc and intellisense help, you can add any extra detail required to fully describe a code element - the name doesn't need to describe the minutiae, only give a clear and distinctve idea of what the code element's purpose is.
With name auto-completion readily available, there is no longer any reason to complain of long names requiring lots of typing.
Also, good coding style is all about making code easy to read, understand and maintain, not about packing more code into a smaller space.
OO design should help to break functionality down hierarchically into namespaces and classes, reducing the need for such long names at the class/method level)
Lastly, if you really must shorten names, most languages most languages provide easy ways to strip off namespaces and/or add competely new aliases for names (e.g. 'typedef' and 'using' in C++, 'using' in C#), so in a localised region you can easily refer to a long name via a shortened variant or alias if you wish.
I like the idea. It's really good and I congradulate you and hope you're successful in developing it. Although I would never use such an Add-On.