I'm looking for .NET library with the next graphs algorithms:
algorithm for finding a minimum
spanning tree;
algorithm for partitioning graph for N subgraphs with minimal number of connections.
I can write my own realization, but don't have too much time. Tell me, please, names of any existing libraries that can do this. Thanks.
yWorks provides several products for .NET
Depending on where you plan on deploying you can choose your flavor and perform lots of analyses on the graph.
Although I'm not fond of the API (a bit too java-ish), the biggest disadvantage is definitely the price.
The best solution was to use QuickGraph library. It already has algorithm for finding a minimum spanning tree. And I used their implementation of graph to write my own algorithm for partitioning it.
Related
I am in a bit of a crisis here. I would really appreciate your help on the matter.
My Final Year Project is a "Location Based Product Recommendation Service". Now, due to some communication gap, we got stuck with an extremely difficult algorithm. Here is how it went:
We had done some research about recommendation systems prior to the project defense. We knew there were two approaches, "Collaborative Filtering" and "Content Based Recommendation". We had planned on using whichever technique gave us the best results. So, in essence, we were more focused on the end product than the actual process. The HOD asked us what algorithms OUR product would use? But, my group members thought that he meant what are the algorithms that are used for "Content Based Recommendations". They answered with "Rule Mining, Classification and Clustering". He was astonished that we planned on using all these algorithms for our project. He told us that he would accept our project proposal if we use his algorithm in our project. He gave us his research paper, without any other resources such as data, simulations, samples, etc. The algorithm is named "Context Based Positive and Negative Spatio-Temporal Association Rule Mining" In the paper, this algorithm was used to recommend sites for hydrocarbon taps and mining with extremely accurate results. Now here are a few issues I face:
I am not sure how or IF this algorithm fits in our project scenario
I cannot find spatio-temporal data, MarketBaskets, documentation or indeed any helpful resource
I tried asking the HOD for the data he used for the paper, as a reference. He was unable to provide the data to me
I tried coding the algorithm myself, in an incremental fashion, but found I was completely out of my depth. I divided the algo in 3 phases. Positive Spatio-Temporal Association Rule Mining, Negative Spatio-Temporal Association Rule Mining and Context Based Adjustments. Alas! The code I write is not mature enough. I couldn't even generate frequent itemsets properly. I understand the theory quite well, but I am not able to translate it into efficient code.
When the algorithm has been coded, I need to develop a web service. We also need a client website to access the web service. But with the code not even 10% done, I really am panicking. The project submission is in a fortnight.
Our supervisor is an expert in Artificial Intelligence, but he cannot guide us in the algorithm development. He dictates the importance of reuse and utilizing open-source resources. But, I am unable to find anything of actual use.
My group members are waiting on me to deliver the algorithm, so they can deploy it as a web service. There are other adjustments than need to be done, but with the algorithm not available, there is nothing we can do.
I have found a data set of Market Baskets. It's a simple excel file, with about 9000 transactions. There is not spatial or temporal data in it and I fear adding artificial data would compromise the integrity of the data.
I would appreciate if somebody could guide me. I guess the best approach would be to use an open-source API to partially implement the algorithm and then build the service and client application. We need to demonstrate something on 17th of June. I am really looking forward to your help, guidance and constructive criticism. Some solutions that I have considered are:
Use "User Clustering" as a "Collaborate Filtering" technique. Then
recommend the products from similar users via an alternative "Rule
Mining" algorithm. I need all these algorithms to be openly available
either as source code or an API, if I have any chance of making this
project on time.
Drop the algorithm altogether and make a project that actually works
as we intended, using available resources. I am 60% certain that we
would fail or marked extremely low.
Pay a software house to develop the algorithm for us and then
over-fit it into our project. I am not inclined to do this because it
would be unethical to do this.
As you can clearly see, my situation is quite dire. I really do need extensive help and guidance if I am to complete this project properly, in time. The project needs to be completely deployed and operational. I really am in a loop here
"Collaborative Filtering", "Content Based Recommendation", "Rule Mining, Classification and Clustering"
None of these are algorithms. They are tasks or subtasks, for each of which several algorithms exist.
I think you had a bad start already by not really knowing well enough what you proposed... but granted, the advice from your advisor was also not at all helpful.
What are the best clustering algorithms to use in order to cluster data with more than 100 dimensions (sometimes even 1000). I would appreciate if you know any implementation in C, C++ or especially C#.
It depends heavily on your data. See curse of dimensionality for common problems. Recent research (Houle et al.) showed that you can't really go by the numbers. There may be thousands of dimensions and the data clusters well, and of course there is even one-dimensional data that just doesn't cluster. It's mostly a matter of signal-to-noise.
This is why for example clustering of TF-IDF vectors works rather well, in particular with cosine distance.
But the key point is that you first need to understand the nature of your data. You then can pick appropriate distance functions, weights, parameters and ... algorithms.
In particular, you also need to know what constitutes a cluster for you. There are many definitions, in particular for high-dimensional data. They may be in subspaces, they may or may not be arbitrarily rotated, they may overlap or not (k-means for example, doesn't allow overlaps or subspaces).
well i know something called vector quantization, its a nice algorithem to cluster stuf with many dimentions.
i've used k-means on data with 100's dimensions, it is very common so i'm sure theres an implementation in any language, worst case scenario - it is very easy to implement by your self.
It might also be worth trying some dimensionality reduction techniques like Principle Component Analysis or an auto-associative neural net before you try to cluster it. It can turn a huge problem into a much smaller one.
After that, go k-means or mixture of gaussians.
The EM-tree and K-tree algorithms in the LMW-tree project can cluster high dimensional problems like this. It is implemented in C++ and supports many different representations.
We have novel algorithms clustering binary vectors created by LSH / Random Projections, or anything else that emits binary vectors that can be compared via Hamming distance for similarity.
I would like to find an appropriate algorithm to solve this problem:
Suppose we have N projects and we know how much money we will earn by each project and how much time it is required for each project to be done(estimated time for each project). We have certain amount of available time to do all projects. We want to select projects so that our profit is maximized and overall estimated time does not exceed available time. Can you please advise which optimization algorithm should I use? Are there any already made things that I could use in C#, .NET technology or Java technology?
This sounds like straightforward Knapsack problem:
Given a set of items, each with a weight and a value, determine the count of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.
In your case, the weight is the time required for the projects, and the limit is the time limit.
Normally, if you are doing this for the real world, then a brute-force would suffice for the small cases, and greedy approximation with some randomization should be good enough for the large cases if you don't really care for the accurate maximal. However, I doubt if anyone would use such a strict model for the real world.
In the case of theoretical interest, knapsack problem is NP-hard, and an active field of algorithm.
What you're looking for is called the Knapsack problem.
in your case the "weight" limit is the time limit, and the value is the value.
Simplified, this looks like a weighted http://en.wikipedia.org/wiki/Knapsack_problem. Your time would be the size of the project and your weight would be your costs
Coming at this problem from an Operations Research perspective you are looking at some for of mixed-integer program (MIP). The knapsack problem approach might be sufficient but without getting more specifics on the problem I can't suggest a more detailed formulation.
Once you've decided on your formulation there are a couple of c# solutions available to solve the MIP. Microsoft has a Microsoft Solver Foundation that you can look into that is capable of solving simple MIPs and that has a nice C# API
IBM recently purchased the OPL optimization package (considered industry leading) that you can use to develop your MIP formulation. Once you have the formulation OPL offers .NET APIs that you can call to run your models.
Having gone the OPL route myself I would avoid using the OPL .NET APIs if possible because they are very cumbersome. If your problem is simple you may want to look into solver foundation because it offers a modern and clean API as compared to the OPL
I wonder if you could help me with a simple implementation for detecting cycles in a directed graph in C#.
I've read about the algorithms but I'd like to find something already implemented, very simple and short.
I don't care about the performance because the data size is limited.
Check out QuickGraph - it has loads of the algorithms implemented and it's quite a nice library to use.
Run a DFS on G and check for backedges.
At every node you expand just check if it is already in the currrent path.
What would be the best library choice for finding similar parts in images and similarity matching?
Thank you.
It sounds like the Scale Invariant Feature Transform (SIFT) is probably the algorithm you're really looking for. Offhand, I don't know of any general-purpose image processing library that includes it, but there are definitely standalone implementations to be found (and knowing the name should make Googling for it relatively easy).
ImageJ fastest image processing library in Java.
OpenCV is certainly a solid choice as always.
That said, VLFeat is also very good. It includes many popular feature detectors (including SIFT, MSER, Harris, etc.) as well as clustering algorithms like (kd-trees and quickshift). You can piece together something like a bag of words classifier using that very quickly.
While SIFT is certainly a solid general purpose solution, it actually is a pipeline composed of a feature detector (which points are interesting in the image), a feature descriptor (for each interesting point in the image, what's a good representation), and a feature matcher (given a descriptor and a database of descriptors, how do I determine what is the best match).
Depending upon your application, you may want to break apart this pipeline and swap in different components. VLFeat's SIFT implementation is very modular and lets you experiment with doing so easily.
Never did image processing, but I've heard from friends OpenCV is quite good, they usually use C++