C# data structure for subways - c#

What would be the best way to store subway data in the application?
Data consists of subway station positions, length of the tunnels between stations, alignment of the labels while rendering, types of arcs to draw while rendering tunnels, junctions, etc...
Right now I'm thinking of a severely extended graph, but (just curious) maybe there is something more convenient? (obviously, subway model is used for path finding and routing).

I would suggest creating different data models that treat different parts of your problem (because you have different bounded contexts).
Using a directed graph is a no-brainer. You should implement it in a very abstract manner, so you can reuse decent, proven path finding algorithms. Depending on the algorithm you chose (A* is likely a good candidate) your data model needs to optimize for this algorithm. In case of A* this starts by defining a meaningful, practically relevant topological sort on your subway stations (euclidian distance is fine for a start, but by analyzing the nature of your data and tuning it you are likely to gain a decent boost in performance). Another aspect is using caches for various calculations and quickly discarding stations out of question.
For representation, you want to create another model of your graph, that can carry all information relevant to presentation (colors, texts, etc.).

Related

Ensemble learning, multiple classifier system

I am trying to use a MCS (Multi classifier system) to do some better work on limited data i.e become more accurate.
I am using K-means clustering at the moment but may choose to go with FCM (Fuzzy c-means) with that the data is clustered into groups (clusters) the data could represent anything, colours for example. I first cluster the data after pre-processing and normalization and get some distinct clusters with a lot in between. I then go on to use the clusters as the data for a Bayes classifier, each cluster represents a distinct colour and the Bayes classifier is trained and the data from the clusters is then put through separate Bayes classifiers. Each Bayes classifier is trained only in one colour. If we take the colour spectrum 3 - 10 as being blue 13 - 20 as being red and the spectrum in between 0 - 3 being white up to 1.5 then turning blue gradually through 1.5 - 3 and same for blue to red.
What I would like to know is how or what kind of aggregation method (if that is what you would use) could be applied so that the Bayes classifier can become stronger, and how does it work? Does the aggregation method already know the answer or would it be human interaction that corrects the outputs and then those answers go back into the Bayes training data? Or a combination of both? Looking at Bootstrap aggregating it involves having each model in the ensemble vote with equal weight so not quite sure in this particular instance I would use bagging as my aggregation method? Boosting however involves incrementally building an ensemble by training each new model instance to emphasize the training instances that previous models mis-classified, not sure if this would be a better alternative to bagging as im unsure how it incrementally builds upon new instances? And the last one would be Bayesian model averaging which is an ensemble technique that seeks to approximate the Bayes Optimal Classifier by sampling hypotheses from the hypothesis space, and combining them using Bayes' law, however completely unsure how you would sample hypotheses from search space?
I know that usualy you would use a competitive approach to bounce between the two classification algorithms one says yes one says maybe a weighting could be applied and if its correct you get the best of both classifiers but for keep sake I dont want a competitive approach.
Another question is using these two methods together in such a way would it be beneficial, i know the example i provided is very primitive and may not apply in that example but can it be beneficial in more complex data.
I have some issues about the method you are following:
K-means puts in each cluster the points that are the most near to it. And then you train a classifier using the output data. I think that the classifier may outperform the clustering implicit classification, but only by taking into account the number of samples in each cluster. For example, if your training data after clustering you have typeA(60%), typeB(20%), typeC(20%); your classifier will prefer to take ambiguous samples to typeA, to obtain less classification error.
K-means depends on what "coordinates"/"features" you take from the objects. If you use features where the objects of different types are mixed, the K-means performance will decrease. Deleting these kind of features from the feature vector may improve your results.
Your "feature"/"coordinates" that represent the objects that you want to classify may be measured in different units. This fact can affect your clustering algorithm since you are implicitly setting a unit conversion between them through the clustering error function. The final set of clusters is selected with multiple clustering trials (that were obtained upon different cluster initializations), using an error function. Thus, an implicit comparison is made upon the different coordinates of your feature vector (potentially introducing the implicit conversion factor).
Taking into account these three points, you will probably increase the overall performance of your algorithm by adding preprocessing stages. For example in object recognition for computer vision applications, most of the information taken from the images comes only from borders in the image. All the color information and part of the texture information are not used. The borders are substracted from the image processing the image to obtain the Histogram of Oriented Gradients (HOG) descriptors. This descriptor gives back "features"/"coordinates" that separate better the objects, thus, increasing classification (object recognition) performance. Theoretically descriptors throw information contained in the image. However, they present two main advantages (a) the classifier will deal with lower dimensionality data and (b) descriptors calculated from test data can be more easily matched with training data.
In your case, I suggest that you try to improve your accuracy taking a similar approach:
Give richer features to your clustering algorithm
Take advantage of prior knowledge in the field to decide what features you should add and delete from your feature vector
Always consider the possibility of obtaining labeled data, so that supervised learning algorithms can be applied
I hope this helps...

High Dimensional Data Clustering

What are the best clustering algorithms to use in order to cluster data with more than 100 dimensions (sometimes even 1000). I would appreciate if you know any implementation in C, C++ or especially C#.
It depends heavily on your data. See curse of dimensionality for common problems. Recent research (Houle et al.) showed that you can't really go by the numbers. There may be thousands of dimensions and the data clusters well, and of course there is even one-dimensional data that just doesn't cluster. It's mostly a matter of signal-to-noise.
This is why for example clustering of TF-IDF vectors works rather well, in particular with cosine distance.
But the key point is that you first need to understand the nature of your data. You then can pick appropriate distance functions, weights, parameters and ... algorithms.
In particular, you also need to know what constitutes a cluster for you. There are many definitions, in particular for high-dimensional data. They may be in subspaces, they may or may not be arbitrarily rotated, they may overlap or not (k-means for example, doesn't allow overlaps or subspaces).
well i know something called vector quantization, its a nice algorithem to cluster stuf with many dimentions.
i've used k-means on data with 100's dimensions, it is very common so i'm sure theres an implementation in any language, worst case scenario - it is very easy to implement by your self.
It might also be worth trying some dimensionality reduction techniques like Principle Component Analysis or an auto-associative neural net before you try to cluster it. It can turn a huge problem into a much smaller one.
After that, go k-means or mixture of gaussians.
The EM-tree and K-tree algorithms in the LMW-tree project can cluster high dimensional problems like this. It is implemented in C++ and supports many different representations.
We have novel algorithms clustering binary vectors created by LSH / Random Projections, or anything else that emits binary vectors that can be compared via Hamming distance for similarity.

Should entities have the capability to draw themselves? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
A bit of a simple question but it's one I'm never too sure of. This is mostly in context of game development. Let's say you had an entity, like a ship or a character. Should (and I know this is a matter of opinion) this object contain a Draw method? (perhaps implement an IDrawable interface)
When looking at source code in some game projects I see this done in very many ways. Sometimes there is a separate class that may take in an Entity base object and draw it, given its properties (texture, location, et cetera). Other times the entity is given its own draw method and they are able to render themselves.
I hope this question is not too subjective.. Sometimes I hear people say things like drawing should be completely separate from objects and they should not know how to draw themselves or have the capability to. Some are more pragmatic and feel this kind of design where objects have this capability are fine.
I'm not too sure myself. I didn't find any similar questions to this because it may be a bit subjective or either method is fine but I'd like to hear what SO has to say about it because these kinds of design decisions plague my development daily. These small decisions like whether object A should have the capability to perform function B or whether or not that would violate certain design principles.
I'm sure some will say to just go with my gut and refactor later if necessary but that's what I have been doing and now I'd like to hear the actual theory behind deciding when certain objects should maintain certain capabilities.
I suppose what I am looking for are some heuristics for determining how much authority a given object should have.
You should go with whatever makes your implementation the easiest. Even if it turns out you made the wrong choice, you first hand experience on why one method is better than the other and can apply that knowledge in the future. This is valuable knowledge that will help you make decisions later. I find that one of the best ways I learn the merits of something is to do it a wrong a few times (not on purpose mind you) so you can get a good understanding of the pitfalls of an approach.
The way I handle it is this: An entity has all the information on it that is required for it to be drawn. e.g. The sprites that make it up, where they are located relative to the center of the entity. The entity itself does very little, if anything at all. It's actually just a place to store information that other systems operate on.
The world rendering code handles drawing the world as well as all the entities in it. It looks at a given entity and draws each of its sprites in the right spot, applying any necessary camera transformations or effects or what-have-you.
This is how most of my game works. The entity has no idea about physics either, or anything. All it is is a collection of data, which other systems operate on. It doesn't know about physics, but it does have some Box2D structures that hang off of it which Box2D operates on during the physics updating phase. Same with AI or Input and everything else.
Each method has it's pros and cons.
Advantage of letting objects draw themselves:
It is more comfortable for both parties. the person writing the engine just has to call a function, and the person using it, writing these IDrawable classes has low level access to everything they need. If you wanted to make a system where each object has to define what it looks like which shaders and textures it will use, and how to apply any special effects. You will either have a pretty complicated system, or a very limited one.
Advantage of having a renderer manage and draw objects
All modern 3D game engines do it this way, and for a very good reason. If the renderer has full control over which textures are bound, which models to use, and which shaders are active, it can optimize. Say you had a lot of items rendering with the same shader. The renderer can set the shader once and render them all in one batch! You can't do this if each object draws itself.
Conclusion
For small projects, its fine to have each object draw itself. It's more comfortable in fact.
If you really want performance and want to squeeze everything out of your graphics card, you need a renderer that has strict control over what is rendered and how, so that it can optimize.
It depends what you mean by 'draw itself'. If you mean should it contain low-level routines involving pixels or triangles, then probably not.
My suggestion:
One class to represent the behavior of the object (movement, AI, whatever)
One class to represent the appearance of the object
One class to define an object as combination of an appearance and a behavior
If the 'appearance' involves some custom drawing routine, that could be in that class. On the whole though drawing would be abstracted from underlying API perhaps and either using the strategy, visitor or some IoC pattern the object would be drawn by some rendering manager. This is especially true in game design where things like swapping textures in/out of video memory and drawing things in the correct order is important; something needs to be able to optimize things above the object level.
To try to be more specific, one part of your object's graph (object itself or appearance subdivision) implements IRenderable, has a method Draw(IRenderEngine), and that IRenderEngine gives the IRenderable access to methods like DrawSprite, CreateParticleEffect or whatever. Optimising or measuring your engine and algorithms will be far easier this way, and if you need to swap out engines or re-write in an unmanaged, high-performance way (or port to another platform), just create a new implementation of IRenderEngine.
Hint: if you do this, you can have many things that look the same/similar but act differently by swapping out the behavior, or lots of things that look different but act the same. You can also have the behavior stuff happen on a server, and the appearance stuff happen on a client.
I've done something very similar to this before. Each class of object had a base class that contained properties used on the client, but on the server it was a derived instance of that class that was instantiated, and it included the behavior (movement, simulation, etc). The class also had a factory method for providing its associated IRenderable (appearance) interface. Actually had a pretty effective multiplayer game engine in the end with that approach.
I suppose what I am looking for are some heuristics for determining how
much authority a given object should have.
I'd replace authority with responsibility and then the answer is ONE. Responsibility both for the behaviour and drawing of a ship seems too much to me.

Setting up filter-based graph framework in C#

I need a little push in the right direction.
I want to code a framework in C# that allows me to create graphs that process (mostly numerical) data. I've been looking for the right nomenclature, and for other projects with the same goal, but found no satisfactory results. I'm pretty sure code like this already exists, and I don't want to completely reinvent the wheel. Also, more experienced programmers will probably use techniques (templates, interfaces, ...) that I would love to learn by examining their code.
The framework should process data much like the DirectShow framework handles video. Some components produce data (eg. a file reader or a sensor), some components manipulate data (eg. add, average) and some components render data (eg. a file writer or a chart drawing control). The components/nodes are connected using edges/lines.
Nodes can have multiple inputs (sinks) and outputs (sources). The framework should encompass the base classes that allow filter graphs to be constructed. Applications using the framework must subclass to implement the actual source, transform and render components.
An example: a GPS device produces latitude and longitude values (2 output pins). A calculator transforms these values into cartesian coordinates. The next component takes two consecutive coordinates and calculates the distance.
I am looking for tips, references and example code that enables me code the framework. Thanks!
UPDATE: Pipes.NET looks promising.
UPDATE: Dataflow is a relevant term.
I suppose you could use QuickGraph for all your graph needs. If none of the built-in algorithms are of use, it's always possible to simply iterate over the tree and invoke whatever custom logic you want along the way.

Finding most proper standard for image [Image recognition]

We have some examples of pictures.
And we have on input set of pictures. Every input picture is one of example after combination of next things
1) Rotating
2) Scaling
3) Cutting part of it
4) Adding noise
5) Using filter of some color
It is guarantee that human can recognize picture ease.
I need simple but effective algorithm to recognize from which one of base examples we get input picture.
I am writing in C# and Java
I don't think there is a single simple algorithm which will enable you to recognise images under all the conditions you mention.
One technique which might cover most is to Fourier transform the image, but this can't be described as simple by any stretch of the imagination, and will involve some pretty heavy mathematical concepts.
You might find it useful to search in the field of Digital Signal Processing which includes image processing since they're just two dimensional signals.
EDIT: Apparently the problem is limited to recognising MONEY (notes and coins) so the first problem of searching becomes avoiding websites which mention money as the result of using their image-recognition product, rather than as the source of the images.
Anyway, I found more useful hits by searching for 'Currency Image Recognition'. Including some which mention Hidden Markov Models (whatever that means). It may be the algorithm you're searching for.
The problem is simplified by having a small set of target images, but complicated by the need to detect counterfeits.
I still don't think there's a 'simple agorithm' for this job. Good luck in your searching.
There is some good research going on in the field of computer vision. One of the problem being solved is identification of an object irrespective of scale changes,noise additions and skews introduced because photo has been clicked from a different view. I have done little assignment on this two years back as a part of computer vision course. There is a transformation called as scale invariant feature transform by which you can extract various features for the corner point. Corner points are those which are different from all its neighboring pixels. As you can observe, If photo has been clicked from two different views, some edges may disappear and appear like some thing else but corners remain almost same. This transformations explains how feature vector of size 128 can be extracted for all the corner points and tells you how to use these feature vector to find out the similarity between two images. Here in you case
You can extract those features for one of all the currency notes you have and check for existence of these corner points in the test image you are supposed to test
As this transformation is robust to rotation,scaling,cropping,noise addition and color filtering, I guess this is the best I can suggest you. You can check this demo to have a better picture of what I explained.
OpenCV has lots of algorithms and features, I guess it should be suitable for your problem, however you'll have to play with PInvoke to consume it from c# (it's C library) - doable, but requires some work.
You would need to build a set of functions that compute the probability of a particular transform between two images f(A,B). A number of transforms have previously been suggested as answers, e.g. Fourier. You would probably not be able to compute the probability of multiple transforms in one go fgh(A,B) with any reliability. So, you would compute the probability that each transform was independently applied f(A,B) g(A,B) h(A,B) and those with P above a threshold are the solution.
If the order is important, i.e you need to know that f(A,B) then g(f,B) then h(g,B) was performed, then you would need to adopt a state based probability framework such as Hidden Markov Models or a Bayesian Network (well, this is a generalization of HMMs) to model the likelihood of moving between states. See the BNT toolbox for Matlab (http://people.cs.ubc.ca/~murphyk/Software/BNT/bnt.html) for more details on these or any good modern AI book.

Categories

Resources