finding Peaks in image Histogram - c#

I am witting a project of image processing.
For some part of my project to find good threshold value I need to find peaks and valleys of image's histogram.
I am witting my project in C# .net
but I need Algorithm or sample code in any languages like(Java, C,C++,....) to understand the logic of that. I can convert to C# by my self.
any document or algorithm or piece of code...
thanks

It's hard to beat Ohtsu's Method for binary thresholding. Even if you insist on implementing local extrema searching by yourself, Ohtsu's method will give you a good result to compare to.

If you already have computed your histogram, to find peaks and valleys is computationally trivial (loop over it and find local extrema). What is not trivial is to find "good" peaks and valleys to do some segmentation/threshold. But that is not a matter of coding, it's a matter of modelling. You can google for it.
If you want a simple recipe, and if you know that your histogram has "essentially" two peaks and a valley in the middle ("bimodal" histogram) and you want to locate that valley, I have once implemented the following ad-hoc procedure, with relative success:
Compute all the extrema of the histogram (relative maxima/minima, including borders)
If there are only two maxima, AND if in between those maxima there is only one local minimum, we've found the valley. Return it.
Else, smooth the histogram (eg. a moving average) and go to first step.

Related

3D Data Interpolation in C#

I'm looking for a simple function in C# to interpolate my 3D data.
Given is already a list with around 100-150 data sets and 3 double values.
-25.000000 -0.770568 2.444945
-20.000000 -0.726583 2.467809
-15.000000 -0.723274 2.484167
-10.000000 -0.723114 2.506445
and so on...
The chart created by these values looks usually like this, I'm not sure if this counts as scattered or rather still gridded data ...
In the end I want to hand over two double values and get the third then from the interpolation function. It shouldn't flatten the surface, it should still go through all the given data points.
Since I'm not given the time to look into all possible algorithms and lack the mathematical background I'm a bit overwhelmed by all the possibilities that I get thrown at: Kriging, Delauney triangulation, NURBs and many more ...
In addition to that most solutions I found in the net were either for a different language, outdated or are charged by the time (e.g ilnumerics, still not sure if they have the solution)
In matlab there exists a griddata function that does exactly this (and is based on a kriging algorithm as far as I know) but in this case C# is mandatory for me.
Thank you for your help and criticism and suggestions are welcome.

pattern recognition inside a matrix

say I have these boxes, some of which are black and some white.
The image shows a U shape drawn with the black boxes. Now say I have a matrix of 1s and 0s (it can be a huge matrix) like this:
111111111111111111
111111111111111111
111111111111111111
111111111101111111
111111111101111111
111011111101111111
111011111101111111
111011111101111111
111011111101111111
111011111101111111
111011111101111111
111100000011111111
111111111111111111
which shows zeros forming roughly the shape shown in the image. The image and the matrix are just examples. The image is a screen shot of a software where we should draw patterns, which would then need to be located in given matrices (inside simple text files).
What I'm looking for is a guidance on how to get started on this, cuz I have never programmed anything related to pattern recognitions, which this problem clearly seems to be related to. This is all that I have to do, a pattern given, to be matched with matrix of 0s and 1s. I dont think I can write it on my own in a few days, I'm writing code in c# vs 2013, so hoping I can find some libraries that would let me achieve this with minimal dependencies. Thanks
I think you need to provide a bit more information on what exactly you're looking for. Are the shapes all letters or arbitrary shapes?
Whatever you're looking for I'd start with emguCV. It's a pretty comprehensive library that isn't too difficult to use.
EmguCV has a lot of OCR (optical character recognition) functions which should be able to pick out letters pretty well.
I don't have as much experience using it for arbitrary shape detection but I think SURF detection, something which emguCV also does, might be a good way to go. It attempts to match a given image with features in another image.
People never draw at the exact same place and scale as your stored data.
The things you want are often done with neural networks (its also in aforge).
But it might be hard to A understand it and B use it in your code.
So maybe you could try it like this, get the first position, then record the delta position.
Try to find long lines, and their next direction; store the general direction changes.
above sample would be "down right up", you might also store some length info.
Then there is some math to check how much different sets are, for example string comparisons distance of strings (like in php the levenshtein function); cant think of a levenshtein func in c# dough i dont think c# is that rich with string functions but once you see that i'm sure you can derive something for C#.

Path Smoothing/Point reducing algorithm

I am currently writing an application which displays saved gps paths on a map. (I am using greatmaps for the map) Link
I am looking to run some path smoothing and point reducing algorithm on the path to produce a cleaner looking path on the map. I have been looking at the Ramer–Douglas–Peucker algorithm and possibly a spline.
Can anyone advise me on what approach to take, Any help on this issue would be great.
The algorithm key part is to recursion.
If you could understand how it works,disregard of language it is the same thing.
So,basically we just have to take points and send them to a function that holds the logic(which also does the recursion) part.
As you have the implementation now,pick up the points from the control as this.MainMap.Position (play with the control to know about ) , and call that implemented function :)
This might give you a start
Good luck!

Playable Heightmap

I have a game with infinity procedually generated terrain. I'm using 1/f noise for the height (I think this is perlin noise?). Anyway it looks nice, but its not very playable since it doesn't really have flat areas. Just decreasing the amplitude won't work since I still want a large variation in height. Does anyone know of a filter I can apply to the heightmap to encourage flat areas while keeping a large range of heights?
Written in C#
EDIT: I've realised that what I want is for steep gradients to become steeper, and for flat gradients to become flatter. The terrain needn't be realistic, just "fun" for an FPS.
I believe you need to use a smoothing function to get rid of the jaggedness of the terrain, if that seems to be your problem.
I only glanced through this page, but it may be a decent guide: http://www.float4x4.net/index.php/2010/06/generating-realistic-and-playable-terrain-height-maps/
Not sure if this would help, but you could make that a range of your function is transformed into a flat surface with a high probability. For example all results between 0.1 and 0.3 have a 80% probability of end as a 0.1 surface. This way you encourage flat surfaces but keep the high variability you want.
Simple noise is not enough to generate a good looking terrain. It's just one of the intermediate steps in a way more complicated process. You need to simulate some real world phenomena: temperature, erosion, precipitation, that sort of thing. It's a CPU-heavy process, usually, but well worth the effort. Here are some interesting links:
Dungeon League - read all of it. Great stuff.
http://www.dungeonleague.com/
World generation articles on The Chronicles of Doryen:
http://doryen.eptalys.net/2010/01/back-to-the-caves-world-generator/
http://doryen.eptalys.net/2010/01/the-cave-map-with-ice-floe/
http://doryen.eptalys.net/2010/01/the-caves-biome-map/
http://doryen.eptalys.net/2010/01/nifty-debug-maps/
http://doryen.eptalys.net/2010/01/improved-precipitation-map/
http://doryen.eptalys.net/2010/01/biomes-balancing-and-rivers/
http://doryen.eptalys.net/2010/01/rrt-rivers-until-i-get-something-better/
http://doryen.eptalys.net/2010/01/disco-time/
(You can download the generator too, but it's written in C++)
You need a "master random generator" that will decide what a new area should look like, with a frequency of your choosing. For mountains choose what you have already. For flats choose less noise.
You can filter by median filter then. it will flatten your surface. But it will destroy mountains. This is fine for relatively flat areas like hollows and plateaus. If you want sharp mountains (with fast and big height diff) you should apply this filter selectively.
You should look for more materials especialy on procedural textures and noise. Those three are related a lot. You should think about using more than one noise function with different parameters and combine them using different functions or operators.
To help your case, you can use one function to generate high-frequency noise and then multiply it by low-frequency noise. This will result in peaks where low-frequency noise is closer to 1 and flats where it is close to 0. Some kind of smoothing/erosion algorithm is cool too. But you will still need lot of trial and error and fine tuning your parameters to get at least usable results.
Some more complex terains may need over 10 noise functions with alpha blending or smoothing and such. Dont think you will get nice looking terrain from aplying simple filter.
What you can do for an easy and ok looking solution is to Evaluate the height you get from your noise map with a custom curve function.
For example you can make your curve map noise points from 0.1-0.3 to 0.1-0.15 and then from 0.3-1.0 to 0.15-1.0.
This way you still keep the actual roughness of the terrain but make it flatter.

Removing Duplicate Images [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
We have a collection of photo images sizing a few hundred gigs. A large number of the photos are visually duplicates, but with differing filesizes, resolution, compression etc.
Is it possible to use any specific image processing methods to search out and remove these duplicate images?
I recently wanted to accomplish this task for a PHP image gallery. I wanted to be able to generate a "fuzzy" fingerprint for an uploaded image, and check a database for any images that had the same fingerprint, indicating they were similar, and then compare them more closely to determine how similar.
I accomplished it by resizing the uploaded image to 150 pixels wide, reducing it to greyscale, rounding the value of each colour off to the nearest multiple of 16 (giving 17 possible shades of grey between 0 and 255), normalise them and store them in an array, thereby creating a "fuzzy" colour histogram, then creating an md5sum of the histogram which I could then search for in my database. This was extremely effective in narrowing down images which were very visually similar to the uploaded file.
Then to compare the uploaded file against each "similar" image in the database, I took both images, resized them to 16x16, and analysed them pixel by pixel and took the RGB value of each pixel away from the value of the corresponding pixel in the other image, adding all the values together and dividing by the number of pixels giving me an average colour deviation. Anything less than specific value was determined to be a duplicate.
The whole thing is written in PHP using the GD module, and a comparison against thousands of images takes only a few hundred milliseconds per uploaded file.
My code, and methodology is here: http://www.catpa.ws/php-duplicate-image-finder/
Try PerceptualDiff for comparing 2 images with the same dimensions. Allows threshholds such as considering images with only X number of pixels different to be visually indistinguishable.
If visual duplicates may have different dimensions due to scaling, or different filetypes,
you may want to make a standard format for comparisons. For example, I might use ImageMagick
to scale all images to 100x100 and save them as PNG files.
A very simple approach is the following:
Convert the image to greyscale in memory, so every pixel is only a number between 0 (black) and 255 (white).
Scale the image to a fixed size. Finding the right size is important, you should play around with different sizes. E.g. you could scale each image to 64x64 pixels, but you may get better or worse results with either smaller or bigger pictures.
Once you've done this for all images (yes, that will take a while), always load two images in memory and subtract them from each other. That is subtract the value of pixel (0,0) in image A ob the value of pixel (0,0) in image B, now do the same for (0,1) in both and so on. The resulting value might be positive or negative, you should always store the absolute value (so 5 results in 5, -8 however results in 8).
Now you have a third image being the "difference image" (delta image) of image A and B. If they were identical, the delta image is all black (all values will subtract to zero). The "less black" it is, the less identical the images are. You need to find a good threshold, since even if the images are in fact identical (to your eyes), by scaling, altering brightness and so on, the delta image will not be totally black, it will however have only very dark greytones. So you need a threshold that says "If average error (delta image brightness) is below a certain value, there is still a good chance they might be identical, however if it is above that value, they are most likely not. Finding the right threshold is as hard as finding the right scaling size. You will always have false positives (images deemed to be identical, though they are not at all) and false negatives (images deemed to be not identical, although they are).
This algorithm is ultra slow. Actually only creating the greyscale images takes tons of time. Then you need to compare each GS image to each other one, again, tons of time. Also storing all the GS images takes a lot of disk space. So this algorithm is very bad, but the results are not that bad, even though its that simple. While the results are not amazing, they are better than I had initially thought.
The only way to get even better results is to use advanced image processing and here it starts getting really complicated. It involves a lot of math (a real lot of it); there are good applications (dupe finders) for many systems that have these implemented, so unless you must program it yourself, you are probably better off using one of these solutions. I read a lot papers on this topic but I'm afraid most of this goes beyond my horizon. Even the algorithms I might be able to implement according to these papers are beyond it; that means I understand what needs to be done, but I have no idea why it works or how it actually works, it's just magic ;-)
I actually wrote an application that does this very thing.
I started with a previous application that used a basic Levenshtein Distance algorithm to compute image similarity, but that method is undesirable for a number of reasons. Without a doubt, the fastest algorithm you're going to find for determining image similarity is either mean squared error or mean absolute error (both have a running time of O(n), where n is the number of pixels in the image, and it'd also be trivial to thread an implementation of either algorithm in a number of different ways). Mecki's post is actually just a Mean Absolute Error implementation, which my application can perform (code is also available for your browsing pleasure, should you so desire).
In any event, in our application, we first down-sample images (e.g. everything is scaled to, say, 32*32 pixels), then convert to gray scale, and then run the resulting images through our comparison algorithms. We're also working on some more advanced pre-processing algorithms to further normalize images, but...not quite there yet.
There are definitely better algorithms than MSE/MAE (in fact, the problems with these two algorithms as applied to visual information has been well documented), like SSIM, but it comes at a cost. Other people attempt to compare other visual qualities in the image, such as luminance, contrast, color histograms, etc., but it's all pricey compared to simply measuring the error signal.
My application might work, depending on how many images are in those folders. It's multi-threaded (I've seen it fully load eight processor cores performing comparisons), but I've never tested against an image database larger than a few hundred images. A few hundred gigs of images sounds prohibitively large. (simply reading them from disk, downsampling, converting to gray scale and storing in memory--assuming you have enough memory to hold everything, which you probably don't--could take a couple hours).
This is still a research area, I believe. If you have some time in your hands, some relevant keywords are:
Image copy detection
Content based image retrieval
Image indexing
Image duplicate removal
Basically, each image is processed (indexed) to produce an "image signature". Similar images have similar signatures. If your images are just rescaled then probably their signature are nearly identical, so they cluster well. Some popular signatures are the MPEG-7 descriptors. To cluster, I think K-Means or any of its variants may be enough.
However, you probably need to deal with millions of images, this may be a problem.
Here is a link to the main Wikipedia entry:
http://en.wikipedia.org/wiki/CBIR
Hope this helps.
Image similarity is probably a sub-field of image processing/AI.
Be prepared to implement algorithms/formulae from papers if you're looking for an excellent (i.e. performant and scalable) solution.
If you want something quick n dirty, search google for Image Similarity
Here's a C# image similarity app that might do what you want.
Basically, all algorithms extract and compare features. How they define "feature" depends on the math model they're based on.
A quick hack at this is to write a program that will calculate the value of the average pixel in each image, in greyscale, sort by this value, and then compare them visually. Very similar images should occur near each other in the sorted order.
You will need a command line tool to deal with so much data.
Comparing every possible pair of images will not scale to such a large set of images.
You need to sort the entire set of images according to some metric so that further
comparisons are only needed on neighbouring images.
An example of a simple metric is the average value of all of the pixels in an image, expressed
as a single greyscale value. This should work only if the duplicates have not had any visual alterations.
Using a lossy file format can also result in visual alterations.
Thinking outside the box, you may be able to use image metadata to narrow down your dataset.
For example, your images may have fields showing the date and time the image was taken, down to the nearest second.
Duplicates are likely to have identical values.
A tool such as exiv2 could be used to dump out this data to a more convenient and sortable text format (with a little knowledge of batch/shell scripting).
Even fields such as the camera manufacturer and model could be used to reduce a set of 1,000,000 images
to say 100 sets of 10,000 images, a significant improvement.
The gqview program has an option for finding duplicates, so you might try looking there. However, it's not foolproof, so it'd only be suitable as a heuristic to present duplicates to a human, for manual confirmation.
The most important part is to make the files comparable.
A generic solution might be to scale all images to a certain fixed size and greyscale. Then save the resulting images in a separate directory with same name for later reference. It would then be possible to sort by filesize and visually compare neighboring entries.
The resulting pictures might be quantified in certain ways to programatically detect similarities (averaging of blocks, lines etc.).
I would imagine the most scaleable method would be to store an fingerprint with each image. Then when a new image is added, it's a simple case of SELECT id FROM photos where id='uploaded_image_id' to check for duplicates (or fingerprinting all the images, then doing a query for duplicate
Obviously a simple file-hash wouldn't work as the actual content differs..
Acoustic fingerprinting/this paper may be a good start on the concept, as there are many implementations of this. Here is a paper on image fingerprinting.
That said, you may be able to get away with something simpler. Something as basic as resizing the image to equal width or height, subtracting image_a from image_b, and summing the difference. If the total difference is below a threshold, the image is a duplicate.
The problem with this is you need to compare every image to every other. The time required will exponentially increase..
If you can come up with a way of comparing images that obeys the triangle inequality (eg, if d(a,b) is the difference between images a and b, then d(a,b) < d(a,c) + d(b,c) for all a,b,c), then a BK-Tree would be an effective way of indexing the images such that you can find matches in O(log n) time instead of O(n) time for each image.
If your matches are restricted to the same image after varying amounts of compression/resizing/etc, then converting to some canonical size/color balance/etc and simply summing the squares-of-differences of each pixel may be a good metric, and this obeys the triangle inequality, so you could use a BK-tree for efficient access.
If you have a little bit of money to spend, and maybe once you run a first pass to determine which images are maybe matches, you could write a test for Amazon's Mechanical Turk.
https://www.mturk.com/mturk/welcome
Essentially, you'd be creating a small widget that AMT would show to real human users who would then basically just have to answer the question "Are these two images the same?". Or you could show them a grid of say 5x5 images and ask them "Which of these images match?". You'd then collect the data.
Another approach would be to use the principles of Human Computation which have been most famously espoused by Luis Von Ahn (http://www.cs.cmu.edu/~biglou/) with reCaptcha, which uses Captcha answers to determine the unreadable words that have been run through Optical character Recognition, thus helping to digitize books. You could make a captcha that asked users to help refine the images.
It sounds like a procedural problem rather than a programming problem. Who uploads the photos? You or the customers? If you are uploading the photo, standardize the dimensions to a fixed scale and file format. That way comparisons will be easier. However, as it stands, unless you have days - or even weeks of free time - I suggest that you instead manually remove the duplicates images by either yourself or your team by visually comparing the images.
Perhaps you should group the images by location since it is a tourist images.

Categories

Resources