Before I start, I must say that for those with a background of linear algebra, this is NOT matrix decomposition as you know it. Please read the following paragraphs to get a clearer understanding of the problem I am trying to solve.
Here are the salient properties/definitions of the matrix and its submatrices:
I have an SxP matrix which forms a grid like structure of S.P "boxes". This is the main matrix.
This is what the (empty) main matrix looks like. Each square in the matrix is simply referred to as a box. The matrix can be viewed as a a kind of "gameboard" e.g. a chess board. The vertical axis is measured using an interval scale (i.e. real numbers), and the horizontal axis is measured using monotonically increasing non-negative integers.
There is an additional concept of submatrices (as explained earlier). A submatrix is simply a collection of boxes in a particular configuration, and with specific numbers and piece types (see black and white pieces below), assigned to the boxes. I have a finite set of these sub matrices - which I refer to as my lexicon or vocabulary for carrying out valid matrix composition/decompositions.
The "formal" definition of a sub matrix is that it is a configuration of M boxes contained within the main matrix, that satisfy the criteria:
1 <=M<= 4
the "gap" G (i.e. distance) between any two adjacent boxes satisfies: 1<= G<= 2*(vertical units).
A vertical unit is the gap between the vertical axis lines in the main matrix. In the image below, the vertical unit is 100.
The image immediately above illustrates a simple sub matrix addition. The units with orange boarders/boxes are sub matrices - the recognized units that form part of my lexicon. You will notice that I have introduced further annotation in my sub matrices. This is because (using the chess analogy), I have two types of pieces I can use on the board. B means a black piece, and W (not shown in the image), represents a white piece. A recognized unit (or lexeme/sub matrix) There is a simple equivalence relation that allows conversion between a white piece and a black piece. This relationship can be used to further decompose a submatrix to use either exclusively black pieces, white pieces or a combination of both.
For the sake of simplicity, I have omitted specifying the equivalence relationship. However, if someone feels that the problem as posed is not "too difficult" without additional details, I shall gladly broaden the scope. For now, I am trying to keep things as simple as possible, to avoid confusing people with "information overload".
Each box in a sub matrix contains a signed integer, indicating a number of units of an item. Each "configuration" of boxes (along with its signed integers and piece type i.e. black or white pieces) is said to be a "recognized unit".
Submatrices can be placed in the main matrix in a way such that they overlap. Wherever the "boxes" overlap, the number of units in the resulting submatrix box is the sum of the number of units in the constituent boxes (as illustrated in the second image above).
The problem becomes slightly difficult because, the "recognized units" defined above themselves are sometimes combined with other "recognized units" to form another "recognized unit" - i.e. the sub matrices (i.e.recognized units) are "holons". For example, in the second image above, the recognized unit being added to the matrix can itself be further decomposed into "smaller" submatrices.
This sort of holarchy is similar to how (in Physical chemistry), elements form compounds, which then go on to form ever more complicated compounds (amino acids, proteins etc).
Back to our problem, given a main matrix M, I want to be able to do the following:
i. identify the submatrices (or recognized units) that are contained within the main matrix. This is the first "matrix decomposition". (Note: a submatrix has to satisfy the criteria given above)
ii. For each identified submatrix, I want to be able to recognize whether it can be decomposed further into 2 or more recognized submatrices. The idea is to iteratively decompose submatrices found in step i above, until either a specified hierarchy level is reached, or until we have a finite set of submatrices that can not be decomposed further.
I am trying to come up with an algorithm to help me do (i) and (ii) above. I will implement the logic in either C++, Python or C# (in increasing level of preference), depending on which ever is the easiest to do and/or in which I happen to get snippets to get me started in implementing the algorithm.
I am not sure if i have a understand correctly the problem.
So first ypu want to find all submatrixes that conform with your 2 criterea.
Thats like a graph decomposition problem or a set coverage problem i think, where you can have a recursive function and iterate the matrix to find all available submatrixes.
enum PieceTypes
{
White,
Black
}
class Box
{
public PieceTypes PieceType { get; set; }
public uint Units { get; set; }
public int s, p;
public Box(PieceTypes piecetype, uint units)
{
PieceType = piecetype;
Units = units;
}
}
class Matrix
{
public Box[,] Boxes;
public int Scale, S, P, MaxNum, MaxDist;
public List<List<Box>> Configurations;
public Matrix(int s, int p, int scale, int maxnum, int maxdist)
{
S = s;
P = p;
Scale = scale;
Boxes = new Box[S, P];
MaxNum = maxnum;
MaxDist = maxdist;
Configurations = new List<List<Box>>();
}
public void Find(List<Box> Config, int s, int p)
{
// Check the max number thats valid for your configuration
// Check that the current p and s are inside matrix
if (Config.Count() < MaxNum && s >= 0 && s < S && p >= 0 && p < P)
{
foreach (Box b in Config)
{
if (Valid(b, Boxes[s, p]))
{
Boxes[s, p].s = s;
Boxes[s, p].p = p;
Config.Add(Boxes[s, p]);
break;
}
}
Find(Config, s + 1, p);
Find(Config, s - 1, p);
Find(Config, s, p + 1);
Find(Config, s, p - 1);
}
if (Config.Count() > 0) Configurations.Add(Config);
Config.Clear();
}
public bool Valid(Box b1, Box b2)
{
// Create your dist funtion here
// or add your extra validation rules like the PieceType
if (Math.Sqrt((b1.s - b2.s) ^ 2 + (b1.p - b2.p) ^ 2) <= MaxDist && b1.PieceType == b2.PieceType) return true;
else return false;
}
}
I haven't used the best data structures and i have simplified the solution. I hope its some way helpful.
Related
I'm trying to get the corners of the following shape:
By corners I mean this (red dots):
The minimum quantity of points that can define this shape.
And I have implemented the following:
public Shape Optimize()
{
// If the vertices are null or empty this can't be executed
if (vertices.IsNullOrEmpty())
return this; // In this case, return the same instance.
if (!edges.IsNullOrEmpty())
edges = null; //Reset edges, because a recalculation was requested
// The corners available on each iteration
var corners = new Point[] { Point.upperLeft, Point.upperRight, Point.downLeft, Point.downRight };
//The idea is to know if any of the following or previous vertice is inside of the the array from upside, if it is true then we can add it.
Point[] vs = vertices.ToArray();
for (int i = 0; i < vertices.Count - 1; ++i)
{
Point backPos = i > 0 ? vs[i - 1] : vs[vertices.Count - 1],
curPos = vs[i], //Punto actual
nextPos = i < vertices.Count - 1 ? vs[i + 1] : vs[0];
// We get the difference point between the actual point and the back & next point
Point backDiff = backPos - curPos,
nextDiff = nextPos - curPos,
totalDiff = nextPos - backPos;
if (corners.Contains(backDiff) || corners.Contains(nextDiff) || corners.Contains(totalDiff))
AddEdge(curPos, center); // If any of the two points are defined in the corners of the point of before or after it means that the actual vertice is a edge/corner
}
return this;
}
This works rectangled shapes, but rotated shapes are very sharp, so, this code doesn't work well:
Blue pixels (in this photo and the following) are the vertices variable processed on Optimize method.
Green pixels are the detected corners/edges (on both photos).
But sharpness in a shape only defines the side inclination, so what can I do to improve this?
Also, I have tested Accord.NET BaseCornersDetector inherited classes, but the best result is obtained with HarrisCornersDetector, but:
Many edges/corners are innecesary, and they aren't in the needed place (see first photo).
Well, after hours of research I found a library called Simplify.NET that internally runs the Ramer–Douglas–Peucker algorithm.
Also, you maybe interested on the Bresenham algorithm, with this algorithm you can draw a line using two Points.
With this algorithm, you can check if your tolerance is too high, comparing the actual points and the points that this algorithm outputs and making some kind of percentage calculator of similarity.
Finally, is interesting to mention Concave Hull and Convex Hull algorithms.
All this is related to Unity3D.
My outputs:
And my implementation.
It's very important to say, that points needs to be sorted forcing them to be connected. If the shape is concave as you can see on the second photo maybe you need to iterate walls of the shape.
You can see an example of implementation here. Thanks to #Bunny83.
I want to create a 2D map of tiles. Example:
Cell[,] cells;
for(int x = 0; x < columns; x++)
{
for(int y = 0; y < rows; y++)
{
cells[x, y] = new Cell();
}
}
The first cell would be at (0|0). What if I want to have this cell as my center and create new cells on the left and top side? These cells would have negative indices.
One way to fix this would be a value that determines the maximum length of one direction. Having a map of 100 tiles per side would place the center of the map at (50|50).
Let's say there would be no hardware limitations and no maximum length per side, what is the best way to create a 2D map with a (0|0) center? I can't image a better way than accessing a cell by its x and y coordinate in a 2D array.
Well, Arrays are logical constructs, not physical ones.
This means that the way we look at the the 0,0 as the top left corner, while might help visualize the content of a 2-D array (and in fact, a 2-D array is also somewhat of a visualization aid), is not accurate at all - the 0,0 "cell" is not a corner, and indexes are not coordinates, though it really helps to understand them when you think about them like they are.
That being said, there is nothing stopping you from creating your own class, that implement an indexer that can take both positive and negative values - in fact, according to Indexers (C# Programming Guide) -
Indexers do not have to be indexed by an integer value; it is up to you how to define the specific look-up mechanism.
Since you are not even obligated to use integers, you most certainly can use both positive and negative values as your indexer.
I was testing an idea to use a list of lists for storage and dynamically calculate the storage index based on the class indexer, but it's getting too late here and I guess I'm too tired to do it right. It's kinda like the solution on the other answer but I was attempting to do it without making you set the final size in the constructor.
Well, you can't use negative indices in an array or list, they're just not the right structure for a problem like this... You could, however, write your own class that handles something like this.
Simply pass in the size of the grid into the constructor, and then use the index operator to return a value based off of an an adjusted index... Something like this... Wrote it up really fast, so it probably isn't ideal in terms of optimization.
public class Grid<T> {
T[,] grid { get; }
int adjustment { get; }
int FindIndex(int provided) {
return provided + adjustment;
}
public Grid(int dimension) {
if (dimension <= 0)
throw new ArgumentException("Grid dimension cannot be <= 0");
if (dimension % 2 != 0)
throw new ArgumentException("Grid must be evenly divisible");
adjustment = dimension / 2;
grid = new T[dimension, dimension];
}
public T this[int key, int key2] {
get {
return grid[FindIndex(key), FindIndex(key2)];
}
set {
grid[FindIndex(key), FindIndex(key2)] = value;
}
}
}
I used these to test it:
var grid = new Grid<int>(100);
grid[-50, -50] = 5;
grid[0, 1] = 10;
You can just switch it to:
var grid = new Grid<Cell>(100);
This only works for a grid with equal dimensions... If you need separate dimensions, you'll need to adjust the constructor and the FindIndex method.
I think that an infinitely sized grid would be dangerous. If you increase the size to the right, you'd have to reposition the center.. Which means, what you think will be at 0,0 will now be shifted as the grid is no longer properly centered.
Additionally, performance of such a structure would be a nightmare as you cannot rely on an array to be infinite (as it inherently isn't). So you'd either have to continuously copy the array (like how a list works) or use a linked list.. If using a linked list, you would have to do enormous amounts of iteration to get whatever value you want.
I have implemented a version of the AdaBoost boosting algorithm, where I use decision stumps as weak learners. However often I find that after training the AdaBoost algorithm, a series of weak learners is created, such that this series is recurring in the whole set. For example, after training, the set of weak learners looks like A,B,C,D,E,D,E,D,E,D,E,F,E,D,E,D,E etc.
I believe I am updating the weights of the data properly after each assignment of a new weak learner. Here I classify each data point and then set the weight of this data point.
// After we have chosen the weak learner which reduces the weighted sum error by the most, we need to update the weights of each data point.
double sumWeights = 0.0f; // This is our normalisation value so we can normalise the weights after we have finished updating them
foreach (DataPoint dataP in trainData) {
int y = dataP.getY(); // Where Y is the desired output
Object[] x = dataP.getX();
// Classify the data input using the weak learner. Then check to see if this classification is correct/incorrect and adjust the weights accordingly.
int classified = newLearner.classify(x);
dataP.updateWeight(y, finalLearners[algorithmIt].getAlpha(), classified);
sumWeights += dataP.getWeight();
}
Here is my classify method in the WeakLearner class
// Method in the WeakLearner class
public int classify(Object[] xs) {
if (xs[splitFeature].Equals(splitValue))
return 1;
else return -1;
}
Then I have a method which updates the weight of a DataPoint
public void updateWeight(int y, double alpha, int classified) {
weight = (weight * (Math.Pow(e, (-y * alpha * classified))));
}
And I'm not sure why this is happening, are there any common factors why the same weak learners would generally be chosen?
You could increase the value of alpha and check. Maybe, not enough weight is being given to the misclassified samples, hence ,they are showing up again and again.
For each word I am creating an object of LocationTextExtractionStrategy class to get its coordinates but the problem is each time I pass a word it is returning coordinates of all the chunks of that word present in pdf. How can i get coordinates of the word present at specific position or in a specific line?
I found a code somewhere
namespace PDFAnnotater
{
public class RectAndText
{
public iTextSharp.text.Rectangle Rect;
public string Text;
public RectAndText(iTextSharp.text.Rectangle rect, string text)
{
this.Rect = rect;
this.Text = text;
}
}
public class MyLocationTextExtractionStrategy : LocationTextExtractionStrategy
{
public List<RectAndText> myPoints = new List<RectAndText>();
public string TextToSearchFor { get; set; }
public System.Globalization.CompareOptions CompareOptions { get; set; }
public MyLocationTextExtractionStrategy(string textToSearchFor, System.Globalization.CompareOptions compareOptions = System.Globalization.CompareOptions.None)
{
this.TextToSearchFor = textToSearchFor;
this.CompareOptions = compareOptions;
}
public override void RenderText(TextRenderInfo renderInfo)
{
base.RenderText(renderInfo);
var startPosition = System.Globalization.CultureInfo.CurrentCulture.CompareInfo.IndexOf(renderInfo.GetText(), this.TextToSearchFor, this.CompareOptions);
//If not found bail
if (startPosition < 0)
{
return;
}
var chars = renderInfo.GetCharacterRenderInfos().Skip(startPosition).Take(this.TextToSearchFor.Length).ToList();
//Grab the first and last character
var firstChar = chars.First();
var lastChar = chars.Last();
//Get the bounding box for the chunk of text
var bottomLeft = firstChar.GetDescentLine().GetStartPoint();
var topRight = lastChar.GetAscentLine().GetEndPoint();
//Create a rectangle from it
var rect = new iTextSharp.text.Rectangle(
bottomLeft[Vector.I1],
bottomLeft[Vector.I2],
topRight[Vector.I1],
topRight[Vector.I2]
);
this.myPoints.Add(new RectAndText(rect, this.TextToSearchFor));
}
}
}
I am passing words from an array to check for its coordinates. The problem is that RenderText() method is automatically called again and again for each chunk and returns the list of coordinates of the word present at different places in the pdf. For example if i need coordinate of '0' it is returning 23 coordinates. What should I do or modify in the code to get the exact coordinate of the word?
Your question is a bit confusing.
How can I get coordinates of the word present at specific position
In that statement you're basically saying "How can I get the coordinates of something that I already know the coordinates of?" Which is redundant.
I'm going to interpret your question as "How can I get the coordinates of a word, if I know the approximate location?"
I'm not familiar with C#, but I assume there are methods similar to the ones in Java for working with Rectangle objects.
Rectangle#intersects(Rectangle other)
Determines whether or not this Rectangle and the specified Rectangle intersect.
and
Rectangle#contains(Rectangle other)
Tests if the interior of the Shape entirely contains the specified Rectangle2D.
Then the code becomes trivially easy.
You use LocationTextExtractionStrategy to fetch all the iText based rectangles
you convert them to native rectangle objects (or write your own class)
for every rectangle you test whether the given search region contains that rectangle, keeping only those that are within the search region
If you want to implement your second use-case (getting the location of a word if you know the line) then there are two options:
you know the rough coordinates of the line
you want this to work given a line number
For option 1:
build a search region. Use the bounds of the page to get an idea of the width (since the line could stretch over the entire width), and add some margin y)-coordinates (to account for font differences, subscript and superscript, etc)
Now that you have a search region, this reverts to my earlier answer.
For option 2:
you already have the y coordinate of every word
round those (to the nearest multiple of fontsize)
build a Map where you keep track of how many times a certain y-coordinate is used
remove any statistical outliers
put all these values in a List
sort the list
This should give you a rough idea of where you can expect a given line(number) to be.
Of course, similar to my earlier explanation, you will need to take into account some padding and some degree of flexibility to get the right answer.
I have a function where I try to find a matching Point between 2 collections of 4 Points each, but sometimes the function reports the collections do not share a common Point even though in the debugger I see they do. is the debugger not showing me the full precision of the points so I do not see the difference? or is there something else going on here? here's the code to blame:
public static Point CorrectForAllowedDrawArea(Point previousDisplayLocation, Point newDisplayLocation, Rect displayLimitedArea, Rect newBoundingBox)
{
// get area that encloses both rectangles
Rect enclosingRect = Rect.Union(displayLimitedArea, newBoundingBox);
// get corners of outer rectangle, index matters for getting opposite corner
var outsideCorners = new[] { enclosingRect.TopLeft, enclosingRect.TopRight, enclosingRect.BottomRight, enclosingRect.BottomLeft }.ToList();
// get corners of inner rectangle
var insideCorners = new[] { displayLimitedArea.TopLeft, displayLimitedArea.TopRight, displayLimitedArea.BottomRight, displayLimitedArea.BottomLeft }.ToList();
// get the first found corner that both rectangles share
Point sharedCorner = outsideCorners.FirstOrDefault((corner) => insideCorners.Contains(corner));
// find the index of the opposite corner
int oppositeCornerIndex = (outsideCorners.IndexOf(sharedCorner) + 2) % 4;
on the last line 'sharedCorner' is sometimes set to default(Point) even though both Point collections appear to share 1 Point.
EDIT: I should mention if I place the debugger back to the top of the function and restart it still does not find the matching point. I should also mention that this function uses the Point class of the System.Windows namespace and not of the System.Drawing namespace! Thanks for pointing this out to me in the comments.
We really need to see what the definition of insideCorners.Contains(corner) is, but I suspect that your problem is due to the inherent inaccuracies with floating point numbers.
You cannot compare two floating point values like this:
if (a == b)
{
// Values are equal
}
especially if either a or b are calculated values.
You'll need to implement something along the lines of:
if (Math.Abs(a - b) < some_small_value)
{
// Values are equal
}