sorting strings containing numbers and letters

sorting strings containing numbers and letters - c#

I need to sort a list of scene numbers which are in fact a list of string and contain numbers and letters.
this is the list
101-11
102-1
101-10
101-8
103-10
101-8A
101-9
103-4
103-4B
I've made following a Comparer
public class SceneComparer : IComparer
{
private readonly Regex sceneRegEx = new Regex(#"(\D*)(\w*)", RegexOptions.Compiled);
public int Compare(object x, object y)
{
Scene sceneX = x as Scene;
Scene sceneY = y as Scene;
var firstSceneMatch = this.sceneRegEx.Match(sceneX.SceneNumber);
var firstSceneNumeric = Convert.ToInt32(firstSceneMatch.Groups[1].Value);
var firstSceneAlpha = firstSceneMatch.Groups[2].Value;
var secondSceneMatch = this.sceneRegEx.Match(sceneY.SceneNumber);
var secondSceneNumeric = Convert.ToInt32(secondSceneMatch.Groups[1].Value);
var secondSceneAlpha = secondSceneMatch.Groups[2].Value;
if (firstSceneNumeric < secondSceneNumeric)
{
return -1;
}
if (firstSceneNumeric > secondSceneNumeric)
{
return 1;
}
return string.CompareOrdinal(firstSceneAlpha, secondSceneAlpha);
}
}
Which gives me following result
101-8
101-8A
101-9
102-1
103-4
103-4B
101-10
101-11
103-10
It looks like it's only sorting the first number before the dash and the alphanumeric but it doesn't sort the number after the dash to get following desired result.
101-8
101-8A
101-9
101-10
101-11
102-1
103-4
103-4B
103-10
Any idea on how to get the desired result.

You have to Split the string and extract numbers in it to request sort on those fields.
// Assuming you have this...
List<string> list = new List<string>()
{
"101-11",
"102-1",
"101-10",
"101-8",
"103-10",
"101-8A",
"101-9",
"103-4",
"103-4B"
};
// You could do this.
var result = list.Select(x=>
{
var splits = x.Split('-');
return new
{
first = int.Parse(splits[0]),
second = int.Parse(Regex.Match(splits[1], #"\d+").Value),
item =x
};
})
.OrderBy(x=>x.first)
.ThenBy(x=>x.second)
.Select(x=>x.item)
.ToList();
Check this Demo

You are very close. Use Matches instead of Match.
Also, correct your Regex as \D captures non-digit characters.
public class SceneComparer : IComparer
{
private readonly Regex sceneRegEx = new Regex(#"(\d+)(\w+)?", RegexOptions.Compiled);
public int Compare(object x, object y)
{
Scene sceneX = x as Scene;
Scene sceneY = y as Scene;
var firstSceneMatches = this.sceneRegEx.Matches(sceneX.SceneNumber);
var secondSceneMatches = this.sceneRegEx.Matches(sceneY.SceneNumber);
var matchesCount = Math.Min(firstSceneMatches.Count, secondSceneMatches.Count);
for (var i = 0; i < matchesCount; i++)
{
var firstSceneMatch = firstSceneMatches[i];
var secondSceneMatch = secondSceneMatches[i];
var firstSceneNumeric = Convert.ToInt32(firstSceneMatch.Groups[1].Value);
var secondSceneNumeric = Convert.ToInt32(secondSceneMatch.Groups[1].Value);
if (firstSceneNumeric != secondSceneNumeric)
{
return firstSceneNumeric - secondSceneNumeric;
}
var firstSceneAlpha = firstSceneMatch.Groups[2].Value;
var secondSceneAlpha = secondSceneMatch.Groups[2].Value;
if (firstSceneAlpha != secondSceneAlpha)
{
return string.CompareOrdinal(firstSceneAlpha, secondSceneAlpha);
}
}
return firstSceneMatches.Count - secondSceneMatches.Count;
}
}

Related

C# examining and replacing tuple values based on other tuple

I'm starting with programming and C# and I have two tuples. One tuple is representing a list of points:
static List<(string, string, string)> PR { get; set; } = new List<(string, string, string)>()
{
("P1", "0", "0"),
("P2", "P1", "P1+Height"),
("P3", "P1+Width", "P2"),
("P4", "P3", "P3+Height")
};
where Item1 in the list of tuples stands for a Point name (P1, P2, P3, P4) and Item2 and Item3 represent a parametric formula for respectively the x- and y-value of a point.
"P1" in the second item in the above list should look for the tuple starting with "P1", and then for the second item in that tuple, in this case, 0.
I have a second list of tuples that represent the parameters that I need to calculate the above point values.
static List<(string, double)> PAR { get; set; } = new List<(string, double)>()
{
("Height", 500),
("Width", 1000)
};
Say I want to calculate the value of the parametric formula "P3+Height" as follows:
P3+Height --> P2 (+Height) --> P1+Height (+Height) --> 0 (+Height (+Height) --> 0 + Height + Height;
In the end I want to replace the parameter strings with the actual values (0 + 500 + 500 -> P3+Height = 1000) but thats of later concern.
Question: I'm trying to make a function that recursively evaluates the list of tuples and keeps the parameter names, but also looks for the corresponding tuple until we reach an end or exit situation. This is where I'm at now but I have a hard time getting my thought process in actual working code:
static void Main(string[] args)
{
//inputString = "P3+Height"
string inputString = PR[3].Item3;
string[] returnedString = GetParameterString(inputString);
#region READLINE
Console.ReadLine();
#endregion
}
private static string[] GetParameterString(string inputString)
{
string[] stringToEvaluate = SplitInputString(inputString);
for (int i = 0; i < stringToEvaluate.Length; i++)
{
//--EXIT CONDITION
if (stringToEvaluate[0] == "P1")
{
stringToEvaluate[i] = "0";
}
else
{
if (i % 2 == 0)
{
//Check if parameters[i] is point string
var value = PAR.Find(p => p.Item1.Equals(stringToEvaluate[i]));
//Check if parameters[i] is double string
if (double.TryParse(stringToEvaluate[i], out double result))
{
stringToEvaluate[i] = result.ToString();
}
else if (value == default)
{
//We have a point identifier
var relatingPR = PR.Find(val => val.Item1.Equals(stringToEvaluate[i])).Item2;
//stringToEvaluate[i] = PR.Find(val => val.Item1.Equals(pointId)).Item2;
stringToEvaluate = SearchParameterString(relatingPR);
}
else
{
//We have a parameter identifier
stringToEvaluate[i] = value.Item2.ToString();
}
}
}
}
return stringToEvaluate;
}
private static string[] SplitInputString(string inputString)
{
string[] splittedString;
splittedString = Regex.Split(inputString, Delimiters);
return splittedString;
}
Can anyone point me in the right direction of how this could be done with either recursion or some other, better, easier way?
In the end, I need to get a list of tuples like this:
("P1", "0", "0"),
("P2", "0", "500"),
("P3", "1000", "500"),
("P4", "1000", "1000")
Thanks in advance!

I wrote something that does this - I changed a bit of the structure to simplify the code and runtime, but it still returns the tuple you expect:
// first I used dictionaries so we can search for the corresponding value efftiantly:
static Dictionary<string, (string width, string height)> PR { get; set; } =
new Dictionary<string, (string width, string height)>()
{
{ "P1", ("0", "0") },
{ "P2", ("P1", "P1+Height")},
{ "P3", ("P1+Width", "P2") },
{ "P4", ("P3", "P3+Height") }
};
static Dictionary<string, double> PAR { get; set; } = new Dictionary<string, double>()
{
{ "Height", 500 },
{ "Width", 1000 }
};
static void Main(string[] args)
{
// we want to "translate" each of the values height and width values
List<(string, string, string)> res = new List<(string, string, string)>();
foreach (var curr in PR)
{
// To keep the code generic we want the same code to run for height and width-
// but for functionality reasons we need to know which it is - so sent it as a parameter.
res.Add((curr.Key,
GetParameterVal(curr.Value.width, false).ToString(),
GetParameterVal(curr.Value.height, true).ToString()));
}
#region READLINE
Console.ReadLine();
#endregion
}
private static double GetParameterVal(string inputString, bool isHeight)
{
// for now the only delimiter is + - adapt this and the code when \ as needed
// this will split string with the delimiter ("+height", "-500", etc..)
string[] delimiters = new string[] { "\\+", "\\-" };
string[] stringToEvaluate =
Regex.Split(inputString, string.Format("(?=[{0}])", string.Join("", delimiters)));
// now we want to sum up each "part" of the string
var sum = stringToEvaluate.Select(x =>
{
double result;
int factor = 1;
// this will split from the delimiter, we will use it as a factor,
// ["+", "height"], ["-", "500"] etc..
string[] splitFromDelimiter=
Regex.Split(x, string.Format("(?<=[{0}])", string.Join("|", delimiters)));
if (splitFromDelimiter.Length > 1) {
factor = int.Parse(string.Format($"{splitFromDelimiter[0]}1"));
x = splitFromDelimiter[1];
}
if (PR.ContainsKey(x))
{
// if we need to continue recursively translate
result = GetParameterVal(isHeight ? PR[x].height : PR[x].width, isHeight);
}
else if (PAR.ContainsKey(x))
{
// exit condition
result = PAR[x];
}
else
{
// just in case we didnt find something - we should get here
result = 0;
}
return factor * result;
}).Sum();
return sum;
}
}
}
I didnt add any validity checks, and if a value wasn't found it recieves a val of 0, but go ahead and adapt it to your needs..

Here a a working example for your question... It took me a lot of time so I hope you appreciate it: The whole code is comented line by line. If you have any question do not hesitate to ask me !
First of all we create a class named myEntry that will represent an entry. The name has to be unique e.g P1, P2, P3
public class myEntry
{
public string Name { get; private set; } //this field should be unique
public object Height { get; set; } //Can contain a reference to another entry or a value also as many combinations of those as you want.
// They have to be separated with a +
public object Width { get; set; } //same as for height here
public myEntry(string name, object height, object width)
{
//Set values
this.Name = name;
this.Height = height;
this.Width = width;
}
}
Now I create a dummy Exception class for an exception in a further class (you will see the use of this further on. Just ignore it for now)
public class UnknownEntry : Exception
{
//Create a new Class that represents an exception
}
Now we create the important class that will handle the entries and do all the work for us. This might look complicated but if you don't want to spend time understanding it you can just copy paste it, its a working solution!
public class EntryHolder
{
private Dictionary<string, double> _par = new Dictionary<string, double>(); //Dictionary holding our known variables
private List<myEntry> _entries; //List holding our entries
public EntryHolder()
{
_entries = new List<myEntry>(); //Create list
//Populate dictionary
_par.Add("Height", 500);
_par.Add("Width", 1000);
}
public bool Add(myEntry entry)
{
var otherEntry = _entries.FirstOrDefault(x => x.Name.Equals(entry.Name)); //Get entry with same name
if(otherEntry != null)
{
//Entry with the same name as another entry
//throw new DuplicateNameException(); //Throw an exception if you want
return false; //or just return false
}
//Entry to add is valid
_entries.Add(entry); //Add entry
return true; //return success
}
public void Add(List<myEntry> entries)
{
foreach (var entry in entries) //Loop through entries
{
Add(entry);
}
}
public myEntry GetEntry(string uniqueName)
{
var entry = GetRawEntry(uniqueName); //Get raw entry
var heightToCalculate = entry.Height.ToString(); //Height to calculate to string
var widthToCalculate = entry.Width.ToString(); //Width to calculate to string
entry.Height = Calculate(heightToCalculate, true); //Calculate height
entry.Width = Calculate(widthToCalculate, false); //Calculate width
return entry; //return entry
}
public List<myEntry> CalculateAllEntries()
{
List<myEntry> toReturn = new List<myEntry>(); //Create list that we will return after the calculation finished
foreach (var entryToCalculate in _entries) //Loop through all entries
{
toReturn.Add(GetEntry(entryToCalculate.Name)); //calculate entry values and add them to the list we will return after
}
return toReturn; //return list after the whole calculation finished
}
private double Calculate(string toCalculate, bool isHeight)
{
if (!toCalculate.Contains("+"))
{
//String doesn't contain a + that means it has to be a number or a key in our dictionary
object toConvert = toCalculate; //Set the object we want to convert to double
var entryCorrespondingToThisValue = _entries.FirstOrDefault(x => x.Name.Equals(toCalculate)); //Check if the string is a reference to another entry
if (entryCorrespondingToThisValue != null)
{
//It is the name of another object
toConvert = isHeight ? entryCorrespondingToThisValue.Height : entryCorrespondingToThisValue.Width; //Set object to convert to the height or width of the object in entries
}
try
{
return Convert.ToDouble(toConvert); //try to convert and return if success
}
catch (Exception e)
{
//the given height object has the wrong format
//Format: (x + Y + z ...)
throw new FormatException();
}
}
//Contains some +
var spitedString = toCalculate.Split(new char[] {'+'}); //Split
double sum = 0d; //Whole sum
foreach (var splited in spitedString) //Loop through all elements
{
double toAdd = 0; //To add default = 0
if (_par.ContainsKey(splited)) //Check if 'splited' is a key in the dictionary
{
//part of object is in the par dictionary so we get the value of it
toAdd = _par[splited]; //get value corresponding to key in dictionary
}
else
{
//'splited' is not a key in the dictionary
object toConvert = splited; //set object to convert
var entryCorrespondingToThisValue = _entries.FirstOrDefault(x => x.Name.Equals(splited)); //Does entries contain a reference to this value
if (entryCorrespondingToThisValue != null)
{
//It is a reference
toConvert = isHeight ? entryCorrespondingToThisValue.Height : entryCorrespondingToThisValue.Width; //Set to convert to references height or width
}
try
{
toAdd = Convert.ToDouble(toConvert); //Try to convert object
}
catch (Exception e)
{
//A part of the given height is not a number or is known in the par dictionary
throw new FormatException();
}
}
sum += toAdd; //Add after one iteration
}
return sum; //return whole sum
}
public myEntry GetRawEntry(string uniqueName)
{
var rawEntry = _entries.FirstOrDefault(x => x.Name.Equals(uniqueName)); //Check for entry in entries by name (unique)
if (rawEntry == null)
{
//Entry is not in the list holding all entries
throw new UnknownEntry(); //throw an exception
return null; //Or just return null
}
return rawEntry; //return entry
}
}
And here the end, the test and prove that it works:
public void TestIt()
{
List<myEntry> entries = new List<myEntry>()
{
new myEntry("P1", 0, 0),
new myEntry("P2", "P1", "P1+Height"),
new myEntry("P3", "P1+Height", "P2"),
new myEntry("P4", "P3", "P3+Height"),
};
EntryHolder myEntryHolder = new EntryHolder();
myEntryHolder.Add(entries);
var calculatedEntries = myEntryHolder.CalculateAllEntries();
}
Here an image of how it looks like:

C# Dictionary, Sum of value from one key?

I have a Dictionary and use it as a save game.
public Dictionary<string, inventoryvars> inventar = new Dictionary<string, inventoryvars>();
public bool Additem(string Planetname, int WWlvl, int AKlvl, int BaGebLvl, int UwLvl, int Exlvl)
{
inventoryvars ip = new inventoryvars();
if (!inventar.ContainsKey(Planetname))
{
ip.name = Name;
ip.WWlvl = WWlvl;
ip.AKlvl = AKlvl;
ip.UwLvl = UwLvl;
ip.Exlvl = Exlvl;
inventar.Add(name, ip);
return true;
}
else
{
inventar[name].anzahl += 1;
return true;
}
return true;
}
Now i need to get the sum of all Exlvl. Lets say, there are 5 items, every item has Exlvl with a different value.
Sorry for my english, it's not my first language.
The Solution is: inventar.Sum(x => x.Value.BaGebLvl);
Thanks everybody!

You can use the Values property to get all instances of inventoryvars and use LINQ Sum() against them
var result = inventar.Values.Sum(x => x.Exlvl)
(OR)
var result = inventar.Sum(x => x.Value.Exlvl)

you can make a funtion that return the sum of keys in your collection?
public int sumOfKeysValues(public Dictionary input)
{ int result = 0;
foreach(var item in input)
{
var key = (inventoryvars)item.key;
result += key.Exlvl;
}
}

Check if an stl file may contain two models

An stl file may contain 2 3D models. Is there any way I can detect if there are 2 or more models stored in one stl file?
In my current code, it can detect that there are 2 models in the example, but there are instances that it detects a lot of model even though it only has one.
The Triangle class structure has Vertices that contains 3 points (x, y, z)..
Sample STL File:
EDIT: Using #Gebb's answer this is how I implemented it:
private int GetNumberOfModels(List<TopoVertex> vertices)
{
Vertex[][] triangles = new Vertex[vertices.Count() / 3][];
int vertIdx = 0;
for(int i = 0; i < vertices.Count() / 3; i++)
{
Vertex v1 = new Vertex(vertices[vertIdx].pos.x, vertices[vertIdx].pos.y, vertices[vertIdx].pos.z);
Vertex v2 = new Vertex(vertices[vertIdx + 1].pos.x, vertices[vertIdx + 1].pos.y, vertices[vertIdx + 1].pos.z);
Vertex v3 = new Vertex(vertices[vertIdx + 2].pos.x, vertices[vertIdx + 2].pos.y, vertices[vertIdx + 2].pos.z);
triangles[i] = new Vertex[] { v1, v2, v3 };
vertIdx += 3;
}
var uniqueVertices = new HashSet<Vertex>(triangles.SelectMany(t => t));
int vertexCount = uniqueVertices.Count;
// The DisjointUnionSets class works with integers, so we need a map from vertex
// to integer (its id).
Dictionary<Vertex, int> indexedVertices = uniqueVertices
.Zip(
Enumerable.Range(0, vertexCount),
(v, i) => new { v, i })
.ToDictionary(vi => vi.v, vi => vi.i);
int[][] indexedTriangles =
triangles
.Select(t => t.Select(v => indexedVertices[v]).ToArray())
.ToArray();
var du = new XYZ.view.wpf.DisjointUnionSets(vertexCount);
// Iterate over the "triangles" consisting of vertex ids.
foreach (int[] triangle in indexedTriangles)
{
int vertex0 = triangle[0];
// Mark 0-th vertexes connected component as connected to those of all other vertices.
foreach (int v in triangle.Skip(1))
{
du.Union(vertex0, v);
}
}
var connectedComponents =
new HashSet<int>(Enumerable.Range(0, vertexCount).Select(x => du.Find(x)));
return connectedComponents.Count;
}
In some cases, it produces the correct output, but for the example image above, it outputs 3 instead of 2. I am now trying to optimize the snippet #Gebb gave to use float values since I believe that the floating points are necessary to the comparisons. Does anyone have a way to do that as well? Maybe I need another perspective.

You could do this by representing vertices and connections between them as a graph and finding the number of connected components of the graph with the help of the Disjoint-set data structure.
using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
using Vertex = System.ValueTuple<double,double,double>;
namespace UnionFindSample
{
internal class DisjointUnionSets
{
private readonly int _n;
private readonly int[] _rank;
private readonly int[] _parent;
public DisjointUnionSets(int n)
{
_rank = new int[n];
_parent = new int[n];
_n = n;
MakeSet();
}
// Creates n sets with single item in each
public void MakeSet()
{
for (var i = 0; i < _n; i++)
// Initially, all elements are in
// their own set.
_parent[i] = i;
}
// Finds the representative of the set
// that x is an element of.
public int Find(int x)
{
if (_parent[x] != x)
{
// if x is not the parent of itself, then x is not the representative of
// his set.
// We do the path compression by moving x’s node directly under the representative
// of this set.
_parent[x] = Find(_parent[x]);
}
return _parent[x];
}
// Unites the set that includes x and
// the set that includes x
public void Union(int x, int y)
{
// Find representatives of two sets.
int xRoot = Find(x), yRoot = Find(y);
// Elements are in the same set, no need to unite anything.
if (xRoot == yRoot)
{
return;
}
if (_rank[xRoot] < _rank[yRoot])
{
// Then move x under y so that depth of tree remains equal to _rank[yRoot].
_parent[xRoot] = yRoot;
}
else if (_rank[yRoot] < _rank[xRoot])
{
// Then move y under x so that depth of tree remains equal to _rank[xRoot].
_parent[yRoot] = xRoot;
}
else
{
// if ranks are the same
// then move y under x (doesn't matter which one goes where).
_parent[yRoot] = xRoot;
// And increment the result tree's
// rank by 1
_rank[xRoot] = _rank[xRoot] + 1;
}
}
}
internal class Program
{
private static void Main(string[] args)
{
string file = args[0];
Vertex[][] triangles = ParseStl(file);
var uniqueVertices = new HashSet<Vertex>(triangles.SelectMany(t => t));
int vertexCount = uniqueVertices.Count;
// The DisjointUnionSets class works with integers, so we need a map from vertex
// to integer (its id).
Dictionary<Vertex, int> indexedVertices = uniqueVertices
.Zip(
Enumerable.Range(0, vertexCount),
(v, i) => new {v, i})
.ToDictionary(vi => vi.v, vi => vi.i);
int[][] indexedTriangles =
triangles
.Select(t => t.Select(v => indexedVertices[v]).ToArray())
.ToArray();
var du = new DisjointUnionSets(vertexCount);
// Iterate over the "triangles" consisting of vertex ids.
foreach (int[] triangle in indexedTriangles)
{
int vertex0 = triangle[0];
// Mark 0-th vertexes connected component as connected to those of all other vertices.
foreach (int v in triangle.Skip(1))
{
du.Union(vertex0, v);
}
}
var connectedComponents =
new HashSet<int>(Enumerable.Range(0, vertexCount).Select(x => du.Find(x)));
int count = connectedComponents.Count;
Console.WriteLine($"Number of connected components: {count}.");
var groups = triangles.GroupBy(t => du.Find(indexedVertices[t[0]]));
foreach (IGrouping<int, Vertex[]> g in groups)
{
Console.WriteLine($"Group id={g.Key}:");
foreach (Vertex[] triangle in g)
{
string tr = string.Join(' ', triangle);
Console.WriteLine($"\t{tr}");
}
}
}
private static Regex _triangleStart = new Regex(#"^\s+outer loop");
private static Regex _triangleEnd = new Regex(#"^\s+endloop");
private static Regex _vertex = new Regex(#"^\s+vertex\s+(\S+)\s+(\S+)\s+(\S+)");
private static Vertex[][] ParseStl(string file)
{
double ParseCoordinate(GroupCollection gs, int i) =>
double.Parse(gs[i].Captures[0].Value, CultureInfo.InvariantCulture);
var triangles = new List<Vertex[]>();
bool isInsideTriangle = false;
List<Vertex> triangle = new List<Vertex>();
foreach (string line in File.ReadAllLines(file))
{
if (isInsideTriangle)
{
if (_triangleEnd.IsMatch(line))
{
isInsideTriangle = false;
triangles.Add(triangle.ToArray());
triangle = new List<Vertex>();
continue;
}
Match vMatch = _vertex.Match(line);
if (vMatch.Success)
{
double x1 = ParseCoordinate(vMatch.Groups, 1);
double x2 = ParseCoordinate(vMatch.Groups, 2);
double x3 = ParseCoordinate(vMatch.Groups, 3);
triangle.Add((x1, x2, x3));
}
}
else
{
if (_triangleStart.IsMatch(line))
{
isInsideTriangle = true;
}
}
}
return triangles.ToArray();
}
}
}
I'm also using the fact that System.ValueTuple implements Equals and GetHashCode in an appropriate way, so we can easily compare vertices (this is used implicitly by HashSet) and use them as keys in a dictionary.

Substract shortest string containing all search criteria

I have a problem to solve where given a string source and a collection of search criteria criteria, the algorithm has to return the shortest possible substring of source that contains all items of criteria.
=================================
UPDATE
The same search criteria might be in the source string multiple
times. In that case, it is required to return the sub-string
containing the particular instance of the search criteria such that
it is the shortest among all possible sub-strings.
The search items can contain spaces in them such as hello world
The order in which the search criteria are found does not matter as long as they are all in the resultant sub-string
==================================
String source = "aaa wwwww fgffsd ththththt sss sgsgsgsghs bfbfb hhh sdfg kkk dhdhtrherhrhrthrthrt ddfhdetehehe kkk wdwd aaa vcvc hhh zxzx sss nbnbn";
List<String> criteria = new List<string> { "kkk", "aaa", "sss", "hhh" };
The input above should return the following substring: kkk wdwd aaa vcvc hhh zxzx sss
Unfortunately, I spent a lot of time trying to write such an algorithm but I couldn't get it just right. Below is the code I have got so far:
public struct Extraction
{
public int Start { get; set; }
public int End { get; set; }
public int Length
{
get
{
var length = this.End - this.Start;
return length;
}
}
public Extraction(int start, int end)
{
this.Start = start;
this.End = end;
}
}
public class TextExtractor
{
private String _source;
private Dictionary<String, List<Int32>> _criteriaIndexes;
private Dictionary<String, int> _entryIndex;
public TextExtractor(String source, List<String> searchCriteria)
{
this._source = source;
this._criteriaIndexes = this.ExtractIndexes(source, searchCriteria);
this._entryIndex = _criteriaIndexes.ToDictionary(x => x.Key, v => 0);
}
public String Extract()
{
List<Extraction> possibleExtractions = new List<Extraction>();
int index = 0;
int min = int.MaxValue;
int max = 0;
bool shouldStop = false;
while (index < _criteriaIndexes.Count && !shouldStop)
{
Boolean compareWithAll = index == _criteriaIndexes.Count - 1;
if (!compareWithAll)
{
var current = _criteriaIndexes.ElementAt(index);
this.CalculateMinMax(current, ref min, ref max);
index++;
}
else
{
var entry = _criteriaIndexes.Last();
while (_entryIndex[entry.Key] < entry.Value.Count)
{
int a = min;
int b = max;
this.CalculateMinMax(entry, ref a, ref b);
_entryIndex[entry.Key]++;
Extraction ext = new Extraction(a, b);
possibleExtractions.Add(ext);
}
int k = index - 1;
while (k >= 0)
{
var prev = _criteriaIndexes.ElementAt(k);
if (prev.Value.Count - 1 > _entryIndex[prev.Key])
{
_entryIndex[prev.Key]++;
break;
}
else
{
k--;
}
}
shouldStop = _criteriaIndexes.All(x => x.Value.Count - 1 <= _entryIndex[x.Key]);
_entryIndex[entry.Key] = 0;
index = 0;
min = int.MaxValue;
max = 0;
}
}
Extraction shortest = possibleExtractions.First(x => x.Length.Equals(possibleExtractions.Min(p => p.Length)));
String result = _source.Substring(shortest.Start, shortest.Length);
return result;
}
private Dictionary<String, List<Int32>> ExtractIndexes(String source, List<String> searchCriteria)
{
Dictionary<String, List<Int32>> result = new Dictionary<string, List<int>>();
foreach (var criteria in searchCriteria)
{
Int32 i = 0;
Int32 startingIndex = 0;
var indexes = new List<int>();
while (i > -1)
{
i = source.IndexOf(criteria, startingIndex);
if (i > -1)
{
startingIndex = i + 1;
indexes.Add(i);
}
}
if (indexes.Any())
{
result.Add(criteria, indexes);
}
}
return result;
}
private void CalculateMinMax(KeyValuePair<String, List<int>> current, ref int min, ref int max)
{
int j = current.Value[_entryIndex[current.Key]];
if (j < min)
{
min = j;
}
int indexPlusWordLength = j + current.Key.Length;
if (indexPlusWordLength > max)
{
max = indexPlusWordLength;
}
}
}
I would appreciate it if someone could point out where did I go wrong in my algorithm. Moreover, I kinda feel this is a very naive implementation. Maybe there is a better approach to solve this problem than trying to try out combinations of indexes?
Thanks!

This is a much simpler algorithm that will give you the shortest substring.
void Main()
{
String source = "aaa wwwww fgffsd ththththt sss ww sgsgsgsghs bfbfb hhh sdfg kkk " +
"dhdhtrherhrhrthrthrt ddfhdetehehe kkk wdwd aaa vcvc hhh zxzx sss ww nbnbn";
List<String> criteria = new List<string> { "kkk", "aaa", "sss ww", "hhh" };
var result = GetAllSubstringContainingCriteria(source, criteria)
.OrderBy(sub => sub.Length).FirstOrDefault();
// result is "kkk wdwd aaa vcvc hhh zxzx sss ww"
}
private IEnumerable<string> GetAllSubstringContainingCriteria(
string source, List<string> criteria)
{
for (int i = 0; i < source.Length; i++)
{
var subString = source.Substring(i);
if (criteria.Any(crit => subString.StartsWith(crit)))
{
var lastWordIndex =
GetLastCharacterIndexFromLastCriteriaInSubstring(subString, criteria);
if (lastWordIndex >= 0)
yield return string.Join(" ", subString.Substring(0, lastWordIndex));
}
else
continue;
}
}
private int GetLastCharacterIndexFromLastCriteriaInSubstring(
string subString, List<string> criteria)
{
var results = criteria.Select(crit => new {
index = subString.IndexOf(crit),
criteria = crit});
return results.All(result => result.index >= 0)
? results.Select(result => result.index + result.criteria.Length).Max()
: -1;
}

Let the Java built-in classes do the work. How about converting your criteria to a regular expression Pattern. If the criteria are X or Y or Z . . ., convert this into a regular expression of the form "(X)|(Y)|(Z)|...", compile it, and execute it against the source string.
This, of course, returns the leftmost match. You could code a very straightforward loop that iterates across all occurrences, caches them, and chooses the shortest--or the leftmost shortest--or, if two or more are equally short, then all of those.

How to compare two "numbers" with multiple dots?

I have an unordered list that can look something like this:
1
2.2
1.1.1
3
When i sort the list, 1.1.1 becomes greater than 3 and 2.2, and 2.2 becomes greater than 3.
This is because Double.Parse removes the dots and makes it a whole number.
This is the method i use to sort with:
public class CompareCategory: IComparer<Category>
{
public int Compare(Category c1, Category c2)
{
Double cat1 = Double.Parse(c1.prefix);
Double cat2 = Double.Parse(c2.prefix);
if (cat1 > cat2)
return 1;
else if (cat1 < cat2)
return -1;
else
return 0;
}
}
How can i fix this?
Thanks

Are these version #s by chance? Can you use the Version class? It sorts each part as you seem to want, although it only works up to 4 parts. I would not recommend parsing into a numeric value like you are doing.
It has an IComparable interface. Assuming your inputs are strings, here's a sample:
public class CompareCategory: IComparer<Category>
{
public int Compare(Category c1, Category c2)
{
var cat1 = new Version(c1.prefix);
var cat2 = new Version(c2.prefix);
if (cat1 > cat2)
return 1;
else if (cat1 < cat2)
return -1;
else
return 0;
}
}
If you need something with more than 4 "parts", I think I would create a comparer which split the strings at the dots, and then parse each element as an integer and compare them numerically. Make sure to consider cases like 1.002.3 and 1.3.3 (what do you want the sort order to be?).
Update, here is a sample of what I mean. Lightly tested:
public class CategoryComparer : Comparer<Category>
{
public override int Compare(Category x, Category y)
{
var xParts = x.prefix.Split(new[] { '.' });
var yParts = y.prefix.Split(new[] { '.' });
int index = 0;
while (true)
{
bool xHasValue = xParts.Length > index;
bool yHasValue = yParts.Length > index;
if (xHasValue && !yHasValue)
return 1; // x bigger
if (!xHasValue && yHasValue)
return -1; // y bigger
if (!xHasValue && !yHasValue)
return 0; // no more values -- same
var xValue = decimal.Parse("." + xParts[index]);
var yValue = decimal.Parse("." + yParts[index]);
if (xValue > yValue)
return 1; // x bigger
if (xValue < yValue)
return -1; // y bigger
index++;
}
}
}
public static void Main()
{
var categories = new List<Category>()
{
new Category { prefix = "1" },
new Category { prefix = "2.2" },
new Category { prefix = "1.1.1" },
new Category { prefix = "1.1.1" },
new Category { prefix = "1.001.1" },
new Category { prefix = "3" },
};
categories.Sort(new CategoryComparer());
foreach (var category in categories)
Console.WriteLine(category.prefix);
}
Output:
1
1.001.1
1.1.1
1.1.1
2.2
3

public class CodeComparer : IComparer<string>
{
public int Compare(string x, string y)
{
var xParts = x.Split(new char[] { '.' });
var yParts = y.Split(new char[] { '.' });
var partsLength = Math.Max(xParts.Length, yParts.Length);
if (partsLength > 0)
{
for (var i = 0; i < partsLength; i++)
{
if (xParts.Length <= i) return -1;// 4.2 < 4.2.x
if (yParts.Length <= i) return 1;
var xPart = xParts[i];
var yPart = yParts[i];
if (string.IsNullOrEmpty(xPart)) xPart = "0";// 5..2->5.0.2
if (string.IsNullOrEmpty(yPart)) yPart = "0";
if (!int.TryParse(xPart, out var xInt) || !int.TryParse(yPart, out var yInt))
{
// 3.a.45 compare part as string
var abcCompare = xPart.CompareTo(yPart);
if (abcCompare != 0)
return abcCompare;
continue;
}
if (xInt != yInt) return xInt < yInt ? -1 : 1;
}
return 0;
}
// compare as string
return x.CompareTo(y);
}
}

Maybe you could just string compare it?

I'm surprised that Double.Parse doesn't throw an exception with those numbers with more than one decimal place.
You really need to write some rules about how to compare these strings.
I would split the strings using String.Split() on the dot character, then iterate through the two lists created and as soon as one of the levels contained a lower or higher number than the other, or if you ran out of items in one of the lists then you wold return 1 or -1 as appropriate. If you get to the end of both lists in the same iteration of the loop then they are the same and return 0.
I would write the code but I don't have VS in front of me.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

sorting strings containing numbers and letters - c#

Related

C# examining and replacing tuple values based on other tuple

C# Dictionary, Sum of value from one key?

Check if an stl file may contain two models

Substract shortest string containing all search criteria

How to compare two "numbers" with multiple dots?

Categories

Resources