Cleanly analyzing and handling integers

Cleanly analyzing and handling integers - c#

I am reverse engineering an old game format. For their textures, they store display type information in a single integer. I initially though the different bits were boolean flags. For example, if bit 1 was set, the texture is transparent. The more I have worked through the different display types, the less I think that is the case. So now I am handling them as just whole integers.
Here is some sample data:
1. When the texture is invisible, the flag integer is 0.
2. Diffuse - flag integer: -2147483647. Hex: 0x80000001
3. Semi Transparent - flag integer: -2147483643. Hex: 0x80000005
4. Masked - flag integer: -2147483629, Hex: 0x80000013
There are quite a few more types and they have their own unique integer value as well.
What is the easiest way to handle this data? An if/else statement with the huge negative integers is super ugly and hard to read. It looks like it would be easier to analyze them as hex values but each if/else statement requires an int.Parse statement like so:
if (parameters == int.Parse("0x80000001", NumberStyles.HexNumber))
{
// Handle this case
}
Is there any other suggested clean way of doing this and analyzing the data? Is it possible to just analyze the end of the hex value somehow?

There are quite a few more types and they have their own unique integer value as well...What is the easiest way to handle this data?
What you are describing is known as magic numbers. i.e. whenever your code needs to deal with arbitrary almost random numbers and it is not clear as to their meaning.
The best way to deal with them is to define them as constants once with comments and then to use the constants in code rather than repeating the “ugly” numbers throughout the code base.
Perhaps something like:
public static class TextureConstants
{
// TODO comments
public static int Diffuse = -2147483647;
public static int SemiTransparent = -2147483643;
public static int Masked= -2147483629;
}
Also their is no reason why the above could not have been done with an enum but if you are going to do lots of int comparison, comparing int with an enum requires pesky casting.
An if/else statement with the huge negative integers is super ugly and hard to read
Later you can use it in if or switch statements as so:
switch (someNumber)
{
case TextureConstants.Diffuse:
// do something
break;
case TextureConstants.SemiTransparent:
// do something
break;
case TextureConstants.Masked:
// do something
break;
}
Dictionary
If you don't like switch statements you can use a Dictionary<>:
public enum Textures
{
Diffuse,
SemiTransparent,
Masked
}
// initialise the dictionary
Dictionary<int, Textures> dict;
dict[TextureConstants.Diffuse] = Textures.Diffuse;
dict[TextureConstants.SemiTransparent] = Textures.SemiTransparent;
dict[TextureConstants.Masked] = Textures.Masked;
int someNumber = // ...
if (dict.TryGetValue (someNumber, out var texture))
{
// got one
if (texture == Textures.Diffuse) // ...
}

Related

Is numbers comparisons as uint faster than normal int in c#

I was looking how some of the .Net core libraries were implemented and one of many things that caught my eye was that in Dictionary<TKey, TValue> class some numeric comparisons where done doing a casting to (uint) even though at my naive eyes this didn't impacted the logic.
For example on
do { // some magic } while (collisionCount <= (uint)entries.Length);
collisionCount was initialized at 0 and always incremented (collisionCount++) and thus entries being an array its length will not be negative either see source code
as opposed to
if ((uint)i >= (uint)entries.Length) { // some code }
source code line
where i could become negative in some occasions when doing the following, see debug img
i = entry.next;
and thus using it as positive would change the program flow (due to two's complement)
See an extract of the class code:
// Some code and black magic
uint hashCode = (uint)key.GetHashCode();
int i = GetBucket(hashCode);
Entry[]? entries = _entries;
uint collisionCount = 0;
if (typeof(TKey).IsValueType)
{
i--;
do
{
if ((uint)i >= (uint)entries.Length) // <--- Workflow impact
{
goto ReturnNotFound;
}
entry = ref entries[i];
if (entry.hashCode == hashCode && EqualityComparer<TKey>.Default.Equals(entry.key, key))
{
goto ReturnFound;
}
i = entry.next;
collisionCount++;
} while (collisionCount <= (uint)entries.Length);
}
// More cool stuffs
Is there any performace gain or what is the reason for this?

The linked Dictionary source contains this comment;
// Should be a while loop https://github.com/dotnet/runtime/issues/9422
// Test in if to drop range check for following array access
if ((uint)i >= (uint)entries.Length)
{
goto ReturnNotFound;
}
entry = ref entries[i];
The uint comparison here isn't faster, but it helps speed up the array access. The linked github issue talks about a limitation in the runtime compiler, and how this loop structure allows further optimisations. Since this uint comparison has been performed explicitly, the compiler can prove that 0 <= i < entries.Length. This allows the compiler to leave out the array boundary test and throw of IndexOutOfRangeException that would otherwise be required.
In other words, at the time this code was written, and performance profiling was performed. The compiler wasn't smart enough to make simpler, more readable code, run as fast as possible. So someone with a deep understanding of the limitations of the compiler tweaked the code to make it better.

Is it possible to compare multiple values with one field? [duplicate]

Any easier way to write this if statement?
if (value==1 || value==2)
For example... in SQL you can say where value in (1,2) instead of where value=1 or value=2.
I'm looking for something that would work with any basic type... string, int, etc.

How about:
if (new[] {1, 2}.Contains(value))
It's a hack though :)
Or if you don't mind creating your own extension method, you can create the following:
public static bool In<T>(this T obj, params T[] args)
{
return args.Contains(obj);
}
And you can use it like this:
if (1.In(1, 2))
:)

A more complicated way :) that emulates SQL's 'IN':
public static class Ext {
public static bool In<T>(this T t,params T[] values){
foreach (T value in values) {
if (t.Equals(value)) {
return true;
}
}
return false;
}
}
if (value.In(1,2)) {
// ...
}
But go for the standard way, it's more readable.
EDIT: a better solution, according to #Kobi's suggestion:
public static class Ext {
public static bool In<T>(this T t,params T[] values){
return values.Contains(t);
}
}

C# 9 supports this directly:
if (value is 1 or 2)
however, in many cases: switch might be clearer (especially with more recent switch syntax enhancements). You can see this here, with the if (value is 1 or 2) getting compiled identically to if (value == 1 || value == 2).

Is this what you are looking for ?
if (new int[] { 1, 2, 3, 4, 5 }.Contains(value))

If you have a List, you can use .Contains(yourObject), if you're just looking for it existing (like a where). Otherwise look at Linq .Any() extension method.

Using Linq,
if(new int[] {1, 2}.Contains(value))
But I'd have to think that your original if is faster.

Alternatively, and this would give you more flexibility if testing for values other than 1 or 2 in future, is to use a switch statement
switch(value)
{
case 1:
case 2:
return true;
default:
return false
}

If you search a value in a fixed list of values many times in a long list, HashSet<T> should be used. If the list is very short (< ~20 items), List could have better performance, based on this test
HashSet vs. List performance
HashSet<int> nums = new HashSet<int> { 1, 2, 3, 4, 5 };
// ....
if (nums.Contains(value))

Generally, no.
Yes, there are cases where the list is in an Array or List, but that's not the general case.

An extensionmethod like this would do it...
public static bool In<T>(this T item, params T[] items)
{
return items.Contains(item);
}
Use it like this:
Console.WriteLine(1.In(1,2,3));
Console.WriteLine("a".In("a", "b"));

You can use the switch statement with pattern matching (another version of jules's answer):
if (value switch{1 or 3 => true,_ => false}){
// do something
}

Easier is subjective, but maybe the switch statement would be easier? You don't have to repeat the variable, so more values can fit on the line, and a line with many comparisons is more legible than the counterpart using the if statement.

In vb.net or C# I would expect that the fastest general approach to compare a variable against any reasonable number of separately-named objects (as opposed to e.g. all the things in a collection) will be to simply compare each object against the comparand much as you have done. It is certainly possible to create an instance of a collection and see if it contains the object, and doing so may be more expressive than comparing the object against all items individually, but unless one uses a construct which the compiler can explicitly recognize, such code will almost certainly be much slower than simply doing the individual comparisons. I wouldn't worry about speed if the code will by its nature run at most a few hundred times per second, but I'd be wary of the code being repurposed to something that's run much more often than originally intended.
An alternative approach, if a variable is something like an enumeration type, is to choose power-of-two enumeration values to permit the use of bitmasks. If the enumeration type has 32 or fewer valid values (e.g. starting Harry=1, Ron=2, Hermione=4, Ginny=8, Neville=16) one could store them in an integer and check for multiple bits at once in a single operation ((if ((thisOne & (Harry | Ron | Neville | Beatrix)) != 0) /* Do something */. This will allow for fast code, but is limited to enumerations with a small number of values.
A somewhat more powerful approach, but one which must be used with care, is to use some bits of the value to indicate attributes of something, while other bits identify the item. For example, bit 30 could indicate that a character is male, bit 29 could indicate friend-of-Harry, etc. while the lower bits distinguish between characters. This approach would allow for adding characters who may or may not be friend-of-Harry, without requiring the code that checks for friend-of-Harry to change. One caveat with doing this is that one must distinguish between enumeration constants that are used to SET an enumeration value, and those used to TEST it. For example, to set a variable to indicate Harry, one might want to set it to 0x60000001, but to see if a variable IS Harry, one should bit-test it with 0x00000001.
One more approach, which may be useful if the total number of possible values is moderate (e.g. 16-16,000 or so) is to have an array of flags associated with each value. One could then code something like "if (((characterAttributes[theCharacter] & chracterAttribute.Male) != 0)". This approach will work best when the number of characters is fairly small. If array is too large, cache misses may slow down the code to the point that testing against a small number of characters individually would be faster.

Using Extension Methods:
public static class ObjectExtension
{
public static bool In(this object obj, params object[] objects)
{
if (objects == null || obj == null)
return false;
object found = objects.FirstOrDefault(o => o.GetType().Equals(obj.GetType()) && o.Equals(obj));
return (found != null);
}
}
Now you can do this:
string role= "Admin";
if (role.In("Admin", "Director"))
{
...
}

public static bool EqualsAny<T>(IEquatable<T> value, params T[] possibleMatches) {
foreach (T t in possibleMatches) {
if (value.Equals(t))
return true;
}
return false;
}
public static bool EqualsAny<T>(IEquatable<T> value, IEnumerable<T> possibleMatches) {
foreach (T t in possibleMatches) {
if (value.Equals(t))
return true;
}
return false;
}

I had the same problem but solved it with a switch statement
switch(a value you are switching on)
{
case 1:
the code you want to happen;
case 2:
the code you want to happen;
default:
return a value
}

Converting a method to use any Enum

My Problem:
I want to convert my randomBloodType() method to a static method that can take any enum type. I want my method to take any type of enum whether it be BloodType, DaysOfTheWeek, etc. and perform the operations shown below.
Some Background on what the method does:
The method currently chooses a random element from the BloodType enum based on the values assigned to each element. An element with a higher value has a higher probability to be picked.
Code:
public enum BloodType
{
// BloodType = Probability
ONeg = 4,
OPos = 36,
ANeg = 3,
APos = 28,
BNeg = 1,
BPos = 20,
ABNeg = 1,
ABPos = 5
};
public BloodType randomBloodType()
{
// Get the values of the BloodType enum and store it in a array
BloodType[] bloodTypeValues = (BloodType[])Enum.GetValues(typeof(BloodType));
List<BloodType> bloodTypeList = new List<BloodType>();
// Create a list where each element occurs the approximate number of
// times defined as its value(probability)
foreach (BloodType val in bloodTypeValues)
{
for(int i = 0; i < (int)val; i++)
{
bloodTypeList.Add(val);
}
}
// Sum the values
int sum = 0;
foreach (BloodType val in bloodTypeValues)
{
sum += (int)val;
}
//Get Random value
Random rand = new Random();
int randomValue = rand.Next(sum);
return bloodTypeList[randomValue];
}
What I have tried so far:
I have tried to use generics. They worked out for the most part, but I was unable to cast my enum elements to int values. I included a example of a section of code that was giving me problems below.
foreach (T val in bloodTypeValues)
{
sum += (int)val; // This line is the problem.
}
I have also tried using Enum e as a method parameter. I was unable to declare the type of my array of enum elements using this method.

(Note: My apologies in advance for the lengthy answer. My actual proposed solution is not all that long, but there are a number of problems with the proposed solutions so far and I want to try to address those thoroughly, to provide context for my own proposed solution).
In my opinion, while you have in fact accepted one answer and might be tempted to use either one, neither of the answers provided so far are correct or useful.
Commenter Ben Voigt has already pointed out two major flaws with your specifications as stated, both related to the fact that you are encoding the enum value's weight in the value itself:
You are tying the enum's underlying type to the code that then must interpret that type.
Two enum values that have the same weight are indistinguishable from each other.
Both of these issues can be addressed. Indeed, while the answer you accepted (why?) fails to address the first issue, the one provided by Dweeberly does address this through the use of Convert.ToInt32() (which can convert from long to int just fine, as long as the values are small enough).
But the second issue is much harder to address. The answer from Asad attempts to address this by starting with the enum names and parsing them to their values. And this does indeed result in the final array being indexed containing the corresponding entries for each name separately. But the code actually using the enum has no way to distinguish the two; it's really as if those two names are a single enum value, and that single enum value's probability weight is the sum of the value used for the two different names.
I.e. in your example, while the enum entries for e.g. BNeg and ABNeg will be selected separately, the code that receives these randomly selected value has no way to know whether it was BNeg or ABNeg that was selected. As far as it knows, those are just two different names for the same value.
Now, even this problem can be addressed (but not in the way that Asad attempts to…his answer is still broken). If you were, for example, to encode the probabilities in the value while still ensuring unique values for each name, you could decode those probabilities while doing the random selection and that would work. For example:
enum BloodType
{
// BloodType = Probability
ONeg = 4 * 100 + 0,
OPos = 36 * 100 + 1,
ANeg = 3 * 100 + 2,
APos = 28 * 100 + 3,
BNeg = 1 * 100 + 4,
BPos = 20 * 100 + 5,
ABNeg = 1 * 100 + 6,
ABPos = 5 * 100 + 7,
};
Having declared your enum values that way, then you can in your selection code divide the enum value by 100 to obtain its probability weight, which then can be used as seen in the various examples. At the same time, each enum name has a unique value.
But even solving that problem, you are still left with problems related to the choice of encoding and representation of the probabilities. For example, in the above you cannot have an enum that has more than 100 values, nor one with weights larger than (2^31 - 1) / 100; if you want an enum that has more than 100 values, you need a larger multiplier but that would limit your weight values even more.
In many scenarios (maybe all the ones you care about) this won't be an issue. The numbers are small enough that they all fit. But that seems like a serious limitation in what seems like a situation where you want a solution that is as general as possible.
And that's not all. Even if the encoding stays within reasonable limits, you have another significant limit to deal with: the random selection process requires an array large enough to contain for each enum value as many instances of that value as its weight. Again, if the values are small maybe this is not a big problem. But it does severely limit the ability of your implementation to generalize.
So, what to do?
I understand the temptation to try to keep each enum type self-contained; there are some obvious advantages to doing so. But there are also some serious disadvantages that result from that, and if you truly ever try to use this in a generalized way, the changes to the solutions proposed so far will tie your code together in ways that IMHO negate most if not all of the advantage of keeping the enum types self-contained (primarily: if you find you need to modify the implementation to accommodate some new enum type, you will have to go back and edit all of the other enum types you're using…i.e. while each type looks self-contained, in reality they are all tightly coupled with each other).
In my opinion, a much better approach would be to abandon the idea that the enum type itself will encode the probability weights. Just accept that this will be declared separately somehow.
Also, IMHO is would be better to avoid the memory-intensive approach proposed in your original question and mirrored in the other two answers. Yes, this is fine for the small values you're dealing with here. But it's an unnecessary limitation, making only one small part of the logic simpler while complicating and restricting it in other ways.
I propose the following solution, in which the enum values can be whatever you want, the enum's underlying type can be whatever you want, and the algorithm uses memory proportionally only to the number of unique enum values, rather than in proportion to the sum of all of the probability weights.
In this solution, I also address possible performance concerns, by caching the invariant data structures used to select the random values. This may or may not be useful in your case, depending on how frequently you will be generating these random values. But IMHO it is a good idea regardless; the up-front cost of generating these data structures is so high that if the values are selected with any regularity at all, it will begin to dominate the run-time cost of your code. Even if it works fine today, why take the risk? (Again, especially given that you seem to want a generalized solution).
Here is the basic solution:
static T NextRandomEnumValue<T>()
{
KeyValuePair<T, int>[] aggregatedWeights = GetWeightsForEnum<T>();
int weightedValue =
_random.Next(aggregatedWeights[aggregatedWeights.Length - 1].Value),
index = Array.BinarySearch(aggregatedWeights,
new KeyValuePair<T, int>(default(T), weightedValue),
KvpValueComparer<T, int>.Instance);
return aggregatedWeights[index < 0 ? ~index : index + 1].Key;
}
static KeyValuePair<T, int>[] GetWeightsForEnum<T>()
{
object temp;
if (_typeToAggregatedWeights.TryGetValue(typeof(T), out temp))
{
return (KeyValuePair<T, int>[])temp;
}
if (!_typeToWeightMap.TryGetValue(typeof(T), out temp))
{
throw new ArgumentException("Unsupported enum type");
}
KeyValuePair<T, int>[] weightMap = (KeyValuePair<T, int>[])temp;
KeyValuePair<T, int>[] aggregatedWeights =
new KeyValuePair<T, int>[weightMap.Length];
int sum = 0;
for (int i = 0; i < weightMap.Length; i++)
{
sum += weightMap[i].Value;
aggregatedWeights[i] = new KeyValuePair<T,int>(weightMap[i].Key, sum);
}
_typeToAggregatedWeights[typeof(T)] = aggregatedWeights;
return aggregatedWeights;
}
readonly static Random _random = new Random();
// Helper method to reduce verbosity in the enum-to-weight array declarations
static KeyValuePair<T1, T2> CreateKvp<T1, T2>(T1 t1, T2 t2)
{
return new KeyValuePair<T1, T2>(t1, t2);
}
readonly static KeyValuePair<BloodType, int>[] _bloodTypeToWeight =
{
CreateKvp(BloodType.ONeg, 4),
CreateKvp(BloodType.OPos, 36),
CreateKvp(BloodType.ANeg, 3),
CreateKvp(BloodType.APos, 28),
CreateKvp(BloodType.BNeg, 1),
CreateKvp(BloodType.BPos, 20),
CreateKvp(BloodType.ABNeg, 1),
CreateKvp(BloodType.ABPos, 5),
};
readonly static Dictionary<Type, object> _typeToWeightMap =
new Dictionary<Type, object>()
{
{ typeof(BloodType), _bloodTypeToWeight },
};
readonly static Dictionary<Type, object> _typeToAggregatedWeights =
new Dictionary<Type, object>();
Note that the work of actually selecting a random value is simply a matter of choosing a non-negative random integer less than the sum of the weights, and then using a binary search to find the appropriate enum value.
Once per enum type, the code will build the table of values and weight-sums that will be used for the binary search. This result is stored in a cache dictionary, _typeToAggregatedWeights.
There are also the objects that have to be declared and which will be used at run-time to build this table. Note that the _typeToWeightMap is just in support of making this method 100% generic. If you wanted to write a different named method for each specific type you wanted to support, that could still used a single generic method to implement the initialization and selection, but the named method would know the correct object (e.g. _bloodTypeToWeight) to use for initialization.
Alternatively, another way to avoid the _typeToWeightMap while still keeping the method 100% generic would be to have the _typeToAggregatedWeights be of type Dictionary<Type, Lazy<object>>, and have the values of the dictionary (the Lazy<object> objects) explicitly reference the appropriate weight array for the type.
In other words, there are lots of variations on this theme that would work fine. But they will all have essentially the same structure as above; semantics would be the same and performance differences would be negligible.
One thing you'll notice is that the binary search requires a custom IComparer<T> implementation. That is here:
class KvpValueComparer<TKey, TValue> :
IComparer<KeyValuePair<TKey, TValue>> where TValue : IComparable<TValue>
{
public readonly static KvpValueComparer<TKey, TValue> Instance =
new KvpValueComparer<TKey, TValue>();
private KvpValueComparer() { }
public int Compare(KeyValuePair<TKey, TValue> x, KeyValuePair<TKey, TValue> y)
{
return x.Value.CompareTo(y.Value);
}
}
This allows the Array.BinarySearch() method to correct compare the array elements, allowing a single array to contain both the enum values and their aggregated weights, but limiting the binary search comparison to just the weights.

Assuming your enum values are all of type int (you can adjust this accordingly if they're long, short, or whatever):
static TEnum RandomEnumValue<TEnum>(Random rng)
{
var vals = Enum
.GetNames(typeof(TEnum))
.Aggregate(Enumerable.Empty<TEnum>(), (agg, curr) =>
{
var value = Enum.Parse(typeof (TEnum), curr);
return agg.Concat(Enumerable.Repeat((TEnum)value,(int)value)); // For int enums
})
.ToArray();
return vals[rng.Next(vals.Length)];
}
Here's how you would use it:
var rng = new Random();
var randomBloodType = RandomEnumValue<BloodType>(rng);
People seem to have their knickers in a knot about multiple indistinguishable enum values in the input enum (for which I still think the above code provides expected behavior). Note that there is no answer here, not even Peter Duniho's, that will allow you to distinguish enum entries when they have the same value, so I'm not sure why this is being considered as a metric for any potential solutions.
Nevertheless, an alternative approach that doesn't use the enum values as probabilities is to use an attribute to specify the probability:
public enum BloodType
{
[P=4]
ONeg,
[P=36]
OPos,
[P=3]
ANeg,
[P=28]
APos,
[P=1]
BNeg,
[P=20]
BPos,
[P=1]
ABNeg,
[P=5]
ABPos
}
Here is what the attribute used above looks like:
[AttributeUsage(AttributeTargets.Field, AllowMultiple = false)]
public class PAttribute : Attribute
{
public int Weight { get; private set; }
public PAttribute(int weight)
{
Weight = weight;
}
}
and finally, this is what the method to get a random enum value would like:
static TEnum RandomEnumValue<TEnum>(Random rng)
{
var vals = Enum
.GetNames(typeof(TEnum))
.Aggregate(Enumerable.Empty<TEnum>(), (agg, curr) =>
{
var value = Enum.Parse(typeof(TEnum), curr);
FieldInfo fi = typeof (TEnum).GetField(curr);
var weight = ((PAttribute)fi.GetCustomAttribute(typeof(PAttribute), false)).Weight;
return agg.Concat(Enumerable.Repeat((TEnum)value, weight)); // For int enums
})
.ToArray();
return vals[rng.Next(vals.Length)];
}
(Note: if this code is performance critical, you might need to tweak this and add caching for the reflection data).

Some of this you can do and some of it isn't so easy. I believe the following extension method will do what you describe.
static public class Util {
static Random rnd = new Random();
static public int PriorityPickEnum(this Enum e) {
// The approved types for an enum are byte, sbyte, short, ushort, int, uint, long, or ulong
// However, Random only supports a int (or double) as a max value. Either way
// it doesn't have the range for uint, long and ulong.
//
// sum enum
int sum = 0;
foreach (var x in Enum.GetValues(e.GetType())) {
sum += Convert.ToInt32(x);
}
var i = rnd.Next(sum); // get a random value, it will form a ratio i / sum
// enums may not have a uniform (incremented) value range (think about flags)
// therefore we have to step through to get to the range we want,
// this is due to the requirement that return value have a probability
// proportional to it's value. Note enum values must be sorted for this to work.
foreach (var x in Enum.GetValues(e.GetType()).OfType<Enum>().OrderBy(a => a)) {
i -= Convert.ToInt32(x);
if (i <= 0) return Convert.ToInt32(x);
}
throw new Exception("This doesn't seem right");
}
}
Here is an example of using this extension:
BloodType bt = BloodType.ABNeg;
for (int i = 0; i < 100; i++) {
var v = (BloodType) bt.PriorityPickEnum();
Console.WriteLine("{0}: {1}({2})", i, v, (int) v);
}
This should work pretty well for enum's of type byte, sbyte, ushort, short and int. Once you get beyond int (uint, long, ulong) the problem is the Random class. You can adjust the code to use doubles generated by Random, which would cover uint, but the Random class just doesn't have the range to cover long and ulong. Of course you could use/find/write a different Random class if this is important.

Specifying the length of Int containing 0s

I want to convert a string which contains 5 zeros ("00000") into a Int so it can be incremented.
Whenever I convert the string to an integer using Convert.ToInt32(), the value becomes 0.
How can I ensure that the integer stays at a fixed length once converted?
I want to be able to increment the value from "00000" to "00001" and so on so that the value appears with that many digits in a database instead of 0 or 1.
If you are going to down vote question, the least you could do is leave feedback on why you did it...

An integer is an integer. Nothing more.
So the int you have has value zero, there is no way to add any "length" metadata or similar.
I guess what you actually want is, that - at some point - there will be a string again. You want that string to represent your number, but with 5 characters and leading zeroes.
You can achieve that using:
string s = i.ToString("00000")
EDIT
You say you want the numbers to appear like that in your DB.
Ask yourself:
Does it really need to be like that in the DB or is it sufficient to format the numbers as soon as you read them from the database?
Depending on that, format them (as shown above), either when writing to or reading from the DB. Of course, the former case will require more storage space and you cannot really use SQL or similar to perform arithmetic operations!

An integer doesn't have a length, it's purely a numerical value. If you want to keep the length, you have to store the information about the length somewhere else.
You can wrap the value in an object that keeps the length in a property. Example:
public class FormattedInt {
private int _value, _length;
public FormattedInt(string source) {
_length = source.Length;
_value = Int32.Parse(source);
}
public void Increase() {
_value++;
}
public override string ToString() {
return _value.ToString(new String('0', _length));
}
}
Usage:
FormattedInt i = new FormattedInt("0004");
i.Increase();
string s = i.ToString(); // 0005

As olydis says above, an int is just a number. It doesn't specify the display format. I don't recommend storing it padded out with zeroes in a database either. It's an int, store it as an int.
For display purposes, in the app, a report, whatever, then pad out the zeroes.
string s = i.ToString("00000");

Recursive woes - reducing an input string

I'm working on a portion of code that is essentially trying to reduce a list of strings down to a single string recursively.
I have an internal database built up of matching string arrays of varying length (say array lengths of 2-4).
An example input string array would be:
{"The", "dog", "ran", "away"}
And for further example, my database could be made up of string arrays in this manner:
(length 2) {{"The", "dog"},{"dog", "ran"}, {"ran", "away"}}
(length 3) {{"The", "dog", "ran"}.... and so on
So, what I am attempting to do is recursively reduce my input string array down to a single token. So ideally it would parse something like this:
1) {"The", "dog", "ran", "away"}
Say that (seq1) = {"The", "dog"} and (seq2) = {"ran", "away"}
2) { (seq1), "ran", "away"}
3) { (seq1), (seq2)}
In my sequence database I know that, for instance, seq3 = {(seq1), (seq2)}
4) { (seq3) }
So, when it is down to a single token, I'm happy and the function would end.
Here is an outline of my current program logic:
public void Tokenize(Arraylist<T> string_array, int current_size)
{
// retrieve all known sequences of length [current_size] (from global list array)
loc_sequences_by_length = sequences_by_length[current_size-min_size]; // sequences of length 2 are stored in position 0 and so on
// escape cases
if (string_array.Count == 1)
{
// finished successfully
return;
}
else if (string_array.Count < current_size)
{
// checking sequences of greater length than input string, bail
return;
}
else
{
// split input string into chunks of size [current_size] and compare to local database
// of known sequences
// (splitting code works fine)
foreach (comparison)
{
if (match_found)
{
// update input string and recall function to find other matches
string_array[found_array_position] = new_sequence;
string_array.Removerange[found_array_position+1, new_sequence.Length-1];
Tokenize(string_array, current_size)
}
}
}
// ran through unsuccessfully, increment length and try again for new sequence group
current_size++;
if (current_size > MAX_SIZE)
return;
else
Tokenize(string_array, current_size);
}
I thought it was straightforward enough, but have been getting some strange results.
Generally it appears to work, but upon further review of my output data I'm seeing some issues. Mainly, it appears to work up to a certain point...and at that point my 'curr_size' counter resets to the minimum value.
So it is called with a size of 2, then 3, then 4, then resets to 2.
My assumption was that it would run up to my predetermined max size, and then bail completely.
I tried to simplify my code as much as possible, so there are probably some simple syntax errors in transcribing. If there is any other detail that may help an eagle-eyed SO user, please let me know and I'll edit.
Thanks in advance

One bug is:
string_array[found_array_position] = new_sequence;
I don't know where this is defined, and as far as I can tell if it was defined, it is never changed.
In your if statement, when if match_found ever set to true?
Also, it appears you have an extra close brace here, but you may want the last block of code to be outside of the function:
}
}
}
It would help if you cleaned up the code, to make it easier to read. Once we get past the syntactic errors it will be easier to see what is going on, I think.

Not sure what all the issues are, but the first thing I'd do is have your "catch-all" exit block right at the beginning of your method.
public void Tokenize(Arraylist<T> string_array, int current_size)
{
if (current_size > MAX_SIZE)
return;
// Guts go here
Tokenize(string_array, ++current_size);
}
A couple things:
Your tokens are not clearly separated from your input string values. This makes it more difficult to handle, and to see what's going on.
It looks like you're writing pseudo-code:
loc_sequences_by_length is not used
found_array_position is not defined
Arraylist should be ArrayList.
etc.
Overall I agree with James' statement:
It would help if you cleaned up the
code, to make it easier to read.
-Doug

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.