Is there a way to make string a reference type in a collection? - c#

I want to modify some strings that are contained in an object like say an array, or maybe the nodes in an XDocument (XText)XNode.Value.
I want to gather a subset of strings from these objects and modify them, but I don't know at runtime from what object type they come from.
Put another way, let's say I have objects like this:
List<string> fruits = new List<string>() {"apple", "banana", "cantelope"};
XDocument _xmlObject;
I want to be able to add a subset of values from the original collections to new lists like this:
List<ref string> myStrings1 = new List<ref string>();
myStrings1.Add(ref fruits[1]);
myStrings1.Add(ref fruits[2]);
List<ref string> myStrings2 = new List<ref string>();
IEnumerable<XNode> xTextNodes = getTargetTextNodes(targetPath); //some function returns a series of XNodes in the XDocument
foreach (XNode node in xTextNodes)
{
myStrings2.Add(((XText)node).Value);
}
Then change the values using a general purpose method like this:
public void Modify(List<ref string> mystrings){
foreach (ref string item in mystrings)
{
item = "new string";
}
}
Such that I can pass that method any string collection, and modify the strings in the original object without having to deal with the original object itself.
static void Main(string[] args)
{
Modify(myStrings1);
Modify(myStrings2);
}
The important part here is the mystrings collection. That can be special. But I need to be able to use a variety of different kinds of strings and string collections as the originals source data to go in that collection.
Of course, the above code doesn't work, and neither does any variation I've tried. Is this even possible in c#?

What you want is possible with C#... but only if you can fix every possible source for your strings. That would allow you to use pointers to the original strings... at a terrible cost, however, in terms of memory management and unsafe code throughout your application.
I encourage you to pursue a different direction for this.
Based on your edits, it looks like you're always working with an entire collection, and always modifying the entire collection at once. Also, this might not even be a string collection at the outset. I don't think you'll be able to get the exact result you want, because of the base XDocument type you're working with. But one possible direction to explore might look like this:
public IEnumerable<string> Modify(IEnumerable<string> items)
{
foreach(string item in items)
{
yield return "blah";
}
}
You can use a projection to get strings from any collection type, and get your modified text back:
fruits = Modify(fruits).ToList();
var nodes = Modify( xTextNodes.Select(n => (XText)n.Value));
And once you understand how to make a projection, you may find that the existing .Select() method already does everything you need.
What I really suggest, though, is that rather than working with an entire collection, think about working in terms of one record at a time. Create a common object type that all of your data sources understand. Create a projection from each data source into the common object type. Loop through each of the objects in your projection and make your adjustment. Then have another projection back to the original record type. This will not be the original collection. It will be a new collection. Write your new collection back to disk.
Used appropriately, this also has the potential for much greater performance than your original approach. This is because working with one record at a time, using these linq projections, opens the door to streaming the data, such that only one the one current record is ever held in memory at a time. You can open a stream from the original and a stream for the output, and write to the output just as fast as you can read from the original.

The easiest way to achieve this is by doing the looping outside of the method. This allows you to pass the strings by reference which will replace the existing reference with the new one (don't forget that strings are immutable).
And example of this:
void Main()
{
string[] arr = new[] {"lala", "lolo"};
arr.Dump();
for(var i = 0; i < arr.Length; i++)
{
ModifyStrings(ref arr[i]);
}
arr.Dump();
}
public void ModifyStrings(ref string item)
{
item = "blah";
}

Related

What is the optimal data structure for storing objects with a string key and a bool auxiliary value?

I need a data structure like below, but I need to be able to change the bool value. Other two stay the as they were when they were initialized. What would you use for best performance?
Dictionary<string, (object, bool)> dic = new Dictionary<string, (object, bool)>();
I was thinking of hashtable. But hashtable is like a dictionary with key/value. The object and bool in my example are in concept not like a key/value, because other values of the external dictionary can have the same object (or better yet ... object type). I don't want to make someone looking at my code later on thinking that the object and bool are more related they really are.
EDIT: object in this example is just a place holder. In reality it's a complex object with other objects in it and so on. Procedure before this one makes a bunch of this objects and some of them are deepcopy of the others. They are passed to this procedure. All of the object are here named by some rules and stored in the dictionary. Names are obviously unique. Procedure that comes after will take this dictionary and set the bool value on and off based on the values in the objects themselves and on the values of other bools. Procedure will be recursive until some state is reached.
Number of objects (or dic. entries) is arbitrary but expected to be >100 && <500. Time complexity is O(n).
I am targeting .NET7 (standard).
but I need to be able to change the bool value.
You can just reassign value for the key:
var tuples = new Dictionary<string, (object Obj, bool Bool)>
{
{ "1", (new object(), true) }
};
tuples["1"] = (tuples["1"].Obj, false); // or tuples["1"] = (tuples["1"].Item1, false);
Or
if (tuples.TryGetValue("1", out var c))
{
tuples["1"] = (c.Obj, false);
}
Personally I would leave it at that, but for really high perf scenarios you can look into CollectionMarshall instead of second snippet:
ref var v = ref CollectionsMarshal.GetValueRefOrNullRef(tuples, "1");
if (!Unsafe.IsNullRef(ref v))
{
v.Bool = false;
}
A bit more info - here.
For the 'performance' aspect:
The .NET Dictionary uses hashes to look up the item you need, which is very fast (comparable to a HashTable). I don't expect much performance issues related to this, or at least nothing that can be improved on with other data structures.
Also, you shouldn't worry about performance unless you are doing things a million times in a row + it turns out (in practice) that something is taking a measurable amount of time.
For the 'changing a bool' aspect:
... that is quite a long story.
There are 2 tuple variants in .NET:
The value tuple, created by doing var x = (myObj, myBool), like you are doing.
The x is a struct, and therefore a Value Type. You can actually change x.Item1 or x.Item2 to a new value just fine.
However... if you put x into a Dictionary then you actually put a copy of x (with a copy of its values) into the dictionary, because that is the nature of value types.
When you retrieve it again from the Dictionary, yet another copy is made - which makes modifying the actual tuple inside the Dictionary impossible; any attempt to do so would only modify the last copy you got.
Side story: The .NET Compiler knows this, which is why its refuses to compile code like dic[yourKey].Item2 = newBool; because such code wouldn't do what you might hope it would do. You're basically telling the compiler to create a copy, modify the copy, and then... discard the copy. The compiler requries a variable to store the copy before the rest can even start, but we provided no variable.
The Tuple generic class, or rather a range of generic classes, an instance of which can be created using calls like var x = Tuple.Create(myObj, myBool). These classes however forbid that you change any of their properties, they are always readonly. Tuple class instances can be put in a Dictionary, but they will still be readonly.
So what options are there really to 'modify a value in a tuple' a Dictionary?
Keep using a value tuple, but accept that in order to "change" the tuple inside the Dictionary you'll have to make a new instance (either a copy, or from scratch), set it to the properties that you want, and put that instance (or actualy a copy...) into the dictionary:
// initialize it
var dict = new Dictionary<string, (object, bool)>();
var obj = new object();
dict["abc"] = (obj, true);
// change it
var tmpTuple = dict["abc"]; // get copy
tmpTuple.Item2 = false; // alter copy
dict["abc"] = tmpTuple; // store another copy
// or if you want to avoid the tmp variable
dict["abc"] = (dict["abc"].Item1, false)
Use a custom class instead of the value tuple or a Tuple class, and then put that into the Dictionary:
public class MyPair
{
public object O { get; set; }
public bool B { get; set; }
}
// initialize it
var dict = new Dictionary<string, MyPair>();
var obj = new object();
dict["abc"] = new MyPair { O = obj, B = true };
// change it
dict["abc"].B = false;
So both types of Tuples are OK for objects that you don't want to do a lot with. But both have certain limits in their usage, and sooner or later you may need to start using classes.

Is it better to create a List of new Objects or Dictionary?

I have a file with 2 columns and multiple rows. 1st column is ID, 2nd column is Name. I would like to display a Dropdown where I will show only all the names from this file.
I will only iterate through the collection. So what is the better approach? Is creating the objects more readable for other developers? Or maybe creating new objects is too slow and it's not worth.
while (!reader.EndOfStream)
{
var row = reader.ReadLine();
var values = row.Split(' ');
list.Add(new Object { Id = int.Parse(values[0]), Name = values[1] });
}
or
while (!reader.EndOfStream)
{
var row = reader.ReadLine();
var values = row.Split(' ');
dict.Add(int.Parse(values[0]), values[1]);
}
Do I lose the speed in the case if I will create new objects?
You create new objects, so to speak, also while adding to the Dictionary<T>, you create new Key-Value pair of the desired type.
As you already mentioned in your question, the decision is made on primary
expected access pattern
performance considerations, which are the function also of access pattern per se.
If you need read-only array to iterate over, use List<T> (even better if the size of the data is known upfront use Type[] data, and just read it where you need it).
If you need key-wise access to your data, use Dictionary<T>.
If you want to only iterate objects, then use List. No need to use Dictionary class at all.

Passing List<Domain Object> to a Method Expecting List<object>

I know I'm missing something fundamental with either generics or covariance, and I was hoping there is a better way to do what I am doing.
I have a method that takes a list of domain objects and turns it into an HTML table:
public String GenerateTable(List<object> Data, String[] Properties,
String[] ColumnHeaders = null)
{
}
When I call the method, I find myself having to do this:
List<Customer> cust = GetCustomers();
List<object> oCust = new List<object>;
foreach (Customer c in cust)
oCust.Add((object)c);
string table = GenerateTable(oCust, new string[] { "CustNbr", "CustName" });
I believe with covariance I can simply:
List<object> oCust = cust;
But I'm looking for a better solution all-around -- eliminate the necessity to create a completely new list each time I run this method. It's not a performance or memory issue, as these lists are always relatively small, but I'd like to understand what is the best (or at least better) way to accomplish this.
You should change GenerateTable to accept an IEnumerable of objects instead of a list. Then you won't have to convert your Customer list to a list of objects.
public String GenerateTable(IEnumerable<object> Data, String[] Properties, String[] ColumnHeaders = null)
The problem with your original version is that GenerateTable could attempt to add a non-Customer object to the List. IEnumerable works because it is read only. You can read more about it here, if you are interested.
Covariance is only supported in generic interfaces. Since it looks like an IEnumerable would be sufficient you can try to not use a generic at all.
public string GenerateTable(IEnumerable data, string[] properties, string[] columnHeaders = null)
Alternately, you could set up a generic transformation method
public string GenerateRow(Customer customer) { // convert one object here}
public string GenerateTable<T>(List<T> objects, Func<T,string> rowGenerator)
{
// table boilerplate
foreach(var obj in objects) {
output.Append(rowGenerate(customer))
}
}
and then call it with
var table = GenerateTable(customerList, GenerateRow);
to generate your table.

Creation and recalling of objects in a foreach loop?

Beginner programmer here so please keep (explanation of) answers as simple as possible.
For an assignment we got a text file that contains a large amount of lines.
Each line is a different 'Order' with an ordernumber, location, frequency (amount of times per week), and a few other arguments.
Example:
18; New York; 3PWK; ***; ***; etc
I've made it so that each line is read and stored in a string with
string[] orders = System.IO.File.ReadAllLines(#<filepath here>);
And I've made a separate class called "Order" which has a get+set for all the properties like ordernumber, etc.
Now I'm stuck on how to actually get the values in there. I know how string splitting works but not how I can make unique objects in a loop.
I'd need something that created a new Order for every line and then assign Order.ordernumber, Order.location etc.
Any help would be MUCH appreciated!
An easy approach will be to make a class to define the orders like this:
public class Order{
public string OrderNumber{get;set;}
public string OrderId{get;set;}
public string OrderSomeThingElse{get;set;}
}
Then initialize a List:
var orderList = new List<Order>();
Then loop through and populate it:
foreach( var order in orders ){
var splitString = order.Split(';');
orderList.Add( new Order{
OrderNumber = splitString[0],
OrderId = splitString[1],
OrderSomeThingElse = splitString[2]
});
}
If you want an easy, but not that elegant approach, this is it.
In addition to all the good answers you've already received. I recommend you to use File.ReadLines() instead File.ReadAllLines() Because you are reading large file.
The ReadLines and ReadAllLines methods differ as follows: When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned; when you use ReadAllLines, you must wait for the whole array of strings be returned before you can access the array. Therefore, when you are working with very large files, ReadLines can be more efficient. MSDN
Unless I misunderstand... do you mean something like this?
var ordersCollection = new List<Order>();
foreach (var order in orders)
{
var o = new Order();
o.PropertyName = ""; // Assign your property values like this
ordersCollection.Add(o);
}
// ordersCollection is now full of your orders.

how to add an associative index to an array. c#

i have an array of custom objects. i'd like to be able to reference this array by a particular data member, for instance myArrary["Item1"]
"Item1" is actually the value stored in the Name property of this custom type and I can write a predicate to mark the appropriate array item. However I am unclear as to how to let the array know i'd like to use this predicate to find the array item.
I'd like to just use a dictionary or hashtable or NameValuePair for this array, and get around this whole problem but it's generated and it must remain as CustomObj[]. i'm also trying to avoid loading a dictionary from this array as it's going to happen many times and there could be many objects in it.
For clarification
myArray[5] = new CustomObj() // easy!
myArray["ItemName"] = new CustomObj(); // how to do this?
Can the above be done? I'm really just looking for something similar to how DataRow.Columns["MyColumnName"] works
Thanks for the advice.
What you really want is an OrderedDictionary. The version that .NET provides in System.Collections.Specialized is not generic - however there is a generic version on CodeProject that you could use. Internally, this is really just a hashtable married to a list ... but it is exposed in a uniform manner.
If you really want to avoid using a dictionary - you're going to have to live with O(n) lookup performance for an item by key. In that case, stick with an array or list and just use the LINQ Where() method to lookup a value. You can use either First() or Single() depending on whether duplicate entries are expected.
var myArrayOfCustom = ...
var item = myArrayOfCustom.Where( x => x.Name = "yourSearchValue" ).First();
It's easy enough to wrap this functionality into a class so that external consumers are not burdened by this knowledge, and can use simple indexers to access the data. You could then add features like memoization if you expect the same values are going to be accessed frequently. In this way you could amortize the cost of building the underlying lookup dictionary over multiple accesses.
If you do not want to use "Dictionary", then you should create class "myArrary" with data mass storage functionality and add indexers of type "int" for index access and of type "string" for associative access.
public CustomObj this [string index]
{
get
{
return data[searchIdxByName(index)];
}
set
{
data[searchIdxByName(index)] = value;
}
}
First link in google for indexers is: http://www.csharphelp.com/2006/04/c-indexers/
you could use a dictionary for this, although it might not be the best solution in the world this is the first i came up with.
Dictionary<string, int> d = new Dictionary<string, int>();
d.Add("cat", 2);
d.Add("dog", 1);
d.Add("llama", 0);
d.Add("iguana", -1);
the ints could be objects, what you like :)
http://dotnetperls.com/dictionary-keys
Perhaps OrderedDictionary is what you're looking for.
you can use HashTable ;
System.Collections.Hashtable o_Hash_Table = new Hashtable();
o_Hash_Table.Add("Key", "Value");
There is a class in the System.Collections namespace called Dictionary<K,V> that you should use.
var d = new Dictionary<string, MyObj>();
MyObj o = d["a string variable"];
Another way would be to code two methods/a property:
public MyObj this[string index]
{
get
{
foreach (var o in My_Enumerable)
{
if (o.Name == index)
{
return o;
}
}
}
set
{
foreach (var o in My_Enumerable)
{
if (o.Name == index)
{
var i = My_Enumerable.IndexOf(0);
My_Enumerable.Remove(0);
My_Enumerable.Add(value);
}
}
}
}
I hope it helps!
It depends on the collection, some collections allow accessing by name and some don't. Accessing with strings is only meaningful when the collection has data stored, the column collection identifies columns by their name, thus allowing you to select a column by its name. In a normal array this would not work because items are only identified by their index number.
My best recommendation, if you can't change it to use a dictionary, is to either use a Linq expression:
var item1 = myArray.Where(x => x.Name == "Item1").FirstOrDefault();
or, make an extension method that uses a linq expression:
public static class CustomObjExtensions
{
public static CustomObj Get(this CustomObj[] Array, string Name)
{
Array.Where(x => x.Name == Name).FirstOrDefault();
}
}
then in your app:
var item2 = myArray.Get("Item2");
Note however that performance wouldn't be as good as using a dictionary, since behind the scenes .NET will just loop through the list until it finds a match, so if your list isn't going to change frequently, then you could just make a Dictionary instead.
I have two ideas:
1) I'm not sure you're aware but you can copy dictionary objects to an array like so:
Dictionary dict = new Dictionary();
dict.Add("tesT",40);
int[] myints = new int[dict.Count];
dict.Values.CopyTo(myints, 0);
This might allow you to use a Dictionary for everything while still keeping the output as an array.
2) You could also actually create a DataTable programmatically if that's the exact functionality you want:
DataTable dt = new DataTable();
DataColumn dc1 = new DataColumn("ID", typeof(int));
DataColumn dc2 = new DataColumn("Name", typeof(string));
dt.Columns.Add(dc1);
dt.Columns.Add(dc2);
DataRow row = dt.NewRow();
row["ID"] = 100;
row["Name"] = "Test";
dt.Rows.Add(row);
You could also create this outside of the method so you don't have to make the table over again every time.

Categories

Resources