Why is there still a reference onto this string? - c#

I was tyoing around with WeakReference and WeakReference<T>. They only work with classes (obviously, reference) so I did an example with a string (string is a class in .Net).
When I ran the following code snippet, it didn't provide the result I expected, in the meaning that the WeakReference still contained the string.
string please = "wat";
WeakReference<string> test = new WeakReference<string>(please);
string testresult;
please = null;
GC.Collect();
bool worked = test.TryGetTarget(out testresult);
Console.WriteLine("it is " + worked);
result: "it is true"
Now when I created a simple wrapper class around the string:
class TestWeakStuff
{
public string Test { get; set; }
}
and used it instead of the string, it did return my expected result:
TestWeakStuff testclass = new TestWeakStuff() { Test = "wat" };
WeakReference<TestWeakStuff> test2 = new WeakReference<TestWeakStuff>(testclass);
TestWeakStuff testresult2;
testclass = null;
GC.Collect();
bool worked2 = test2.TryGetTarget(out testresult2);
Console.WriteLine("2nd time is " + worked2);
Result: "2nd time is false"
I tried the same with the non generic WeakReference class, and the result is the same.
Why is the String not being claimed by the garbage collector?
(GC.Collect() does claim all generations, external GC call is with -1 (all generations))

String literals are not a good candidate to test GC behavior. String literals are added to the intern pool on the CLR. This causes only one object for each distinct string literal to live in memory. This is an optimization. Strings in the intern pool are referenced forever and never collected.
Strings are not an ordinary class. They are intrinsics to the runtime.
You should be able to test it with new string('x', 10) which creates a new object each time. This is guaranteed to be so. Sometimes, this is being used to use unsafe code to write to strings before publishing them to other code. Can be using with native code as well.
It's probably best to drop testing strings entirely. The results you obtain are not particularly interesting or guaranteed to remain stable across runtime changes.
You could test with new object() which would be the simplest way to test it.

Related

Do references to collections cause any trouble with threads?

I have something like the following code:
public class MainAppClass : BaseClass
{
public IList<Token> TokenList
{
get;
set;
}
// This is execute before any thread is created
public override void OnStart()
{
MyDataBaseContext dbcontext = new MyDataBaseContext();
this.TokenList = dbcontext.GetTokenList();
}
// After this the application will create a list of many items to be iterated
// and will create as many threads as are defined in the configuration (5 at the momment),
// then it will distribute those items among the threads for parallel processing.
// The OnProcessItem will be executed for every item and could be running on different threads
protected override void OnProcessItem(AppItem processingItem)
{
string expression = getExpressionFromItem();
expression = Utils.ReplaceTokens(processingItem, expression, this);
}
}
public class Utils
{
public static string ReplaceTokens(AppItem currentProcessingItem, string expression, MainAppClass mainAppClass)
{
Regex tokenMatchExpression = new Regex(#"\[[^+~][^$*]+?\]", RegexOptions.IgnoreCase);
Match tokenMatch = tokenMatchExpression.Match(expression)
if(tokenMatch.Success == false)
{
return expression;
}
string tokenName = tokenMatch.Value;
// This line is my principal suspect of messing in some way with the multiple threads
Token tokenDefinition = mainAppClass.TokenList.Where(x => x.Name == tokenName).First();
Regex tokenElementExpression = new Regex(tokenDefintion.Value);
MyRegexSearchResult evaluationResult = Utils.GetRegexMatches(currentProcessingItem, tokenElementExpression).FirstOrDefault();
string tokenValue = string.Empty;
if (evaluationResult != null && evaluationResult.match.Groups.Count > 1)
{
tokenValue = evaluationResult.match.Groups[1].Value;
}
else if (evaluationResult != null && evaluationResult.match.Groups.Count == 1)
{
tokenValue = evaluationResult.match.Groups[0].Value;
}
expression = expression.Replace("[" + tokenName + "]", tokenValue);
return expression;
}
}
The problem I have right now is that for some reason the value of the token replaced in the expression get confused with one from another thread, resulting in an incorrect replacement as it should be a different value, i.e:
Expression: Hello [Name]
Expected result for item 1: Hello Nick
Expected result for item 2: Hello Sally
Actual result for item 1: Hello Nick
Actual result for item 2: Hello Nick
The actual result is not always the same, sometimes is the expected one, sometimes both expressions are replaced with the value expected for the item 1, or sometimes both expressions are replaced with the value expected for the item 2.
I'm not able to find what's wrong with the code as I was expecting for all the variables within the static method to be in its own scope for every thread, but that doesn't seem to be the case.
Any help will be much appreciated!
Yeah, static objects only have one instance throughout the program - creating new threads doesn't create separate instances of those objects.
You've got a couple different ways of dealing with this.
Door #1. If the threads need to operate on different instances, you'll need to un-static the appropriate places. Give each thread its own instance of the object you need it to modify.
Door #2. Thread-safe objects (like mentioned by Fildor.) I'll admit, I'm a bit less familiar with this door, but it's probably the right approach if you can get it to work (less complexity in code is awesome)
Door #3. Lock on the object directly. One option is to, when modifying the global static, to put it inside a lock(myObject) { } . They're pretty simple and straight-foward (so much simpler than the old C/C++ days), and it'll make it so multiple modifications don't screw the object up.
Door #4. Padlock the encapsulated class. Don't allow outside callers to modify the static variable at all. Instead, they have to call global getters/setters. Then, have a private object inside the class that serves simply as a lockable object - and have the getters/setters lock that lockable object whenever they're reading/writing it.
The tokenValue that you're replacing the token with is coming from evaluationResult.
evaluationResult is based on Utils.GetRegexMatches(currentProcessingItem, tokenElementExpression).
You might want to check GetRegexMatches to see if it's using any static resources, but my best guess is that it's being passed the same currentProcessingItem value in multiple threads.
Look to the code looks like that splits up the AppItems. You may have an "access to modified closure" in there. For example:
for(int i = 0; i < appItems.Length; i++)
{
var thread = new Thread(() => {
// Since the variable `i` is shared across all of the
// iterations of this loop, `appItems[i]` is going to be
// based on the value of `i` at the time that this line
// of code is run, not at the time when the thread is created.
var appItem = appItems[i];
...
});
...
}

Looking for better understanding on the coding standards

I installed CodeCracker
This is my original method.
//Add
public bool AddItemToMenu(MenuMapper mapperObj)
{
using (fb_databaseContext entities = new fb_databaseContext())
{
try
{
FoodItem newItem = new FoodItem();
newItem.ItemCategoryID = mapperObj.ItemCategory;
newItem.ItemName = mapperObj.ItemName;
newItem.ItemNameInHindi = mapperObj.ItemNameinHindi;
entities.FoodItems.Add(newItem);
entities.SaveChanges();
return true;
}
catch (Exception ex)
{
//handle exception
return false;
}
}
}
This is the recommended method by CodeCracker.
public static bool AddItemToMenu(MenuMapper mapperObj)
{
using (fb_databaseContext entities = new fb_databaseContext())
{
try
{
var newItem = new FoodItem
{
ItemCategoryID = mapperObj.ItemCategory,
ItemName = mapperObj.ItemName,
ItemNameInHindi = mapperObj.ItemNameinHindi,
};
entities.FoodItems.Add(newItem);
entities.SaveChanges();
return true;
}
catch (Exception ex)
{
//handle exception
return false;
}
}
}
As far as I know Static methods occupy memory when the application intialize irrespective if they are called or not.
When I alrady know the return type then why should I use var keyword.
Why this way of Object intializer is better.
I am very curios to get these answer, as it can guide me in a long way.
Adding one more method:-
private string GeneratePaymentHash(OrderDetailMapper order)
{
var payuBizzString = string.Empty;
payuBizzString = "hello|" + order.OrderID + "|" + order.TotalAmount + "|FoodToken|" + order.CustomerName + "|myemail#gmail.com|||||||||||10000";
var sha1 = System.Security.Cryptography.SHA512Managed.Create();
var inputBytes = Encoding.ASCII.GetBytes(payuBizzString);
var hash = sha1.ComputeHash(inputBytes);
var sb = new StringBuilder();
for (var i = 0; i < hash.Length; i++)
{
sb.Append(hash[i].ToString("X2"));
}
return sb.ToString().ToLower();
}
As far as I know Static methods occupy memory when the application intialize irrespective if they are called or not.
All methods do that. You are probably confusing this with static fields, which occupy memory even when no instances of the class are created. Generally, if a method can be made static, it should be made static, except when it is an implementation of an interface.
When I already know the return type then why should I use var keyword.
To avoid specifying the type twice on the same line of code.
Why this way of Object intializer is better?
Because it groups the assignments visually, and reduces the clutter around them, making it easier to read.
Static methods don't occupy any more memory than instance methods. Additionally, your method should be static because it doesn't rely in any way on accessing itself (this) as an instance.
Using var is most likely for readability. var is always only 3 letters while many types are much longer and can force the name of the variable much further along the line.
The object initializer is, again, most likely for readability by not having the variable name prefix all the attributes. It also means all your assignments are done at once.
In most cases, this tool you're using seems to be about making code more readable and clean. There may be certain cases where changes will boost performance by hinting to the compiler about your intentions, but generally, this is about being able to understand the code at a glance.
Only concern yourself with performance if you're actually experiencing performance issues. If you are experiencing performance issues then use some profiling tools to measure your application performance and find out which parts of your code are running slowly.
As far as I know Static methods occupy memory when the application
initialize irrespective if they are called or not.
This is true for all kind of methods, so that's irrelevant.
When I already know the return type then why should I use var keyword.
var is a personal preference (which is a syntactic sugar). This analyzer might think since the return type is already known, there is no need to use type explicitly, so, I recommend to use var instead. Personaly, I use var as much as possible. For this issue, you might wanna read Use of var keyword in C#
Why this way of Object intializer is better.
I can't say object initializer is always better but object initialize supplies that either your newItem will be null or it's fully initialized since your;
var newItem = new FoodItem
{
ItemCategoryID = mapperObj.ItemCategory,
ItemName = mapperObj.ItemName,
ItemNameInHindi = mapperObj.ItemNameinHindi,
};
is actually equal to
var temp = new FoodItem();
newItem.ItemCategoryID = mapperObj.ItemCategory;
newItem.ItemName = mapperObj.ItemName;
newItem.ItemNameInHindi = mapperObj.ItemNameinHindi;
var newItem = temp;
so, this is not the same as your first one. There is a nice answer on Code Review about this subject. https://codereview.stackexchange.com/a/4330/6136 Also you might wanna check: http://community.bartdesmet.net/blogs/bart/archive/2007/11/22/c-3-0-object-initializers-revisited.aspx
A lot of these are personal preferences but most coding standards allow other programmers to read your code easier.
Changing the static method to an instance takes more advantage of OO concepts, it limits the amount of mixed state and also allows you to add interfaces so you can mock out the class for testing.
The var keyword is still statically typed but because we should concentrate on naming and giving our objects more meaningful so explicitly declaring the type becomes redundant.
As for the object initialisation this just groups everything that is required to setup the object. Just makes it a little easier to read.
As far as I know Static methods occupy memory when the application intialize irrespective if they are called or not.
Methods that are never called may or may not be optimized away, depending on the compiler, debug vs. release and such. Static vs. non-static does not matter.
A method that doesn't need a this reference can (and IMO should) be static.
When I already know the return type then why should I use var keyword
No reason. There's no difference; do whatever you prefer.
Why this way of Object intializer is better.
The object initializer syntax generates the same code for most practical purposes (see answer #SonerGönül for the details). Mostly it's a matter of preference -- personally I find the object initializer syntax easier to read and maintain.

Do array properties cause memory allocation on the heap?

Consider the following:
public class FooBar {
public int[] SomeNumbers {
get { return _someNumbers; }
private set;
}
private int[] _someNumbers;
public FooBar() {
_someNumbers = new int[2];
_someNumbers[0] = 1;
_someNumbers[1] = 2;
}
}
// in some other method somewhere...
FooBar foobar = new FooBar();
Debug.Log(foobar.SomeNumbers[0]);
What I am wondering is, does calling the SomeNumbers property cause a heap allocation; basically does it cause a copy of the array to be created, or is it just a pointer?
I ask because I am trying to resolves some GC issues I have due to functions that return arrays, and I want to make sure my idea of caching some values like this will actually make a difference
Arrays are always reference types, so yes, it is "basically returning a pointer".
If you are trying to debug memory issues I recommend using a memory profiler. There is one built in to Visual Studio or you can use a 3rd party one (I personally like DotMemory, it has a 5 day free trial). Using a profiler will help you identify what is creating memory objects and what is keeping memory objects alive.

Why string pointer position is different?

Why string pointer position is different each time I ran the application, when I'm using StringBuilder but same when I declare a variable?
void Main()
{
string str_01 = "my string";
string str_02 = GetString();
unsafe
{
fixed (char* pointerToStr_01 = str_01)
{
fixed (char* pointerToStr_02 = str_02)
{
Console.WriteLine((Int64)pointerToStr_01);
Console.WriteLine((Int64)pointerToStr_02);
}
}
}
}
private string GetString()
{
StringBuilder sb = new StringBuilder();
sb.Append("my string");
return sb.ToString();
}
Output:
40907812
178488268
next time:
40907812
179023248
next time:
40907812
178448964
str_01 holds a reference to constant string. StringBuilder however builds string instances dynamically, so the returned string instance is not referentially the same instance as the constant string with the same content. System.Object.ReferenceEquals() will return false.
Since the str_01 is a reference to a constant string, its data is probably stored in a data section of the executable, which always gets the same address in the application virtual address space.
Edit:
You can see the "my string" text in UTF-8 encoding when you open the compiled .exe file using PE.Explorer or similar software. It is present in the .data section of the file, including a preferred Virtual Address where the section should be loaded in process virtual memory.
I have however not been able to reproduce that str_01 has a same address on multiple runs of the application, probably because my x64 Windows 8.1 performs Address space layout randomization (ASLR). Because of that, all pointers will be different across multiple runs of the application, even those that point directly to loaded PE sections.
Just because two strings are equal that doesn't mean they point to the same references (which I guess would mean having the same pointers), C# does not intern all strings automatically because of performance considerations and what not. If you want the pointers to be the same for both strings you can intern str_02 using string.Intern.
when i use fixed it will allocate a memory
as str_01 is constant string, it allocates memory on execution and points to same location every time
fixed (char* pointerToStr_01 = str_01)
but in case of
fixed (char* pointerToStr_02 = str_02)
its dynamically allocating the memory hence the pointing location varies every time
hence there is diffrence in the string pointer each time we run
I am not agree that output for
Console.WriteLine((Int64)pointerToStr_01);
is same for you always as I tested it personally to make my point more clear.
Lets have a look in both cases:
In case of string str_01 = "my string", when you will print the pointer value of this variable it will not the same as previous because every time a new String object is created (i.e. string is Immutable) and "my string" is assigned to it. Then within Fixed statement you are printing the pointer's value which is out of scope when you execute the program again and previous value will not be remembered.
I think, till now you can self-explain the behavior of StringBuilder.
Also check with:
string str_01 = GetString();
private static string GetString()
{
var sb = new String(new char[] {'m','y',' ','s','t','r','i','n','g'});
return sb;
}

Why does .NET create new substrings instead of pointing into existing strings?

From a brief look using Reflector, it looks like String.Substring() allocates memory for each substring. Am I correct that this is the case? I thought that wouldn't be necessary since strings are immutable.
My underlying goal was to create a IEnumerable<string> Split(this String, Char) extension method that allocates no additional memory.
One reason why most languages with immutable strings create new substrings rather than refer into existing strings is because this will interfere with garbage collecting those strings later.
What happens if a string is used for its substring, but then the larger string becomes unreachable (except through the substring). The larger string will be uncollectable, because that would invalidate the substring. What seemed like a good way to save memory in the short term becomes a memory leak in the long term.
Not possible without poking around inside .net using String classes. You would have to pass around references to an array which was mutable and make sure no one screwed up.
.Net will create a new string every time you ask it to. Only exception to this is interned strings which are created by the compiler (and can be done by you) which are placed into memory once and then pointers are established to the string for memory and performance reasons.
Each string has to have it's own string data, with the way that the String class is implemented.
You can make your own SubString structure that uses part of a string:
public struct SubString {
private string _str;
private int _offset, _len;
public SubString(string str, int offset, int len) {
_str = str;
_offset = offset;
_len = len;
}
public int Length { get { return _len; } }
public char this[int index] {
get {
if (index < 0 || index > len) throw new IndexOutOfRangeException();
return _str[_offset + index];
}
}
public void WriteToStringBuilder(StringBuilder s) {
s.Write(_str, _offset, _len);
}
public override string ToString() {
return _str.Substring(_offset, _len);
}
}
You can flesh it out with other methods like comparison that is also possible to do without extracting the string.
Because strings are immutable in .NET, every string operation that results in a new string object will allocate a new block of memory for the string contents.
In theory, it could be possible to reuse the memory when extracting a substring, but that would make garbage collection very complicated: what if the original string is garbage-collected? What would happen to the substring that shares a piece of it?
Of course, nothing prevents the .NET BCL team to change this behavior in future versions of .NET. It wouldn't have any impact on existing code.
Adding to the point that Strings are immutable, you should be that the following snippet will generate multiple String instances in memory.
String s1 = "Hello", s2 = ", ", s3 = "World!";
String res = s1 + s2 + s3;
s1+s2 => new string instance (temp1)
temp1 + s3 => new string instance (temp2)
res is a reference to temp2.

Categories

Resources