Ignite's cache methods hang - c#

I'm using Ignite.NET 2.7.6 and sometimes it hangs on calling cache's methods like TryGet or MoveNext for cache's enumerator.
I have one server and multiple client nodes, the hang occurs on the client-side.
Typical call stack:
Apache.Ignite.Core.dll!Apache.Ignite.Core.Impl.Unmanaged.Jni.Env.CallVoidMethod(Apache.Ignite.Core.Impl.Unmanaged.Jni.GlobalRef obj, System.IntPtr methodId, long* argsPtr) Line 213 C#
Apache.Ignite.Core.dll!Apache.Ignite.Core.Impl.Unmanaged.UnmanagedUtils.TargetOutStream(Apache.Ignite.Core.Impl.Unmanaged.Jni.GlobalRef target, int opType, long memPtr) Line 145 C#
Apache.Ignite.Core.dll!Apache.Ignite.Core.Impl.PlatformJniTarget.OutStream(int type, System.Func readAction) Line 147 C#
Apache.Ignite.Core.dll!Apache.Ignite.Core.Impl.PlatformTargetAdapter.DoInOp(int type, System.Func action) Line 193 C#
Apache.Ignite.Core.dll!Apache.Ignite.Core.Impl.Cache.CacheEnumerator.MoveNext() Line 55 C#
Apache.Ignite.Core.dll!Apache.Ignite.Core.Impl.Cache.CacheEnumeratorProxy.MoveNext() Line 71 C#
AlphaLib.dll!Casino.Table.Enumerator.MoveNext() Line 503 C#
It hangs in CallVoidMethod. I tried to reproduce this on a simple project but failed.
This reproduces much more often if I start the client on the machine where the server node was started.
Any assumptions about why this happens?
ADDED
I inspected JVM state in case of hanging, here is the full stack: https://pastebin.com/v5HiuQWb
Looks like this thread is stuck:
"Thread-11" #148 prio=5 os_prio=0 tid=0x000001ae99665800 nid=0x34c4 in Object.wait() [0x00000050156bc000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at org.apache.ignite.internal.processors.cache.query.GridCacheQueryFutureAdapter.internalIterator(GridCacheQueryFutureAdapter.java:301)
- locked <0x00000005d494de98> (a org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryFuture)
at org.apache.ignite.internal.processors.cache.query.GridCacheQueryFutureAdapter.next(GridCacheQueryFutureAdapter.java:158)
at org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager$5.onHasNext(GridCacheDistributedQueryManager.java:642)
at org.apache.ignite.internal.util.GridCloseableIteratorAdapter.hasNextX(GridCloseableIteratorAdapter.java:53)
at org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45)
at org.apache.ignite.internal.processors.platform.cache.query.PlatformAbstractQueryCursor.processOutStream(PlatformAbstractQueryCursor.java:92)
at org.apache.ignite.internal.processors.platform.PlatformTargetProxyImpl.outStream(PlatformTargetProxyImpl.java:93)
It seems like this code hangs:
long waitTime = timeout == 0 ? Long.MAX_VALUE : timeout - (U.currentTimeMillis() - startTime);
if (waitTime <= 0) {
it = Collections.<R>emptyList().iterator();
break;
}
synchronized (this) {
try {
if (queue.isEmpty() && !isDone())
wait(waitTime); /* HERE!!! */
}
catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new IgniteCheckedException("Query was interrupted: " + qry, e);
}
}
I suppose that timeout is 0 in my case, so it waits infinitely, so it's possible to set it to a finite value. But it doesn't look like a good solution.

In our network, there is a Symantec protection system, that unexpectedly started closing ports that were used by Ignite. On both client and server sides. Furthermore, it was starting to close these ports after some event or time after launching - I didn't understand. I found it in the Symantec log.
After adding used ports to the white list the problem was solved.

Related

Foreground services and repetitive tasks which need to be executed on time

I'm developing an app which basically performs some tasks on timer tick (in this case - searching for beacons) and sends results to the server. My goal was to create an app which does its job constantly in the background. Fortunately, I'm using logging all over the code, so when we started to test it we found that sometime later the timer's callback wasn't being called on time. There were some pauses which obviously had been caused by standby and doze mode. At that moment I was using a background service and System.Threading.Timer. Then, after some research, I rewrote the services to use Alarm Manager + Wake locks, but the pauses were still there. The next try was to make the service foreground and use it with a Handler to post delayed tasks and everything seemed to be fine while the device was connected to the computer. When the device is not connected to a charger those pauses are here again. The interesting thing is that we cannot actually predict this behavior. Sometimes it works perfectly fine and sometimes not. And this is really strange because the code to schedule it is pretty simple and straightforward:
...
private int scanThreadsCount = 0;
private Android.OS.Handler handler = new Android.OS.Handler();
private bool LocationInProgress
{
get { return Interlocked.CompareExchange(ref scanThreadsCount, 0, 0) != 0; }
}
public void ForceLocation()
{
if (!LocationInProgress) DoLocation();
}
private async void DoLocation()
{
Interlocked.Increment(ref scanThreadsCount);
Logger.Debug("Location is started");
try
{
// Location...
}
catch (Exception e)
{
Logger.Error(e, "Location cannot be performed due to an unexpected error");
}
finally
{
if (LocationInterval > 0)
{
# It's here. The location interval is 60 seconds
# and the service is running in the foreground!
# But in the screenshot we can see the delay which
# sometimes reaches 10 minutes or even more
handler.PostDelayed(ForceLocation, LocationInterval * 1000);
}
Logger.Debug("Location has been finished");
Interlocked.Decrement(ref scanThreadsCount);
}
}
...
Actually it can be ok, but I need that service to do its job strictly on time, but the callback is being called with a few seconds delay or a few minutes and that's not acceptable.
The Android documentation says that foreground services are not restricted by standby and doze mode, but I cannot really find the cause of that strange behavior. Why is the callback not being called on time? Where do these 10 minutes pauses come from? It's pretty frustrating because I cannot move further unless I have the robust basis. Does anybody know the reason of such a strange behavior or any suggestions how I can achieve the callback to be executed on time?
P.S. The current version of the app is here. I know, it's quite boring trying to figure out what is wrong with one's code, but there are only 3 files which have to do with that problem:
~/Services/BeaconService.cs
~/Services/BeaconServiceScanFunctionality.cs
~/Services/BeaconServiceSyncFunctionality.cs
The project was provided for those who would probably want to try it in action and figure it out by themselves.
Any help will be appreciated!
Thanks in advance

After using the InternetConnect() API in wininet how can I tell if I'm still connected?

I use the InternetConnect() method from the WinINet APIs. I connect to my FTP server just fine with no issues. After I connect, I wait about 1 min and the server disconnects me because of no activity as expected. I then try to send a file but I'm not connected.
Is there a way to "check" the FTP connection to see if I'm still connected? Or is there some type of way for me to attach an event to tell me when I get disconnected?
I haven't used wininet for FTP, and I use the classes, not the global functions directly. But I suspect that CFtpConnection behaves the same way as CHttpConnection in this respect. Anyway, you might learn something from what I have discovered about the latter.
CHttpConnection seems to be a high and abstract level of connection. When I started out I expected its member functions to throw exceptions once the server closed the underlying socket (for timeout). I now know better or at least believe otherwise. NORMAL closing of the socket does NOT cause exceptions to be thrown at this high level of classy wininet. You might suspect as much from inspecting the wininet error codes: There is no code corresponding to the server having closed the connection.
I experimented with this and found: The server closing the socket (for timeout) is considered normal and does not cause an exception to be thrown. You can go ahead and use CHttpConnection without worrying about this. It will simply reconnect if needed without alerting you. So once you have called GetHttpConnection and got your CHttpConnection object, it will normally last forever!
The exceptions that might be thrown, ERROR_INTERNET_CONNECTION_ABORTED and ERROR_INTERNET_CONNECTION_RESET, are caused by abnormal conditions, f.ex. a proxy server crashing or somebody accidentally pulling the power plug to your modem. The server closing the socket for timeout is considered NORMAL and is transparent to the user of wininet classes.
So the tentative conclusion is that you don't have to worry about the connection being closed by the server. If that happens, CHttpConnection will reconnect backstage and you won't be bothered. You can pretend that the connection always stays open - it seems so to the user of wininet classes.
Consider the following experiment. A connection is opened and then a request is sent once a minute. The function returns once an exception is thrown. But if not, then it loops forever. I tried it on two different web sites: An exception is NEVER thrown! Despite a whole minute of inactivity between requests.
int httpclient::test(string host)
{
int flags = INTERNET_FLAG_RELOAD;
int port = INTERNET_DEFAULT_HTTP_PORT;
CHttpConnection *connection = session.GetHttpConnection(host.cstring(),flags,port);
int secs = 0;
while (true)
{
CHttpFile *fil;
try
{
fil = connection->OpenRequest(CHttpConnection::HTTP_VERB_HEAD, "index.htm",
0,1,0, "HTTP/1.1", flags);
}
catch (CInternetException *exc)
{
connection->Close();
int feil = exc->m_dwError;
exc->Delete();
return -feil;
}
fil->AddRequestHeaders("Connection: Keep-Alive");
try
{
fil->SendRequest();
}
catch (CInternetException *exc)
{
connection->Close();
int feil = exc->m_dwError;
exc->Delete();
return feil;
}
fil->Close();
Sleep(60 * 1000);
secs += 60;
printf("%u seconds passed\n", secs);
}
return 0;
}
Take all this with a grain of salt. wininet is poorly documented and all I know is what experiments have taught me.

Detecting unexpected socket disconnect

This is not a question about how to do this, but a question about whether it's wrong what I'm doing. I've read that it's not possible to detect if a socket is closed unexpectedly (like killing the server/client process, pulling the network cable) while waiting for data (BeginReceive), without use of timers or regular sent messages, etc. But for quite a while I've been using the following setup to do this, and so far it has always worked perfectly.
public void OnReceive(IAsyncResult result)
{
try
{
var bytesReceived = this.Socket.EndReceive(result);
if (bytesReceived <= 0)
{
// normal disconnect
return;
}
// ...
this.Socket.BeginReceive...;
}
catch // SocketException
{
// abnormal disconnect
}
}
Now, since I've read it's not easily possible, I'm wondering if there's something wrong with my method. Is there? Or is there a difference between killing processes and pulling cables and similar?
It's perfectly possible and OK to do this. The general idea is:
If EndReceive returns anything other than zero, you have incoming data to process.
If EndReceive returns zero, the remote host has closed its end of the connection. That means it can still receive data you send if it's programmed to do so, but cannot send any more of its own under any circumstances. Usually when this happens you will also close your end the connection thus completing an orderly shutdown, but that's not mandatory.
If EndReceive throws, there has been an abnormal termination of the connection (process killed, network cable cut, power lost, etc).
A couple of points you have to pay attention to:
EndReceive can never return less than zero (the test in your code is misleading).
If it throws it can throw other types of exception in addition to SocketException.
If it returns zero you must be careful to stop calling BeginReceive; otherwise you will begin an infinite and meaningless ping-pong game between BeginReceive and EndReceive (it will show in your CPU usage). Your code already does this, so no need to change anything.

Timeouts in Silverlight Sockets

I'm using Sockets in my Silverlight application to stream data from a server to a client.
However, I'm not quite sure how timeouts are handled in a Silverlight Socket.
In the documentation, I cannot see anything like ReceiveTimeout for Silverlight.
Are user-defined timeouts possible? How can I set them? How can I get notifications when a send / receive operation times out?
Are there default timeouts? How big are they?
If there are no timeouts: what's the easiest method to implement these timeouts manually?
I've checked the Socket class in Reflector and there's not a single relevant setsockopt call that deals with timeouts - except in the Dispose method. Looks like Silverlight simply relies on the default timeout of the WinSock API.
The Socket class also contains a "SetSocketOption" method which is private that you might be able to call via reflection - though it is very likely that you will run into a security exception.
Since I couldn't find any nice solution, I solved the problem manually by creating a System.Threading.Timer with code similar to the following:
System.Threading.Timer t;
bool timeout;
[...]
// Initialization
t = new Timer((s) => {
lock (this) {
timeout = true;
Disconnected();
}
});
[...]
// Before each asynchronous socket operation
t.Change(10000, System.Threading.Timeout.Infinite);
[...]
// In the callback of the asynchronous socket operations
lock (this) {
t.Change(System.Threading.Timeout.Infinite, System.Threading.Timeout.Infinite);
if (!timeout) {
// Perform work
}
}
This handles also cases where a timeout occurs which is produced by simple lag, and lets the callback return immediately if the operation took too much time.
I solved this issue for my project sharpLightFtp like:
Created a class which is injected in the UserToken-property of an instance of System.Net.Sockets.SocketAsyncEventArgs and has an System.Threading.AutoResetEvent, which is used to receive a signal after ConnectAsync, ReceiveAsync and SendAsync with a timeout (like here: line 22 for getting a custom enhanced SocketAsyncEventArgs-instance, line 270 for creating and enhancing the SocketEventArgs-instance, line 286 for sending the signal and line 30 for waiting)

DisconnectedContext MDA when calling WMI functions in single-threaded application

I write an app in C#, .NET 3.0 in VS2005 with a feature of monitoring insertion/ejection of various removable drives (USB flash disks, CD-ROMs etc.). I did not want to use WMI, since it can be sometimes ambiguous (e.g. it can spawn multiple insertion events for a single USB drive), so I simply override the WndProc of my mainform to catch the WM_DEVICECHANGE message, as proposed here. Yesterday I run into a problem when it turned out that I will have to use WMI anyway to retrieve some obscure disk details like a serial number. It turns out that calling WMI routines from inside the WndProc throws the DisconnectedContext MDA.
After some digging I ended with an awkward workaround for that. The code is as follows:
// the function for calling WMI
private void GetDrives()
{
ManagementClass diskDriveClass = new ManagementClass("Win32_DiskDrive");
// THIS is the line I get DisconnectedContext MDA on when it happens:
ManagementObjectCollection diskDriveList = diskDriveClass.GetInstances();
foreach (ManagementObject dsk in diskDriveList)
{
// ...
}
}
private void button1_Click(object sender, EventArgs e)
{
// here it works perfectly fine
GetDrives();
}
protected override void WndProc(ref Message m)
{
base.WndProc(ref m);
if (m.Msg == WM_DEVICECHANGE)
{
// here it throws DisconnectedContext MDA
// (or RPC_E_WRONG_THREAD if MDA disabled)
// GetDrives();
// so the workaround:
DelegateGetDrives gdi = new DelegateGetDrives(GetDrives);
IAsyncResult result = gdi.BeginInvoke(null, "");
gdi.EndInvoke(result);
}
}
// for the workaround only
public delegate void DelegateGetDrives();
which basically means running the WMI-related procedure on a separate thread - but then, waiting for it to complete.
Now, the question is: why does it work, and why does it have to be that way? (or, does it?)
I don't understand the fact of getting the DisconnectedContext MDA or RPC_E_WRONG_THREAD in the first place. How does running GetDrives() procedure from a button click event handler differs from calling it from a WndProc? Don't they happen on the same main thread of my app? BTW, my app is completely single-threaded, so why all of the sudden an error referring to some 'wrong thread'? Does the use of WMI imply multithreading and special treatment of functions from System.Management?
In the meantime I found another question related to that MDA, it's here. OK, I can take it that calling WMI means creating a separate thread for the underlying COM component - but it still does not occur to me why no-magic is needed when calling it after a button is pressed and do-magic is needed when calling it from the WndProc.
I'm really confused about that and would appreciate some clarification on that matter. There are only a few worse things than having a solution and not knowing why it works :/
Cheers,
Aleksander
There is a rather long discussion of COM Apartments and message pumping here. But the main point of interest is the message pump is used to ensure that calls in a STA are properly marshaled. Since the UI thread is the STA in question, messages would need to be pumped to ensure that everything works properly.
The WM_DEVICECHANGE message can actually be sent to the window multiple times. So in the case where you call GetDrives directly, you effectively end up with recursive calls. Put a break point on the GetDrives call and then attach a device to fire the event.
The first time you hit the break point, everything in fine. Now press F5 to continue and you will hit the break point a second time. This time the call stack is something like:
[In a sleep, wait, or join]
DeleteMeWindowsForms.exe!DeleteMeWindowsForms.Form1.WndProc(ref System.Windows.Forms.Message m) Line 46 C#
System.Windows.Forms.dll!System.Windows.Forms.Control.ControlNativeWindow.OnMessage(ref System.Windows.Forms.Message m) + 0x13 bytes
System.Windows.Forms.dll!System.Windows.Forms.Control.ControlNativeWindow.WndProc(ref System.Windows.Forms.Message m) + 0x31 bytes
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.DebuggableCallback(System.IntPtr hWnd, int msg, System.IntPtr wparam, System.IntPtr lparam) + 0x64 bytes
[Native to Managed Transition]
[Managed to Native Transition]
mscorlib.dll!System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle waitableSafeHandle, long millisecondsTimeout, bool hasThreadAffinity, bool exitContext) + 0x2b bytes
mscorlib.dll!System.Threading.WaitHandle.WaitOne(int millisecondsTimeout, bool exitContext) + 0x2d bytes
mscorlib.dll!System.Threading.WaitHandle.WaitOne() + 0x10 bytes
System.Management.dll!System.Management.MTAHelper.CreateInMTA(System.Type type) + 0x17b bytes
System.Management.dll!System.Management.ManagementPath.CreateWbemPath(string path) + 0x18 bytes
System.Management.dll!System.Management.ManagementClass.ManagementClass(string path) + 0x29 bytes
DeleteMeWindowsForms.exe!DeleteMeWindowsForms.Form1.GetDrives() Line 23 + 0x1b bytes C#
So effectively the window messages are being pumped to ensure the COM calls are properly marshalled, but this has the side effect of calling your WndProc and GetDrives again (as there are pending WM_DEVICECHANGE messages) while still in a previous GetDrives call. When you use BeginInvoke, you remove this recursive call.
Again, put a breakpoint on the GetDrives call and press F5 after the first time it's hit. The next time around, wait a second or two then press F5 again. Sometimes it will fail, sometimes it won't and you'll hit your breakpoint again. This time, your callstack will include three calls to GetDrives, with the last one triggered by the enumeration of the diskDriveList collection. Because again, the messages are pumped to ensure the calls are marshaled.
It's hard to pinpoint exactly why the MDA is triggered, but given the recursive calls it reasonable to assume the COM context may be torn down prematurely and/or an object is collected before the underlying COM object can be released.

Categories

Resources