Re: XForms: Re: Big time lag using XCopyArea, and get_next_event() event backlog problem.

From: jac@casurgica.com
Date: Wed Apr 23 2003 - 10:49:59 EDT

  • Next message: T.C. Zhao: "Re: XForms: Re: Big time lag using XCopyArea, and get_next_event() event backlog problem."

    # To subscribers of the xforms list from jac@casurgica.com :

    > I suspect that you will find detailed answers hard to come by, for the
    > simple reason that the author of the code (TC) and its maintainer for
    > the last donkey's years (SPL) are the ones with the real, detailed
    > knowledge about this end of the library. Your delvings have probably
    > made you "the" expert.

    This is very unfortunate because in reality, I have absolutely no idea
    what I'm talking about.

    One last thing. Just as an "fyi" for people who are interested, here is
    the email I sent to my boss last night which contains a half-coherent
    summary of the stuff I found (btw, pardon the "some genius" comment, I was
    pretty annoyed at that point):

    after 9 hours of digging through forms library code and all sorts of crap,
    i found the cause of the problem, why it's happening, and why only
    XCopyArea is doing it (XCopyPlane would do it, too). turns out it's a
    combination of a few bugs in the forms library and some unfortunate
    default settings in the [removed].

    if you look under GraphicsExpose and NoExpose events in the xlib reference
    manual, it states:

    "If graphics_exposures is True in the GC used for the copy, either one
    NoExpose event or one or more GraphicsExpose events will be generated for
    every XCopyArea or XCopyPlane call made."

    so what is happening is, every time you call fl_check_forms, you follow it
    immediately by a call to [removed], which calls XCopyArea. this
    causes a NoExpose (GraphicsExpose events weren't the problem, they weren't
    getting sent) event to be sent to the canvas. via a few other functions,
    fl_check_forms eventually calls the function get_next_event(). some bad
    logic in the function eventually leads to the event queue getting too full
    and lagging the ui (it peaks around 320 events when it should only be
    peaking around 2 or 3... and it only stops at 320 because it can't hold
    any more). here's what get_next_event() does:

    - if there's at least one event in the queue, grab the first event.
    - if this event is not destined for a form window (if it's going to say, a
    canvas window), do some minor tweaking of the event then put it back on
    the queue so that it can be handled by the next call to get_next_event().

    this is all good and well. now, get_next_event() also calls a function
    fl_watch_io(). this function is called every 11th call to get_next_event()
    just so it doesn't eat up too much cpu time. fl_watch_io() does a bunch of
    socket stuff that i'm not too clear on. there is a comment in
    get_next_event() that says fl_watch_io() shouldn't be called with xevents
    in the queue because it will delay processing of the events. but this is
    ok because the queue should be empty or only contain 1 or 2 events when
    fl_watch_io() is called (and 1 or 2 events doesn't lag it that much).

    HOWEVER, some genius decided that on every 11th call to get_next_event(),
    when fl_watch_io() is called, event processing should be completely
    skipped! this means that for every 11 calls to get_next_event(), only 10
    events are removed from the queue. now recall that since you are calling
    one XCopyArea() per fl_check_forms(), you are adding 1 event to the queue
    each time. so for every 11 events you add to the queue, only 10 are
    removed. this builds up quickly and, not only does it overflow the queue,
    but it leads to fl_watch_io() being called with over 300 events in the
    queue, which *really* slows things down. by the way, an interesting thing
    to note is that fl_check_forms() will ALWAYS return NULL every 11 times
    you call it.

    when you don't call XCopyArea, the only time events get added to the queue
    are in response to mouse and keyboard and expose events and such. in this
    case it's ok that 10 out of 11 calls to get_next_event() actually remove
    an event, because there's so few events in the queue that they all get
    processed very quickly.

    this is why calling fl_check_forms() twice fixed the problem. because
    every time you called XCopyArea and a NoExpose event happened, you removed
    at least 1 event from the queue, so it was always cool. calling
    fl_check_forms() twice doesn't work if you call XCopyArea() twice.

    i have many possible solutions to this but i narrowed them down to two
    simple ones. i have tried them both and they both work perfectly
    (in the test program, anyway) with no side effects:

    1) make get_next_event() *not* skip the event processing every 11 events.
    there is no reason for it to do so. but still make it call fl_watch_io()
    every 11 events. this way, all the events get processed and fl_watch_io()
    still gets called. about 5% of the time, fl_watch_io() is called with 1 or
    2 events in the queue, but this is ok and fl_watch_io() doesn't noticeably
    hang.

    2) make the [removed] GC have it's graphics_exposures set to False so
    that NoExpose events aren't generated. do this by modifying the part where
    _gc gets initialized like so:

      XGCValues values;
      values.graphics_exposures = False;
      _gc = XCreateGC(_display, _window, GCGraphicsExposures, &values);

    instead of:

      _gc = XCreateGC(_display, _window, 0, NULL);

    both ways are good for different reasons, and i'd actually recommend doing
    them both. we have the xforms 1.0 source and we can modify it however we
    want, so we can fix it there and stop using 0.89. if we don't do way 1,
    the possibility for this problem to occur is still there -- we've only
    fixed one of the things that lead up to the queue overflow occuring. i
    don't like that.

    another thing is, the bug is kind of "unfixable" in a way... but it's
    weird because the only way to "fix" it would be to hack things in and
    start doing risky things like ignoring events in weird places and such.
    the reason is: fl_check_forms(), for the most part, only processes one
    event each time it is called. so if you are explicitly generating more
    than one event per fl_check_forms() call, you're kind of screwed. there's
    no way around this except to use fl_do_forms() instead of
    fl_check_forms(), which basically processes all the events in the queue
    before returning.

    so, here's my suggestions:

    1) make sure graphics_exposures is false in the [removed], and
    2) fix get_next_event() so it never skips an event, and
    3) use fl_do_forms() whenever possible, which i guess is tough for the way
    [removed] and the [removed] work.

    jason

    _________________________________________________
    To unsubscribe, send the message "unsubscribe" to
    xforms-request@bob.usuhs.mil or see
    http://bob.usuhs.mil/mailserv/xforms.html
    XForms Home Page: http://world.std.com/~xforms
    List Archive: http://bob.usuhs.mil/mailserv/list-archives/
    Development: http://savannah.nongnu.org/files/?group=xforms



    This archive was generated by hypermail 2b29 : Wed Apr 23 2003 - 10:51:30 EDT