Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_multi 50x slower when one of the keys is not found #106

Closed
orslumen opened this issue Jul 20, 2011 · 22 comments
Closed

read_multi 50x slower when one of the keys is not found #106

orslumen opened this issue Jul 20, 2011 · 22 comments

Comments

@orslumen
Copy link

If all keys are present, the read_multi call is very fast. But surprisingly, when one of the requested keys is missing, it becomes more than 50x slower, see example below. The memcache-client gem does not have that problem.

I found that the time is spend in memcached itself, so I upgraded to the latest version (1.4.6) but that does not help.
I think the problem is in the way Dalli communicates with memcached.

The following example assumes Rails is loaded in the IRB

> mstore = ActiveSupport::Cache.lookup_store(:mem_cache_store, '127.0.0.1:11211', {:namespace => 'mtest'})
 => #<ActiveSupport::Cache::MemCacheStore:0xcab46fc @options={:namespace=>"mtest"}, dataMemCache: 1 servers, ns: nil, ro: false 
> dstore = ActiveSupport::Cache.lookup_store(:dalli_store, '127.0.0.1:11211', {:namespace => 'dtest'})
 => #<ActiveSupport::Cache::DalliStore:0xca9bd00 @options={:namespace=>"dtest"}, data#<Dalli::Client:0xca910f8 @options={:expires_in=>0}, ringnil, servers["127.0.0.1:11211"] 

> mstore.write('key1', 'value'); mstore.write('key2', 'value');
> dstore.write('key1', 'value'); dstore.write('key2', 'value');

> # MemCacheStore.read_multi with all known keys
> t=Time.now; 1000.times{ mstore.read_multi('key1', 'key2') }; Time.now - t
 => 0.662803 

> # DalliStore.read_multi with all known keys
> t=Time.now; 1000.times{ dstore.read_multi('key1', 'key2') }; Time.now - t
 => 0.320749 

> # MemCacheStore.read_multi with one missing key
> t=Time.now; 1000.times{ mstore.read_multi('key1', 'key3') }; Time.now - t
 => 0.605206 

> # Dalli.read_multi with one missing key
> t=Time.now; 1000.times{ dstore.read_multi('key1', 'key3') }; Time.now - t
 => 32.166594
@bootspring
Copy link

Can't reproduce:

> rails c
Loading development environment (Rails 3.0.8)
ruby-1.9.2-p180 :001 > dstore = Rails.cache
 => #<ActiveSupport::Cache::DalliStore:0x0000010204ece8 @options={:namespace=>"cypress_cache-anilla/cypress", :compress=>true, :expires_in=>604800}, @data=#<Dalli::Client:0x0000010204d280 @servers=["localhost"], @options={:expires_in=>0}, @ring=nil>, @thread_local_key=:active_support_cache_dalli_store_local_cache_2164422260, @middleware=ActiveSupport::Cache::Strategy::LocalCache> 
ruby-1.9.2-p180 :002 > dstore.write('key1', 'value'); dstore.write('key2', 'value');
ruby-1.9.2-p180 :003 >   t=Time.now; 1000.times{ dstore.read_multi('key1', 'key2') }; Time.now - t
 => 0.578851 
ruby-1.9.2-p180 :004 > t=Time.now; 1000.times{ dstore.read_multi('key1', 'key3') }; Time.now - t
 => 0.444149 
ruby-1.9.2-p180 :005 > Dalli::VERSION
 => "1.0.5" 

@mperham
Copy link
Collaborator

mperham commented Jul 23, 2011

Sorry, wrong account :-)

@mperham mperham closed this as completed Jul 23, 2011
@phiggins
Copy link
Contributor

FWIW, I can reproduce this on my machine running gentoo linux, ruby 1.9.2-p290, dalli 1.0.5 and memcached 1.4.5.

I rewrote the original example script: https://gist.github.com/1103725

Results were all over the place, but here's two typical runs:

$ ruby benchmark_dalli_multi_get.rb 10
N == 10
Rehearsal -------------------------------------------------
with all keys   0.000000   0.000000   0.000000 (  0.001365)
missing a key   0.010000   0.000000   0.010000 (  0.083601)
---------------------------------------- total: 0.010000sec

                    user     system      total        real
with all keys   0.000000   0.000000   0.000000 (  0.001410)
missing a key   0.010000   0.000000   0.010000 (  0.040388)
$ ruby benchmark_dalli_multi_get.rb 10
N == 10
Rehearsal -------------------------------------------------
with all keys   0.000000   0.000000   0.000000 (  0.001301)
missing a key   0.000000   0.000000   0.000000 (  0.357664)
---------------------------------------- total: 0.000000sec

                    user     system      total        real
with all keys   0.010000   0.000000   0.010000 (  0.002479)
missing a key   0.010000   0.000000   0.010000 (  0.279454)

Then I rewrote that to use ruby-prof, but I didn't see anything exceedingly obvious.

Script: https://gist.github.com/1103732
Output: https://gist.github.com/1103733 (raw html, sorry)

$ time ruby profile_dalli_multi_get.rb 1000
N == 1000

real    0m23.531s
user    0m0.624s
sys     0m0.264s

@orslumen
Copy link
Author

I could reproduce it on Ubuntu 10.4 (linux 2.6.32-33), with dalli 1.0.5, memcached 1.4.2 and 1.4.6, ruby 1.8.7REE and ruby 1.9.2.

When playing around with different configurations I noticed that the slowdown of the 'missing-key-multi-get' fluctuated between a 10x to 60x performance decrease.

A colleague using a Mac saw consistently good performance. Could it be connected to the combination of Linux and Dalli?

@mperham
Copy link
Collaborator

mperham commented Jul 25, 2011

real    0m23.531s
user    0m0.624s
sys     0m0.264s

These times would indicate that ruby is just blocking on I/O, possibly implying that the issue is on the server-side. Can you reproduce it with other clients?

@phiggins
Copy link
Contributor

Yes. Here's a version of the benchmark script using the memcache gem instead: https://gist.github.com/1106068

I know running things once or twice doesn't mean anything statistically, but the memcache version has no unusual slowdown in the missing key case:

$ ruby benchmark_memcache_multi_get.rb 
N == 1000
Rehearsal -------------------------------------------------
with all keys   0.100000   0.010000   0.110000 (  0.131847)
missing a key   0.060000   0.020000   0.080000 (  0.103449)
---------------------------------------- total: 0.190000sec

                    user     system      total        real
with all keys   0.080000   0.030000   0.110000 (  0.130006)
missing a key   0.080000   0.000000   0.080000 (  0.103335)
$ ruby benchmark_dalli_multi_get.rb 
N == 1000
Rehearsal -------------------------------------------------
with all keys   0.120000   0.030000   0.150000 (  0.148742)
missing a key   0.190000   0.060000   0.250000 ( 22.132741)
---------------------------------------- total: 0.400000sec

                    user     system      total        real
with all keys   0.130000   0.030000   0.160000 (  0.177940)
missing a key   0.180000   0.050000   0.230000 ( 14.453573)

@mperham
Copy link
Collaborator

mperham commented Jul 26, 2011

Pete, your benchmark works perfectly for me against HEAD and with the Dalli 1.0.4 gem. Still can't reproduce - one of you will have to track down where that time goes.

@qianthinking
Copy link

I found the benchmark depends on if memcached server is remote or local. I can reproduce it if memcached server at localhost.

@mperham
Copy link
Collaborator

mperham commented Sep 21, 2011

Can one of you hook up perftools.rb, run with the profiler, generate a call graph GIF and post it here?

brew install graphviz ghostscript
gem install perftools.rb
CPUPROFILE_FREQUENCY=200 CPUPROFILE_REALTIME=1 CPUPROFILE=/tmp/dalli_profile RUBYOPT="-r`gem which perftools | tail -1`" ruby profile_dalli_multi_get.rb 1000
pprof.rb --gif /tmp/dalli_profile > /tmp/dalli_profile.gif

@orslumen
Copy link
Author

Here is the callgraph

Thanks for your time.

@mperham
Copy link
Collaborator

mperham commented Sep 27, 2011

That graph says that memcached or your machine is the slow factor, not Dalli.

On Sep 27, 2011, at 0:45, orslumenreply@reply.github.com wrote:

Here is the callgraph

Thanks for your time.

Reply to this email directly or view it on GitHub:
#106 (comment)

@dfens
Copy link
Collaborator

dfens commented Sep 27, 2011

I can reproduce this too (ubuntu)

@mperham
Copy link
Collaborator

mperham commented Sep 27, 2011

Try removing the kgio gem and see if that makes a difference.

@orslumen
Copy link
Author

Already tried that last week. Did not help...

@dfens
Copy link
Collaborator

dfens commented Sep 27, 2011

tried it, it does not depend on kgio

@mperham
Copy link
Collaborator

mperham commented Feb 25, 2012

It's looking more and more like this is an issue with older memcached server versions. 1.4.8 has a bug fix for multi-get processing and I can't reproduce this on newer versions of memcached.

It probably works with memcache-client because it uses the text protocol, not binary.

Could people upgrade their memcached server version and see if the issue goes away?

@timhaines
Copy link

I have reproduced this on Memcached 1.4.13 with Libevent 2.0.17. Using REE and Ubuntu 10.04.3

@mperham
Copy link
Collaborator

mperham commented Feb 25, 2012

@timhaines Does it happen on 1.9.3?

@timhaines
Copy link

Just tried it - no. The problem goes away on 1.9.3.

@orslumen
Copy link
Author

I can also confirm that the problem disappears with Dalli 2.2.1 + ruby 1.9.3.

Note that Dalli 1.1.5 + ruby 1.9.3 is still slow on Ubuntu (not as slow as with ree 1.8.7 though).

@tmm1
Copy link
Contributor

tmm1 commented Oct 31, 2012

Is anyone still able to repro this? I fixed some multi_get performance issues on master recently that are triggered when using large memcached server pools.

@slingamn
Copy link

slingamn commented Jun 2, 2014

We had this problem and it was caused by Nagle's algorithm; it went away when we set TCP_NODELAY. We think the cause is the binary protocol for multigets is write-read-write-read on a hit, but write-write-read on a miss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants