600buckets
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||
Oviewview 600-Bucket is for testing scribe configuration which has a large number of buffer stores. E.g. 600 buffer stores within a top level bucket store. In this kind of setup, if down stream receivers are unreachable, buffer stores will start backing incoming messages which could incur such large amount of file i/o that it could bring down the scribe server during buffer store's buffering, and SENDING_BUFFER phases. The test is used to verify the changes and newly added options that are designed to deal with this situation. Network Topology You need at least 3 servers: a sender, a mid tier, and a receiver. Scribe servers will be running on all of them. On the receiver, we will run two scribe servers to simulate bucketing to different hosts. Configuration files for scribe instances will be generated via confgen.pl script. We made some assumptions about the configurations. If you are testing someting differently, or of different scale, please update confgen.pl's configuration accordingly. Run confgen.pl again to generate updated scribe.confs. Configuration generation Before you run confgen.pl, you need to collect the following information: * midtierhost: host name /ip of the mid tier server * midtierport: port number of the mid tier server * receiverhost: hostname/ip of the receiver server * receiverporteven: port number of the receiver server * numbuckets: how many buckets * outputdir: there will be five configuration file generated. outputdir specify under which directory will the output conf file generated. I usually output them to one of the directory under my home nfs mount so that I can use it anywhere. * category: scribe category * scribefilepath: root directory for scribe file store. File basename will be the same as the category. Now that you have all those info, run perl confgen.pl -h to view the latest syntax. Plug in the information listed above into the command line argument. Run "perl confgen.pl" with arguments supplied, and you have generated all necessary configurations. The following files are generated: * scribe.sender.conf * scribe.midtier.conf * scribe.sinker.conf * scribe.sinker.file.even.conf * scribe.sinker.file.odd.conf You need first three conf files for load test. You need the first two conf files and even/odd sinker conf files if you want to test whether messages are bucketed correctly. scribe.sinker.file.even.conf and scribe.sinker.file.odd.conf use file stores to write messages onto the disk so that you can verify bucketing results. Now, copy those conf file to sender, mid tier, and receiver box. (sinker means receiver, if you don't already noticed) Or, copy them to a mounted directory. How to test: 1. Start mid tier: scribed -p midtierport scribe.midtier.conf. 2. Start sender: scribed -p senderport scribe.sender.conf. 3. Use your favorite scribe client to generate scribe messages. Since we are testing bucket store messages, make sure that you read the scribe.midtier.conf file's relevant bucket store configuration regarding which delimiter to use and how many buckets are there. By default, we use ^A, or ascii 1, as delimiter, and there are 600 buckets. You can run your client any where. However, a convenient place is on the sender box. Tune your message load to the max load specified in scribe.conf. Directions are given in confgen.pl within each conf file template. 4. Use ganglia to monitor mid tier and sender servers. Observe the following: 4.1. sender output bandwidth 4.2. mid tier input bandwidth 4.3. mid tier memory usage 4.4. mid tier disk usage. (Avail disk should keep going down) 4.5. mid tier output bandwidth (it should be close to zero as we haven't brought up receivers yet) 5. Let it run for some extended period of time. Watch available disk starts getting lower and lower overtime. 6. On receiver servers, starts two scribe sinker servers: 6.1. perl -p 9998 scribe.conf, and 6.2. perl -p 9999 scribe.conf. 7. Monitor ganglia for sender, mid tier, and receiver hosts. Pay attention to network, memory utilitzation, and avail free disk. With proper tuning, free available disk recovery should be as fast as when it was getting filled in step 4. Misc: 1. iorw.sh: is a shell command that show disk i/o. It mainly use iostat but with a little bit of formatting to only show interesting i/o stats. 2. toggleSinker.sh: is a shell program that automates periodic shutting down sinkers and then bring it up repeatly, every 10 min. More Information: http://www.intern.facebook.com/intern/wiki/index.php/AdScribe2#600_Buckets