Occasionally we have had situations where the FAS server/s and Media brokers become unresponsive to service traffic.
Such problems are often due to threading or memory retention issues. Therefore it is worth gathering the following information in the order suggested before doing a restart of the services.
1. Capture the start time of the Servers
Run the command:
ps -ef | grep java > java_processes.txt
and send us the resultant java_processes.txt file
2. Gather thread dumps
Run the following from the unresponsive box from the directory with logcapture script:
./logcapture.sh -t -f fas_threads.tar
If the logcapture script fails to complete try:
For the App server run the command:
kill -3 `ps -ef | grep [a]ppserver | awk '{print $2}'`
run the command:
For the loadbalancer
kill -3 `ps -ef | grep [l]oadbalancer | awk '{print $2}'`
Both outputs will be in the /opt/cafex/FAS-X.X.X/domain/log/console.log
For the Media broker controller
run the command:
kill -3 `ps -ef | grep [m]ulti-rtp-proxy-manager | awk '{print $2}'`
The output will be in the /opt/cafex/FCSDK-X.X.X/media_broker/console.log
Note: You will need to get a copy of the console.log files prior to restarting FAS and Media Broker as the files will be overwritten on restart.
3. Gather output from TOP
The top thread output is important to collect as it can be correlated with the thread dump to identify spinning threads:
For app server box run the following command: top -H -d 2 -n 30 -b > fas_top_thread.txt
fas_top.txt file
This will take 60 seconds to complete and then send us the
It is also worth getting the manual capture of the cpu usage of individual cores.
Run: top and then select "1" to show individual cpu usage. Take a capture of the values.
For media broker box run:
top -H -d 2 -n 30 -b > mb_top_thread.txt
4. Gather full logs
Furthermore gathering the full logs is advisable (see Capturing Logs )
If the logs are rolling over too quickly due to heavy traffic and disk space is available it may be worth increasing the size and number of app server server.logs 20@200M gives 8 times more logging than the default and may capture the problem and only takes 4Gig of disk. This would provide better logging if the problem reoccurs in the future. If you do alter the logging then a FAS restart is required.
5. Take heap dumps
For app server run the following command:
`which jmap` -J-d64 -dump:format=b,file=as_heap.bin `ps -ef | grep [a]ppserver | awk '{print $2}'`
For the loadbalancer the following command:
`which jmap` -J-d64 -dump:format=b,file=lb_heap.bin `ps -ef | grep [l]oadbalancer | awk '{print $2}'`
For theMedia broker controller
the following command:
`which jmap` -J-d64 -dump:format=b,file=mbc_heap.bin `ps -ef | grep [m]ulti-rtp-proxy-manager | awk '{print $2}'`
(remove the -J-d64 flag if running on 32 bit box)
It will take a few seconds to complete and then you will have a 512M+ file to get to us. Probably via the ftp.cafex.com server If the file is smaller than this then there isn't a memory problem so the file can be discarded. The above commands can also be run with the -F Force option if the process does not respond.
Important Note: In certain situation the heap dump may take so long to capture that it causes a time out in FAS which triggers a restart so make sure you have a copy of the thread dump console.log and start time of servers prior to taking the heap dump.
Comments
0 comments
Please sign in to leave a comment.