Agreeing on the Cluster’s information needs is the first step towards identifying the objectives of the Cluster strategy. After the beginning of the emergency response, common information needs are usually determined by preemergency and incrisis baseline data (i.e preliminary damage and disaster impact data) along with stakeholder profile data. They will then be refined on an ongoing basis to ensure that it is responsive to stakeholders’ information needs.
Cluster members also have to agree on common standards and tools to work with: If possible, cluster members should use common collection tools as it enables to process and collate different types of data. Common data collection mechanisms include: joint needs assessments, surveys, tracking and monitoring, mapping, profiling, interviews and early warning systems.
“Not everything that counts can be counted”
There are a number of preexisting tools, such as rapid assessments and ‘who is doing what, where, when’ (4W) that assist identification of needs and help IM process. The Displacement Tracking Matrix (DTM) and MultiSectoral Initial Rapid Assessment (MIRA) are two examples of tool that have been developed by GCLA’s for country level adaptation. Others are the Rapid Assessment, Tool (RAT) and Comprehensive Assessment Tool (CAT).
This diagram is an example of assessment phasing from the Education Cluster
The Cluster Approach has somehow ensured that initial rapid needs assessments are anything but. Making data available that allows information to become knowledge useful for planning, monitoring, and evaluating aid interventions is what is required. But frequently, the survey methods used are so complicated, so expensive, and so labour intensive that the aid environment has moved on by the time the findings are made available.
You do not need to be an epidemiologist to run a useful survey and obtain useful numbers. In fact, in the early days of a crisis, it is better not to be one as epidemiologists tend to be fixated on academic rigour so that their findings can be published in peer review journals.
So, what is needed is a survey protocol that is:
In the first few days of an emergency response, meta analysis of other Cluster members’ survey findings can be combined with observational information from driveby’s, walk through’s, and overflights and then analysed subjectively through the lens of experience which the Cluster Coordinator and members of the coordination team and/or SAG bring to the table. Despite the epidemiologists having a fit at this point over the lack of empirical objectivity in all this, these findings nevertheless then inform the initial strategy for at least the first month or two of relief operations after which more detailed findings will further focus the response. For all of these, some basic definitions are required.
TERM 
DEFINITION and USAGE

Cluster Sampling  Involves geographically defined units 
Confidence Interval  Degree of statistical confidence in precision of the survey finding 
Design Effect  Ratio of variance from complex sample design to that of simple random sample of same sample size; adjusts for stratification as well as cluster effect. Humanitarian operations normally use a DE of 2. 
Mean  average: approximating the statistical norm or average or expected value 
Median  The numeric value separating the higher half of a sample from the lower half 
Probability Sampling  Every single individual in the sample frame has a known and nonzero chance of being selected into the survey sample 
Range  Range is the difference between the greatest and the least data value 
Sample Frame  Collection of population units from which the sample is selected. This should be as much of the affected population as possible 
Sample Size  Number of persons or households selected. We look for thirty in thirty different randomly selected locations. 
Standard Deviation  A measure of how spread out is your data. 
Statistical Significance  A survey result is said to be ‘statistically significant’ if the finding is unlikely to have happened by chance. It tells us something about whether the results are “true”. The reliability of the result is measured with a ‘pvalue’ (probability value) of between 0 and 1, The higher the value, the less we believe that the observed relation between the variables is a reliable indicator. 
In second and third phase humanitarian operations, a ‘two stage, thirty cluster, random sample’ survey is most normally used. What is it, why are we doing it, and how is it done?
The aim is to maximize the representativeness of the data.
Basic qualitative and quantitative demographic data from before the emergency provides the baselines against which the impact of the disaster and the response can be measured. Such datasets provide information on population density (e.g age and gender distributions), poverty (e.g income of less than $1.25 per day), and vulnerability,(e.g severe acute malnutrition scores) as well as more sectorspecific information such as vaccination coverage and access to improved sanitation.
If the country’s ‘emergency preparedness plan’ has been done properly, this information, along with delineated administrative boundaries and placecodes, will be available in map form. It is up to the InterCluster Coordination Group to define and agree the denominator baselines at the onset of the emergency.
In the immediate aftermath of a disaster, it will not be possible to carry out a full probability sample survey because of lack of access, time, and resources. But also because there is an absence of population data with which to create a suitable sample frame. So, nonprobabilistic sampling will be necessary instead. There are several key considerations when preparing to carry out an initial assessment:
Coverage: Identify and define the target population. Aim to cover as wide a cross section of the affected population and affected geographic arean as possible. Do not omit key sections, and try to avoid including just the easytoreach elements
Sample Frame: Establish clearly what your sample frame is. It might be a list of people, villages, camps, or a map.
Sampling Method: Use probability sampling if at all possible. Be honest in reporting your results. Describe clearly what has been done.
Sample Size: Go for as large a sample as you can manage within the resources available. Where there is choice, it is generally better to visit more locations and interview fewer people in each than vice versa.
Secondary Data: Make full use of information provided by the media, the government, and established NGOs. It is not always necessary to collect your own primary data. Even if you do, findings of others can still be incorporated in order to give a more rounded picture of the situation.
One of the main criticisms of rapid assessments has always related to sampling issues, and in particular the use of nonprobability sampling. The absence of a proper sampling frame means that it is impossible to calculate the probability of selection, and the method of sampling used therefore gives the possibility that there may be subgroups of special interest – such as the most isolated or the poorest – who are severely underrepresented. Such a selection bias would have serious implications for the analysis.
Bias is a key issue in any sampling exercise as the existence of displaced populations and time constraints make it impossible to employ randomized sampling techniques, and a blend of purposive and convenience sampling has to be used instead. This is particularly so when assessors are forced to rely on a small number of more easily accessible informants and observation points which may not be at all representative of the population or situation as a whole.
One way of reducing bias is through triangulation. The aim here is to use different approaches for collecting the same data and then to crosscheck the results to see if there any inconsistencies. This can be done, for instance, by comparing results from key informant interviews, focus group discussions, and from the teams own observations (though be careful of observation bias).
A confidence interval is a range around a measurement that conveys how precise the measurement is. For most chronic disease and injury programs, the measurement in question is a proportion or a rate (the percent of New Yorkers who exercise regularly or the lung cancer incidence rate). Confidence intervals are often seen on the news when the results of polls are released. This is an example from the Associate Press in October 1996:
The latest ABC NewsWashington Post poll showed 56 percent favored Clinton while 39 percent would vote for Dole. The ABC NewsWashington Post telephone poll of 1,014 adults was conducted March 810 and had a margin of error of plus or minus 3.5 percentage points. (Emphasis added).
Although it is not stated, the margin of error presented here was probably the 95 percent confidence interval. In the simplest terms, this means that there is a 95 percent chance that between 35.5 percent and 42.5 percent of voters would vote for Bob Dole (39 percent plus or minus 3.5 percent). Conversely, there is a 5 percent chance that fewer than 35.5 percent of voters or more than 42.5 percent of voters would vote for Bob Dole
The median is the numeric value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one. If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values.
For example, the population affected by the earthquake in Pakistan in 2005 was anybody’s guess in early September 2005. The eleven or so Cluster Coordinators gathered the estimates from those few members that had them and came to the InterCluster meeting the next evening with the median for their Cluster. Each Cluster then gave their figures. Various other sources, including the government representatives, the national Red Cross Society and OCHA added their ‘guesstimates’ and the median figure (of 2.8 million) was broadcast to the world later that night along with the ‘interquartile range’ that indicated the level of confidence we had in coming to that conclusion. Needless to say, the media ignored our qualification. The figure was revised (upwards, to 3.4 million) three weeks later once the evidence from more sophisticated surveys became available. The lesson here is that the figure might be wrong but it matters less if everyone is using it.
Clusters should be depicting data in this way, so you should be interested in box plots, too. These are a convenient way of graphically depicting groups of numerical data through their fivenumber summaries: the smallest observation (sample minimum), lower quartile (Q1), median (Q2), upper quartile (Q3), and largest observation (sample maximum). The spacings between the different parts of the box help indicate the degree of dispersion and therefore skewness (or confidence) in the data. It also helps visually in identification of outliers. Boxplots can be drawn either horizontally or vertically.
The standard deviation is a measure of how spread out is your data. Computation of the standard deviation is a bit tedious. The steps are:
Suppose your data follows the classic bell shaped curve pattern. One conceptual way to think about the standard deviation is that it is a measure of how spread out the bell is. To the left is a bell shaped curve with a standard deviation of 1. Notice how tightly concentrated the distribution is.
Shown below is a different bell shaped curve, one with a standard deviation of 2. Notice that the curve is wider, which implies that the data are less concentrated and more spread out.
Finally, a bell shaped curve with a standard deviation of 3 appears below. This curve shows the most spread.
This is is a section from Clusterwise 2. Reproduction is encouraged. It would be nice if the author,
James ShepherdBarron, and clustercoordination.org were acknowledged when doing so.
http://james.shepherdbarron.com/clusterwise2/20needsassessment/