Methodology  Population, Housing, and Income Estimates
First a quick overview:
In building population estimates there are several pieces needed to begin. The changes that occur in an area will be the addition of births, subtraction of deaths and the addition/subtraction of those who moved. The starting point is the 2000 Short Form (SF1) BLOCK level data set. This has the most detailed and comprehensive numbers about where the entire population of the US lives, their age and their race. To progress from the 2000 data to current year estimates, we use the US Census Bureau's (USCB) County and State level annual estimates to roll the numbers forward to the current year. But the USCB data is only available at the County and State level, so the next challenge is distributing the data down to the smaller geographies.
The next step is to work with actuarial tables for births and deaths by age and race, and use them to create a model of "likelihood" of dying or likelihood of having a child. This then is what creates the engine driving the increase and decrease in population growth.
The third step is to look at immigration and emigration. Where are people moving "to" and where are they moving "from". The US Postal Service keeps track of all moves as a "to" and "from" location.
Now the more detailed explanation:
1. Working with the Census Bureau "estimation base" county level numbers.
This data is processed to obtain "race distribution" coefficients. However, the Census Bureau estimation base data do not include "other" race category. Also, "two or more races" category is much smaller than it is in SF1/SF3 Census data. By comparing the estimation base to SF1 county level data, it is possible to obtain some numeric ratios as to how "other race" and "two or more races" populations were distributed among the remaining races in the USCB's estimation base. These coefficients allow us to remap the SF1 block level data and redistribute the "other race" and part of the "two or more races" population among the 6 remaining mutually exclusive races.
2. The SF1 block level data are processed with these new racial distribution coefficients. The resulting dataset is our estimation base. It includes 8 race/origin groups:
WA 
White alone 
BA 
Black alone 
NA 
Native American alone 
AA 
Asian alone 
PA 
Pacific alone 
R2 
Two or more races 
HS 
Hispanic 
WN 
White, not Hispanic 
A few words on Census analogs: The Total Population count corresponds to the Census table P001, count P0010001. The rest correspond to Census ageracesex tables from P012A to P012I, with the P012F (Other Race table) dropped. We do not have the "Other Race" category in the estimates even though Census 2000 does, because the USCB dropped the "Other Race" data from its estimates. They switched to 8 races in 2001 and we had to follow. It is worth mentioning that the USCB redistributed the racial counts of Other Race completely and the counts for "2 or more Races" were partially redistributed between the rest of the races in their estimates. We did the same and therefore the racial breakdown differs from the Census 2000 but fits the 2001 USCB estimates. We believe that the USCB made these changes because there are no actuarial tables for "other" or "2 or more" races so they needed to redistribute those people into one of the race categories by which they could create estimates
3. Having dealt with Race we then turn to Age. The USCB groups the population into 18 age groups. These range from age 0 (under 1) to age 108. The age groups are each 5 year intervals (04, 59, etc) except the ages 85 and up (85108) are treated as a single group.
4. Now that we have the entire population broken down into age and race categories we begin building the deathbirth model. With the use of Actuarial tables we calculate the statistical likelihood for any given age/race group to die or to give birth. We then apply these coefficients to the 2000 data to create an estimation base for 2001, the coefficients are reapplied to create 2002, and so on until we get to the current year.
The model includes:
 transformation of age group distribution to "exact age" distribution. The resulting data set has population groups for each single year of age from 0 to 108.
 application of death probabilities for a specific age, sex and race group.
 application of birth rates for a specific age, sex and race group. The white population is treated as a mix of white not Hispanic and Hispanic population. The mix ratio is determined from the block data.
 1 year shift.
 collecting the annual data into 5year buckets.
 comparison of the results with Census Bureau estimates for this year.
 the results of comparison are used to tweak birth rates and death probabilities to make the numbers of both newborn and deceased in the model to be exactly equal to Census Bureau numbers for each county. The racial distribution is also tweaked to reflect that of Census Bureau data. It puts the annual estimates in sync with USCB data as much as possible.
5. The same model is applied to the results for 20082013. This time, however, the "tweaking coefficients" are predicted (as we do not have any materials for comparison) from the tweaking coefficients for 2002 to 2007. The prediction algorithm is based on a linear regression approach (they actually fit the linear plot very nicely),
Methodology  Household Estimates
The household estimates were calculated from:
 the Census data on the household
 the estimated data on the households
 the Census data on the ageracesex
 the estimated data on the ageracesex.
GeoLytics calculated the ratios of Census household variables to Census ageracesex data and Census housing data and then used these ratios for estimated data of the same nature to get the estimated values. The underlying assumption being that the average family size by race will not have changed dramatically in the years since the 2000 Census was compiled.
Methodology  Housing Estimates
The only way that the number of housing units (HU) changes is if new buildings are built or old ones torn down. Some houses can be built on empty lots, but if a lot of houses are built usually a whole new development gets put in. So the first thing that we did was to look at the TIGER/Line files. This is the USCB file that shows each and every street in the US and has the numbers of each housing unit. By looking at this dataset we can determine if new streets have been put in and by looking at the numbering we can determine about how many units are being built. We can also see if new numbers have been added to an existing street.
1. The TIGER/Lines records for the years 2000 and 2007 were analyzed. For each block, the sum of associated address ranges was calculated. As a result, each block was assigned a Change Coefficient (CC), a number representing the changes in the aggregate number of addresses within this block. The number is a fraction between 1 and +1. The number 0 represents a block that has not been changed within this time interval. The number +1 represents a block that did not have any addresses in 2000 and has some in 2007, and the number 1 is a block with no addresses in 2007 and has some addresses in 2000. The block changes were later summarized to BG level.
2. The Census Bureau Housing Units Estimates (at the county) for the years 2000 to 2007 were used to assess the number of HU per county for the year 2008 via a linear regression algorithm.
3. For each county, the Census Bureau HU growth/decline was distributed among BGs of this county so that:
 BGs with CC = 0 did not change any HU counts
 BGs with CC not equal to 0 received some parts of the county growth on proportional basis so that BGs with CC > 0 received some HUs and BGs with CC < 0 lose some HUs. The results vary from small changes (mostly, a few percent is a typical change) to some pretty dramatic changes of 35 times (rarely). These obviously are where large housing complexes went in and dramatically changed the number of housing units in the block group.
Once we had the change in the number of Housing Units we can then look at the other housing variables such as of number of rooms, vacancy status, tenure (own vs. rent) status, etc. People all live in either a household or a group quarter (military barracks, college dorms, nursing homes, prisons, mental institutions, halfway homes, etc). The group quarters were left stable so the changes in population were then accounted for in the changes in Housing Units that had now been calculated. So for example, if the housing units stayed the same but the population numbers dropped than the vacancy status would go up.
The sum of all changes for all BGs in a county is equal to the Census Bureau HU county growth estimates.
Methodology  Income Estimates
When calculating Income Estimates there are several components. First we needed to calculate the changes in income from 1990 to 2000 so that we would have a basis for estimating forward. This again required some racial breakout changes because in 1990 the Race grouping was "Asian and Pacific Islanders" whereas in 2000 they are two separate races. Additionally the age changes had to be accounted for (everyone has aged since April 2000 so all of the age categories needed to shift up).
1. The first step was to create an Income Growth by Race number for each Block Group. Luckily, we were able to use both the GeoLytics Census CD 2000 Long Form (SF3) and the CensusCD 1990 in 2000 boundaries Long Form data product for the 1990 data. By using this normalized data set it means that we already have dealt with the geographic boundary changes from 1990 to 2000 and can then look at just the differences in incomes.
2. The BGlevel racial growth data were applied to 2000 Census data to obtain 2008 racial income growth coefficients for each BG area. First, the growth data for 19902000 were processed using a compound interest model. Second, the calculated "interest rates" were applied to 2000 racial income data to get the 2008 growth data.
The Income Growth data by Race were not available for many BG for some races because if there are very few households of a given race in a block group than numbers were suppressed by the USCB in 1990. For these cases, we used the USCB Median Income Estimates for years 20002006 to get 2008 state median income growth data using a linear regression algorithm, and then used these state growth data for Block Groups and races.
3. The racial aggregate income data were processed in the same manner as racial median income data.
4. The Householder age distributions were estimated by using estimated Householder totals from our dataset and an age shift model. Namely, for each age group, a calculated number of householders was moved to the next age group. The first and last age groups were processed in a special way to take into account both new and dead householders. The sum of all householder age brackets is equal to our estimated HH total for 2008.
5. The area income range data were estimated using a distribution shift model. First, we assumed that the Census 2000 income brackets represent the "best fit curve" frequency distribution, and then applied a linear stretch transformation to the income scale. Finally, I calculated the new income bracket values produced by this linear stretching of the frequency distribution. The stretch coefficient was equal to the median income growth ratio for this area. What it all means is that the income increase moves some households from its income bracket in 2000 to the next income bracket in 2008. The number of such households can be estimated mathematically if we know the exact number of households for each income value. This exact number can be estimated using the "best fit curve" model.
6. Finally, the BG data (both medians and aggregates) were tuned so that summary state median values were exactly equal to the state median data for 2008, as estimated from Census Bureau publications for 20002006 (see item 2). It was done by using a twosection linear mapping scheme. The scheme
 moves the actual state median so it becomes equal to the target value;
 leaves state minimum and maximum median values for state BGs intact;
 is a*x + b  linear a) between state minimum median value for all state BGs and state median, and b) between state median and state maximum median value for all state BGs (with different a and b within these two segments).
