Normalizing the Data
The actual re-mapping procedure for converting data from 1980 to 2000 boundaries is quite complicated. The basic procedure was done in two steps. First, create the correspondence file between 1990 and 2000 and then create one from 1980 to 1990. One of the issues with normalizing the data is determining which dataset to use. For example, in 1980 the sum of the MCD’s in a county is not always equal to the county numbers. Likewise the sum of the tracts is not always equal to the MCD’s or counties. In part this is due to data suppression (see below) and in part it is due to census tabulation irregularities. The computer system the US Census Bureau was running on in 1980 was not powerful enough to allow them to do many of the cross checks that we now take for granted. Thus what we decided to do was to use the smallest level of geography for which there was coverage in that area. We used tract level data for 80%, MCD level data for 19% and had to resort to county level data for the remaining 1%. When we normalized the dataset we selected the source and then distributed the data to the 2000 geographies. Thus in the CensusCD 1980 in 2000 Boundaries, Block Groups sum to Tracts which sum to MCDs which sum to Counties which sum to States.
Special Issues Relating to MCDs
There are a few exceptions where the MCDs are not in alignment. The problem is that the TIGER 1992 file associates 1990 blocks with 1980 tracts, 1980 counties, and 1980 MCD FIPS codes. But, the 1980 MCD data is stored at the MCD census code level. Therefore you then need to consult the Census Bureau’s 1980 MCD FIPS to Census code correspondence file. Some of these correspondences are mis-aligned, thus causing the problem. Additionally, when we look at the map of the 1980 MCD geography there are some holes in the map (see below). There is supposed to be complete coverage of the U.S. at the MCD level. These holes are caused by errors in the TIGER 92 relationship file. Quite frankly, they are holes in the data correspondance that therefore cannot be properly mapped. By normalizing the data to the 2000 MCD geographic definitions we solve this error as well, as the maps below demonstrate.
MCD map of the U.S. in 1980 Boundaries (created by GeoLytics’ CensusCD 1980 Long Form)
Close up view of MCDs in South and North Dakota in 1980 Boundaries
Close up view of MCDs in South and North Dakota (created by GeoLytics’ CensusCD 1980 in 2000 Boundaries)
Data Suppression in 1980
Many of the questions on the Long Form (STF-3) are personal and the US Census Bureau works very hard to ensure that no individual or household is identifiable. This means that when the answers to many of the questions are too few, the data is suppressed. Some data, like Race or Age Grouping, are not suppressed because there is a sense that you can tell by looking at someone their age category or their race. But other issues, like income, are carefully guarded. Thus if there are only 2 Asian families the Census Bureau will not give you family income by race for Asians. The numbers will just be zeroed out.
Because of this suppression the sum of the tracts in a county that is fully tracted may not be the same as the county level numbers. Likewise, the sum of Block Groups may not be the same as the Tract or County numbers.
When we went to distribute the data from the 1980 MCD or county to the 2000 tract and block group levels there are times that the suppression of data affects the summary of these numbers. This means that the MCD or county numbers from the CensusCD 1980 in 2000 Boundaries may differ from the 1980 Long Form MCD or county numbers even if there was no change in the geography.
Weighting and converting the 1980 Census data to 1990 geographies
The 1980 census data was weighted and converted first to 1990 block groups. Then this 1980 data in 1990 block groups were weighted and converted to the 12 different 2000 geographic areas.
In order to normalize the 1980 data to the 1990 geographies we used the relationship tables produced by the US Census Bureau that were released in a product called TIGER92. These relationship tables link up 1990 Block groups, with 1980 tracts, MCD’s, and counties.
In 1980 tracts are found in primarily urban areas. The Census Bureau had designated tracted areas for only about 50% of the country’s area that covers about 80% of the population. It is desirable to weight and convert 1980 data from the smallest geography. So, in this case we used the tract. There are some Block Groups in the most urban areas, but because of the higher quality of the geographic associations in TIGER92 we used the Tract. But we still had to deal with the areas that are outside of “tracted areas” therefore, it became necessary to use the next smallest area, the MCD. The smaller (more detailed) area, the tract, was used for about 80% of the population coverage, while MCD was used for about 19%. Where there was not tract coverage, and where TIGER92 associations for 1980 MCD’s were missing, county level data was used, about 1% of the cases.
Specific relationships of 1990 Block groups to 1980 tracts, MCD’s and counties were used to allow a weighting by 1990 block population to 1990 block group areas.
We used the Tiger/Line 2000 relationship between 2000 blocks with 1990 blocks. This allowed us to identify how block boundaries had changed between censuses. For example, some 1990 tracts split into two for the 2000 census, other tracts merged, while some tracts both merged and then split. The table below shows the three scenarios (many to one, one to many, and many to many).
There are 8.2 million blocks in the US in 2000, so in fact these are very small geographic areas. But in order to be more precise, when necessary we broke blocks down. When the blocks split then the matter of how to split the population becomes a problem. (When they merge you just add the numbers). In order to determine how to subdivide blocks we looked at the Tiger Street files. The assumption being that people live on or near streets, so the number of addresses on a street will indicate the approximate weight to give to that area of the block.
From the blocks or block parts we created a Block Weighting File. These population weights were then applied to the various other counts to convert them to 2000 block boundaries. Once the data had been calculated at the block level we were then able to sum up the blocks to the various other geographies. Testing was then done to assure the accuracy and validity of the weighting method compared with the original numbers.
A final weighting consideration should be noted, the weighting of 1980 data to 2000 areas has been done as statistically accurately as possible. The 1980 STF1 and STF3 data is the official Census data and our methodology presents an accurate and comprehensive method to statistically compare 1980 data with 2000 data. However, the converted 1980 data in 2000 boundaries cannot be considered official census data. While a major obstacle to comparing altered geographic areas has been overcome, those areas that have not changed between 1980 and 2000 may contain rounding differences in the weighting process and may not exactly match the official census.