Data issues

From Portalproject

Jump to: navigation, search

SIPRI Portal Data Issues

a) Designation of countries

SHW: We'll need to decide if we need to think about how we designate countries and territories/borders that are in dispute on the map. Also need to consider how we label non-sovereign territories in the search function. Can this be easily solved by calling it 'countries/ territories' with a disclaimer note somewhere on the site?

SPF: Territories in dispute certainly an issue. I think "countries/territories" is fine.

MB: What about entities that have no territorial connection? We have records for PKK, FARC etc. I am guessing that the easiest thing will be to simply not have these 'armed rebel groups' in the portal since they only apply to the AT Database, but that they will continue to appear in the stand-along AT Database ...

MB: What about countries that merge or split. How do other databases handle this? The AT database simply has separate records for each one. So there is a separate country record for Yugoslavia, Serbia and Montenegro and Serbia (1996-). The country records are not time stamped. They do not cease to 'function' after the entity stops existing. They just won't have any trade records attached to them for the years before they came into existence and after they ceased existing. Hence, you can ask the public UI to give you data on arms exports from Yugoslavia from 1970-2011 but the years after 1992 will be blank.

SPF: I don’t see a problem with having entities with zero territory. They would also have no entry for milex, etc. I think we need a ‘not applicable’ value as well as a data not available value.

Countries that merge or split, we will have to harmonize treatment. We currently have a “Yugoslavia (former)” entry that runs up to 1991, and a “Serbia” entry from then on, which covers the Federal Republic of Yugoslavia up to I think 1999 or whenever, then Serbia & Montenegro, then Serbia.

We do have country start and end dates. This is necessary for constructing regional estimates. If a country has missing data for a particular year, we estimate the value for the regional total, but if the country didn’t exist, we don’t. Thus, for our purposes, it is good to have a third category of ‘no data’, namely ‘didn’t exist’.

For a similar reason, since we need to calculate series and changes and stuff, we need successor states to be recorded under the same entity. Thus, the various versions of Serbia, likewise Russia as successor to Soviet Union. However, we treat post-1992 Yugoslavia (now called Serbia) as being a clean break from Yugoslavia (former), i.e. a separate entity.

One idea could be to have a variable called ‘former name’ for each country-year. Normally this would be blank. But, for example, the country currently named Serbia would have a ‘former name’ of Serbia & Montenegro for the years before 2006, and Federal Republic of Yugoslavia back to 1992. Likewise, Russia would have a Former Name of Soviet Union for the years 1988-1991. Then, if it was important for a project, data over a period could be separated into the different ‘former names’.

However, that would still leave the question of how things would appear in the searches. This is going to take some thinking about.

This is actually quite a tricky one, if it is important for Arms Transfers to have the successor states treated as distinct entities.


b) Harmonization of definition of regions across datasets.

SJ: I also realized that I have regions in my DB too. Only mine include regimes or organizations in addition to a few geographic (West Europe; CEE; EU; OECD; Developing). There is also one listed as TW but I am not sure what that is.

MB: With regards to regions, I think we need to focus on the question of 'what regions do we want the external users of the database to utilize?' I think here we can focus on a few geographic regions (Middle East, Latin America, SE Asia, NE Asia etc.) and some major regional organisations whose membership is well defined (EU, NATO, ASEAN ... ) and leave it at that. We have a system within our internal database that allows us to create regions which we can then use for generating data. We would like to keep this system in place. However, we do not think it needs to be transferred across to the public portal.

SJ: I agree with the region comment and think it a good idea to have two fields: one we can use across the datasets for public access and another we can use for internal customization. The external should be geographical, I think.


c) Budgets:

SHW: Is this a potential issue? Eg: Milex presents data in constant dollars while PKO presents budget figures in current dollars. Transfers??

SPF: Not a problem really - these are simply different variables. Any country in any given year will have a lot of different variables, including milex in its various forms, AT TIV value, and PK budget. So it is simply not a problem that they have different units, so long as the units for each are specified.

One thing that may be an issue with milex is that we have fiscal and calender year data. For the milex figures itself that is not a problem, as again it is simply a different year. But if the GDP data, is used by more than one db, it could be an issue. At the moment we have a field saying whether the GDP data is presented as fiscal or calender year, but it might be better simply to ditch that and ensure that all GDP data is entered as calender year.

SJ: AP data is also in local currency which would be interesting to be able to present for individual countries for example because we've seen a gross distortion in relative change recently due to the exchange rate issues with converting to the dollar.

Calendar/fiscal year for GDP: I think we should have one variable for all the datasets and that the data only needs to be entered once instead of by all of us.


d) Companies merging and splitting

SJ: I generated a result for Lockheed Martin. Lockheed Martin was Lockheed and Martin Marietta (two separate companies) until 1994. The 1994 data is not connected to any previous years. For other merged companies (e.g., Northrop Grumman), the data for the merged year is linked to one of the parent companies (Northrop) for prior years. I am unsure if it is methodologically possible to link company records to more than one or if it is a database issue that makes it link to only one of the original companies. I am checking on this.

MB: Point I raise above about coutries merging and splitting may be relevant here ...

Personal tools