Nov 27, 2008

Not to forget to do data cleansing before modeling

Commit to memory:
1) Values are within the domain range – need to eliminate illegal or out-of-range values.
a. Example 1: A variable like ‘gender’ would expected to have only two value; either ‘0’, ‘1’ or ‘Male’, ‘Female’. Check through frequency tables whether values are more than expected.
b. Example 2: Variables like ‘Date-of-Birth’ or ‘Height in Inches’ should be within reasonable limits.
c. Example 3: Levels of Education, Customer Category should not have more than defined levels or categories.

2) Uniqueness of the data – check for duplicate records across the data.
a. Following examples might be due to programming, typo and phonetic errors need to be corrected for uniqueness. City name and STD code should correspond, correcting misspelling of Chennai city.
‘Customer ID = 1000089’ ‘Customer Name = John Smith’
‘Customer ID = 1000089’ ‘Customer Name = “Peter Miller’.
‘City=Chennai, STDCODE=044’ ‘City=Chennai, STDCODE=055’.

‘City=Chennai, City=chenai, City=CHHENNAI, CITY=Madras’.

‘Customer Name= VIVEKANAND’ ‘Customer Name=VIVEK ANAND’.

b. Following examples must be treated properly as either “wrong or misfiled” or “missing” values, so that uniqueness of the field is maintained;
‘phone=000-00000000’ ‘phone=999-99999999’.

‘phone=000-23#45*56’ ‘phone=###-********’.

3) Wrong References – Reference may be defined but wrong entry or record exits, need to be corrected or cross-checked.
a. Examples: Reference ZIP may be defined but does not belong to Chennai city.
‘City=Chennai, STDCODE=044, ZIP=600053’
‘City=Chennai, STDCODE=044, ZIP=600653’
4) Correspondent values – values like age should correspond to given date-of-birth (DOB).
a. Example: In the below example given DOB and age are not correct.

‘DOB: 10-10-1981, Age of customer = 37 years’.

Nov 25, 2008

First step in model building - Data Reading.

Commit to memory:
1) Variables has to be read in appropriate format, namely:
a. Numeric
b. Character
c. Date
d. Currency (Dollar) or Custom (Comma)
e. Length:
i. Appropriate width and decimals for numeric’s
ii. Appropriate width for character’s
2) Appropriate order & labelling for ‘Ordered’ categorical variables, since order is important and value driven.
a. Example 1: Strongly agree, somewhat agree, neither agree nor Disagree.
b. Example 2: Ratings viz., 0, 1, 2, 3 etc being 0 as worst and 3 as very-good.
c. Example 3: If exists arithmetic operations viz., greater than or less than.
3) Appropriate labelling (description) for ‘Nominal’ categorical variables when given in numerics.
a. Example 1: If Gender given as 1 label whether it is ‘Male’ or ‘Female’.
b. Example 2: Similarly for Brand, Ethnicity etc.
4) Appropriate labelling (scale description) for continuous variables.
a. Example 1: Age of a product/service – whether in weeks, months, quarterly, half-yearly etc.
b. Example 2: Quantity of a product – whether in units or volumes (Pounds, Kilograms, Litres etc.).

Oct 19, 2008

Good Econometrics

Hi,

A good Econometrician tries to understand any given business problems and gives the best possible or approachable solution to it.

Want to be a good econometrician? Understand any given business problem rightfully. Not to worry about the statistical tool application in the beginning.

Have a nice day.

Jul 30, 2008

When you have different series with different measurements

Hi,

One of the common mistakes a new comer in econometrics doe's is using different series (variable series) having different measurements for analysis like regression etc.

This is wrong approach, since you cannot compare for instance, GDP as function of CPI, interest rates, etc. If one observe GDP will available at current and constant price in billion/million dollars where as CPI is index and interest rate is in below two digits one.

Some people say there is no wrong in estimation GDP as function of CPI and interest rates, yes, but how do you calculate you elasticities, don't you think comparing milions with index is cumbersome.

Hence, convert everthing into log terms, then you can interpret your elasticity directly.

See next post for more how to convert into log terms and advantage and disadvantages.

Regards,

Jul 13, 2008

How this blog helps you?

This blog posts daily/weekly easy ways of learning and doing practical econometrics. Some posts will be related to the theory and some how to do it using a statistical package or excel.

Coming to statistical package it refer to R language free software (for more see www.r-project.org) and get a free copy of yours today. And coming excel everybody knows about it and have it.

Watch for future new posts daily/weekly.

Why Econometrics?

Hi,

People who are new to this field/subject might be of doubt what it is and why we need it.

Let me put things simply. Econometrics is Mathematical/Statistical application to the empirical estimation of many economic scenarios/economic theories existed. A simple and very common one is what happen if price of crude oil burst to $200. Its effect is only on gasoline consumption or on total economy. Is this thought raising?

An Economist with the help of Econometrics gives a solution given economic conditions of the State.

Now, coming Why Econometrics?

General Economics usually carries a blame being not scientific and consider it as one of social sciences. Econometrics being a major branch (in recent past) or lets say better economics, is scientific (since empirical) and practical.

Hence, Econometrics now-a-days is a considered as a scientific approach for many to get statistical evidence for descriptions of the economic scenarios. And widely Statesman look at these Econometricians for their prescriptions for any economic policy problem.