******************************************************************************** ***************************** APPLIED ECONOMICS - LEC.1 ************************ ******************************************************************************** /*This do file is based on the official Stata sample session, which can be found by clicking: Help > PDF Documentation The specific chapter is 'Introducing Stata—sample session' */ *********************************** ******* 1.HELP AND COMMENTS ******* *********************************** // The first thing to know about Stata is how to ask for help. Simply typing.. help /* in the command window will open the help tab, the search engine is on the top right corner. We can also type the world help followed by a command we wish to know about and Stata will automatically open the corresponding page. For instance, if we type:*/ help comments // we will find out that there are three ways to type in comments. *********************************** ************ 2.DATASETS *********** *********************************** /* To use a dataset we can use the interface commands and click on File > Open > and choose the file among the ones saved on the pc. Alternatively, we can directly type in the command prompt use "____" where, ____ stands for the specific file path, i.e C:\Desktop\module_c.dta Today, however, we will use one of the sample datasets already installed in Stata. We can access is either by clicking on File > Example datasets > Example datasets installed with Stata > use or by using the command: 'sysuse', followed by the name of the dataset:*/ sysuse auto.dta /*Let's save this dataset in a specific folder. To save a dataset we can use the interface commands and click on File > Save As > and decide where to save our dataset.dta on our pc. Alternatively, we can first set up a directory where we want STATA to save or take our data from. We do it in the following way: type in the command prompt cd "____" where ___ stands for the chosen path (i.e. C:\Desktop), then type */ save dataset_name, replace /* once we have closed stata, we can access our saved data by telling STATA where to find them using the 'change directory' (cd) command again and typing*/ use dataset_name, clear /*We can also create and save a log file, a file that keeps the record of whatever we do while the current STATA session is open. First, however, we must check that no other log file is in use*/ capture log close /*Let us call this firt log file "first_class*. We can create the file either by clicking on File > Log Begin or by typing:*/ log using first_class.txt, replace text /*Remember to close the log file at the end of the session with the command: capture log close*/ /*We are now using the dataset, and can have a look at it by either pushing the 'Data Editor' button, or by typing:*/ browse /*Notice that: 1_ 'make' is a text variable; 'mpg' is a numerical variable; 'foreign' is a categorical variable, with value labels (0=Domestic, 1=Foreign) 2_ '.' stands for missing values*/ //We can describe data by simply typing describe //in the command prompt. //We can have a summary of the data by typing sum /*Notice: 1_ there are no observations if the variable is non-numeric ('make') 2_ there are less observations if the variable has missing values ('rep78')*/ //We can have more information on the variable type by using codebook foreign codebook rep78 /*We can obtain a list of observation for a specific variable using the 'list' command. For example, to list the brand and model of every observation:*/ list make /*Further, we can use conditional commands. For instance, if we only want to list the brand and model of cars that have a missing value in the Repair Record: */ list make if missing(rep78) //Which is equivalent to: list make if rep78==. *********************************** ******* 3.DESCRIPTIVE STATS ******* *********************************** //As we have seen, descritive statistics can be obtained with the sum command sum price //But the command offers further options help sum /*It says that 'detail produces additional statistics, including skewness, kurtosis, the four smallest and four largest values, and various percentiles'. Options are called for after the comma , let's see*/ sum price, detail /* As you can see, the comand sum provides all the relevant descriptive statistics: mean, standard deviation, variance, skewenes and kurtosis. Notice: skeweness > 0 (long right tail) and kutosis >3 (fat tails) */ //We can also graph it to get a better idea of price distribution. //Histogram: hist price /*density on the y axis*/ hist price, fraction /*fractions on the y axis, you can also get it as frequencies or percentages*/ hist price, fraction normal /*will superimpose a normal distribution*/ /*You can see how the distribution is not symmetric (longer to the right), and has a fat tail, same information contained in skeweness and kurtosis*/ /*We can also divide the observations i.e foreign/domestic, and obtain the histogram for the two separate distributions*/ twoway (hist price), by(foreign) twoway (hist price), by(foreign, total) //Or a boxplot graph box price, by(foreign) by foreign, sort : sum price, detail /*If we want to know more about the linear relation of two variables, we can ask Stata to compute covariance and correlation coefficients*/ corr price mpg, covariance //To get variance/covariance matrix corr price mpg //To get correlation coefficients //Notice that correlation is scale free, whereas covariance is not.. gen kpg = mpg*1.609344 // 1 mile = 1.609344 km //This is just a linear transformation of a variable: corr mpg kpg corr price kpg, covariance //while covariances changed corr price kpg //correlation coefficients did not /*Bynary or categorical variables can also be summarized in a table (you can do it with continuous variables too but it doesn't make much sense) */ tab foreign tab rep78 //Are foreign cars better kept than domestic ones? tab rep78 foreign, column //Seems so.. are we sure? mean rep78 mean rep78 if foreign==1 mean rep78 if foreign==0 /*Conditional means.. If you do not know all the values the variable foreign can have, you can also type:*/ by foreign, sort : sum rep78 //or: tab foreign, sum(rep78) /*We can see conditional means are different, but are they statistically different?*/ ttest rep78, by(foreign) /*diff = mean(Domestic) - mean(Foreign) Ho: diff = 0, is the null hypotesis Three different tests: Ha: diff < 0, mean repairs on foreign cars > mean repairs on domestic cars Ha: diff != 0, is a two sided alternative Ha: diff > 0 mean repairs on foreign cars > mean repairs on domestic cars Pr(T < t) is the p-value: the probability of getting a test statistic as 'extreme' as the one we get, if the null is true. A small p-value is evidence against the null (i.e. we can reject the hypotesis that foreign and domestic cars recieve an equal amount of repairs) Notice however, we did not check the assumptions required for the test (i.e. there are very few observations)*/ //Remember to close the log file now: */ capture log close