
Chapter 8: Validating and Cleaning Data
- Data errors occur when data values are not appropriate for the SAS statements that are specified in a program. SAS detects data errors during program execution.
- The
freqproduce can show if any genders are notForMand if any countries are notAUorUS. - The
meanprocedure can show if any salaries are not in the range of 24000 to 500000. -
The
univariateprocedure can show if any salaries are not in the range of 24000 to 500000.123456789101112131415161718192021222324252627282930data work.nonsales;length Employee_ID 8 First $ 12Last $ 18 Gender $ 1Salary Job_Title $ 25Country $ 2 Birth_DateHire_Date 8;infile 'nonsales.csv' dlm=',';input Employee_ID First $ Last $Gender $ Salary Job_Title $Country $ Birth_Date :date9.Hire_Date :date9.;format Birth_Date Hire_Date ddmmyy10.;run;proc print data=work.nonsales;var Employee_ID Job_Title Birth_Date Hire_Date;where Job_Title = ' ' or Birth_Date > Hire_Date;run;proc freq data=work.nonsales;tables Gender Country;run;proc means data=work.nonsales n nmiss min max;var Salary;run;proc univariate data=work.nonsales;var Salary;run; -
During the processing of every
datastep, SAS automatically creates the following temporary variable:
_N_variable, which counts the number of times thedatastep begins to iterate._ERROR_variable, which signals the occurrence of an error caused by the data during execution. 0 indicates no error exist.
- Which statement best descries the invalid data? b:
- The data in the raw data file is bad
- The programmer incorrectly read the data
-
To write a SAS date constant, enclose a date in quotation marks in the form
ddmmyyyyand immediately follow the final quotation mark with the letterd. Example: January 1, 1974 is'01JAN1974'd1234proc print data=orion.nonsales;var Employee_ID Birth_Date Hire_Date;where Hire_Date < '01JAN1974'd;run; -
The
freqprocedure produces one-way to n-way frequency tables.
- The
tablesstatement specifies the frequency tables to produce. Without it,proc freqproduces a frequency table for each variable. - The
nlevelsoption displays a table that provides the number of distinct values for each variable named in thetablesstatement.
123proc freq data=orion.nonsales nlevels;tables Gender Country Employee_ID;run;
- The
meansprocedure produces summary reports displayed descriptive statistics.
- The
varstatement specifies the analysis variables and their order in the result. - By default, the
meansprocedure creates a report withN,mean,stddev,minandmax
1234567891011proc means data=orion.nonsales n nmiss min max;var Salary;run;```10. The `univariate` procedure produces summary reports displaying descriptive statistics.+ The `var` statement specifies the analysis variables and their order in the results.+ Without the `var` statement, SAS will analysis all numeric variables.```sasproc univariate data=orion.nonsales;var Salary;run;
- Interactively cleaning data: the
Viewtablewindow enables you to browse, edit, or create SAS data sets interactively. - Programmatically cleaning data: The
datastep can be used to programmatically clean the invalid data.
- The assignment statement evaluates an expression and assigns the resulting value to a variable:
variable = expression; Salary = 26960;Hire_Date = '21JAN1995'd;Country = upcase(Country);
-
The
if-then-elsestatement executes a SAS statement for observations that meet specific conditions.12345678910data work.clean;set orion.nonsales;Country=upcase(Country);if Employee_ID=120106 then Salary=26960;else if Employee_ID=120115 then Salary=26500;else if Employee_ID=120191 then Salary=24015;else if Employee_ID=120107 then Hire_Date='21JAN1995'd;else if Employee_ID=120111 then Hire_Date='01NOV1978'd;else if Employee_ID=121011 then Hire_Date='01JAN1998'd;run; -
What are the two phases of DATA step processing?: Compilation and Execution
- What is a program data vector (PDV)?: A logical area in memory where SAS holds the current observation
- What is an instruction that SAS uses to read data values into a variable?: An informat
- When would you use a : modifier?: You use a : modifier with nonstandard raw data that requires list input and an informat
Chapter 9: Manipulating Data
- If an operand is missing for an arithmetic operator, the result is missing. Example:
var1 = .,var2 = 10, thennum = var1 + var2 / 2,numis.(missing). sum: return the sum of all arguments.year,qtr,month,day,weekday: extract pieces from a SAS date.today(): return the current date as a SAS date value.mdy(month, day, year): return a SAS date value.
AnnivBonus=mdy(month(Hire_Date),15,2008);-
Given the following code, are the correct results produced when the drop statement is placed after the set statement?
1234567data work.comp;set orion.sales;drop Gender Salary Job_Title Country Birth_Date Hire_Date;Bonus=500;Compensation=sum(Salary,Bonus);BonusMonth=month(Hire_Date);run; -
Yes, the drop statement specifies the names of the variables to omit from the output data set
- The
dropandkeepstatements select variables after they are brought into the program data vector. -
Alternatives to the
dropandkeepstatements are thedrop=andkeep=data set options placed in thedatastatement.123456data work.comp(drop=Salary Hire_Date);set orion.sales(keep=Employee_ID First_Name Last_Name Salary Hire_Date);Bonus=500;Compensation=sum(Salary,Bonus);BonusMonth=month(Hire_Date);run; -
Multiple executable statements are allowed in
if-then do / else do ... endstatements.123456789101112data work.bonus;set orion.sales;length Freq $ 12;if Country='US' then do;Bonus=500;Freq='Once a Year';end;else do;Bonus=300;Freq='Twice a Year';end;run; -
if-then delete: an alternative to the subsettingifstatement is thedeletestatement on anif-thenstatement.
if BonusMonth ne 12 then delete;is equivalent to:if BounsMonth = 12;
Chapter 10: Combining SAS Data Sets
1.




近期评论