sas core code Import files merge tables copy dataset convert some variable from string to number drop unuseful info sort data deal with dates in SAS standardize variables create distinct identification(number) export file

most useful SAS code here.

don’t forget to create a new library before starting your SAS program or it will disappear after you close SAS.
Whenever you make some notes, use “* … ;”

Import files

To import CSV file: general version

PROC IMPORT DATAFILE = 'pathxx.csv' OUT=LIB.xxx ;
DBMS=CSV;
REPLACE;
RUN;

If for your dataset, some variables missing at the beginning, then use guessingrows:

proc import file='pathpatents.csv' 
guessingrows=3000000;  *approximate number;
out=lib.name dbms=csv;
run;

merge tables

Alt text

inner join

proc sql; 
create table lib.name as 
select a.xxx, a.xxx as xxx, b.* from lib1.name as a, lib2.name as b where      
input(a.PatentNo,7.)=b.patentno;
quit;

Above case show when variable Patentno is a string while b.patentno is a number, convert a.Patentno from string to number. (the length of patentno is 7)

left join

proc sql; 
    create table lib.name as 
select a.*,  b.xxx, b.xx from lib1.xxx as a
left join lib2.xxx as b 
on a.xxx=b.xxx;
quit;

notes

to merge table, proc sql is more flexible than merge

copy dataset

Data lib.newname;
set lib.name;
run;

convert some variable from string to number

data lib.newname;
set lib.newname(rename=(var=var1));
var = put(var1, 7.);
drop var1;
run;

drop unuseful info

drop variable

Data lib.newname(drop=xx xx);
set lib.name;
run;

or:

data lib.newname;
set lib.name;
drop xxx;
run;

keep variable you want

Data lib.name(keep=xxx);
set lib.name;
run;

drop duplicates

proc sort data= lib.name nodupkey
out= lib.newname;
by xxx(variable);
run;

drop missing data

data lib.name;
set lib.name;
if cmiss(xxx) then delete;
run;

Data lib.name;
Set lib.name;
if xxx=" " then delete;
run;

sort data

PROC SORT data=linkaf10.assign;
    BY PatentNo;
RUN;

deal with dates in SAS

keep obs after 2010

Data lib.name;
set lib.name;
year= year(xxx);
run;

Data lib.name;
set lib.name;
if year > 2010 then delete;
run;

standardize variables

change to uppercase

data lib.name;
set lib.name;
newvar = upcase(var);
run;

delete all the punctuation characters

data lib.name;
set lib.name;
var1=compress(var,    ,'pd') 
run;
data lib.name;
set lib.name;
rename var1 var;
run;

create distinct identification(number)

data lib.name;
set lib.name;
retain var1 2000; *start from 2000;
var+1;
run;

export file

proc export data=lib.xxx
outfile= "pathxxx.csv" 
dbms=csv replace;
run;