1
Paper BI02-2016
Macro method to use Google Maps™ and SAS® to find the shortest driving
and straight line distances between 2 addresses in the United States
Laurie Bishop, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio
ABSTRACT
Google Maps
is a very useful tool in finding driving distances between addresses and defining a
geographic representation of an address which can be used to find straight line distances between two
addresses. Often distances between addresses and geocoding for locations can be useful in research.
Using macro and data step SAS
®
code, the shortest driving distance between two addresses can be found
by searching the html code resulting from a Google Maps
driving directions search between two addresses
in the United States. In addition, a search of the html code resulting from a singular address Google Maps™
search, will enable one to define the latitude and longitude for a location. This will allow use of the SAS
®
GEODIST function to find the straight line distance between locations. Partial addresses are taken into
consideration with this method, which was developed using SAS
®
v9.3 for Windows
®
.
INTRODUCTION
The methods introduced will allow a SAS
®
Programmer, with basic SAS
®
programming skills, to use
Google
and SAS
®
to find all available driving distances between two US addresses specified in a data set
of address pairs, determine which is the shortest, and retain it. These methods will also provide the ability
to isolate the latitude and longitude of any address search in Google Maps™, and use these values to find
the straight line distance between two US addresses specified in a data set of address pairs.
SHORTEST DRIVING DISTANCE
SAS
®
can use Google Maps™ to find route dependent distances between addresses. Google Maps™
uses specific formats for the address that its search engine will accept, dependent on the parts of the
address present. If the address is partially defined, the format of the address entered into Google Maps™
must be properly defined.
FORMAT ADDRESSES FOR GOOGLE MAPS™
The SAS
®
data set AddressX (Display 1) contains starting address details defined in four separate
variables, the starting address defined with Address (street number with street address), City1 (US city
name), State1 (US State abbreviation), and Zip1 (US 5 digit zip code), and the ending address defined with
Hospital_Address (street number with street address), HCity (US city name), HState (US State
abbreviation), and HZip (US zip code). The following data step formats the address for a Google Maps™
search depending on the parts of the address that are defined, taking into consideration missing variables.
After running the code below against the AddressX data set, the variables of the addresses will be
concatenated with proper commas and plus (+) signs as one address where the starting address will be
Address1 and ending address will be Address2 in data set Address (Display 2). The code also counts the
number of observations in the AddressX for later use in the macros:
2
Display 1. AddressX SAS
®
data set
/*Adjust format of address to be Google Maps™ friendly considering missing
variables*/
data Address;
retain subjid Address1 Address2;
set AddressX nobs=obsnum;
length Address1 Address2 $200.;
format Address1 Address2 $200.;
informat Address1 Address2 $200.;
*Format starting address;
if Address^='' and City1^='' and State1^='' and Zip1^=. then
Address1=tranwrd(left(trim(Address)),"","+")||"+"||
tranwrd(left(trim(City1))," ","+")||","||left(trim(State1))||"+"||
left(strip(put(Zip1,best12.)));
else if Address^='' and State1^='' and Zip1^=. then
Address1=tranwrd(left(trim(Address))," ","+")||"+"||
left(trim(State1))||"+"||left(strip(put(Zip1,best12.)));
else if Address^='' and City1^='' and Zip1^=. then
Address1=tranwrd(left(trim(Address))," ","+")||"+"||
tranwrd(left(trim(City1))," ","+")||"+"||
left(strip(put(Zip1,best12.)));
else if Address^='' and City1^='' and State1^='' then
Address1=tranwrd(left(trim(Address))," ","+")||"+"||
tranwrd(left(trim(City1))," ","+")||","||left(trim(State1));
else if City1^='' and State1^='' and Zip1^=. then
Address1=tranwrd(left(trim(City1))," ","+")||","||left(trim(State1))||
"+"||left(strip(put(Zip1,best12.)));
else if Address^='' and City1^='' then Address1=tranwrd(left(trim(Address)),
" ","+")||"+"||tranwrd(left(trim(City1))," ","+");
else if Address^='' and State1^='' then
Address1=tranwrd(left(trim(Address))," ","+")||"+"||left(trim(State1));
else if Address^='' and Zip1^=. then Address1=tranwrd(left(trim(Address)),
" ","+")||"+"||left(strip(put(Zip1,best12.)));
else if City1^='' and Zip1^=. then Address1=tranwrd(left(trim(City1)),
" ","+")||"+"||left(strip(put(Zip1,best12.)));
3
else if City1^='' and State1^='' then Address1=tranwrd(left(trim(City1)),
" ","+")||","||left(trim(State1));
else if State1^='' and Zip1^=. then Address1=left(trim(State1))||"+"||
left(strip(put(Zip1,best12.)));
else if Zip1^=. then Address1=left(strip(put(Zip1,best12.)));
else if Address^='' then Address1=tranwrd(left(trim(Address))," ","+");
else if City1^='' then Address1=tranwrd(left(trim(City1))," ","+");
else if State1^='' then Address1=left(trim(State1));
*Format ending address;
if Hospital_Address^='' and HCity^='' and HState^='' and HZip^=. then
Address2=tranwrd(left(trim(Hospital_Address))," ","+")||"+"||
tranwrd(left(trim(HCity))," ","+")||","||left(trim(HState))||"+"||
left(strip(put(HZip,best12.)));
else if Hospital_Address^='' and HState^='' and HZip^=. then
Address2=tranwrd(left(trim(Hospital_Address))," ","+")||
"+"||left(trim(HState))||"+"||left(strip(put(HZip,best12.)));
else if Hospital_Address^='' and HCity^='' and HZip^=. then
Address2=tranwrd(left(trim(Hospital_Address))," ","+")||
",+"||tranwrd(left(trim(HCity))," ","+")||"+"||
left(strip(put(HZip,best12.)));
else if Hospital_Address^='' and HCity^='' and HState^='' then
Address2=tranwrd(left(trim(Hospital_Address))," ","+")||
",+"||tranwrd(left(trim(HCity))," ","+")||","||left(trim(HState));
else if HCity^='' and HState^='' and HZip^=. then
Address2=tranwrd(left(trim(HCity))," ","+")||","||left(trim(HState))||
"+"||left(strip(put(HZip,best12.)));
else if Hospital_Address^='' and HCity^='' then
Address2=tranwrd(left(trim(Hospital_Address))," ","+")||",+"||
tranwrd(left(trim(HCity))," ","+");
else if Hospital_Address^='' and HState^='' then
Address2=tranwrd(left(trim(Hospital_Address))," ","+")||"+"||
left(trim(HState));
else if Hospital_Address^='' and HZip^=. then
Address2=tranwrd(left(trim(Hospital_Address))," ","+")||"+"||
left(strip(put(HZip,best12.)));
else if HCity^='' and HZip^=. then Address2=tranwrd(left(trim(HCity)),
" ","+")||"+"||left(strip(put(HZip,best12.)));
else if HCity^='' and HState^='' then Address2=tranwrd(left(trim(HCity)),
" ","+")||","||left(trim(HState));
else if HState^='' and HZip^=. then
Address2=left(trim(HState))||"+"||left(strip(put(HZip,best12.)));
else if HZip^=. then Address2=left(strip(put(HZip,best12.)));
else if Hospital_Address^='' then
Address2=tranwrd(left(trim(Hospital_Address))," ","+");
else if HCity^='' then Address2=tranwrd(left(trim(HCity))," ","+");
else if HState^='' then Address2=left(trim(HState));
*Assign macro variable obsnum=number of observations in the data set;
call symput('obsnum',obsnum);
run;
4
Display 2. Address1 and Address2 variables added to AddressX data set to form Address SAS
®
data set
FIND SHORTEST DRIVING DISTANCE
First, create an empty data set to append all distances (drdist). The ShortDrDist macro loops through each
observation of the Address data set containing the formatted starting and ending dates. The loop begins
with a data step to assign macro variables used throughout the macro. A filename statement for Google
Maps™ url is defined with the starting and ending addresses. The next data step uses an infile statement
to read in the html code from the Google Maps™ page in 1000 character substrings (sitecode). Using the
FIND function, sitecode is searched for the word ‘miles’, which is always proceeded by the driving distance
between the starting and ending addresses in miles, and only these observations are retained. All driving
distances available will be defined as drivdist, whose value is assigned by isolating the numeric distance
using the SCAN function with sitecode. Use SORT procedure to sort driving distance results by distance
(drivdist). Use a data step to retain the smallest of all the distances. Append the observation to final driving
distances data set (drdist). Always be sure to disassociate current assigned url. The resulting data set is
a collection of the shortest driving distances (Display 3):
*Create empty data set to append all distances;
data drdist;
run;
%macro ShortDrDist(j=);
*Create loop to access each observation;
%do i=1 %to &j;
*Assign macro variables needed;
data _null_;
set Address;
if _n_=&i then do;
call symput('addr1',trim(strip(Address1)));
call symput('addr2',trim(strip(Address2)));
call symput('Subjid',Subjid);
5
end;
run;
*Find Road distances;
filename gmaps url
"http://maps.google.com/maps?daddr=&addr1.%nrstr(&saddr)=&addr2";
data drdistX&i (drop=sitecode idrivdist rdstart);
infile gmaps recfm=f lrecl=1000 end=eof;
input sitecode $1000.;
Subjid="&subjid";
AddStart=tranwrd(tranwrd("&addr1","+"," "),",",", ");
AddEnd=tranwrd(tranwrd("&addr2","+"," "),",",", ");
if find(sitecode,'miles');
idrivdist=find(sitecode,'miles');
rdstart=idrivdist-1;
drivdist=input(compress(scan(substr(sitecode,1,rdstart),1,
' "','bc'),','),best12.);
run;
proc sort data=drdistX&i; by drivdist; run;
data drdist&i;
set drdistX&i;
by drivdist;
if _n_=1;
run;
data drdist;
set drdist
drdist&i;
if drivdist^=.;
label AddStart='Starting Address'
AddEnd='Ending Address'
drivdist='Shortest Driving Distance (miles)'
Subjid='Subject Id';
run;
filename gmaps clear;
%end;
%mend;
%ShortDrDist(j=&obsnum);
6
Display 3. drdist SAS
®
data set
STRAIGHT LINE DISTANCE
FIND LATITUDE AND LONGITUDE OF AN ADDRESS (GEOCODE)
First, create empty data sets to append starting address (sldist1) and ending address (sldist2) latitude/
longitude pairs. The StrLineDist macro loops through each observation of the Address data set containing
the formatted starting and ending dates. The loop begins with a data step to assign macro variables used
throughout the macro. A filename statement for Google Maps™ url is defined with a singular address. The
following data step uses an infile statement to read in the html code from the Google Maps™ page in 1000
character substrings (sitecode). Using the FIND function, sitecode is searched for the a unique character
string (",[null,null,) that always proceeds the latitude and longitude corresponding to the address defined
and the observation is retain. The numeric latitude (lat1, lat2) and longitude (long1, long2) are assigned by
using the SCAN function with sitecode to isolate the numeric value of each. Observations with
latitude/longitude pairs are appended to sldist1 for starting addresses and sldist2 for ending addresses
(Display 4). Always be sure to disassociate current assigned url:
*Create empty data sets to append latitudes and longitudes;
data sldist1;
run;
data sldist2;
run;
%macro StrLineDist(j=);
*Create loop to access each observation;
%do i=1 %to &j;
data _null_;
set Address;
if _n_=&i then do;
call symput('addr1',trim(strip(Address1)));
call symput('addr2',trim(strip(Address2)));
call symput('Subjid',Subjid);
end;
run;
/*Find Latitude and longitude in Google Maps™ page code*/
filename gmaps url "http://maps.google.com/maps?daddr=&addr1";
7
*For starting addresses;
data sldist1&i (drop=sitecode markst);
infile gmaps recfm=f lrecl=1000 end=eof;
input sitecode $1000.;
Subjid="&subjid";
AddStart=tranwrd(tranwrd("&addr1","+"," "),",",", ");
if find(sitecode,'",[null,null,') then do;
markst=find(sitecode,'",[null,null,');
lat1=input(scan(substr(sitecode,markst),4,',]'),best12.);
long1=input(scan(substr(sitecode,markst),5,',]'),best12.);
output;
end;
run;
data sldist1;
set sldist1
sldist1&i;
if lat1^=.;
label AddStart='Starting Address'
lat1='Starting Address latitude'
long1='Starting Address longitude'
Subjid='Subject Id';
run;
proc sort data=sldist1; by Subjid; run;
filename gmaps clear;
*For ending addresses;
filename gmaps url "http://maps.google.com/maps?daddr=&addr2";
data sldist2&i (drop=sitecode markst);
infile gmaps recfm=f lrecl=1000 end=eof;
input sitecode $1000.;
Subjid="&subjid";
AddEnd=tranwrd(tranwrd("&addr2","+"," "),",",", ");
if find(sitecode,'",[null,null,') then do;
markst=find(sitecode,'",[null,null,');
lat2=input(scan(substr(sitecode,markst),4,',]'),best12.);
long2=input(scan(substr(sitecode,markst),5,',]'),best12.);
output;
end;
run;
data sldist2;
set sldist2
sldist2&i;
if lat2^=.;
label AddEnd='Ending Address'
lat2='Ending Address latitude'
long2='Ending Address longitude'
Subjid='Subject Id';
run;
proc sort data=sldist2; by Subjid; run;
filename gmaps clear;
%end;
%mend;
%StrLineDist(j=&obsnum);
8
Display 4. sldist1 SAS
®
data set
FIND STRAIGHTLINE DISTANCE BETWEEN ADDRESSES
Create the sldist data set by merging sldist1 and sldist2by subjid created by the StrLineDist macro above.
Define sldistg which is a calculation of straight line distance between the two addresses using
latitude/longitude pairs for each observation in the GEODIST function and define sldisth which is the
calculation of straight line distance between two addresses using latitude/longitude pairs for each
observation and Haversine formula (3949.99 cos

1/180 2/180 1/
180). Create dist data set by merging both drdist and sldist to include both shortest driving distance and
straight line distance in one data set:
*Merge starting and ending data sets by subjid;
data sldist;
merge sldist1
sldist2;
by Subjid;
sldistg=geodist(lat1,long1,lat2,long2,'M');
sldisth=3949.99*arcos(sin(lat1*(constant('pi')/180))*
sin(lat2*(constant('pi')/180))+cos(lat1*(constant('pi')/180))*
cos(lat2*(constant('pi')/180))*cos(long2*(constant('pi')/180)-
long1*(constant('pi')/180)));
label sldistg='GEODIST function distance (miles)'
sldisth='Haversine formula distance (miles)';
run;
*Merge both distance variables into one data set;
data dist;
retain subjid AddStart lat1 long1 AddEnd lat2 long2 drivdist sldistg
9
sldisth;
merge sldist (drop=AddEnd AddStart)
drdist;
by subjid;
run;
proc sort data=dist; by subjid; run;
Display 5. Straight Line Distances SAS
®
data set
CONCLUSION
The macros presented allows the user to find the shortest driving distance between two US addresses, but
more importantly it allows a user to isolate the latitude and longitude of a particular address using SAS
®
and Google Maps™. A couple of things to note: Running this code against 24 observations takes longer
than usual. Also, if a facility is large enough with multiple buildings/businesses at one address, it is possible
that more than one latitude/longitude pair may be found (see subjid 008 and 021 above. The address for
subjid 008 refers to the AZ state Hospital and State Prison and for subjid 021, the address is shared by
Riley Hospital for Children and a doctor’s office. In both of these cases the locations are described by one
address but each facility has its own latitude and longitude.)
REFERENCES
Bekkerman, Anton, Ph.D. 2010. “Going the Distance: Google Maps™ Capabilities in a Friendly SAS
Environment.” http://wuss.org/Proceedings13/100_Paper.pdf
Zdeb, Mike. 2010. “Driving Distances and Drive Times using SAS
®
and Google Maps™.” Coders' Corner.
Seattle, WA: SAS
®
Gobal Forum, http://support.sas.com/resources/papers/proceedings10/050-2010.pdf
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Laurie Bishop
Cincinnati Children’s Hospital Medical Center
10
(513) 803-9001
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.