Friday, October 30, 2015

IPv6 geolocation in database

IPv6 popularity is grown to the extent that I have about 15% of my visitors to my website and tools download to be using dual stack.  It is time for me to get a handle on these IPv6 using some geolocation to identify the users.

Getting IPv6 lookup on a database is actually not as complicated as it seems.  Due to the enormously large 128-bit number a DBA might fear the large integer index.  When it comes to IPv6 addresses are allocated at a minimum /64 making this problem at least half the size.  You could basically use BINARY(16) which is 16 byte binary to do range matches on. 

Let me just show how you would do this with say MariaDB or MySQL.  First the function that will pull the first 64 bits of the 128-bit IPv6 address and create a binary "string" - making 2601:1c0:c901:5698:2955:f004:2de1:abcd into 260101c0c9015698. Lets get this work started
(Note: ANSI SQL and MariaDB native SQL is in caps compared to user defined variable or functions)
DELIMITER //
CREATE FUNCTION `ipv6_to_64`(ip_v6 VARCHAR(41)) RETURNS text     DETERMINISTIC
BEGIN
    DECLARE i,j,k int;
    DECLARE istring varchar(20);
    DECLARE jstring varchar(41);
    SET istring:="";
    SET jstring:=ip_v6;
    SET i:=0;
    SET k:=0;
    SET j:=0;
    WHILE i < 4 DO
        SET j:=instr(jstring,":");
        IF j < 1 THEN
            SET j:=length(ip_v6)+1;
        END IF;
        SET istring:=CONCAT(istring,lpad(substring(ip_v6,k+1,j-1),4,'0'));
        SET jstring:=substring(ip_v6,j+k+1);
        SET k:=length(ip_v6)-length(jstring);
        set i:=i+1;
    END WHILE;
    RETURN istring;
END //
DELIMITER ;

CREATE TABLE geoip6cc (ipv6_start BINARY(16), ipv6_end BINARY(16), geocode CHAR(2), INDEX range_v6 (ipv6_start,ipv6_end));

Now get the IPv6 CSV database from Maxmind (say) http://geolite.maxmind.com/download/geoip/database/GeoIPv6.csv.gz  gunzip it into your /tmp/ folder

LOAD DATA LOCAL INFILE "/tmp/GeoIPv6.csv" INTO TABLE geoip6cc FIELDS TERMINATED BY ', ' ENCLOSED BY '"'
(@ip_start,@ip_end,@ignore,@ignore,geocode,@ignore)
SET ipv6_start=BINARY(ipv6_to_64(@ip_start)),
ipv6_end=BINARY(ipv6_to_64(@ip_end));

If you want to track 6to4 tunnels, you can insert this row into the SQL database, so those users with unknown GEOIP code can be mapped to country code "64"

--insert into geoip6cc values ('2002000000000000','2002ffffffffffff','64');

mysql> SELECT geocode FROM geoip6cc WHERE ipv6_to_64("2001:200:30::1") BETWEEN  ipv6_start and ipv6_end LIMIT 1;
+---------+
| geocode |
+---------+
| JP      |
+---------+
Doing some visitors lookup in my visitors table I create a ipv6_to_64 version of the IP Address and keep it handy for table joins (mainly for speed)
mysql> SELECT ipv6_to_64(ip),geoip6cc.geocode from visitors inner join geoip6cc  on visitors.ipv6_64 between geoip6cc.ipv6_start and geoip6cc.ipv6_end;
 +------------------+---------+
| ipv6_to_64(ip)   | geocode |
+------------------+---------+
| 20010569bc503a00 | CA      |
| 20010569bc5de200 | CA      |
| 20010569bc5de200 | CA      |
| 20010569bc5de200 | CA      |
| 20010569bc5de200 | CA      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
+------------------+---------+
 If not, you can lookup, note inner joins with function calls are slower in MySQL.  They are much faster in Postgres and Oracle as long as the functions are written well.
mysql> SELECT ipv6_to_64(ip),geoip6cc.geocode FROM visitors INNER JOIN geoip6cc  ON  BINARY(ipv6_to_64(ip)) BETWEEN geoip6cc.ipv6_start AND geoip6cc.ipv6_end;
+------------------+---------+
| ipv6_to_64(ip)   | geocode |
+------------------+---------+
| 20010569bc503a00 | CA      |
| 20010569bc5de200 | CA      |
| 20010569bc5de200 | CA      |
| 20010569bc5de200 | CA      |
| 20010569bc5de200 | CA      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
| 2a02120b2c6c4b20 | CH      |
+------------------+---------+
mysql> SELECT ipv6_to_64(ip),geoip6cc.geocode,count(geoip6cc.geocode) FROM visitors INNER JOIN geoip6cc  on BINARY(ipv6_to_64(ip)) BETWEEN geoip6cc.ipv6_start AND geoip6cc.ipv6_end GROUP BY geoip6cc.geocode;
+------------------+---------+-------------------------+
| ipv6_to_64(ip)   | geocode | count(geoip6cc.geocode) |
+------------------+---------+-------------------------+
| 20010569bc503a00 | CA      |                       5 |
| 2a02120b2c6c4b20 | CH      |                      10 |
+------------------+---------+-------------------------+
Finally just for the curious, top 5 countries IPv6 users coming to my site in the past week:
+---------+----------------+
| geoipcc | count(geoipcc) |
+---------+----------------+
| US      |            674 |
| IE      |            233 |
| DE      |             33 |
| CH      |             32 |
| NL      |             29 |
+---------+----------------+
US probably leads because of Comcast adoption mostly. Catch the IPv6 wave and learn how to monitor your IPv6 user community!





No comments:

Post a Comment