base64

base64编码原理及php模拟实现

Base64是一种基于64个可打印字符来表示二进制数据的表示方法。由于2的6次方等于64,所以每6个位元为一个单元,对应某个可打印字符。三个字节有24个位元,对应于4个Base64单元,即3个字节需要用4个可打印字符来表示。 Base64常用于在通常处理文本数据的场合,表示、传输、存储一些二进制数据。包括MIME的email,email via MIME, 在XML中存储复杂数据. 使用的字符包括大小写字母各26个,加上10个数字,和加号「+」,斜杠「/」,一共26×2+10+2=64个字符,等号「=」用来作为后缀用途。

为了方便网络传输出现了web safe base64(尚未标准化)编码,即编码完成后,使用- _ *(或-_.) 替换+ / =

base64 url safe
base64 url safe

base64 table
base64 table

编码 (以单词book为例 )

1.获取每个字符的ascii值,并转化为8位bit

ascii:98|111|111|107|

bits:01100010011011110110111101101011

2.计算字符长度模3的余值

1)剩余1个字符(8bit)时,需要额外填充4bit到尾 部,构成12bit,方能整除6,编码完成加两个==

2)剩余2个字符(16bit)时,需要额外填充2bit到 尾部,构成18bit,方能整除6,编码完成加一个=

011000100110111101101111011010110000

3.以6位为一组,进行分块

011000 100110 111101 101111 011010 110000

4.每块前面用0进行填充,凑成8位

00011000 00100110 00111101 00101111 00011010 00110000

5.每块转换为10进制值,查索引表

24 => Y 38 => m 61 => 9 47 => v 26 => a 48 => w

6.最终编码: Ym9vaw==

原理清楚后,可以参考以下base64代码,建议你试着写一个base32

php代码如下:

<?php
$base = array(
    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', //0-9
    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', //10-19
    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', //20-29
    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', //30-39
    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', //40-49
    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', //50-59
    '8', '9', '+', '/'//60-63
);
$base_reverse = array_flip($base);
//shuffle($base);//打乱base table后,能得到一个"私有"的base64加解密算法

$words = empty($_GET['word']) ? 'I love you!' : $_GET['word'];

$encoded = base64_en($words);
d('', 0);
base64_de($encoded);

function base64_en($words) {
    global $base;

    d('plain:' . $words);

    $len = strlen($words);

    d('ascii:', 0);
    for ($i = 0; $i < $len; $i++) {
        $char = $words[$i];
        //获取每个字符的ascii值,并转化为8位bit
        $ascii = ord($char);
        d($ascii . '|', 0);
        $bin = decbin($ascii);
        $bin = str_pad($bin, 8, '0', STR_PAD_LEFT);
        //连接所有字符的8位bit
        $bits.=$bin;
    }
    d('');

    //字符长度模3的值
    $remain = $len % 3;
    if ($remain != 0) {
        //剩余1个字符时,需要额外填充4bit,构成12bit,方能整除6
        //剩余2个字符时,需要额外填充2bit,构成18bit,方能整除6
        $repeat = ($remain==1)?4:2;
    //最终编码末尾需要添加的=号
    $suffix = str_repeat('=', $repeat/2);
        //填充指定长度的0到bits尾部
        $fill = str_repeat('0', $repeat);
        $bits.=$fill;
    }

    d('bits:', 0);
    d($bits);
    d('chunks:' . chunk_split($bits, 6));

    //以6为单位,拆分bits
    $chunks = str_split($bits, 6);
    foreach ($chunks as $key => $value) {
        //每块前面填充0,转化为8位
        $chunks[$key] = str_pad($value, 8, '0', STR_PAD_LEFT);
    }

    d($chunks);

    foreach ($chunks as $value) {
        //把每块转化为10进制值,即base表中的位置值,查找base表中该位置对应的字符
        $index = bindec($value);
        d("{$index} =====> {$base[$index]}");
        $encoded.=$base[$index];
    }
  $encoded.=$suffix;

    d('encoded:' . $encoded);
    return $encoded;
}

function base64_de($encoded) {
    global $base,$base_reverse;
  $encoded = trim($encoded,'=');

    d('encoded:' . $encoded);
    $len = strlen($encoded);
    //获取每个字符在base表中的的位置值,并转化为8位bit
    for ($i = 0; $i < $len; $i++) {
        $char = $encoded[$i];
        $index = $base_reverse[$char];
        d("{$char} =====> {$index}");
        $bin = decbin($index);
        $bin = str_pad($bin, 8, '0', STR_PAD_LEFT);
        //连接所有字符的8位bit
        $bits.=$bin;
    }

    //以8为单位,拆分bits
    $chunks = str_split($bits, 8);
    d($chunks);

    foreach ($chunks as $key => $value) {
        //把每块转化为6位,即去年前面填充的00
        $value = substr($value, 2);
        $chunks[$key] = $value;
    }

    d('decoded chunks:' . join(' ', $chunks));
    //连接所有字符的6位bit
    $bits = join('', $chunks);
    d('decoded bits:' . $bits);

    //以8为单位,拆分bits
    $chunks = str_split($bits, 8);
    d('asicii:', 0);
    foreach ($chunks as $value) {
        //把每块转化为10进制值,即ascii码,获取该ascci码对应的字符
        $ascii = bindec($value);
        d($ascii . '|', 0);
        $char = chr($ascii);
        $chars.=$char;
    }
    d('');
    d('decoded:' . $chars);

    return $chars;
}

function d($obj, $suffix_type = 1) {
    if (is_string($obj)) {
        echo $obj;
    } else {
        var_dump($obj);
    }
    echo ($suffix_type == 1) ? '<br>' : '';
}

 

运行结果:

plain:I love you!
ascii:73|32|108|111|118|101|32|121|111|117|33|
bits:010010010010000001101100011011110111011001100101001000000111100101101111011101010010000100
chunks:010010 010010 000001 101100 011011 110111 011001 100101 001000 000111 100101 101111 011101 010010 000100 
array(15) { [0]=> string(8) "00010010" [1]=> string(8) "00010010" [2]=> string(8) "00000001" [3]=> string(8) "00101100" [4]=> string(8) "00011011" [5]=> string(8) "00110111" [6]=> string(8) "00011001" [7]=> string(8) "00100101" [8]=> string(8) "00001000" [9]=> string(8) "00000111" [10]=> string(8) "00100101" [11]=> string(8) "00101111" [12]=> string(8) "00011101" [13]=> string(8) "00010010" [14]=> string(8) "00000100" } 
18 =====> S
18 =====> S
1 =====> B
44 =====> s
27 =====> b
55 =====> 3
25 =====> Z
37 =====> l
8 =====> I
7 =====> H
37 =====> l
47 =====> v
29 =====> d
18 =====> S
4 =====> E
encoded:SSBsb3ZlIHlvdSE=
encoded:SSBsb3ZlIHlvdSE
S =====> 18
S =====> 18
B =====> 1
s =====> 44
b =====> 27
3 =====> 55
Z =====> 25
l =====> 37
I =====> 8
H =====> 7
l =====> 37
v =====> 47
d =====> 29
S =====> 18
E =====> 4
array(15) { [0]=> string(8) "00010010" [1]=> string(8) "00010010" [2]=> string(8) "00000001" [3]=> string(8) "00101100" [4]=> string(8) "00011011" [5]=> string(8) "00110111" [6]=> string(8) "00011001" [7]=> string(8) "00100101" [8]=> string(8) "00001000" [9]=> string(8) "00000111" [10]=> string(8) "00100101" [11]=> string(8) "00101111" [12]=> string(8) "00011101" [13]=> string(8) "00010010" [14]=> string(8) "00000100" } 
decoded chunks:010010 010010 000001 101100 011011 110111 011001 100101 001000 000111 100101 101111 011101 010010 000100
decoded bits:010010010010000001101100011011110111011001100101001000000111100101101111011101010010000100
asicii:73|32|108|111|118|101|32|121|111|117|33|0|
decoded:I love you!

 

%1 $ S

发表回复