Php 无需 cookie 或本地存储的用户识别

发布于04月20日

我正在构建一个分析工具，目前我可以从用户代理获取用户的IP地址、浏览器和操作系统.

我想知道是否有可能在不使用cookie或本地存储的情况下检测到同一个用户？我不希望这里有代码示例；这只是一个简单的提示，说明在哪里可以更进一步.

忘了提一下，如果是同一台计算机/设备，它需要跨浏览器兼容.基本上，我追求的是设备识别，而不是真正的用户.

推荐答案

Introduction

如果我理解正确的话，您需要识别一个没有唯一标识符的用户，所以您需要通过匹配随机数据来找出他们是谁.您无法可靠地存储用户身份，因为:

Cookies可以被删除
IP地址可以更改
浏览器可以更改
浏览器缓存可能会被删除

Java小程序或Com对象本来是一个使用硬件信息散列的简单解决方案，但现在人们非常安全，很难让人们在系统上安装此类程序.这让你不得不使用cookie和其他类似的工具.

Cookies and other, similar tools

您可能会考虑建立数据配置文件，然后使用概率测试来标识Probable User.可通过以下组合生成对此有用的配置文件:

IP地址
曲奇饼
- HTTP Cookie
- 会话Cookie
- 第三方Cookie
- 闪存Cookie(most people don't know how to delete these)
Web Bugs(可靠性较低，因为Bug已修复，但仍然有用)
- PDF错误
- 闪存错误
- Java错误
Browsers
- 单击跟踪(许多用户每次访问同一系列页面)
- 浏览器 fingerprint
- 缓存图像(人们有时删除cookie，但保留缓存图像)
- 使用水滴
- URL(浏览器历史记录或cookie可能在URL中包含唯一的用户id，例如https://stackoverflow.com/users/1226894或http://www.facebook.com/barackobama?fref=ts)
- System Fonts Detection(这是一个鲜为人知但通常是唯一的密钥签名)
HTML5&；Javascript

当然，我列出的项目只是识别用户的几种可能方式.还有很多.

With this set of Random Data elements to build a Data Profile from, what's next?个

下一步是开发大约Fuzzy Logic个，或者更好的是Artificial Neural Network个(使用模糊逻辑).无论是哪种情况，我们的 idea 都是训练您的系统，然后将其训练与Bayesian Inference相结合，以提高结果的准确性.

人工神经网络

PHP的NeuralMesh库允许您生成人工神经网络.要实现贝叶斯推理，请查看以下链接:

此时，你可能会想:

Why so much Math and Logic for a seemingly simple task?

基本上，因为它是not a simple task.事实上，你想要达到的目标是Pure Probability.例如，给定以下已知用户:

User1 = A + B + C + D + G + K
User2 = C + D + I + J + K + F

当您收到以下数据时:

B + C + E + G + F + K

您实际上在问的问题是:

接收到的数据(B+C+E+G+F+K)实际上是用户1或用户2的概率是多少？这两个匹配中哪一个是most概率？

为了有效地回答这个问题，你需要理解Frequency vs Probability Format以及为什么Joint Probability可能是更好的方法.这里的细节太多了(这就是为什么我给你链接的原因)，但一个很好的例子是Medical Diagnosis Wizard Application，它使用症状的组合来识别可能的疾病.

请考虑一下一系列数据点，它们构成了您的数据配置文件(在上面的示例中，B+C+E+G+F+K)为Symptoms，未知用户为Diseases.通过识别疾病，您可以进一步确定适当的治疗方法(将该用户视为用户1).

显然，我们已经确定了超过1Symptom个的Disease更容易确定.事实上，我们能识别的Symptoms越多，我们的诊断就越容易、越准确，几乎可以肯定.

Are there any other alternatives?个

当然作为一种替代措施，您可以创建自己的简单评分算法，并基于精确匹配.这并不像概率那么有效，但对您来说实施起来可能更简单.

作为一个例子，考虑这个简单的得分图表:

+-------------------------+--------+------------+
|        Property         | Weight | Importance |
+-------------------------+--------+------------+
| Real IP address         |     60 |          5 |
| Used proxy IP address   |     40 |          4 |
| HTTP Cookies            |     80 |          8 |
| 会话Cookie         |     80 |          6 |
| 第三方Cookies       |     60 |          4 |
| Flash Cookies           |     90 |          7 |
| PDF错误                 |     20 |          1 |
| 闪光虫               |     20 |          1 |
| Java错误                |     20 |          1 |
| Frequent Pages          |     40 |          1 |
| Browsers Finger Print   |     35 |          2 |
| Installed Plugins       |     25 |          1 |
| Cached Images           |     40 |          3 |
| URL                     |     60 |          4 |
| System Fonts Detection  |     70 |          4 |
| Localstorage            |     90 |          8 |
| Geolocation             |     70 |          6 |
| AOLTR                   |     70 |          4 |
| 网络信息API |     40 |          3 |
| 电池状态API      |     20 |          1 |
+-------------------------+--------+------------+

对于您可以根据给定请求收集的每一条信息，奖励相关分数，然后在分数相同时使用Importance解决冲突.

Proof of Concept

对于一个简单的概念证明，请看一下Perceptron.感知器是通常用于模式识别应用的RNA Model.甚至有一个旧的PHP Class很好地实现了它，但是您可能需要根据您的目的对其进行修改.

尽管Perceptron是一个很棒的工具，但它仍然可以返回多个结果(可能的匹配)，因此使用分数和差异比较仍然有助于识别这些匹配中的best个.

Assumptions

存储每个用户的所有可能信息(IP、cookies等)
如果结果完全匹配，则将分数增加1
如果结果不完全匹配，则将分数降低1

Expectation

生成RNA标签
生成模拟数据库的随机用户
生成一个未知用户
生成未知用户RNA和值
该系统将合并RNA信息并教授感知器
训练感知机后，系统将有一组权重
现在可以测试未知用户的模式，感知器将生成一个结果集.
存储所有阳性匹配项
首先按分数对匹配项排序，然后按差异排序(如上所述)
输出两个最接近的匹配项，如果未找到匹配项，则输出空结果

Code for Proof of Concept

$features = array(
    'Real IP address' => .5,
    'Used proxy IP address' => .4,
    'HTTP Cookies' => .9,
    '会话Cookie' => .6,
    '第三方Cookies' => .6,
    'Flash Cookies' => .7,
    'PDF错误' => .2,
    '闪光虫' => .2,
    'Java错误' => .2,
    'Frequent Pages' => .3,
    'Browsers Finger Print' => .3,
    'Installed Plugins' => .2,
    'URL' => .5,
    'Cached PNG' => .4,
    'System Fonts Detection' => .6,
    'Localstorage' => .8,
    'Geolocation' => .6,
    'AOLTR' => .4,
    '网络信息API' => .3,
    '电池状态API' => .2
);

// Get RNA Lables
$labels = array();
$n = 1;
foreach ($features as $k => $v) {
    $labels[$k] = "x" . $n;
    $n ++;
}

// Create Users
$users = array();
for($i = 0, $name = "A"; $i < 5; $i ++, $name ++) {
    $users[] = new Profile($name, $features);
}

// Generate Unknown User
$unknown = new Profile("Unknown", $features);

// Generate Unknown RNA
$unknownRNA = array(
    0 => array("o" => 1),
    1 => array("o" => - 1)
);

// Create RNA Values
foreach ($unknown->data as $item => $point) {
    $unknownRNA[0][$labels[$item]] = $point;
    $unknownRNA[1][$labels[$item]] = (- 1 * $point);
}

// Start Perception Class
$perceptron = new Perceptron();

// Train Results
$trainResult = $perceptron->train($unknownRNA, 1, 1);

// Find matches
foreach ($users as $name => &$profile) {
    // Use shorter labels
    $data = array_combine($labels, $profile->data);
    if ($perceptron->testCase($data, $trainResult) == true) {
        $score = $diff = 0;

        // Determing the score and diffrennce
        foreach ($unknown->data as $item => $found) {
            if ($unknown->data[$item] === $profile->data[$item]) {
                if ($profile->data[$item] > 0) {
                    $score += $features[$item];
                } else {
                    $diff += $features[$item];
                }
            }
        }
        // Ser score and diff
        $profile->setScore($score, $diff);
        $matchs[] = $profile;
    }
}

// Sort bases on score and Output
if (count($matchs) > 1) {
    usort($matchs, function ($a, $b) {
        // If score is the same use diffrence
        if ($a->score == $b->score) {
            // Lower the diffrence the better
            return $a->diff == $b->diff ? 0 : ($a->diff > $b->diff ? 1 : - 1);
        }
        // The higher the score the better
        return $a->score > $b->score ? - 1 : 1;
    });

    echo "<br />Possible Match ", implode(",", array_slice(array_map(function ($v) {
        return sprintf(" %s (%0.4f|%0.4f) ", $v->name, $v->score,$v->diff);
    }, $matchs), 0, 2));
} else {
    echo "<br />No match Found ";
}

Output:

Possible Match D (0.7416|0.16853),C (0.5393|0.2809)

Print_r of "D":

echo "<pre>";
print_r($matchs[0]);


Profile Object(
    [name] => D
    [data] => Array (
        [Real IP address] => -1
        [Used proxy IP address] => -1
        [HTTP Cookies] => 1
        [会话Cookie] => 1
        [第三方Cookies] => 1
        [Flash Cookies] => 1
        [PDF错误] => 1
        [闪光虫] => 1
        [Java错误] => -1
        [Frequent Pages] => 1
        [Browsers Finger Print] => -1
        [Installed Plugins] => 1
        [URL] => -1
        [Cached PNG] => 1
        [System Fonts Detection] => 1
        [Localstorage] => -1
        [Geolocation] => -1
        [AOLTR] => 1
        [网络信息API] => -1
        [电池状态API] => -1
    )
    [score] => 0.74157303370787
    [diff] => 0.1685393258427
    [base] => 8.9
)

如果Debug=true，您将能够看到Input (Sensor & Desired), Initial Weights, Output (Sensor, Sum, Network), Error, Correction and Final Weights.

+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+-----+----+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----------+
| o  | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | x10 | x11 | x12 | x13 | x14 | x15 | x16 | x17 | x18 | x19 | x20 | Bias | Yin | Y  | deltaW1 | deltaW2 | deltaW3 | deltaW4 | deltaW5 | deltaW6 | deltaW7 | deltaW8 | deltaW9 | deltaW10 | deltaW11 | deltaW12 | deltaW13 | deltaW14 | deltaW15 | deltaW16 | deltaW17 | deltaW18 | deltaW19 | deltaW20 | W1 | W2 | W3 | W4 | W5 | W6 | W7 | W8 | W9 | W10 | W11 | W12 | W13 | W14 | W15 | W16 | W17 | W18 | W19 | W20 | deltaBias |
+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+-----+----+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----------+
| 1  | 1  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1    | 0   | -1 | 0       | -1      | -1      | -1      | -1      | -1      | -1      | 1       | 1       | 1        | 1        | 1        | 1        | 1        | -1       | -1       | -1       | -1       | 1        | 1        | 0  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1         |
| -1 | -1 | 1  | 1  | 1  | 1  | 1  | 1  | -1 | -1 | -1  | -1  | -1  | -1  | -1  | 1   | 1   | 1   | 1   | -1  | -1  | 1    | -19 | -1 | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1         |
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --   | --  | -- | --      | --      | --      | --      | --      | --      | --      | --      | --      | --       | --       | --       | --       | --       | --       | --       | --       | --       | --       | --       | -- | -- | -- | -- | -- | -- | -- | -- | -- | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --        |
| 1  | 1  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1    | 19  | 1  | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1         |
| -1 | -1 | 1  | 1  | 1  | 1  | 1  | 1  | -1 | -1 | -1  | -1  | -1  | -1  | -1  | 1   | 1   | 1   | 1   | -1  | -1  | 1    | -19 | -1 | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1         |
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --   | --  | -- | --      | --      | --      | --      | --      | --      | --      | --      | --      | --       | --       | --       | --       | --       | --       | --       | --       | --       | --       | --       | -- | -- | -- | -- | -- | -- | -- | -- | -- | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --        |
+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+-----+----+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----------+

x1到x20表示代码转换的特征.

// Get RNA Labels
$labels = array();
$n = 1;
foreach ( $features as $k => $v ) {
    $labels[$k] = "x" . $n;
    $n ++;
}

这是online demo美元

Class Used:

class Profile {
    public $name, $data = array(), $score, $diff, $base;

    function __construct($name, array $importance) {
        $values = array(-1, 1); // Perception values
        $this->name = $name;
        foreach ($importance as $item => $point) {
            // Generate Random true/false for real Items
            $this->data[$item] = $values[mt_rand(0, 1)];
        }
        $this->base = array_sum($importance);
    }

    public function setScore($score, $diff) {
        $this->score = $score / $this->base;
        $this->diff = $diff / $this->base;
    }
}

Modified Perceptron Class

class Perceptron {
    private $w = array();
    private $dw = array();
    public $debug = false;

    private function initialize($colums) {
        // Initialize perceptron vars
        for($i = 1; $i <= $colums; $i ++) {
            // weighting vars
            $this->w[$i] = 0;
            $this->dw[$i] = 0;
        }
    }

    function train($input, $alpha, $teta) {
        $colums = count($input[0]) - 1;
        $weightCache = array_fill(1, $colums, 0);
        $checkpoints = array();
        $keepTrainning = true;

        // Initialize RNA vars
        $this->initialize(count($input[0]) - 1);
        $just_started = true;
        $totalRun = 0;
        $yin = 0;

        // Trains RNA until it gets stable
        while ($keepTrainning == true) {
            // Sweeps each row of the input subject
            foreach ($input as $row_counter => $row_data) {
                // Finds out the number of columns the input has
                $n_columns = count($row_data) - 1;

                // Calculates Yin
                $yin = 0;
                for($i = 1; $i <= $n_columns; $i ++) {
                    $yin += $row_data["x" . $i] * $weightCache[$i];
                }

                // Calculates Real Output
                $Y = ($yin <= 1) ? - 1 : 1;

                // Sweeps columns ...
                $checkpoints[$row_counter] = 0;
                for($i = 1; $i <= $n_columns; $i ++) {
                    /** DELTAS **/
                    // Is it the first row?
                    if ($just_started == true) {
                        $this->dw[$i] = $weightCache[$i];
                        $just_started = false;
                        // Found desired output?
                    } elseif ($Y == $row_data["o"]) {
                        $this->dw[$i] = 0;
                        // Calculates Delta Ws
                    } else {
                        $this->dw[$i] = $row_data["x" . $i] * $row_data["o"];
                    }

                    /** WEIGHTS **/
                    // Calculate Weights
                    $this->w[$i] = $this->dw[$i] + $weightCache[$i];
                    $weightCache[$i] = $this->w[$i];

                    /** CHECK-POINT **/
                    $checkpoints[$row_counter] += $this->w[$i];
                } // END - for

                foreach ($this->w as $index => $w_item) {
                    $debug_w["W" . $index] = $w_item;
                    $debug_dw["deltaW" . $index] = $this->dw[$index];
                }

                // Special for script debugging
                $debug_vars[] = array_merge($row_data, array(
                    "Bias" => 1,
                    "Yin" => $yin,
                    "Y" => $Y
                ), $debug_dw, $debug_w, array(
                    "deltaBias" => 1
                ));
            } // END - foreach

            // Special for script debugging
             $empty_data_row = array();
            for($i = 1; $i <= $n_columns; $i ++) {
                $empty_data_row["x" . $i] = "--";
                $empty_data_row["W" . $i] = "--";
                $empty_data_row["deltaW" . $i] = "--";
            }
            $debug_vars[] = array_merge($empty_data_row, array(
                "o" => "--",
                "Bias" => "--",
                "Yin" => "--",
                "Y" => "--",
                "deltaBias" => "--"
            ));

            // Counts training times
            $totalRun ++;

            // Now checks if the RNA is stable already
            $referer_value = end($checkpoints);
            // if all rows match the desired output ...
            $sum = array_sum($checkpoints);
            $n_rows = count($checkpoints);
            if ($totalRun > 1 && ($sum / $n_rows) == $referer_value) {
                $keepTrainning = false;
            }
        } // END - while

        // Prepares the final result
        $result = array();
        for($i = 1; $i <= $n_columns; $i ++) {
            $result["w" . $i] = $this->w[$i];
        }

        $this->debug($this->print_html_table($debug_vars));

        return $result;
    } // END - train
    function testCase($input, $results) {
        // Sweeps input columns
        $result = 0;
        $i = 1;
        foreach ($input as $column_value) {
            // Calculates teste Y
            $result += $results["w" . $i] * $column_value;
            $i ++;
        }
        // Checks in each class the test fits
        return ($result > 0) ? true : false;
    } // END - test_class

    // Returns the html code of a html table base on a hash array
    function print_html_table($array) {
        $html = "";
        $inner_html = "";
        $table_header_composed = false;
        $table_header = array();

        // Builds table contents
        foreach ($array as $array_item) {
            $inner_html .= "<tr>\n";
            foreach ( $array_item as $array_col_label => $array_col ) {
                $inner_html .= "<td>\n";
                $inner_html .= $array_col;
                $inner_html .= "</td>\n";

                if ($table_header_composed == false) {
                    $table_header[] = $array_col_label;
                }
            }
            $table_header_composed = true;
            $inner_html .= "</tr>\n";
        }

        // Builds full table
        $html = "<table border=1>\n";
        $html .= "<tr>\n";
        foreach ($table_header as $table_header_item) {
            $html .= "<td>\n";
            $html .= "<b>" . $table_header_item . "</b>";
            $html .= "</td>\n";
        }
        $html .= "</tr>\n";

        $html .= $inner_html . "</table>";

        return $html;
    } // END - print_html_table

    // Debug function
    function debug($message) {
        if ($this->debug == true) {
            echo "<b>DEBUG:</b> $message";
        }
    } // END - debug
} // END - class

Conclusion

识别没有唯一标识符的用户不是一项直接或简单的任务.这取决于收集足够量的随机数据，您可以通过各种方法从用户那里收集这些数据.

即使你 Select 不使用人工神经网络，我建议至少使用一个简单的概率矩阵，带有优先级和可能性——我希望上面提供的代码和示例能够让你继续下go .