Kaggle PM2.5 Prediction

嘗試用 sklearn 進行分析。 使用豐原站的觀測記錄,將資料分為訓練集 (train set) 與測試集 (test set): train.csv:每個月前 20 天的所有觀測資料。 test_X.csv:從每個月剩下的 10 天中取樣。每筆資料包含連續 10 小時,以前九小時的所有觀測數據作為 Feature,預測第十小時的 PM2.5 濃度。一共取出 240 筆不重複的測試資料。 sklearn 在使用上非常直接。目前的策略是採用最基礎的方式:取出所有前九小時的值作為 Feature,不進行額外的特徵工程或化簡,直接觀察結果。 在 Private 排名約在中間,略高於 Baseline。 因為使用的是 Linear Regression,對 Gradient Descent 而言:計算一次斜率,直接就能找到解。 My Github

2017-06-13 · 1 min read · 37 words · KbWen · ZH

Kaggle Titanic

Kaggle The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships. One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class. ...

2017-06-09 · 2 min read · 224 words · KbWen · EN

Kaggle Digit Recognizer

這是進入 Kaggle 的第一個試題:Kaggle digit recognizer。 這是一個用 CSV 儲存的 MNIST 問題,因此選用 CNN 來解決。資料格式如下: If we omit the “pixel” prefix, the pixels make up the image like this: 000 001 002 003 ... 026 027 028 029 030 031 ... 054 055 056 057 058 059 ... 082 083 | | | | ... | | 728 729 730 731 ... 754 755 756 757 758 759 ... 782 783 The test data set, (test.csv), is the same as the training set, except that it does not contain the “label” column. Your submission file should be in the following format: ...

2017-06-05 · 1 min read · 148 words · KbWen · ZH