Python 爬取每日股價(1)

如何取得每日的股價資訊 進入證交所每日收盤行情,選擇全部(不含…),可以看到有許多選項可點。 找到每日收盤行情 110.08.27每日收盤行情 點擊F12進入開發者環境,再點選Network,觀察我們要的數據資訊 Python及時股價 點選XHR找到傳送數據的Requests import requests url = "https://www.twse.com.tw/exchangeReport/MI_INDEX?response=json&date=20210827&type=ALLBUT0999&_=1630244648174" res = requests.get(url) res.json() 得到Json,並在data9找到全部的股票數據 {'alignsStyle1': [['center', 'center', 'center', 'center', 'center', 'center'], ... ... 'data9': [['0050', '元大台灣50', '16,875,047', '9,673', '2,328,482,421', '136.70', '138.50', '136.45', '138.15', '<p style= color:red>+</p>', '1.15', '138.15', '4', '138.20', '103', '0.00'], ['0051', '元大中型100', '17,810', '63', '1,003,448', '56.20', '56.60', '56.20', '56.60', '<p style= color:red>+</p>', '0.35', '56.55', '1', '56.60', '9', '0.00'], ... ... 'subtitle9': '110年08月27日每日收盤行情(全部(不含權證、牛熊證))'} 幾個重要的數據 ['0050', #股票代號 '元大台灣50', '16,875,047', #成交股數 '9,673', '2,328,482,421', '136.70', #開盤價 '138.50', #最高價 '136.45', #最低價 '138.15', #收盤價 '<p style= color:red>+</p>', '1.15', '138.15', '4', '138.20', '103', '0.00'], 輕鬆完成,在爬取過程中還是非常簡單的, ...

2021-08-29 · 1 min read · 96 words · KbWen · ZH

Deep Reinforcement learning

Reinforcement learning (RL) is a framework where agents learn to perform actions in an environment so as to maximize a reward. It’s actually training an AI to learn through every mistake and find the correct path without any label. The two main components are the environment and the agent. Deep Reinforcement learning (DRL) combined with deep learning technology is even more powerful. AlphaGo, is a typical application of deep reinforcement learning. ...

2020-10-26 · 2 min read · 356 words · KbWen · EN

Tensorflow2 -- MNIST

Tensorflow2.X和1.X有多了很多差別和使用方式, 今天用tf2來實作MNIST分類問題 MNIST MNIST是一個很標準的手寫數字分類問題, 數據集下載有很多方式,這次直接使用tf API提供的 28 * 28 且只有黑白的數據 開發 在local 起 jupyter lab 先看看GPU是否啟用 %matplotlib widget import matplotlib.pyplot as plt import tensorflow as tf import numpy as np # check gpu tf.config.list_physical_devices('GPU') tf.test.is_built_with_cuda() # output True 方法一 繼承 tf.keras.model class MLP(tf.keras.Model): def __init__(self): super().__init__() self.flatten = tf.keras.layers.Flatten() self.dense1 = tf.keras.layers.Dense(units=100, activation=tf.nn.relu) self.dense2 = tf.keras.layers.Dense(units=20, activation=tf.nn.leaky_relu) self.dense3 = tf.keras.layers.Dense(units=10) @tf.function def call(self, inputs): # [batch_size, 28, 28, 1] flat1 = self.flatten(inputs) # [batch_size, 784] dens1 = self.dense1(flat1) # [batch_size, 100] dens2 = self.dense2(dens1) # [batch_size, 20] dens3 = self.dense3(dens2) # [batch_size, 10] output = tf.nn.softmax(dens3) return output 使用tf.GradientTape訓練 # @tf.function def one_batch_step(X, y, **kwargs): with tf.GradientTape() as tape: y_pred = model(X) loss = tf.keras.losses.sparse_categorical_crossentropy(y_true=y, y_pred=y_pred) loss = tf.reduce_mean(loss) tf.print(f"{batch_index} loss {loss}", [loss]) with summary_writer.as_default(): tf.summary.scalar("loss", loss, step=batch_index) grads = tape.gradient(loss, model.variables) optimizer.apply_gradients(grads_and_vars=zip(grads, model.variables)) for epoch_index in range(num_epochs): for batch_index in range(num_batches): X, y = data_loader.get_batch(batch_size) one_batch_step(X, y, batch_index=batch_index) with summary_writer.as_default(): tf.summary.trace_export(name="model_trace", step=0, profiler_outdir=log_dir) tf.saved_model.save(model, f"saved/{model_name}") 方法二 使用keras Pipeline來疊每一層要用的函數,彈性較低,但非常適合簡單的Model ...

2020-09-26 · 1 min read · 194 words · KbWen · ZH

python 爬取及時股價

如何取得即時的股價資訊? 進入證交所提供的基本市況報導網站,在右上方輸入股票代號(以 2330 台積電為例)。你可以看到當日的最高、最低、成交價量以及最佳五檔等資訊。 此時在網頁上點選右鍵開啟 Inspect (檢查),打開 DevTools 並切換到 Network 欄位進行觀察: 你會發現網頁會持續發送 Request 到某個網址,名稱開頭是 getStockInfo,這就是我們要的數據源。 import requests url = "https://mis.twse.com.tw/stock/api/getStockInfo.jsp?ex_ch=tse_2330.tw" res = requests.get(url) res.json() 得到一個排列整齊的json {'queryTime': {'stockInfoItem': 4329, 'sessionKey': 'tse_2330.tw_20200908|', 'sessionStr': 'UserSession', 'sysDate': '20200908', 'sessionFromTime': -1, 'stockInfo': 2084673, 'showChart': False, 'sessionLatestTime': -1, 'sysTime': '12:05:35'}, 'referer': '', 'rtmessage': 'OK', 'exKey': 'if_tse_2330.tw_zh-tw.null', 'msgArray': [{'n': '台積電', 'g': '281_174_260_166_385_', 'u': '468.5000', 'mt': '060262', 'o': '428.0000', 'ps': '593', 'tk0': '2330.tw_tse_20200908_B_9998775018', 'a': '430.5000_431.0000_431.5000_432.0000_432.5000_', 'tlong': '1599537930000', 't': '12:05:30', 'it': '12', 'ch': '2330.tw', 'b': '430.0000_429.5000_429.0000_428.5000_428.0000_', 'f': '143_239_162_400_391_', 'w': '383.5000', 'pz': '428.0000', 'l': '427.5000', 'c': '2330', 'v': '16843', 'd': '20200908', 'tv': '-', 'tk1': '2330.tw_tse_20200908_B_9998774678', 'ts': '0', 'nf': '台灣積體電路製造股份有限公司', 'y': '426.0000', 'p': '0', 'i': '24', 'ip': '0', 'z': '-', 's': '-', 'h': '433.0000', 'ex': 'tse'}], 'userDelay': 5000, 'rtcode': '0000', 'cachedAlive': 7891} 在爬取網址時,不要亂刪後面的query parameters,除非你確認過差別是甚麼。 ...

2020-09-08 · 2 min read · 327 words · KbWen · ZH

Google NLP API parsing

使用google 提供的API做語意分析。 語意分析(syntactic analysis)能夠提取語言的訊息,把文章拆成句子,句子在拆成更小的每個分詞,做更進一步的分析,Goole NLP API 會給予每個字詞的詞性以及彼此的關係。 Analyzing syntax 進入GCP新增一個API Key 並確認NLP API狀態為enable;詳細的GCP申請操作步驟可以看官方文件。(或是以後有機會寫。) API Enabled 因為這次是介紹,所以使用google cloud shell;在平常使用下可以把某些步驟改成習慣的語言及IDE。 新增環境變數 export API_KEY=<YOUR_KEY> 確認輸入後,增加要丟進API的文字json檔 text.json { "document":{ "type":"PLAIN_TEXT", "content": "Beirut rescuers search the site for possible survivor 30 days after the explosion." }, "encodingType": "UTF8" } 標準的json檔輸入資訊:https://cloud.google.com/natural-language/docs 使用curl post資料 curl "https://language.googleapis.com/v1/documents:analyzeSyntax?key=${API_KEY}" \ -s -X POST -H "Content-Type: application/json" --data-binary @text.json 會得到解析出來的資訊 { "sentences": [ { "text": { "content": "Beirut rescuers search the site for possible survivor 30 days after the explosion.", "beginOffset": 0 } } ], "tokens": [ { "text": { "content": "Beirut", "beginOffset": 0 }, "partOfSpeech": { "tag": "NOUN", "aspect": "ASPECT_UNKNOWN", "case": "CASE_UNKNOWN", "form": "FORM_UNKNOWN", "gender": "GENDER_UNKNOWN", "mood": "MOOD_UNKNOWN", "number": "SINGULAR", "person": "PERSON_UNKNOWN", "proper": "PROPER", "reciprocity": "RECIPROCITY_UNKNOWN", "tense": "TENSE_UNKNOWN", "voice": "VOICE_UNKNOWN" }, "dependencyEdge": { "headTokenIndex": 1, "label": "NN" }, "lemma": "Beirut" }, { "text": { "content": "rescuers", "beginOffset": 7 }, "partOfSpeech": { "tag": "NOUN", "aspect": "ASPECT_UNKNOWN", "case": "CASE_UNKNOWN", "form": "FORM_UNKNOWN", "gender": "GENDER_UNKNOWN", "mood": "MOOD_UNKNOWN", "number": "PLURAL", "person": "PERSON_UNKNOWN", "proper": "PROPER_UNKNOWN", "reciprocity": "RECIPROCITY_UNKNOWN", "tense": "TENSE_UNKNOWN", "voice": "VOICE_UNKNOWN" }, "dependencyEdge": { "headTokenIndex": 2, "label": "NSUBJ" }, "lemma": "rescuer" }, { "text": { "content": "search", "beginOffset": 16 }, "partOfSpeech": { "tag": "VERB", "aspect": "ASPECT_UNKNOWN", "case": "CASE_UNKNOWN", "form": "FORM_UNKNOWN", "gender": "GENDER_UNKNOWN", "mood": "INDICATIVE", "number": "NUMBER_UNKNOWN", "person": "PERSON_UNKNOWN", "proper": "PROPER_UNKNOWN", "reciprocity": "RECIPROCITY_UNKNOWN", "tense": "PRESENT", "voice": "VOICE_UNKNOWN" } } ...... ], "language": "en" } 觀察一下上面的結果 ...

2020-09-04 · 2 min read · 233 words · KbWen · ZH

Before Data processing: ELT

Before ELT : ETL ETL stands for Extract, Transform, and Load. Historically, ETL has been the best and most reliable way to migrate data from one database to another. In addition to move data from one database to another, it also converts databases into a single format that can be utilized in the final point. Extract: Collecting data from different database. Sometimes using a staging table. Transform: It’s critical. Converting recently extracted data into the correct form so that it can be placed into another database. Sometimes there are other types of transformation involved in this step. Load: Load data into the target database or storage. ...

2020-09-02 · 2 min read · 254 words · KbWen · EN

Python Comments

開發時加入註釋有助於描述思考過程,並幫助自己和其他人了解意圖,可以更輕鬆地發現錯誤、改進程式,以及在其他地方做更多應用。 單行註釋 加入註釋以 # 開頭, # defining the start code startCode = 50 也可加在程式碼後方,會被忽略, startCode = 50 # defining the start code 注意不要加入無用的描述, 如同變數命名時不要取沒意義的名稱。 多行註釋 當要註釋的內容很多,或是撰寫文件、功能之類的,可以使用這種方式。 PEP8中建議單行不要超過79個字,一般情況則是會照公司或是團隊的開發習慣決定。 多行#開頭, # PythonComments version 1.0.3 # -a (--all): show all features # -h (--help): show the help # ..... 或是用""" 包住 """ PythonComments version 1.0.3 -a (--all): show all features -h (--help): show the help ..... """

2020-05-04 · 1 min read · 64 words · KbWen · ZH

Python f-string

python 3.6後,字串多了個處理方法 PEP 498 – Literal String Interpolation 下面直接用例子來比較f-string和我們之前常用的 %-formatting、str.format()語法不同之處 >>> # %-formatting ... >>> text = "Hello" >>> number1 = 10 >>> number2 = 20 >>> print("%s, test numbers are %s and %s" % (text, number1, number2)) Hello, test numbers are 10 and 20 >>> # str.format() ... >>> text = "Hello" >>> number1 = 10 >>> number2 = 20 >>> print("{}, test numbers are {} and {}".format(text, number1, number2)) Hello, test numbers are 10 and 20 >>> print("{0}, test numbers are {2} and {1}".format(text, number1, number2)) Hello, test numbers are 20 and 10 >>> # f-string ... >>> text = "Hello" >>> number1 = 10 >>> number2 = 20 >>> print(f"{text}, test numbers are {number1} and {number2}") Hello, test numbers are 10 and 20 F-string 看起來更python了,也解決了之前會遇到的問題;例如使用 %時的參數限制等等。 在變數變多的情況下更易讀也易改。 嘗試做更多操作 >>> f"{3 + 8}" '11' >>> text = "Literal String Interpolation" >>> f"{text.upper()}" 'LITERAL STRING INTERPOLATION' >>> f"{1/3:.2f}" '0.33' 也可以放入lambda表達式。 ...

2020-04-24 · 1 min read · 152 words · KbWen · ZH

Python lambda

lambda function(匿名函式) 基本語法 lambda arg1, arg2,... : expression fun = lambda x: x + 1 print(fun(5)) >>> 6 lambda function可以看做是一個簡單的function, 有好幾個輸入,但是只能有一個運算式。 適合的使用時機 有幾個時機適合使用lambda function 無法重複使用:“don’t repeat yourself”,因此若知道這個功能簡單且不會在類似的地方重複使用,那這是個好時機。 不想去想變數名稱:在實作功能時,會希望變數名稱就能知道這個東西可能會是甚麼,而不是只有x,y,i,j等等看不出意義或是會搞混的名稱;要注意情況,大多還是乖乖想名字吧。

2020-04-23 · 1 min read · 28 words · KbWen · ZH

Python context manager

內文管理器 python with 語句,能讓我們更輕易的實行資源管理,例如數據、開啟文件,或是各種會lock的行為。 要保證處理完相關事情,資源有被釋放。 簡單行為中,我們會這樣去開啟文件 test_file = open('test.txt', 'w') try: test_file.write('line one') finally: test_file.close() 上述行為除了是非慣用以外, 若try-finally裡面邏輯複雜,還面臨著維護的困難。 這裡有著使用 with 的簡單用法 with open('test.txt', 'w') as test_file: test_file.write('line one') 上述程式碼中,當 with 內的語句執行結束後,會自動關閉該資源,且變數test_file也會結束。 實現context manager 若想實現 context manager的功能,則要定義好__enter__ 與 __exit__ 兩個函式,分別管理with的進入行為和結束行為。

2020-04-14 · 1 min read · 38 words · KbWen · ZH