Python 與 MS SQL 的愛恨情仇

Pei Lee
11 min readOct 2, 2017

--

記錄一下最近面臨的各種失敗。

最近要將公司 A 資料庫的資料,跑一些邏輯後,存進 B 資料庫,感覺是很簡單的程序,但不知為何就是各種失敗,現在甚至還沒有解決辦法,總之先整理一下目前的歷程。

嘗試一:pyodbc 跑 SQL 指令

  1. import 套件,建立 A 資料庫連線:
import pyodbc
import pandas as pd
server = '(IP位置)'
database = '(資料庫名稱)'
username = '(帳號)'
password = '(密碼)'
cnxn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()

2. 撈 A 資料庫的資料,取出全部:

cursor.execute("SELECT * from (Table名稱) WHERE (條件一) AND (條件二)")rows = cursor.fetchall()

噴 error:

UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 0-1: illegal UTF-16 surrogate

處理方式:逃避現實,fetch 每個 row,有問題就 continue 假裝沒看到

# 取得 column name
colList = []
for colInfo in cursor.description:
colList.append(colInfo[0])
# 讀取得每一個 row
rowList = []
while True:
try:
row = cursor.fetchone()
if row:
rowList.append(list(row))
else:
break
except:
continue;

# 存成 dataframe
data = pd.DataFrame(rowList)
data.columns = colList

3. 連結 B 資料庫:

# 連目的地資料庫
server2 = '(IP位置)'
database2 = '(資料庫名稱)'
username2 = '(帳號)'
password2 = '(密碼)'
cnxn2 = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER='+server2+';DATABASE='+database2+';UID='+username2+';PWD='+ password2)
cursor2 = cnxn2.cursor()

4. 在 B 資料庫建立 table:

sql_command = """
CREATE TABLE Supplier
(
AccountId CHAR(36) PRIMARY KEY,
ChnName VARCHAR(50),
EngName VARCHAR(100),
TaxIDNumber VARCHAR(10),
ChnBrand VARCHAR(20),
EngBrand VARCHAR(20),
CompanyProfile VARCHAR(MAX),
CityCode VARCHAR(12),
District VARCHAR(5),
EstablishDate DATE,
ChargePerson VARCHAR(5),
LogoImage VARCHAR(MAX),
NumberOfEmployees INT,
ParentAccountId VARCHAR(50),
WebSiteURL VARCHAR(100)
)
"""
try:
cursor2.execute(sql_command)
cnxn2.commit()
except pyodbc.ProgrammingError:
print("Table 'Supplier' already exists.")

5. insert 資料:

for idx, row in data.iterrows():
insertList = row.tolist()
insertString = ""
try:
for item in insertList:
if insertString:
insertString = insertString + ', ' + "\'" + str(item) + "\'"
else:
insertString = "\'" + item + "\'"
cursor2.execute("""insert into Supplier(
AccountId,
ChnName,
EngName,
TaxIDNumber,
ChnBrand,
EngBrand,
CompanyProfile,
CityCode,
District,
EstablishDate,
ChargePerson,
LogoImage,
NumberOfEmployees,
ParentAccountId,
WebSiteURL) values ( %s )""" % insertString)
cnxn2.commit()
except:
continue

6. 確認資料 insert 成功與否:失敗,什麼都沒 print 出來

cursor2.execute("select * from Supplier")for row in cursor2:
print(row)

7. 那先取 10 個 column 看看呢?先建一個 10 個 column 的 table

# 建 table
sql_command = """
CREATE TABLE Supplier2
(
AccountId CHAR(36) PRIMARY KEY,
ChnName VARCHAR(50),
EngName VARCHAR(100),
TaxIDNumber VARCHAR(10),
ChnBrand VARCHAR(20),
EngBrand VARCHAR(20),
CompanyProfile VARCHAR(MAX),
CityCode VARCHAR(12),
District VARCHAR(5),
EstablishDate DATE
)
"""
try:
cursor2.execute(sql_command)
cnxn2.commit()
except pyodbc.ProgrammingError:
print("Table 'Supplier2' already exists.")

8. 擷取資料,insert 進去:

testdf = twdf.iloc[:,:10]for idx, row in testdf.iterrows():
insertList = row.tolist()
insertString = ""
try:
for item in insertList:
if insertString:
insertString = insertString + ', ' + "\'" + str(item) + "\'"
else:
insertString = "\'" + item + "\'"
cursor2.execute("""insert into Supplier2(
AccountId,
ChnName,
EngName,
TaxIDNumber,
ChnBrand,
EngBrand,
CompanyProfile,
CityCode,
District,
EstablishDate) values ( %s )""" % insertString)
cnxn2.commit()
except:
continue

9. 確認資料 insert 成功與否::

cursor2.execute("select * from Supplier2")for row in cursor2:
print(row)

有開始在跑資料了,感覺好像有 insert 進去,先中斷,確認一下有幾筆:

rows = cursor2.fetchall()viewdf = pd.DataFrame(rows)viewdf.shape

只有 32,328 筆……但是來看看我要 insert 進去的資料:

testdf.shape

361,631 筆,覺得投降。

嘗試二:用用看 pandas 內建的 read_sql, to_sql

參考這個影片

  1. 連結資料庫,按照影片上的話:
import pandas as pd
import pyodbc
server = '(IP位置)'
database = '(資料庫名稱)'
conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';Trusted_Connection=yes')

噴 error:

Error: ('28000', '[28000] [Microsoft][ODBC SQL Server Driver][SQL Server]�n�J���ѡC���n�J�O�Ӧۥ����H��������A����Ω� Windows ���ҡC (18452) (SQLDriverConnect)')

2. 用之前的方式連:

import pyodbc
import pandas as pd
server = '(IP位置)'
database = '(資料庫名稱)'
username = '(帳號)'
password = '(密碼)'
conn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)

3. select 資料進 read_sql:

sql="""
SELECT * from (Table名稱) WHERE (條件一) AND (條件二)
"""
df = pd.read_sql(sql, conn)
df.head()

噴 error:

ProgrammingError: ('42000', '[42000] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]��W�\u07b9D���Ѫ�: �L�k�}�� SQL Server �s�u [53].  (53) (SQLExecDirectW)')DatabaseError: Execution failed on sql '
SELECT * from (Table名稱)
WHERE (條件一) AND (條件二)': ('42000', '[42000] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]��W�\u07b9D���Ѫ�: �L�k�}�� SQL Server �s�u [53]. (53) (SQLExecDirectW)')

嘗試三:sqlalchemy

  1. import 套件,連結資料庫:
import pandas as pd
from sqlalchemy import create_engine, Table, Column, Sequence, String, Integer, Unicode, MetaData, ForeignKey
from sqlalchemy.sql import select
from sqlalchemy import text
server = '(IP位置)'
database = '(資料庫名稱)'
username = '(帳號)'
password = '密碼'
driver = '{SQL Server}'
dsn_name = '(SQL Server名稱)' # 需從 Windows ODBC 資料來源管理員確認名稱
# 建立 engine 並連線至 SQL Server
engine = create_engine('mssql+pyodbc://'+username+':'+password+'@'+dsn_name)
connection = engine.connect()

2. select 資料:

# recommended
cmd = 'select * from (伺服器連結名稱).(資料庫名稱).[dbo].(table名稱)'
result = connection.execute(text(cmd))

3. 把資料 append 進 list,塞進 Dataframe裡面,確認筆數:

resultList = []
for row in result:
resultList.append(list(row))
df = pd.DataFrame(resultList)
df.shape

4. 一樣噴編碼的 error。

(未完待續)

--

--

Pei Lee
Pei Lee

Written by Pei Lee

Data Team Lead @ Retail & E-commerce Industry

No responses yet