面源計算用到的副程式

Table of contents

背景
程式說明
檔案下載
Reference

背景

為消除資料庫在時間與空間的相依性，需要消除資料庫類別-縣市之維度，此處即針對面源的時間變異係數與類別-縣市之對照表(nc_fac.json)進行計算，以備未來套用。主要問題出現在環保署提供的時變係數檔，空間資料為中文名稱，與主資料庫是鄉鎮區代碼2碼編號不能對應，分3個程式說明：
- 時間變異係數(csv)檔案前處理
- csv檔案之產生
- 將csv檔案應用到面源排放量資料庫，並展開至全年逐時之序列，存成nc_fac.json。
排放量整體處理原則參見處理程序總綱、針對面源之處理及龐大.dbf檔案之讀取與重新計算網格座標，為此處之前處理。

程式說明

引用模組及時間標籤轉換`dt2jul`, `jul2dt`

m3.nc檔案的時間標籤TFLAG是個整數的序列jul，而為能計算，需轉成datetime。

kuang@114-32-164-198 /Users/TEDS/teds10_camx/HourlyWeighted/area
$ cat -n include3.py 
import numpy as np
from pandas import *
import sys, os, subprocess
import netCDF4
from datetime import datetime, timedelta
import twd97
from include2 import rd_ASnPRnCBM_A

def dt2jul(dt):
 yr=dt.year
 deltaT=dt-datetime(yr,1,1)
 deltaH=int((deltaT.total_seconds()-deltaT.days*24*3600)/3600.)
 return (yr*1000+deltaT.days+1,deltaH*10000)

def jul2dt(jultm):
 jul,tm=jultm[:]
 yr=int(jul/1000)
 ih=int(tm/10000.)
 return datetime(yr,1,1)+timedelta(days=int(jul-yr*1000-1))+timedelta(hours=ih)


資料庫的網格化`disc`

座標軸中心點(Xcent,Ycent)、轉成nc檔案的IX、IY標籤
使用pivot_table加總，會自動啟動平行化作業。

def disc(dm,nc):
#discretizations
 Latitude_Pole, Longitude_Pole = 23.61000, 120.9900
 Xcent, Ycent = twd97.fromwgs84(Latitude_Pole, Longitude_Pole)
 dm['IX']=np.array((dm.UTME-Xcent-nc.XORIG)/nc.XCELL,dtype=int)
 dm['IY']=np.array((dm.UTMN-Ycent-nc.YORIG)/nc.YCELL,dtype=int)
 #time_const or time_variant df files
 if 'JJJHH' not in dm.columns:
   dmg=pivot_table(dm,index=['nsc2','IX','IY'],values=cole,aggfunc=sum).reset_index()
 else:
   dmg=pivot_table(dm,index=['IX','IY','JJJHH'],values=cole,aggfunc=sum).reset_index()
 return dmg


PM成份劃分

理論上PM的劃分也應從SPECIATE資料庫來，但目前本程式尚未引用，只有CCRS、FCRS、CPRM、FPRM 4項。只用簡單邏輯的劃分：
- 如果是燃燒源(C+N+S) > 0：所有細顆粒都是FPRM、PM-PM25則為CPRM
- 如果非燃燒源(C+N+S) == 0 且 V==0：所有細顆粒都是FCRS、PM-PM25則為CCRS
- 如果非燃燒源且為VOC逸散源(事實上無PM排放，但還是留下邏輯)：一半為CRS、一半為PRM，粗細皆同

#A simple scheme is in place for PM splitting, and the SPECCIATE is not adopted.
def add_PMS(dm):
 #add the PM columns and reset to zero
 colc=['CCRS','FCRS','CPRM','FPRM']
 for c in colc:
   dm[c]=np.zeros(len(dm))
 #in case of non_PM sources, skip the routines
 if 'EM_PM' not in dm.columns or sum(dm.EM_PM)==0:return dm
 # fugitive sources
 not_burn=dm.loc[dm.EM_NOX+dm.EM_CO+dm.EM_SOX==0]
 crst=not_burn.loc[not_burn.EM_PM>0]
 idx=crst.index
 dm.loc[idx,'FCRS']=np.array(crst.EM_PM25)
 dm.loc[idx,'CCRS']=np.array(crst.EM_PM)-np.array(crst.EM_PM25)
 # combustion sources allocated into ?PRM, not PEC or POA
 burn=dm.loc[(dm.EM_NOX+dm.EM_CO+dm.EM_SOX)>0]
 prim=burn.loc[burn.EM_PM>0]
 idx=prim.index
 dm.loc[idx,'FPRM']=np.array(prim.EM_PM25)
 dm.loc[idx,'CPRM']=np.array(prim.EM_PM)-np.array(prim.EM_PM25)
 # check for left_over sources(NMHC fugitives), in fact no PM emits at all
 boo=(dm.EM_PM!=0) & ((dm.CCRS+dm.FCRS+dm.CPRM+dm.FPRM)==0)
 idx=dm.loc[boo].index
 if len(idx)!=0:
   res=dm.loc[idx]
   dm.loc[idx,'FPRM']=np.array(res.EM_PM25)/2
   dm.loc[idx,'FCRS']=np.array(res.EM_PM25)/2
   dm.loc[idx,'CPRM']=(np.array(res.EM_PM)-np.array(res.EM_PM25))/2.
   dm.loc[idx,'CCRS']=(np.array(res.EM_PM)-np.array(res.EM_PM25))/2.
 return dm


VOCs成分劃分

VOC成份劃分的方式
- 為污染源類別的特性，這個特性的對照關係在ASSIGN-A.TXT(df_asgn)中設定，
- 如果要修改、進版，只需要依據最新的SPECIATE資料庫內容，修正這個對照關係即可。
VOC成份與模式模擬成份的對照關係
- 與所選取的光化機制有關，這個對照關係需要2個對照表，
  - 一者是V_PROFIL.TXT(df_prof)，這個檔案也是SPECIATE資料庫的內容，
  - 另一個是CBM.DAT(BASE)這是碳鍵機制的制式表格，與碳鍵機制版本有關。
profile number
- 由於PRO_NO的個數有限，沒有必要臨時再計算累加各物質碳鍵，
- 可以事先準備好，做好對照關係(prof_cbm)，
- 計算時只要叫出PRO_NO對照到的(每單位重量)碳鍵莫耳數(prod)，直接與排放量相乘即可。

def add_VOC(dm,n):
 df_asgn,df_prof,df_cbm=rd_ASnPRnCBM_A()
 df_asgn.NSC=[i.strip() for i in df_asgn.NSC]
 MW={i:j for i,j in zip(list(df_cbm['SPE_NO']),list(df_cbm['MW']))}
 BASE={i:j for i,j in zip(list(df_cbm['SPE_NO']),list(df_cbm['BASE']))}
 colv='OLE PAR TOL XYL FORM ALD2 ETH ISOP NR ETHA MEOH ETOH IOLE TERP ALDX PRPA BENZ ETHY ACET KET'.split()
 NC=len(colv)
 try:
   prof_cbm=read_csv('prof_cbm.csv')
   prof_cbm.PRO_NO=['{:04d}'.format(m) for m in prof_cbm.PRO_NO]
 except:
   HC=1
   prof_cbm=DataFrame({})
   prof_cbm['PRO_NO']=list(set(df_prof.PRO_NO))
   for c in colv:
     prof_cbm[c]=0.
   for i in range(len(prof_cbm)):
     prof=prof_cbm.PRO_NO[i]
     spec=df_prof.loc[df_prof.PRO_NO==prof].reset_index(drop=True)
     for K in range(len(spec)):
       W_K_II,IS=spec.WT[K],spec.SPE_NO[K]
       if W_K_II==0.0 or sum(BASE[IS])==0.0:continue
       VOCwt=HC*W_K_II/100. #in T/Y
       VOCmole=VOCwt/MW[IS] #in Tmole/Y
       for LS in range(NC): #CBM molar ratio
         if BASE[IS][LS]==0.:continue
         prof_cbm.loc[i,colv[LS]]+=VOCmole*BASE[IS][LS]
   prof_cbm.set_index('PRO_NO').to_csv('prof_cbm.csv')

 #matching the category profile number
 if n not in set(df_asgn.NSC):sys.exit('nsc not assigned: '+n)
 prof=df_asgn.loc[df_asgn.NSC==n,'PRO_NO'].values[0]
 if prof not in set(prof_cbm.PRO_NO):sys.exit('prof not found: '+prof)
 prod=prof_cbm.loc[prof_cbm.PRO_NO==prof].iloc[0,1:]
 HC=dm.loc[dm.nsc2==n,'EM_NMHC']
 idx=HC.index
 for LS in range(NC): #CBM molar ratio
   dm.loc[idx,colv[LS]]=HC*prod[LS]
 return dm


VOCs資料庫之讀取`rd_ASnPRnCBM_A`

來自include2.py

def rd_ASnPRnCBM_A():
   from pandas import DataFrame, read_csv
   import subprocess
   ROOT='/'+subprocess.check_output('pwd',shell=True).decode('utf8').strip('\n').split('/')[1]
   fname=ROOT+'/TEDS/teds10_camx/HourlyWeighted/area/ASSIGN-A.TXT'
   df_asgn=read_csv(fname,header=None,delim_whitespace = True)
   df_asgn.columns=['NSC','PRO_NO']+[str(i) for i in range(len(df_asgn.columns)-2)]
   df_asgn.fillna(0,inplace=True)
   df_asgn.PRO_NO=['{:04d}'.format(int(m)) for m in df_asgn.PRO_NO]
   for i in range(len(df_asgn)):
       nsc=df_asgn.NSC[i]
       if not nsc[-1].isalpha():
           df_asgn.loc[i,'NSC']=nsc.strip()+'b'                
   fname=ROOT+'/TEDS/teds10_camx/HourlyWeighted/area/V_PROFIL.TXT'
   with open(fname) as text_file:
       d=[line[:41] for line in text_file]
   PRO_NO,SPE_NO,WT=[i[:4] for i in d],[int(i[11:14]) for i in d],[float(i[24:30]) for i in d]
   df_prof=DataFrame({'PRO_NO':PRO_NO,'SPE_NO':SPE_NO,'WT':WT})
   NC=20
   fname=ROOT+'/TEDS/teds10_camx/HourlyWeighted/line/CBM.DAT'
   with open(fname) as text_file:
       d=[line.strip('\n') for line in text_file]
   d=d[1:]
   SPE_NO,MW=[int(i[41:44]) for i in d],[float(i[57:63]) for i in d]
   BASE=[[i[63+j*6:63+(j+1)*6] for j in range(NC)] for i in d]
   d=BASE
   for i in range(len(d)):
       ii=d[i]
       for j in range(NC):
           s=ii[j].strip(' ')
           if len(s)==0:
               BASE[i][j]=0.
           else:
               BASE[i][j]=float(s)
   df_cbm=DataFrame({'SPE_NO':SPE_NO,'MW':MW,'BASE':BASE})
   return (df_asgn,df_prof,df_cbm)


mostfreqword

序列中最常使用到的字串，作為pivot_table的aggfunc

def compareItems(wc1,wc2):
    (w1,c1), (w2,c2)=wc1,wc2
    if c1 > c2:
        return - 1
    elif c1 == c2:
        return cmp(w1, w2)
    else:
        return 1
def mostfreqword(list_of_w):
    counts = {}
    for w in list_of_w:
        counts[w] = counts.get(w,0) + 1
    it=sorted(counts.items(),reverse=False)
    return it[0][0]
def mostfreq10word(list_of_w):
    if len(list_of_w)<10: 
        return []
    counts = {}
    for w in list_of_w:
        counts[w] = counts.get(w,0) + 1
    it=sorted(counts.items(),reverse=True)
#p2    it=compareItems(it)
#p2    it=counts.items()
#p2    it.sort(compareItems)
    return [(it[x][0],it[x][1]) for x in xrange(10)]

檔案下載

Download: python程式：include2.py

面源計算用到的副程式

背景

程式說明

引用模組及時間標籤轉換dt2jul, jul2dt

資料庫的網格化disc

PM成份劃分

VOCs成分劃分

VOCs資料庫之讀取rd_ASnPRnCBM_A

mostfreqword

檔案下載

Reference

引用模組及時間標籤轉換`dt2jul`, `jul2dt`

資料庫的網格化`disc`

VOCs資料庫之讀取`rd_ASnPRnCBM_A`