2021-05-30发表2025-04-15更新技术 / python / jupyter2 分钟读完 (大约331个字)

Jupyter做数据分析常用脚本

本文介绍jupyter中分析基础数据常用的一些工具和指令

DataFrame

设置相关

pd.set_option('display.max_rows', None) # 显示所有列
pd.set_option('display.max_rows', 10)  #最多显示10列
pd.set_option('display.max_rows', None)  # 显示所有行
pd.set_option('display.max_rows', 10)  #最多显示10行

pd.set_option('display.precision', 2) #展示两位小数点

数据读取

# excel读取
df = pd.read_excel('test.xlsx')

# 数据库读取
engine= create_engine('sqlite:///test.db')
df = pd.read_sql('tbl_user', engine)

# 自己拼接数据
all_data=[{'a', 1}, {'b', 2}, {'c', 1}, {'a', 2}]
df = pd.DataFrame(all_data)

数据加工

# 分组处理，可多组
df.groupby(["post_source", "host"]).count().reset_index()
df.groupby(["post_source", "host"]).max().reset_index()
df.groupby(["post_source", "host"]).min().reset_index()
df.groupby(["post_source", "host"]).mean().reset_index()

# 数据链接 join
pd.merge(trdf, tgrdf, 'left', left_on='GROUP_ID', right_on='GROUP_ID')

# 数据拼接 union
df = pd.concat([df1, df2], ignore_index=True, sort=True)

# 列处理
def trim(row, col='name'):
    return row.get(col).strip()

df['name'] = df.apply(trim, axis=1, col='name')

# 数据排序
df.sort_values('score', ascending=False).head(15)

# 字符筛选，模糊匹配，可正则
df[df['name'].str.contains('Africa')]
df[df['name'].str.contains(r'A.{5}$')]

# 字符转日期 2023015
df['SYS_DATE'] = pd.to_datetime(df['SYS_DATE'], format='%Y%m%d')

画图

1
2
3

# 线图
df.plot('SYS_DATE',kind='line', figsize=(20,5))

数据导出

# 导出成excel
df.to_excel('test_resulut.xlsx', index=False)

# 导出成csv
df.to_csv('test_result.xlsx', index=False)

Jupyter做数据分析常用脚本

http://www.lephee.net/2021/05/30/jupyter/

作者

LePhee

发布于

2021-05-30

更新于

2025-04-15

许可协议

#jupyter python

Jupyter做数据分析常用脚本

DataFrame

设置相关

数据读取

数据加工

画图

数据导出

作者

发布于

更新于

许可协议

喜欢这篇文章？打赏一下作者吧

评论

目录

链接

分类

最新文章

归档

标签

广告