Yu Zhu (祝宇) Ph.D. Candidate, Information SystemsUniversity of Utah, USA
Ph.D., FinanceZhejiang University, China
Yu Zhu
published on 2024-03-01 How to average embeddings of a sequence using a mask?
Yu Zhu
published on 2024-02-05 One Figure to Summarize Pytorch Memory Management
When training a language model, there’re three possible places to tokenize your text. Which one is the most efficient?
The linking tables in CCM often have consecutive date ranges. How to collapse them?
Yu Zhu
published on 2024-01-05 NAICS classification (as CSV tables) download
Yu Zhu
published on 2024-01-02 A collection of solutions for data wrangling problems, tailored for business school students
Yu Zhu
published on 2023-06-19 What’s an Investor Network, how to build one, and what can we do with it?
One table to compare popular tokenization methods: BPE, WordPiece, and SentencePiece.
This article comapres the implementation details between the original Transformer and GPT. These tricks are critical to performance but not always explained in the paper.
In a setting of PEAD (post-earnings-announcement-drift) prediction using earnings call transcripts, I found Transformers (deep learning models) have a larger performance lead on extreme data points (data at the tails of the distribution).
A highly efficient (>10x speedup) and concise (core part is less than 40 lines) event study R program that replicates WRDS’s SAS version.
How date and time are stored/processed in R and SAS, and how to use them wisely in WRDS or in your daily programming.
Yu Zhu
published on 2023-05-26 As a Git non-prouser, I found these tips very helpful.
Yu Zhu
published on 2023-05-26 Some facts on earnings announcement and analyst revision that you may not know before.
Yu Zhu
published on 2023-05-26 How to choose the right expected return model?
Quick recipe for a synthetic control study
Quick recipe for a DD study
Under what conditions does a significant coefficient means causality?
Bypass WRDS’s DUO verification
Downloading with Rsync could be 10X to 100X faster than downloading from Web or PostgreSQL