Sijie Zhu

alt text  Research Scientist
ByteDance Inc.
Email: zhusijie006 at gmail.com
Feel free to reach out for any questions or potential collaborations.
Google Scholar | Github

Sijie Zhu is a Research Scientist at ByteDance Inc. He received his PhD degree from UCF supervised by Prof. Chen Chen. He won the award for Excellence in Outstanding Dissertation (College of Engineering & Computer Science 2022). He received his master's degree from University of Chinese Academy of Sciences, and bachelor's degree from University of Science and Technology of China. During summer 2021, he was a research intern in Adobe Research, working with Zhe Lin, Scott Cohen, Zhifei Zhang, Jason Kuen. During summer 2022, he was a research intern in ByteDance, working with Heng Wang, Linjie Yang, Xiaohui Shen, Quan Wang.

Research Interests

My research interests include

  • Multimodal LLM for Image/Video Understanding

  • Intelligent/Generative Image/Video Editing

  • Geo-localization / Metric Localization

  • Metric Learning / Image Retrieval

Selected Publications

  1. Multi-Reward as Condition for Instruction-based Image Editing
    Xin Gu, Ming Li, Libo Zhang, Fan Chen, Longyin Wen, Tiejian Luo, Sijie Zhu
    arXiv:2411.04713  

  2. Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model
    Lu Xu, Sijie Zhu, Chunyuan Li, Chia-Wen Kuo, Fan Chen, Xinyao Wang, Guang Chen, Dawei Du, Ye Yuan, Longyin Wen
    arXiv:2406.10484   [Dataset]  


  3. CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
    Jiachen Li, Xinyao Wang, Sijie Zhu, Chia-Wen Kuo, Lu Xu, Fan Chen, Jitesh Jain, Humphrey Shi, Longyin Wen
    arXiv:2405.05949   [Code]  


  4. Edit3K: Universal Representation Learning for Video Editing Components
    Xin Gu, Libo Zhang, Fan Chen, Longyin Wen, Yufei Wang, Tiejian Luo, Sijie Zhu
    arXiv:2403.16048  

  5. R2Former: Unified Retrieval and Reranking Transformer for Place Recognition
    Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023 (Highlight, top 10% of accepted papers)   [Code]  
    (The first place at the MSLS Place Recognition Challenge.)

  6. TopNet: Transformer-based Object Placement Network for Image Compositing
    Sijie Zhu, Zhe Lin, Scott Cohen, Jason Kuen, Zhifei Zhang, Chen Chen
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023  

  7. TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization
    Sijie Zhu, Mubarak Shah, Chen Chen
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022   [Code]  

  8. GALA: Toward Geometry-and-Lighting-Aware Object Search for Compositing
    Sijie Zhu, Zhe Lin, Scott Cohen, Jason Kuen, Zhifei Zhang, Chen Chen
    European Conference on Computer Vision (ECCV), 2022  

  9. Deep Learning-Based Human Pose Estimation: A Survey
    Ce Zheng, Wenhan Wu, Chen Chen, Taojiannan Yang, Sijie Zhu, Ju Shen, Nasser Kehtarnavaz, Mubarak Shah
    ACM Computing Surveys   [Project Page]   [Versions: 1, 2, 3, 4]

  10. 3D Human Pose Estimation with Spatial and Temporal Transformers
    Ce Zheng, Sijie Zhu, Matias Mendieta, Taojiannan Yang, Chen Chen, Zhengming Ding
    IEEE International Conference on Computer Vision (ICCV), 2021   [Code]  

  11. VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval
    Sijie Zhu, Taojiannan Yang, Chen Chen
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021   [Dataset and Code]   [Poster]

  12. A3D: Adaptive 3D Networks for Video Action Recognition
    Sijie Zhu, Taojiannan Yang, Matias Mendieta, Chen Chen
    arXiv:2011.12384

  13. GradAug: A New Regularization Method for Deep Neural Networks
    Taojiannan Yang, Sijie Zhu, Chen Chen
    Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS), 2020   [Poster]   [Code]

  14. Visual Explanation for Deep Metric Learning
    Sijie Zhu, Taojiannan Yang, Chen Chen
    IEEE Transactions on Image Processing (TIP), 2021   [Code]   [Versions: 1, 2, 3, 4]

  15. MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution
    Taojiannan Yang, Sijie Zhu, Chen Chen, Yan Shen, Mi Zhang, Andrew Willis
    European Conference on Computer Vision (ECCV), 2020 (Oral)   [Code]   [Video Presentation]

  16. Revisiting Street-to-Aerial View Image Geo-localization and Orientation Estimation
    Sijie Zhu, Taojiannan Yang, Chen Chen
    Winter Conference on Applications of Computer Vision (WACV), 2021

  17. Density Map Guided Object Detection in Aerial Images
    Changlin Li, Taojiannan Yang, Sijie Zhu, Chen Chen, Shanyue Guan
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (EarthVision Workshop), 2020   [Code]

  18. Video Anomaly Detection for Smart Surveillance
    Sijie Zhu, Chen Chen, Waqas Sultani
    Book Chapter of Computer Vision: A Reference Guide, 2020   [Arxiv]

 Full list of publications