This paper investigates the possibility of performing deep learning classification directly on file bytes without the need for decoding files at inference time. The authors demonstrate ByteFormer, a model that achieves 77.33% ImageNet Top-1 classification accuracy when training and testing directly on TIFF file bytes using a transformer backbone, and 95.42% classification accuracy when operating on WAV files. The model also has applications in privacy-preserving inference, allowing for inference on obfuscated input representations with no loss of accuracy. Additionally, the authors propose ByteFormers ability to perform inference on privacy-preserving camera input. The removal of modality-specific input preprocessing provides benefits for model development, while the maintenance of privacy makes ByteFormer an attractive prospect for private inference. Code for ByteFormer is available at https://github.com/apple/ml-cvnets/tree/main/examples/byteformer.