Cross-modality interaction in sensory perception is advantageous for animals' survival. How cortical sensory processing is cross-modally modulated and what are the underlying neural circuits remain poorly understood. In mouse primary visual cortex (V1), we discovered that orientation selectivity of layer (L)2/3, but not L4, excitatory neurons was sharpened in the presence of sound or optogenetic activation of projections from primary auditory cortex (A1) to V1. The effect was manifested by decreased average visual responses yet increased responses at the preferred orientation. It was more pronounced at lower visual contrast and was diminished by suppressing L1 activity. L1 neurons were strongly innervated by A1-V1 axons and excited by sound, while visual responses of L2/L3 vasoactive intestinal peptide (VIP) neurons were suppressed by sound, both preferentially at the cell's preferred orientation. These results suggest that the cross-modality modulation is achieved primarily through L1 neuron- and L2/L3 VIP-cell-mediated inhibitory and disinhibitory circuits.