I took the 254 GPT-4 captioned photos found in this dataset, then repeated all 254 photos with WD-14 tagging, then added an additional 99 photos tagged with WD-14 to make this very responsive model. You should be able to get good results no matter how you phrase your prompt, so long as it's in English.