Optimizing latent goal by learning from trajectory preference