Open Access
Volume 36, Number 3, June 2018
Page(s) 522 - 527
Published online 08 October 2018
  1. Li S, Kulkarni G, Berg T L, et al. Composing Simple Image Descriptions Using Web-Scale N-Grams[C]//Proceedings of the Fifteenth Conference on Computational Natural Language Learning Association for Computational Linguistics, 2011: 220–228 [Google Scholar]
  2. Kuznetsova P, Ordonez V, Berg T, et al. Treetalk:Composition and Compression of Trees for Image Descriptions[J]. Transactions of the Association of Computational Linguistics, 2014, 2(1):351-362 [Article] [Google Scholar]
  3. Kiros R, Salakhutdinov R, Zemel R S. Multimodal Neural Language Models[C]//International Conference on Machine Learning. 2014: 595–603 [Google Scholar]
  4. Donahue J, Hendricks L A, Guadarrama S, et al. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 2625–2634 [Google Scholar]
  5. Vinyals O, Toshev A, Bengio S, et al. Show and Tell: a Neural Image Caption Generator[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3156–3164 [Google Scholar]
  6. Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[J]. Computer Science, 2014 9):0473-0482 [Article] [Google Scholar]
  7. Graves A. Generating Sequences with Recurrent Neural Networks[J]. Computer Science, 2013 8):0850-0863 [Article] [Google Scholar]
  8. Xu K, Ba J, Kiros R, et al. Show Attend and Tell: Neural Image Caption Generation with Visual Attention[C]//International Conference on Machine Learning, 2015: 2048–2057 [Google Scholar]
  9. Ba J, Mnih V, Kavukcuoglu K. Multiple Object Recognition with Visual Attention[J]. Computer Science, 2014 12):7755-7771 [Article] [Google Scholar]
  10. Rashtchian C, Young P, Hodosh M, et al. Collecting Image Annotations using Amazon’s Mechanical Turk[C]//NAACL HLT Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, 2010: 139–147 [Google Scholar]
  11. Young P, Lai A, Hodosh M, et al. From Image Descriptions to Visual Denotations:New Similarity Metrics for Semantic Inference over Event Descriptions[J]. Transactions of the Association for Computational Linguistics, 2014 2):67-78 [Article] [Google Scholar]
  12. Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: Common Objects in Context[C]//European Conference on Computer Vision Springer Cham, 2014: 740–755 [Google Scholar]
  13. Wu Q, Shen C, Liu L, et al. What Value Do Explicit High Level Concepts Have in Vision to Language Problems[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 203–212 [Google Scholar]
  14. Corbetta M, Shulman G L. Control of Goal-Directed and Stimulus-Driven Attention in the Brain[J]. Nature Reviews Neuroscience, 2002, 3(3):201-215 [Article] [CrossRef] [PubMed] [Google Scholar]
  15. Van De Weijer J, Schmid C, Verbeek J, et al. Learning Color Names for Real-World Applications[J]. IEEE Trans on Image Processing, 2009, 18(7):1512-1523 [Article] [NASA ADS] [CrossRef] [Google Scholar]
  16. Zhu Y, Groth O, Bernstein M, et al. Visual7w: Grounded Question Answering in Images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 4995–5004 [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.