Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types.[1] representing geographic locations are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per geographic object to a continuous vector space with a much lower dimension.

Such embedding methods allow complex spatial data to be used in neural networks and have been shown to improve performance in spatial analysis tasks[2][3]

Embedded data types

edit

Geographic data can take many forms: text,[4][5][6] images,[7][8] graphs,[9][10] trajectories,[11][12][13] polygons.[14] Depending on the task, there may be a need to combine multimodal data from different sources.[2][15] The next section describes examples of different types of data and their uses.

Text

edit

Geolocated posts on social media can be used to acquire a library of documents bound to a given place that can be later transformed to embedded vectors using word embedding techniques.[4]

Image

edit

Satellites and aircraft collect digital spatial data acquired from remotely sensed images which can be used in machine learning. They are sometimes hard to analyse using basic image analysis methods and convolutional neural networks can be used to acquire an embedding of images bound to a given geographical object or a region.[7]

 
Example of Seattle city satellite image acquired using remote sensing methods.

Point

edit

A single point of interest (POI) can be assigned multiple features that can be used in machine learning. These could be demographic, transportation, meteorological, or economic data, for example. When embedding single points, it is common to consider the entire set of available points as nodes in a graph.[10]

 
Example of a point of interests map from OpenPOIMap.

Line / multiline

edit

Among other things, motion trajectories are represented as lines (multilines). Individual trajectories are embedded taking into account travel time, distances and also features of points visited along the way. Embedding of trajectories allows to improve performance of such tasks as clustering and also categorization.[13]

 
Example of mobility trajectories from the GeoLife dataset (Beijing, China).

Polygon

edit

The geographic areas analyzed in machine learning are defined by both administrative boundaries and top-down division into grids of regular shapes such as rectangles, for example. Both types are represented as polygons and, like points, can be assigned different demographic, transportation, or economic features. A polygon can also have features related to the size of the area or shape it represents.

 
Example of regular hexagonal tiling used to divide San Francisco Bay area using Uber's H3 library.
 
Map of San Francisco administrative districts.

Graph

edit

An example domain where graph representation is used is the street layout in a city, where vertices can be intersections and edges can be roads. The vertices can also be destination points like public transport stops or important points in the city, and the edges represent the flow between them. Embedding graphs or single vertices allows to improve accuracy of analysis methods in which the treated geographical domain can be represented as a network.[9]

 
Example of a city network: the Rennes Metro (French: Métro de Rennes). In this example metro stops are vertices and tracks between them are edges.

Usage

edit
  • POI recommendation[15][16] - generating personalized point of interest recommendations based on user preferences.
  • Next/future location prediction[10][17] - prediction of the next location a person will go to based on their historical trajectory.
  • Zone functions classification[13] - based on different mobility of people or POI distribution a function of a given area in a city can be predicted.
  • Crime prediction[18] - estimation of crime rate in different regions of a city.
  • Local event detection[6] - studying spatio-temporal changes in embeddings can provide valuable information in detection of local event occurring in specific location.
  • Regional mobility popularity prediction[11] - analysis of mobility can show patterns in popularity of different regions in a city.
  • Shape matching[14] - finding a similar shape of given polygon, for example finding building with the same shape as input building.
  • Travel time estimation[19][20][21] - predicting estimated travel time given current traffic conditions and special occurring events.
  • Time estimation for on-demand food delivery[22] - estimation of delivery time when placing an order through the website.

Temporal aspect

edit

Some of the data analyzed has a timestamp associated with it. In some cases of data analysis this information is omitted and in others it is used to divide the set into groups. The most common division is the separation of weekdays from weekends or division into hours of the day. This is particularly important in the analysis of mobility data, because the characteristics of mobility during the week and at different times of the day are very different from each other.[3][23][24] Another area in which time division into, for example, individual months can be used is in the analysis of tourism of a given region.[16] In order to take such a split into account, embedding methods treat the time stamp specifically or separate versions of the model are developed for different subgroups of the analyzed set.

References

edit
  1. ^ Schneider, Markus (2009), "Spatial Data Types", in LIU, LING; ÖZSU, M. TAMER (eds.), Encyclopedia of Database Systems, Boston, MA: Springer US, pp. 2698–2702, doi:10.1007/978-0-387-39940-9_354, ISBN 978-0-387-39940-9, retrieved 2021-01-19
  2. ^ a b Li, Youru; Zhu, Zhenfeng; Kong, Deqiang; Xu, Meixiang; Zhao, Yao (2019-07-17). "Learning Heterogeneous Spatial-Temporal Representation for Bike-Sharing Demand Prediction". Proceedings of the AAAI Conference on Artificial Intelligence. 33: 1004–1011. doi:10.1609/aaai.v33i01.33011004. ISSN 2374-3468.
  3. ^ a b Cao, Hancheng; Xu, Fengli; Sankaranarayanan, Jagan; Li, Yong; Samet, Hanan (2020-05-01). "Habit2vec: Trajectory Semantic Embedding for Living Pattern Recognition in Population". IEEE Transactions on Mobile Computing. 19 (5): 1096–1108. doi:10.1109/TMC.2019.2902403. ISSN 1536-1233. S2CID 86694179.
  4. ^ a b Dassereto, Federico; Di Rocco, Laura; Guerrini, Giovanna; Bertolotto, Michela (2020), Kyriakidis, Phaedon; Hadjimitsis, Diofantos; Skarlatos, Dimitrios; Mansourian, Ali (eds.), "Evaluating the Effectiveness of Embeddings in Representing the Structure of Geospatial Ontologies", Geospatial Technologies for Local and Regional Development, Lecture Notes in Geoinformation and Cartography, Cham: Springer International Publishing, pp. 41–57, doi:10.1007/978-3-030-14745-7_3, ISBN 978-3-030-14744-0, S2CID 147707446, retrieved 2021-01-19
  5. ^ Jin, Jiaqi; Xiao, Zhuojian; Qiu, Qiang; Fang, Jinyun (July 2019). "A Geohash Based Place2vec Model". IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium. Yokohama, Japan: IEEE. pp. 3344–3347. doi:10.1109/IGARSS.2019.8898375. ISBN 978-1-5386-9154-0. S2CID 208033962.
  6. ^ a b Silva, Amila; Karunasekera, Shanika; Leckie, Christopher; Luo, Ling (December 2019). "USTAR: Online Multimodal Embedding for Modeling User-Guided Spatiotemporal Activity". 2019 IEEE International Conference on Big Data (Big Data). Los Angeles, CA, USA: IEEE. pp. 1211–1217. arXiv:1910.10335. doi:10.1109/BigData47090.2019.9005461. ISBN 978-1-7281-0858-2. S2CID 204838325.
  7. ^ a b Zhang, Sen; Li, Shaobo; Li, Xiang; Yao, Yong (2020-04-02). "Representation of Traffic Congestion Data for Urban Road Traffic Networks Based on Pooling Operations". Algorithms. 13 (4): 84. doi:10.3390/a13040084. ISSN 1999-4893.
  8. ^ Dao, Minh-Son; Zettsu, Koji (September 2018). "A Raster-Image-Based Approach for Understanding Associations of Urban Sensing Data". 2018 IEEE First International Conference on Artificial Intelligence and Knowledge Engineering (AIKE). Laguna Hills, CA: IEEE. pp. 134–137. doi:10.1109/AIKE.2018.00029. ISBN 978-1-5386-9555-5. S2CID 53279500.
  9. ^ a b Wu, Ning; Zhao, Xin Wayne; Wang, Jingyuan; Pan, Dayan (2020-08-23). "Learning Effective Road Network Representation with Hierarchical Graph Neural Networks". Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD '20. Virtual Event CA USA: ACM. pp. 6–14. doi:10.1145/3394486.3403043. ISBN 978-1-4503-7998-4. S2CID 221191109.
  10. ^ a b c Xu, Shuai; Cao, Jiuxin; Legg, Phil; Liu, Bo; Li, Shancang (June 2020). "Venue2Vec: An Efficient Embedding Model for Fine-Grained User Location Prediction in Geo-Social Networks". IEEE Systems Journal. 14 (2): 1740–1751. Bibcode:2020ISysJ..14.1740X. doi:10.1109/JSYST.2019.2913080. ISSN 1932-8184. S2CID 181989049.
  11. ^ a b Fu, Yanjie; Wang, Pengyang; Du, Jiadi; Wu, Le; Li, Xiaolin (2019-07-17). "Efficient Region Embedding with Multi-View Spatial Networks: A Perspective of Locality-Constrained Spatial Autocorrelations". Proceedings of the AAAI Conference on Artificial Intelligence. 33 (1): 906–913. doi:10.1609/aaai.v33i01.3301906. ISSN 2374-3468.
  12. ^ Ouyang, Kun; Shokri, Reza; Rosenblum, David S.; Yang, Wenzhuo (July 2018). "A Non-Parametric Generative Model for Human Trajectories". Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. Stockholm, Sweden: International Joint Conferences on Artificial Intelligence Organization. pp. 3812–3817. doi:10.24963/ijcai.2018/530. ISBN 978-0-9992411-2-7.
  13. ^ a b c Yao, Zijun; Fu, Yanjie; Liu, Bin; Hu, Wangsu; Xiong, Hui (July 2018). "Representing Urban Functions through Zone Embedding with Human Mobility Patterns". Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. Stockholm, Sweden: International Joint Conferences on Artificial Intelligence Organization. pp. 3919–3925. doi:10.24963/ijcai.2018/545. ISBN 978-0-9992411-2-7.
  14. ^ a b Yan, Xiongfeng; Ai, Tinghua; Yang, Min; Tong, Xiaohua (2020-05-25). "Graph convolutional autoencoder model for the shape coding and cognition of buildings in maps". International Journal of Geographical Information Science. 35 (3): 490–512. doi:10.1080/13658816.2020.1768260. ISSN 1365-8816. S2CID 219469997.
  15. ^ a b Chang, Buru; Park, Yonggyu; Park, Donghyeon; Kim, Seongsoon; Kang, Jaewoo (July 2018). "Content-Aware Hierarchical Point-of-Interest Embedding Model for Successive POI Recommendation". Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. Stockholm, Sweden: International Joint Conferences on Artificial Intelligence Organization. pp. 3301–3307. doi:10.24963/ijcai.2018/458. ISBN 978-0-9992411-2-7.
  16. ^ a b Bin, Chenzhong; Gu, Tianlong; Jia, Zhonghao; Zhu, Guimin; Xiao, Cihan (June 2020). "A neural multi-context modeling framework for personalized attraction recommendation". Multimedia Tools and Applications. 79 (21–22): 14951–14979. doi:10.1007/s11042-019-08554-5. ISSN 1380-7501. S2CID 209540693.
  17. ^ "Next Location Prediction with a Graph Convolutional Network Based on a Seq2seq Framework". KSII Transactions on Internet and Information Systems. 14 (5). 2020-05-31. doi:10.3837/tiis.2020.05.003.
  18. ^ Qian, Yiting; Pan, Li; Wu, Peng; Xia, Zhengmin (July 2020). "GeST: A Grid Embedding based Spatio-Temporal Correlation Model for Crime Prediction". 2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC). Hong Kong, Hong Kong: IEEE. pp. 1–7. doi:10.1109/DSC50466.2020.00009. ISBN 978-1-7281-9558-2. S2CID 221281815.
  19. ^ Wang, Meng-xiang; Lee, Wang-Chien; Fu, Tao-yang; Yu, Ge (2019-11-05). "Learning Embeddings of Intersections on Road Networks". Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. Chicago IL USA: ACM. pp. 309–318. doi:10.1145/3347146.3359075. ISBN 978-1-4503-6909-1. S2CID 208016230.
  20. ^ Xu, Saijun; Xu, Jiajie; Zhou, Rui; Liu, Chengfei; Li, Zhixu; Liu, An (2020), Nah, Yunmook; Cui, Bin; Lee, Sang-Won; Yu, Jeffrey Xu (eds.), "TADNM: A Transportation-Mode Aware Deep Neural Model for Travel Time Estimation", Database Systems for Advanced Applications, Lecture Notes in Computer Science, vol. 12112, Cham: Springer International Publishing, pp. 468–484, doi:10.1007/978-3-030-59410-7_32, ISBN 978-3-030-59409-1, S2CID 221840073, retrieved 2021-01-19
  21. ^ Xu, Saijun; Zhang, Ruoqian; Cheng, Wanjun; Xu, Jiajie (2020-08-15). "MTLM: a multi-task learning model for travel time estimation". GeoInformatica. 26 (2): 379–395. doi:10.1007/s10707-020-00422-x. ISSN 1384-6175. S2CID 221128832.
  22. ^ Zhu, Lin; Yu, Wei; Zhou, Kairong; Wang, Xing; Feng, Wenxing; Wang, Pengyu; Chen, Ning; Lee, Pei (2020-08-23). "Order Fulfillment Cycle Time Estimation for On-Demand Food Delivery". Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD '20. Virtual Event CA USA: ACM. pp. 2571–2580. doi:10.1145/3394486.3403307. ISBN 978-1-4503-7998-4. S2CID 221191619.
  23. ^ Du, Bowen; Peng, Hao; Wang, Senzhang; Bhuiyan, Md Zakirul Alam; Wang, Lihong; Gong, Qiran; Liu, Lin; Li, Jing (March 2020). "Deep Irregular Convolutional Residual LSTM for Urban Traffic Passenger Flows Prediction". IEEE Transactions on Intelligent Transportation Systems. 21 (3): 972–985. doi:10.1109/TITS.2019.2900481. ISSN 1524-9050. S2CID 116832221.
  24. ^ Hong, Huiting; Lin, Yucheng; Yang, Xiaoqing; Li, Zang; Fu, Kung; Wang, Zheng; Qie, Xiaohu; Ye, Jieping (2020-08-23). "HetETA". Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD '20. Virtual Event CA USA: ACM. pp. 2444–2454. doi:10.1145/3394486.3403294. ISBN 978-1-4503-7998-4. S2CID 221191112.