Postgresql arrays of text to tsvector performance and pitfalls

by Matheus Monteiro   Last Updated March 07, 2018 05:06 AM

I am using Postgresql 9.3 on an Ubuntu 12.4 machine.

I've only recently started studying database and database design, and I am to create a database that includes text that will later be searched through. I've read up on tsvectors and stuff better than WHERE column LIKE 'query', but I don't have much experience with it. I am thinking of building a table where one column is of type text[], and another one will be the tsvector build from the text. I will use and array of text because some of the text will be displayed to the user with different formatting etc. I have never worked with array columns before. My questions are:

  • Is using a text array a good idea at all in this case?
  • Will the text array work well with the tsvectors?
  • Will using text[] compromise my performance?
  • Are there any pitfalls I should stay alert for?

Thanks for the attention, and this is my first post ever here, so if I messed up on this question do tell me. P.S.: I've searched this topic here beforehand, I hope I am not creating a redundant question. P.P.S.: Not a native speaker, sorry for the weird language.

EDIT: Thanks for the answers! The texts in the array won't be of fixed size. They will range from 6 ~ 20 lines, but they need to be editable, and some of the texts inside an array will be formatted in a different manner when displayed to the user. I'm not using multiple text columns because some of the entries in the table will have a single, two, or more elements inside the array. I know how each element will be formatted because every row will also have a "type" column, which will define it's formatting.

Answers 1

Using an array may not be a very good idea, but you haven't given enough information to tell for sure. How large can they be? Are they all a fixed size?

How will you know which member of the array gets what kind of formatting? If that logic is hard-coded into the position of the member in the array, then why not just have multiple columns and key the formatting to the column names?

You can use text arrays with tsvectors, for example:

select to_tsvector(array_to_string(ARRAY['a','b','c'],' '::text)) ;

I can be tricky to figure out which of the array members lead to any given match, but that is going to be the case whether you use an array or a list of column names.

Arrays do have a quite a bit of overhead for storage and access, but if the data you are storing in the array is of substantial size, the array overhead will probably not be important relative to the irreducible overhead of your data.

February 03, 2017 17:18 PM

Related Questions

Database Design and structure

Updated April 09, 2017 12:06 PM

Postgresql FULL TEXT search LEFT JOIN

Updated November 14, 2016 07:31 AM