I am using Postgresql 9.3 on an Ubuntu 12.4 machine.
I've only recently started studying database and database design, and I am to create a database that includes text that will later be searched through. I've read up on tsvectors and stuff better than
WHERE column LIKE 'query', but I don't have much experience with it. I am thinking of building a table where one column is of type
text, and another one will be the tsvector build from the text. I will use and array of text because some of the text will be displayed to the user with different formatting etc. I have never worked with array columns before. My questions are:
textcompromise my performance?
Thanks for the attention, and this is my first post ever here, so if I messed up on this question do tell me. P.S.: I've searched this topic here beforehand, I hope I am not creating a redundant question. P.P.S.: Not a native speaker, sorry for the weird language.
EDIT: Thanks for the answers! The texts in the array won't be of fixed size. They will range from 6 ~ 20 lines, but they need to be editable, and some of the texts inside an array will be formatted in a different manner when displayed to the user. I'm not using multiple
text columns because some of the entries in the table will have a single, two, or more elements inside the array. I know how each element will be formatted because every row will also have a "type" column, which will define it's formatting.
Using an array may not be a very good idea, but you haven't given enough information to tell for sure. How large can they be? Are they all a fixed size?
How will you know which member of the array gets what kind of formatting? If that logic is hard-coded into the position of the member in the array, then why not just have multiple columns and key the formatting to the column names?
You can use text arrays with tsvectors, for example:
select to_tsvector(array_to_string(ARRAY['a','b','c'],' '::text)) ;
I can be tricky to figure out which of the array members lead to any given match, but that is going to be the case whether you use an array or a list of column names.
Arrays do have a quite a bit of overhead for storage and access, but if the data you are storing in the array is of substantial size, the array overhead will probably not be important relative to the irreducible overhead of your data.